xeps/xep-0029.xml

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE xep SYSTEM 'xep.dtd' [
  <!ENTITY % ents SYSTEM "xep.ent">
%ents;
]>
<?xml-stylesheet type='text/xsl' href='xep.xsl'?>
<xep>
    <header>
        <title>Definition of Jabber Identifiers (JIDs)</title>
        <abstract>This document defines the exact nature of a Jabber Identifier (JID). <em>Note well: this document is superseded by the XMPP Core memo defined by the IETF's XMPP Working Group.</em></abstract>
        &LEGALNOTICE;
        <number>0029</number>
        <status>Retracted</status>
        <type>Standards Track</type>
        <sig>Standards</sig>
        <dependencies>None</dependencies>
        <supersedes>None</supersedes>
        <supersededby>XMPP Core</supersededby>
        <shortname>N/A</shortname>
        <author>
            <firstname>Craig</firstname>
            <surname>Kaes</surname>
            <email>craigk@jabber.com</email>
            <jid>craigk@jabber.com</jid>
        </author>
        <revision>
            <version>1.1</version>
            <date>2003-10-03</date>
            <initials>psa</initials>
            <remark>Changed status to Retracted. This document is superseded by the XMPP Core memo defined by the IETF's XMPP Working Group.</remark>
        </revision>
        <revision>
            <version>1.0</version>
            <date>2002-05-15</date>
            <initials>psa</initials>
            <remark>Changed status to Draft.</remark>
        </revision>
        <revision>
            <version>0.2</version>
            <date>2002-05-01</date>
            <initials>cak</initials>
            <remark>
                Added info on restricting resource identifier length; standardized nomenclature.
            </remark>
        </revision>
        <revision>
            <version>0.1</version>
            <date>2002-04-24</date>
            <initials>cak</initials>
            <remark>Initial version</remark>
        </revision>
    </header>
    <section1 topic='Introduction'>
        <p>Jabber Identifiers (JIDs) uniquely identify individual entities in the Jabber network. To date, their syntax has been defined by convention, existing implementations, and available documentation. As it exists, certain characters that are allowed in JIDs cause ambiguity, and the lack of a size limit on resources defies database schemas and causes some trivial JID operations to require dynamic memory allocation. This document seeks to both define and improve the existing JID syntax. This document will not explain the general usage or nature of JIDs, instead focusing on syntax.</p>
    </section1>
    <section1 topic='JIDs'>
        <p>JIDs consist of three main parts:</p>
          <ol>
            <li>The node identifier (optional)</li>
            <li>The domain identifier (required)</li>
            <li>The resource identifier (optional)</li>
          </ol>
        <p>JIDs are encoded UTF-8. A grammar will be presented first, followed by specific clarifying and further restricting remarks.</p>
        <section2 topic='Grammar'>
            <p>
                <code>
&lt;JID&gt; ::= [&lt;node&gt;"@"]&lt;domain&gt;["/"&lt;resource&gt;]
&lt;node&gt; ::= &lt;conforming-char&gt;[&lt;conforming-char&gt;]*
&lt;domain&gt; ::= &lt;hname&gt;["."&lt;hname&gt;]*
&lt;resource&gt; ::= &lt;any-char&gt;[&lt;any-char&gt;]*
&lt;hname&gt; ::= &lt;let&gt;|&lt;dig&gt;[[&lt;let&gt;|&lt;dig&gt;|"-"]*&lt;let&gt;|&lt;dig&gt;]
&lt;let&gt; ::= [a-z] | [A-Z]
&lt;dig&gt; ::= [0-9]
&lt;conforming-char&gt; ::= #x21 | [#x23-#x25] | [#x28-#x2E] |
                      [#x30-#x39] | #x3B | #x3D | #x3F |
                      [#x41-#x7E] | [#x80-#xD7FF] |
                      [#xE000-#xFFFD] | [#x10000-#x10FFFF]
&lt;any-char&gt; ::= [#x20-#xD7FF] | [#xE000-#xFFFD] |
               [#x10000-#x10FFFF]
                </code>
            </p>
        </section2>
        <section2 topic='Domain Identifier'>
            <p>A domain identifier is a standard DNS hostname as specified in RFC952 <note><link url='http://www.ietf.org/rfc/rfc952.txt'>http://www.ietf.org/rfc/rfc952.txt</link></note> and RFC1123. <note><link url='http://www.ietf.org/rfc/rfc1123.txt'>http://www.ietf.org/rfc/rfc1123.txt</link></note> It is case-insensitive 7-bit ASCII and limited to 255 bytes.  It is the only required component of a JID.</p>
        </section2>
        <section2 topic='Node Identifier'>
            <p>Node identifiers are restricted to 256 bytes, They may contain any Unicode character higher than #x20 with the exception of the following:</p>
            <ol>
                <li>#x22 (")</li>
                <li>#x26 (&amp;)</li>
                <li>#x27 (')</li>
                <li>#x2F (/)</li>
                <li>#x3A (:)</li>
                <li>#x3C (&lt;)</li>
                <li>#x3E (&gt;)</li>
                <li>#x40 (@)</li>
                <li>#x7F (del)</li>
                <li>#xFFFE (BOM)</li>
                <li>#xFFFF (BOM)</li>
            </ol>
            <p>Case is preserved, but comparisons will be made in case-normalized canonical form.</p>
        </section2>
        <section2 topic='Resource Identifier'>
            <p>Resources identifiers are case-sensitive and are limited to 256 bytes. They may include any Unicode character greater than #x20, except #xFFFE and #xFFFF.</p>
        </section2>
        <section2 topic='Limited Resources'>
            <p>To date, resource identifiers have not had a fixed limit on their length.  This document seeks to limit it to 256 bytes for the following reasons:</p>
            <ol>
                <li>In order to perform JID manipulations safely, one cannot use stack space if there is no limit.  This forces temporary calculations onto the heap which is unnecessarily costly.</li>
                <li>As a fixed length character field, a resource identifier is more easily stored in, searched on, and retrieved from a database.  If an end user may store an encyclopedia's worth of information in their resource, then the only way it can be stored without truncating it is to store it as a large object (BLOB or CLOB).  Depending on the database used, that makes it either grossly inefficient or impossible to search on.</li>
                <li>There exist denial of service attacks stemming from an unlimited resource length.</li>
            </ol>
            <p>In a worst-case encoding, such as Han ideographs, 256 bytes will provide enough storage space for 64 character points.  This provides a lower bound on the number of characters a node may have in its resource.</p>
            <p>Specifying limits in terms of bytes instead of characters is somewhat arbitrary once a lower bound for characters is established.  This document proposes limits in terms of bytes mainly because doing so results in parsing efficiency; specifically, an implementation does not have to un-encode the UTF-8 string for the sole purpose of further restricting character sets that require fewer than four bytes per character point.  It is sufficient to have a lower bound on characters and an upper bound on bytes.</p>
        </section2>
    </section1>
</xep>