git-svn-id: file:///home/ksmith/gitmigration/svn/xmpp/trunk@957 4b5297f7-1745-476d-ba37-a9c6900126ab
This commit is contained in:
Peter Saint-Andre 2007-06-13 20:36:57 +00:00
parent 0f162d8672
commit c057ae9f41
1 changed files with 90 additions and 79 deletions

View File

@ -23,10 +23,10 @@
&hildjj;
&stpeter;
<revision>
<version>1.1pre2</version>
<date>in progress, last updated 2007-06-07</date>
<version>1.1pre3</version>
<date>in progress, last updated 2007-06-13</date>
<initials>psa</initials>
<remark><p>Specified that \20 must not be included at the beginning or end of a node identifier; added note about native JIDs with escaped characters; added mapping for IRC addresses; modified terminology to consistently use escaping and unescaping rather than encoding and decoding.</p></remark>
<remark><p>Specified that \20 must not be included at the beginning or end of a node identifier; added security consideration regarding potential confusion caused by mismatch between software that does and software that does not perform JID escaping; added note about existing native JIDs that contain escaped characters; added mapping for IRC addresses; modified terminology to consistently use the terms escaping and unescaping rather than the terms encoding and decoding.</p></remark>
</revision>
<revision>
<version>1.0</version>
@ -78,7 +78,7 @@
</revision>
</header>
<section1 topic='Introduction' anchor='intro'>
<p>&xmppcore; defines the Nodeprep profile of stringprep (&rfc3454;), which specifies that the following nine Unicode code points are disallowed in the node identifier portion of a Jabber Identifier (hereafter we refer to these as "the disallowed characters"):</p>
<p>&xmppcore; defines the Nodeprep profile of stringprep (&rfc3454;), which specifies that the following nine Unicode code points are disallowed in the node identifier portion of a Jabber Identifier (JID):</p>
<ul>
<li>U+0020 (" ") <note>In fact all ASCII and non-ASCII space characters are disallowed, since the Nodeprep profile of stringprep prohibits all the characters specified in Appendices C.1.1 and C.1.2 of <cite>RFC 3454</cite>; however, all of these characters reduce to U+0020, also called SP.</note></li>
<li>U+0022 (")</li>
@ -90,53 +90,36 @@
<li>U+003E (&gt;)</li>
<li>U+0040 (@)</li>
</ul>
<p>This restriction is an inconvenience for users who have one or more of the disallowed characters in their desired usernames, particularly in the case of the ' character, which is common in names like O'Hara and D'Artagnan. The restriction is a positive hardship if existing email addresses are mapped to JIDs, since some of the disallowed characters are allowed in the username portion of an email address (specifically, the characters &amp; ' / as described in Sections 3.2.4 and 3.2.5 of &rfc2822;).</p>
<p>If the &amp; character had not been in the list of disallowed characters, then normal XML escaping conventions (as specified in &w3xml;) could have been used, with the result that D'Artagnan (for example) could have been rendered as D&amp;apos;artagnan [sic]. Since there are good reasons for each of the disallowed characters, another escaping mechanism is needed.</p>
<p>It might have been desirable to use percent-encoding (e.g., %27 for the ' character) as specified in Section 2.1 of &rfc3986;. However, that approach was rejected since the % character is an often-used character in existing JIDs (e.g., to replace the @ character in gateway addresses) and the resulting ambiguity would have caused misdelivered or undeliverable messages. Therefore, a new mechanism is described herein to escape only the disallowed characters and only in the node identifier portion of JIDs.</p>
<p>This restriction is an inconvenience for users who have one or more of these "disallowed characters" in their desired usernames, particularly in the case of the ' character, which is common in names like O'Hara and D'Artagnan. The restriction is a positive hardship if existing email addresses are mapped to JIDs, since some of the disallowed characters are allowed in the username portion of an email address (specifically, the characters &amp; ' / as described in Sections 3.2.4 and 3.2.5 of &rfc2822;).</p>
<p>To overcome this restriction, we define a way to escape the disallowed characters in JIDs. An escaped JID contains none of the disallowed characters and therefore can be transported by native XMPP implementations without modification (e.g., existing XMPP servers do not require modification in order to handle escaped JIDs). The escaped JID is unescaped only for presentation to a human user (typically by an XMPP client) or for gatewaying to a non-XMPP system (such as an LDAP database or a messaging system that does not use XMPP).</p>
</section1>
<section1 topic='Requirements' anchor='reqs'>
<p>This document addresses the following requirements:</p>
<ol>
<li>The escaping mechanism shall apply to the node identitier portion of a JID only, and MUST NOT be applied to domain identifiers or resource identifiers.</li>
<li>Escaped JIDs MUST conform to the definition of a Jabber ID as specified in <cite>RFC 3920</cite>, including the Nodeprep profile of stringprep. In particular this means that even after passing through Nodeprep, the JID MUST be valid, with the result that Unicode look-alikes like U+02BC (Modifier Letter Apostrophe) MUST NOT be used.</li>
<li>It MUST NOT be possible for clients to use this escaping mechanism to avoid the goal of stringprep; namely, that JIDs that look alike should have same character representation after being processed by stringprep. Therefore, this mechanism MUST NOT be applied to any characters other than the disallowed characters.</li>
<li>Existing JIDs that include portions of the escaping mechanism MUST continue to be valid.</li>
<li>The escaping mechanism MUST NOT break commonly deployed Jabber/XMPP software implementations such as servers, components, gateways, and clients.</li>
<li>The escaping mechanism SHOULD NOT place undue strain upon server implementations; implementations or deployments that do not need to unescape SHOULD be able to ignore the escaping mechanism.</li>
<li><p>The escaping mechanism shall apply to the node identitier portion of a JID only, and MUST NOT be applied to domain identifiers or resource identifiers.</p></li>
<li><p>Escaped JIDs MUST conform to the definition of a Jabber ID as specified in <cite>RFC 3920</cite>, including the Nodeprep profile of stringprep. In particular this means that even after passing through Nodeprep, the JID MUST be valid, with the result that Unicode look-alikes like U+02BC (Modifier Letter Apostrophe) MUST NOT be used.</p></li>
<li><p>It MUST NOT be possible for clients to use this escaping mechanism to avoid the goal of stringprep; namely, that JIDs that look alike should have same character representation after being processed by stringprep. Therefore, this mechanism MUST NOT be applied to any characters other than the disallowed characters. <note>In certain circumstances the escaping character itself ("\") may also be escaped.</note></p></li>
<li><p>Existing JIDs that include portions of the escaping mechanism MUST continue to be valid.</p></li>
<li><p>The escaping mechanism MUST NOT break commonly deployed Jabber/XMPP software implementations such as servers, components, gateways, and clients.</p></li>
<li><p>The escaping mechanism SHOULD NOT place undue strain upon server implementations; implementations or deployments that do not need to unescape SHOULD be able to ignore the escaping mechanism.</p></li>
</ol>
</section1>
<section1 topic='Discovery' anchor='discovery'>
<p>If an entity needs to discover whether another entity supports JID escaping, it MUST send a disco#info request to the other entity as specified in &xep0030;.</p>
<example caption='Client requests features'><![CDATA[
<iq type='get'
from='porthos@musketeers.bourbon.gov/gate'
to='irc.shakespeare.lit'
id='info1'>
<query xmlns='http://jabber.org/protocol/disco#info'/>
</iq>
]]></example>
<p>If the queried entity supports JID escaping, it MUST return a <strong>jid\20escaping</strong> [sic] feature in its reply.</p>
<example caption='Service responds with features'><![CDATA[
<iq type='get'
to='porthos@musketeers.bourbon.gov/gate'
from='irc.shakespeare.lit'
id='info1'>
<query xmlns='http://jabber.org/protocol/disco#info'>
...
<feature var='jid\20escaping'/>
</query>
</iq>
]]></example>
</section1>
<section1 topic='Transformations' anchor='transforms'>
<section2 topic='Concepts' anchor='concepts'>
<p>This document specifies escaping each disallowed character as \hexhex -- where "hexhex" is the hexadecimal value of the Unicode code point in question, ignoring the leading "00" in the code point (e.g., 27 for the ' character, resulting in an escaping of \27). (Note: This escaping method is quite similar to that used for disallowed characters in LDAP distinguished names, as specified in &rfc2253;.) Full escaping and unescaping transformations for all nine disallowed characters are provided in the following sections. In addition, escaping and unescaping transformations are shown for the \ character in case it also needs to be escaped when it occurs in a JID or non-XMPP address as part of a string that corresponds to one of the other escaped characters.</p>
<p>This document specifies that each disallowed character shall be escaped as \hexhex -- where "hexhex" is the hexadecimal value of the Unicode code point in question, ignoring the leading "00" in the code point (e.g., 27 for the ' character, resulting in an escaping of \27).</p>
<p>If the &amp; character had not been in the list of disallowed characters, then normal XML escaping conventions (as specified in &w3xml;) could have been used, with the result that D'Artagnan (for example) could have been rendered as D&amp;apos;artagnan [sic]. Since there are good reasons why &amp; is a disallowed character, another escaping mechanism is needed.</p>
<p>It might have been desirable to use percent-encoding (e.g., %27 for the ' character) as specified in Section 2.1 of &rfc3986;. However, that approach was rejected since the % character is an often-used character in existing JIDs (e.g., to replace the @ character in gateway addresses) and the resulting ambiguity would have caused misdelivered or undeliverable messages.</p>
<p>To avoid the problems associated with using &amp; or % as the escaping character, this document specifies a new escaping mechanism that uses the backslash character ("\") followed by "hexhex" (the hexadecimal value of the Unicode code point in question). This escaping method is quite similar to that used for disallowed characters in LDAP distinguished names (see &rfc2253;) but is used only for the characters that are disallowed in XMPP node identifiers (as well as the escaping character itself in certain special situations).</p>
<p>Here is an example of an escaped JID (this would be displayed but never natively transported as "d'artagnan@musketeers.lit"):</p>
<code>
d\27artagnan@musketeers.lit
</code>
<p>This document describes full escaping and unescaping transformations for all nine disallowed characters. In addition, escaping and unescaping transformations are shown for the \ character in case it also needs to be escaped when it occurs in a JID or non-XMPP address as part of a character sequence that corresponds to one of the escaped characters.</p>
<p>Note: All transformations are exactly as specified below. CASE IS SIGNIFICANT. Lowercase was selected since Nodeprep will case fold to lowercase for US-ASCII characters such as A, C, E, and F.</p>
</section2>
<section2 topic='Escaping Transformation' anchor='escaping'>
<section2 topic='Escaping Transformations' anchor='escaping'>
<p>The escaping transformations are defined in the following table. Typically, escaping is performed only by a client that is processing information provided by a human user in unescaped form, or by a gateway to some external system (e.g., email or LDAP) that needs to generate a JID.</p>
<table caption='Mapping from Unescaped to Escaped Characters'>
<tr><th>Unescaped Character</th><th>Escaped Character</th></tr>
@ -151,17 +134,18 @@
<tr><td>@</td><td>\40</td></tr>
<tr><td>\</td><td>\5c</td></tr>
</table>
<p>* Note: The string \20 MUST NOT be the first or last character of an escaped node identifier. <note>For a similar restriction, see Section 2.4 of <cite>RFC 2253</cite>.</note></p>
<example caption="JID Escaping: Porthos starts a chat, typing into his client the JID d'artagnan@musketeers.bourbon.gov:"><![CDATA[
<p>* Note: The character sequence \20 MUST NOT be the first or last character of an escaped node identifier. <note>For a similar restriction, see Section 2.4 of <cite>RFC 2253</cite>.</note></p>
<p>In the following example, Porthos starts a chat with D'Artagnan, typing into his client the string "d'artagnan@musketeers.lit" (which is escaped by his client to "d\27artagnan@musketeers.lit").</p>
<example caption="JID Escaping"><![CDATA[
<message
from='porthos@musketeers.bourbon.gov/gate'
to='d\27artagnan@musketeers.bourbon.gov'
from='porthos@musketeers.lit/gate'
to='d\27artagnan@musketeers.lit'
type='chat'>
<body>And do you always forget your eyes when you run?</body>
</message>
]]></example>
</section2>
<section2 topic='Unescaping Transformation' anchor='unescaping'>
<section2 topic='Unescaping Transformations' anchor='unescaping'>
<p>The unescaping transformations are defined in the following table. Typically, unescaping is performed only by a client that wants to display JIDs containing escaped characters to a human user, or by a gateway to some external system (e.g., email or LDAP) that needs to generate identifiers for foreign systems.</p>
<table caption='Mapping from Escaped to Unescaped Characters'>
<tr><th>Escaped Character</th><th>Unescaped Character</th></tr>
@ -176,10 +160,11 @@
<tr><td>\40</td><td>@</td></tr>
<tr><td>\5c</td><td>\</td></tr>
</table>
<example caption="JID Escaping: D'Artagnan the elder sends SMTP mail through a gateway:"><![CDATA[
<p>In the following example, D'Artagnan the elder sends a message through an SMTP mail gateway (the JID is "tr&#233;ville\40musketeers.lit@smtp.gascon.fr" and the destination email address is "tr&#233;ville@musketeers.lit").</p>
<example caption="JID Unescaping"><![CDATA[
<message
from='d\27artagnan@gascon.fr/elder'
to=']]>tr&#xe9;ville\40musketeers.bourbon.gov@smtp.example.com<![CDATA['>
to=']]>tr&#xe9;ville\40musketeers.lit@smtp.gascon.fr<![CDATA['>
<body>I recommend my son to you.</body>
</message>
]]></example>
@ -190,26 +175,26 @@
<section2 topic='Native Processing' anchor='bizrules-processing'>
<p>The following processing rules apply to native XMPP implementations:</p>
<ol>
<li>A client SHOULD render an escaped character as its unescaped equivalent when presenting it to a human user (e.g., present \27 as the ' character).</li>
<li>A server or gateway MAY unescape an escaped character for communication with external systems (e.g. LDAP), but only <em>after</em> the Nodeprep profile of stringprep has been applied.</li>
<li>The unescaping transformation MUST be NFKC-safe -- i.e., it MUST conform to Unicode normalization form KC (see Appendix B.3 of <cite>RFC 3454</cite>).</li>
<li>An entity MUST NOT include the unescaped version of a disallowed character over the wire in any XML stanzas sent to another entity.</li>
<li>An entity MUST NOT use the unescaped version of a disallowed character when comparing two JIDs.</li>
<li>The string \20 MUST NOT be the first or last character of an escaped node identifier.</li>
<li>If the string \5c is included in the source address, it too MUST be escaped (to \5c5c).</li>
<li><p>A compliant client MUST render an escaped character as its unescaped equivalent when presenting it to a human user (e.g., present \27 as the ' character), but MAY provide a way for the user to view the escaped JID in its wire format (e.g., to compare two JIDs).</p></li>
<li><p>A server or gateway MAY unescape an escaped character for communication with external systems (e.g. LDAP), but only <em>after</em> the Nodeprep profile of stringprep has been applied.</p></li>
<li><p>The unescaping transformation MUST be NFKC-safe -- i.e., it MUST conform to Unicode normalization form KC (see Appendix B.3 of <cite>RFC 3454</cite>).</p></li>
<li><p>An entity MUST NOT include the unescaped version of a disallowed character over the wire in any XML stanzas sent to another entity (since by definition the unescaped version of a disallowed character violates Nodeprep).</p></li>
<li><p>An entity MUST NOT use the unescaped version of a disallowed character when comparing two JIDs.</p></li>
<li><p>The character sequence \20 MUST NOT be the first or last character of an escaped node identifier.</p></li>
<li><p>If the character sequence \5c is included in the source address, it too MUST be escaped (to \5c5c). <note>It is possible that some existing JIDs already contain character sequences matching "\5chexhex" (where "hexhex" is the hexadecimal value of the Unicode code point for a disallowed character or the backslash character), which may result in confusion between escaped JIDs and their presentation in a client; however, a survey of one large XMPP deployment yielded no instances of such sequences or even of the character sequence "\5c".</note></p></li>
</ol>
</section2>
<section2 topic='Address Transformation Algorithm' anchor='bizrules-algorithm'>
<p>When transforming an unescaped address into an escaped address, an implementation MUST adhere to the following process:</p>
<p>When transforming a non-XMPP address into an escaped JID, an implementation MUST adhere to the following process:</p>
<ol>
<li>If the original address is a URI, it MUST first be properly decoded according to the rules in <cite>RFC 3986</cite> before it is transformed into a JID.</li>
<li>If the original addres is a URI, the URI scheme component MUST be removed.</li>
<li>If there are any instances of strings that correspond to escapings of the disallowed characters (e.g., the string "\27") in the original address, the leading backslash character MUST be escaped to the string "\5c".</li>
<li>All disallowed characters in the original address MUST be properly escaped in the resulting JID (as described above).</li>
<li><p>If the original address is a URI, it MUST first be properly decoded according to the rules in <cite>RFC 3986</cite> before it is transformed into a JID.</p></li>
<li><p>If the original address is a URI, the URI scheme component MUST be removed.</p></li>
<li><p>If there are any instances of character sequences that correspond to escapings of the disallowed characters (e.g., the character sequence "\27") or the escaping character (i.e., the character sequence "\5c") in the original address, the leading backslash character MUST be escaped to the character sequence "\5c" (e.g., resulting in the character sequences "\5c27" or "\5c5c").</p></li>
<li><p>All disallowed characters in the original address MUST be properly escaped in the resulting JID (as described above).</p></li>
</ol>
<p>While the fourth step should be clear from the foregoing text and the second step is necessary since XMPP addresses are not URIs, the meaning of the first and third steps may not be obvious.</p>
<p>Regarding step one, many non-XMPP messaging systems use URIs to identify addresses (examples include the mailto:, sip:, sips:, im:, pres:, and wv: URI schemes) or follow some other encoding rules for an identifier (e.g., an LDAP distinguished name). Before transforming a non-XMPP address or identifier into a JID, the address or identifier MUST first be decoded according the rules specified for that type of address or identifier in order to ensure that the proper characters are transformed.</p>
<p>Regarding step three, it is possible for some non-XMPP addresses to contain strings that correspond to JID-escaped characters (e.g., "\27"). Consider a Wireless Village address of &lt;wv:\3and\2is\5cool@example.com&gt; -- if that address were directly converted into a JID, the resulting XMPP address would be \3and\2is\5cool@example.com, which could be construed as :nd\2is\ool@example.com if JID escaping logic is applied. Therefore the leading \ character and the \ character before the string 5c MUST be converted to the string "\5c" during the transformation, leading to a JID of \5c3and\2is\5c5cool@example.com (which would be presented to a human user as \3and\2is\5cool@example.com).</p>
<p>Regarding step three, it is possible for some non-XMPP addresses to contain character sequences that correspond to JID-escaped characters (e.g., "\27"). Consider a Wireless Village address of &lt;wv:\3and\2is\5cool@example.com&gt; -- if that address were directly converted into a JID, the resulting XMPP address would be \3and\2is\5cool@example.com, which could be construed as :nd\2is\ool@example.com if JID escaping logic is applied. Therefore the leading \ character and the \ character before the character sequence 5c MUST be converted to the character sequence "\5c" during the transformation, leading to a JID of \5c3and\2is\5c5cool@example.com (which would be presented to a human user as \3and\2is\5cool@example.com).</p>
</section2>
<section2 topic='Exceptions' anchor='bizrules-exceptions'>
<p>In order to maintain as much backward compatibility as possible, partial escape sequences and escape sequences corresponding to characters not on the list of disallowed characters MUST be ignored.</p>
@ -218,7 +203,7 @@
<example caption='Invalid escape sequence 2'><strong>foob\41r</strong> is not modified (to <strong>foobAr</strong>) by escaping or unescaping transformations.</example>
</section2>
<section2 topic='JID Escaping vs. Older Methods' anchor='bizrules-othermethods'>
<p>When a client attempts to communicate with another entity through a gateway, it needs to know which escaping mechanism to use. A client MUST assume that the gateway does not support the JID escaping mechanism unless it explicitly discovers support for the <strong>jid\20escaping</strong> [sic] feature via Service Discovery as shown above. If there are any errors in the service discovery exchange or if support for JID escaping is not discovered, the client SHOULD proceed as follows:</p>
<p>When a client attempts to communicate with another entity through a gateway, it needs to know which escaping mechanism to use. A client MUST assume that the gateway does not support the JID escaping mechanism unless it explicitly discovers support for the <strong>jid\20escaping</strong> [sic] feature as described under <link url='#disco'>Determining Support</link>. If there are any errors in the service discovery exchange or if support for JID escaping is not discovered, the client SHOULD proceed as follows:</p>
<ol>
<li>If the gateway supports the 'jabber:iq:gateway' protocol (as specified in &xep0100;), use that protocol.</li>
<li>If the gateway does not support the 'jabber:iq:gateway' protocol, use customary escaping mechanisms (such as transformation of the @ character to the % character).</li>
@ -238,24 +223,24 @@
<section2 topic='Jabber Identifiers' anchor='examples-xmpp'>
<p>The following table shows user input, the escaped JID for sending over the wire, and client display (same as user input) for node identifiers that might possibly be used in native JIDs. The examples are numbered for easy reference. Naturally, a client that does not perform JID escaping would display the JIDs in their escaped form (e.g., "space\20cadet" instead of "space cadet").</p>
<table caption='JID Examples'>
<tr><th>#</th><th>User Input</th><th>JID on the Wire</th><th>Client Display</th></tr>
<tr><td>1</td><td>space cadet</td><td>space\20cadet</td><td>space cadet</td></tr>
<tr><td>2</td><td>call me "ishmael"</td><td>call\20me\20\22ishmael\22</td><td>call me "ishmael"</td></tr>
<tr><td>3</td><td>at&amp;t guy</td><td>at\26t\20guy</td><td>at&amp;t guy</td></tr>
<tr><td>4</td><td>d'artagnan</td><td>d\27artagnan</td><td>d'artagnan</td></tr>
<tr><td>5</td><td>/.</td><td>\2f.</td><td>/.</td></tr>
<tr><td>6</td><td>::foo::</td><td>\3a\3afoo\3a\3a</td><td>::foo::</td></tr>
<tr><td>7</td><td>&lt;foo&gt;</td><td>\3cfoo\3e</td><td>&lt;foo&gt;</td></tr>
<tr><td>8</td><td>user@host</td><td>user\40host</td><td>user@host</td></tr>
<tr><td>9</td><td>c:\net</td><td>c\3a\5cnet</td><td>c:\net</td></tr>
<tr><td>10</td><td>c:\\net</td><td>c\3a\5c\5cnet</td><td>c:\\net</td></tr>
<tr><td>11</td><td>c:\cool stuff</td><td>c\3a\5ccool\20stuff</td><td>c:\cool stuff</td></tr>
<tr><td>12</td><td>c:\5commas</td><td>c\3a\5c5commas</td><td>c:\5commas</td></tr>
<tr><th>#</th><th>User Input</th><th>Escaped JID</th><th>Client Display</th></tr>
<tr><td>1</td><td>space cadet@example.com</td><td>space\20cadet@example.com</td><td>space cadet@example.com</td></tr>
<tr><td>2</td><td>call me "ishmael"@example.com</td><td>call\20me\20\22ishmael\22@example.com</td><td>call me "ishmael"@example.com</td></tr>
<tr><td>3</td><td>at&amp;t guy@example.com</td><td>at\26t\20guy@example.com</td><td>at&amp;t guy@example.com</td></tr>
<tr><td>4</td><td>d'artagnan@example.com</td><td>d\27artagnan@example.com</td><td>d'artagnan@example.com</td></tr>
<tr><td>5</td><td>/.fanboy@example.com</td><td>\2f.fanboy@example.com</td><td>/.fanboy@example.com</td></tr>
<tr><td>6</td><td>::foo::@example.com</td><td>\3a\3afoo\3a\3a@example.com</td><td>::foo::@example.com</td></tr>
<tr><td>7</td><td>&lt;foo&gt;@example.com</td><td>\3cfoo\3e@example.com</td><td>&lt;foo&gt;@example.com</td></tr>
<tr><td>8</td><td>user@host@example.com</td><td>user\40host@example.com</td><td>user@host@example.com</td></tr>
<tr><td>9</td><td>c:\net@example.com</td><td>c\3a\5cnet@example.com</td><td>c:\net@example.com</td></tr>
<tr><td>10</td><td>c:\\net@example.com</td><td>c\3a\5c\5cnet@example.com</td><td>c:\\net@example.com</td></tr>
<tr><td>11</td><td>c:\cool stuff@example.com</td><td>c\3a\5ccool\20stuff@example.com</td><td>c:\cool stuff@example.com</td></tr>
<tr><td>12</td><td>c:\5commas@example.com</td><td>c\3a\5c5commas@example.com</td><td>c:\5commas@example.com</td></tr>
</table>
</section2>
<section2 topic='Email Addresses' anchor='examples-email'>
<p>The address format for an Internet mailbox is specified in <cite>RFC 2822</cite>. The identifier of interest in this context is the "addr-spec" address and more particularly the "dot-atom" rule specified in Section 3.2.4, i.e., the email address shorn of angle brackets, display names, comments, quoted strings, and the like. Because some deployments of XMPP messaging systems may want to re-use existing email addresses as JIDs, it is helpful to define how to transform an email address into a JID.</p>
<p>In general, it is straightforward to transform an email address (i.e., a "dot-atom") into a JID, since traditional email addresses allow US-ASCII characters only rather than the nearly full range of Unicode code points allowed in a JID. <note>This specification does not cover recent efforts to define internationalized email addresses.</note> However, there are three characters allowed in the local-part of an email address that are not allowed in the node identifier portion of a JID: namely, the characters &amp; ' / as described in Sections 3.2.4 and 3.2.5 of <cite>RFC 2822</cite>. In order to transform these characters, a compliant implementation MUST use the methods specified herein.</p>
<p>The address format for an Internet mailbox is specified in <cite>RFC 2822</cite>. The identifier of interest in this context is the "addr-spec" address and more particularly the "dot-atom-text" rule specified in Section 3.2.4, i.e., the email address shorn of angle brackets, display names, comments, quoted strings, and the like. Because some deployments of XMPP messaging systems may want to re-use existing email addresses as JIDs, it is helpful to define how to transform an email address into a JID.</p>
<p>In general, it is straightforward to transform an email address (i.e., a "dot-atom-text") into a JID, since traditional email addresses allow US-ASCII characters only rather than the nearly full range of Unicode code points allowed in a JID. <note>This specification does not cover recent efforts to define internationalized email addresses.</note> However, there are three characters allowed in the local-part of an email address that are not allowed in the node identifier portion of a JID: namely, the characters &amp; ' / as described in Sections 3.2.4 and 3.2.5 of <cite>RFC 2822</cite>. In order to transform these characters, a compliant implementation MUST use the methods specified herein.</p>
<example caption='An Email Address Containing JID-Disallowed Characters'><![CDATA[
here's_a_wild_&_/cr%zy/_address@example.com
]]></example>
@ -265,7 +250,7 @@ here\27s_a_wild_\26_\2fcr%zy\2f_address@example.com
<example caption='The JID as Presented to a User'><![CDATA[
here's_a_wild_&_/cr%zy/_address@example.com
]]></example>
<p>(Note: Because the backslash character is forbidden in the "dot-atom" construction, an email address should not contain a string that corresponds to one of the escaped characters specified in the <link url="#transforms">Transformations</link> section of this document; therefore, no such examples are shown; see below under <link url="#examples-imps">IMPS Addresses</link>.)</p>
<p>(Note: Because the backslash character is forbidden in the "dot-atom-text" construction, an email address should not contain a character sequence that corresponds to one of the escaped characters specified in the <link url="#transforms">Transformations</link> section of this document; therefore, no such examples are shown.)</p>
<p>An email address may also exist in the form of a mailto: URI as specified in &rfc2368;. Before transforming a mailto: URI into a JID, it MUST be URL-decoded and all headers MUST be removed, leaving a mailbox identifier, as shown in the following example.</p>
<example caption='A mailto: URI Containing JID-Disallowed Characters'><![CDATA[
mailto:here%27s_a_wild_%26_%2Fcr%zy%2F_address@example.com?subject=that%20is%20crazy%21
@ -358,7 +343,7 @@ here\27s_a_wild_\26_\2fcr%zy\2f_address_for\3a\3cwv\3e(\22IMPS\22)@example.com
<example caption='The JID as Presented to a User'><![CDATA[
here's_a_wild_&_/cr%zy/_address_for:<wv>("IMPS")@example.com
]]></example>
<p>Unlike the foregoing address types, IMPS addresses are allowed to contain backslashes. This implies that it is possible for an IMPS address to contain a string that corresponds to one of the escaped character representations for code points that are disallowed in XMPP node identifiers. An example would be the IMPS address &lt;wv:\3and\2is\5cool@example.com&gt;, where the string "\3a" could be interpreted as the : character (and the string "\5c" as "\") if that IMPS address is directly converted into a JID. Therefore, the leading \ character MUST be transformed to "\5c" (and the source string "\5c" to "\5c5c") in order to avoid possible ambiguity. Thus the transformed JID would be &lt;\5c3and\2is\5c5cool@example.com&gt;, which would be presented to a user as &lt;\3and\2is\5cool@example.com&gt;.</p>
<p>Unlike the foregoing address types, IMPS addresses are allowed to contain backslashes. This implies that it is possible for an IMPS address to contain a character sequence that corresponds to one of the escaped character representations for code points that are disallowed in XMPP node identifiers. An example would be the IMPS address &lt;wv:\3and\2is\5cool@example.com&gt;, where the character sequence "\3a" could be interpreted as the : character (and the character sequence "\5c" as "\") if that IMPS address is directly converted into a JID. Therefore, the leading \ character MUST be transformed to "\5c" (and the source character sequence "\5c" to "\5c5c") in order to avoid possible ambiguity. Thus the transformed JID would be &lt;\5c3and\2is\5c5cool@example.com&gt;, which would be presented to a user as &lt;\3and\2is\5cool@example.com&gt;.</p>
<p>If an IMPS address contains a private resource, a gateway between XMPP and IMPS should process the resource and append it to the end of the JID; however, such gateway behavior is out of scope for this document.</p>
<p>The foregoing example showed how to transform a wv: URI into a JID. However, it also may be necessary to convert a JID into a wv: URI, as shown in the following example.</p>
<example caption='User Enters Address, Including Disallowed Characters'><![CDATA[
@ -373,7 +358,7 @@ wv:here%27s_a_wild_%26_%2Fcr%zy%2F_address_for%3A%3Cwv%3E%28%22IMPS%22%29@exampl
</section2>
<section2 topic='LDAP Distinguished Names' anchor='examples-ldap'>
<p>Within the Lightweight Directory Access Protocol (see &rfc2251;), a "distinguished name" (DN) is a hierarchically-organized string representation that uniquely identifies a user, system, or organization. It is possible that some messaging systems use LDAP distinguished names to identify entities that can communicate using the system (e.g., this is reputed to be the case for certain releases of the Lotus Sametime system sold by IBM), and in any case it may be helpful to transform an LDAP distinguished name into an XMPP address for identification or addressing purposes.</p>
<p>As previously mentioned, a UTF-8 string representation of LDAP distinguished names is specified in <cite>RFC 2253</cite>. This representation specifies that the characters , + " \ &lt; &gt; ; are to be escaped with the backslash character (e.g., the string "\," would be used to escape the , character) and that any other non-US-ASCII characters are to be escaped using a string of the form "\xx".</p>
<p>As previously mentioned, a UTF-8 string representation of LDAP distinguished names is specified in <cite>RFC 2253</cite>. This representation specifies that the characters , + " \ &lt; &gt; ; are to be escaped with the backslash character (e.g., the character sequence "\," would be used to escape the , character) and that any other non-US-ASCII characters are to be escaped using a character sequence of the form "\xx".</p>
<p>The following example shows a distinguished name (and transformations thereof) for a person whose common name is "D'Artagnan Saint-Andr&#233;" and who is associated with an organization called "Example &amp; Company, Inc." whose domain name is "example.com":</p>
<example caption='A Distinguished Name'>
CN=D'Artagnan Saint-Andr&#xe9;,O=Example &amp; Company, Inc.,DC=example,DC=com
@ -414,11 +399,37 @@ somenick!user\22\26\27\2f\3a\3c\3e\5c3address@example.com
<example caption='The JID as Presented to a User'><![CDATA[
somenick!user"&'/:<>\3address@example.com
]]></example>
<p>Like IMPS addresses, IRC addresses are allowed to contain backslashes. This implies that it is possible for an IMPS address to contain a string that corresponds to one of the escaped character representations for code points that are disallowed in XMPP node identifiers. An example is shown above.</p>
<p>Like IMPS addresses, IRC addresses are allowed to contain backslashes. This implies that it is possible for an IMPS address to contain a character sequence that corresponds to one of the escaped character representations for code points that are disallowed in XMPP node identifiers. An example is shown above.</p>
</section2>
</section1>
<section1 topic='Determining Support' anchor='disco'>
<p>If an entity needs to determine whether another entity supports JID escaping, it MUST send a disco#info request to the other entity as specified in &xep0030;.</p>
<example caption='Client requests features'><![CDATA[
<iq type='get'
from='porthos@musketeers.lit/gate'
to='irc.shakespeare.lit'
id='info1'>
<query xmlns='http://jabber.org/protocol/disco#info'/>
</iq>
]]></example>
<p>If the queried entity supports JID escaping, it MUST return a <strong>jid\20escaping</strong> [sic] feature in its reply.</p>
<example caption='Service responds with features'><![CDATA[
<iq type='get'
to='porthos@musketeers.lit/gate'
from='irc.shakespeare.lit'
id='info1'>
<query xmlns='http://jabber.org/protocol/disco#info'>
...
<feature var='jid\20escaping'/>
</query>
</iq>
]]></example>
</section1>
<section1 topic='Security Considerations' anchor='security'>
<p>If an entity (e.g., a client or a gateway) performs JID escaping, it MUST do so consistently (for example, a client or server MUST consistently apply JID escaping and unescaping to the JIDs it handles) so that the entity does not present the same JID in two different ways or present two different JIDs in the same way.</p>
<p>Naturally, if one entity performs JID escaping and another entity does not perform JID escaping, the same JID could be presented differently by those entities (e.g., the JID d\27artagnan@musketeers.lit would be presented as d'artagnan@musketeers.lit by a client that performs JID escaping but as d\27artagnan@musketeers.lit by a client that does not perform JID escaping). By the same token, two different JIDs could be presented in the same way by those entities (e.g., the JID foo\5cbar@example.com would be presented as foo\bar@example.com by a client that performs JID escaping and the JID foo\bar@example.com would be presented as foo\bar@example.com by a client that does not perform JID escaping). These differing presentations could be a source of confusion (e.g., the same human user could use two different clients, one of which performs JID escaping and one of which does not). This confusion may have security implications since in rare instances messages and other information could be directed to an entity other than the intended recipient; unfortunately, this is unavoidable until all XMPP clients support JID escaping.</p>
<p>An entity that performs JID escaping MUST NOT compare unescaped versions, otherwise messages and other information could be directed to an entity other than the intended recipient.</p>
<p>An entity that transforms a non-XMPP address into a JID MUST follow the algorithm specified in the <link url="#bizrules-algorithm">Address Transformation Algorithm</link> section of this document, otherwise messages and other information could be directed to an entity other than the intended recipient.</p>
</section1>