git-svn-id: file:///home/ksmith/gitmigration/svn/xmpp/trunk@972 4b5297f7-1745-476d-ba37-a9c6900126ab
This commit is contained in:
Peter Saint-Andre 2007-06-19 01:45:15 +00:00
parent 3101af385a
commit c34d297869
1 changed files with 14 additions and 14 deletions

View File

@ -23,8 +23,8 @@
&hildjj;
&stpeter;
<revision>
<version>1.1pre3</version>
<date>in progress, last updated 2007-06-13</date>
<version>1.1</version>
<date>2007-06-18</date>
<initials>psa</initials>
<remark><p>Specified that \20 must not be included at the beginning or end of a node identifier; added security consideration regarding potential confusion caused by mismatch between software that does and software that does not perform JID escaping; added note about existing native JIDs that contain escaped characters; added mapping for IRC addresses; modified terminology to consistently use the terms escaping and unescaping rather than the terms encoding and decoding.</p></remark>
</revision>
@ -109,7 +109,7 @@
<section1 topic='Transformations' anchor='transforms'>
<section2 topic='Concepts' anchor='concepts'>
<p>This document specifies that each disallowed character shall be escaped as \hexhex -- where "hexhex" is the hexadecimal value of the Unicode code point in question, ignoring the leading "00" in the code point (e.g., 27 for the ' character, resulting in an escaping of \27).</p>
<p>If the &amp; character had not been in the list of disallowed characters, then normal XML escaping conventions (as specified in &w3xml;) could have been used, with the result that D'Artagnan (for example) could have been rendered as D&amp;apos;artagnan [sic]. Since there are good reasons why &amp; is a disallowed character, another escaping mechanism is needed.</p>
<p>If the &amp; character had not been in the list of disallowed characters, then normal XML escaping conventions (as specified in &w3xml;) could have been used, with the result that D'Artagnan (for example) could have been rendered as D&amp;apos;artagnan [sic].</p>
<p>It might have been desirable to use percent-encoding (e.g., %27 for the ' character) as specified in Section 2.1 of &rfc3986;. However, that approach was rejected since the % character is an often-used character in existing JIDs (e.g., to replace the @ character in gateway addresses) and the resulting ambiguity would have caused misdelivered or undeliverable messages.</p>
<p>To avoid the problems associated with using &amp; or % as the escaping character, this document specifies a new escaping mechanism that uses the backslash character ("\") followed by "hexhex" (the hexadecimal value of the Unicode code point in question). This escaping method is quite similar to that used for disallowed characters in LDAP distinguished names (see &rfc2253;) but is used only for the characters that are disallowed in XMPP node identifiers (as well as the escaping character itself in certain special situations).</p>
<p>Here is an example of an escaped JID (this would be displayed but never natively transported as "d'artagnan@musketeers.lit"):</p>
@ -160,7 +160,7 @@
<tr><td>\40</td><td>@</td></tr>
<tr><td>\5c</td><td>\</td></tr>
</table>
<p>In the following example, D'Artagnan the elder sends a message through an SMTP mail gateway (the JID is "tr&#233;ville\40musketeers.lit@smtp.gascon.fr" and the destination email address is "tr&#233;ville@musketeers.lit").</p>
<p>In the following example, D'Artagnan the elder sends a message through an SMTP mail gateway (the JID is "treville\40musketeers.lit@smtp.gascon.fr" and the destination email address is "treville@musketeers.lit").</p>
<example caption="JID Unescaping"><![CDATA[
<message
from='d\27artagnan@gascon.fr/elder'
@ -177,20 +177,20 @@
<ol>
<li><p>A compliant client MUST render an escaped character as its unescaped equivalent when presenting it to a human user (e.g., present \27 as the ' character), but MAY provide a way for the user to view the escaped JID in its wire format (e.g., to compare two JIDs).</p></li>
<li><p>A server or gateway MAY unescape an escaped character for communication with external systems (e.g. LDAP), but only <em>after</em> the Nodeprep profile of stringprep has been applied.</p></li>
<li><p>The unescaping transformation MUST be NFKC-safe -- i.e., it MUST conform to Unicode normalization form KC (see Appendix B.3 of <cite>RFC 3454</cite>).</p></li>
<li><p>An entity MUST unescape only the specified sequences and MUST NOT unescape sequences that do not match the specified sequences.</p></li>
<li><p>An entity MUST NOT include the unescaped version of a disallowed character over the wire in any XML stanzas sent to another entity (since by definition the unescaped version of a disallowed character violates Nodeprep).</p></li>
<li><p>An entity MUST NOT use the unescaped version of a disallowed character when comparing two JIDs.</p></li>
<li><p>The character sequence \20 MUST NOT be the first or last character of an escaped node identifier.</p></li>
<li><p>If the character sequence \5c is included in the source address, it too MUST be escaped (to \5c5c). <note>It is possible that some existing JIDs already contain character sequences matching "\5chexhex" (where "hexhex" is the hexadecimal value of the Unicode code point for a disallowed character or the backslash character), which may result in confusion between escaped JIDs and their presentation in a client; however, a survey of one large XMPP deployment yielded no instances of such sequences or even of the character sequence "\5c".</note></p></li>
<li><p>If there are any instances of character sequences that correspond to escapings of the disallowed characters (e.g., the character sequence "\27") or the escaping character (i.e., the character sequence "\5c") in the unescaped address, the leading backslash character MUST be escaped to the character sequence "\5c" (e.g., resulting in the character sequences "\5c27" or "\5c5c"). <note>It is possible that some existing JIDs already contain character sequences matching "\5chexhex" (where "hexhex" is the hexadecimal value of the Unicode code point for a disallowed character or the backslash character), which may result in confusion between escaped JIDs and their presentation in a client; however, a survey of one large XMPP deployment yielded no instances of such sequences or even of the character sequence "\5c".</note></p></li>
</ol>
</section2>
<section2 topic='Address Transformation Algorithm' anchor='bizrules-algorithm'>
<p>When transforming a non-XMPP address into an escaped JID, an implementation MUST adhere to the following process:</p>
<p>When transforming a non-XMPP ("source") address into an escaped JID, an implementation MUST adhere to the following process:</p>
<ol>
<li><p>If the original address is a URI, it MUST first be properly decoded according to the rules in <cite>RFC 3986</cite> before it is transformed into a JID.</p></li>
<li><p>If the original address is a URI, the URI scheme component MUST be removed.</p></li>
<li><p>If there are any instances of character sequences that correspond to escapings of the disallowed characters (e.g., the character sequence "\27") or the escaping character (i.e., the character sequence "\5c") in the original address, the leading backslash character MUST be escaped to the character sequence "\5c" (e.g., resulting in the character sequences "\5c27" or "\5c5c").</p></li>
<li><p>All disallowed characters in the original address MUST be properly escaped in the resulting JID (as described above).</p></li>
<li><p>If the source address is a URI, it MUST first be properly decoded according to the rules in <cite>RFC 3986</cite> before it is transformed into a JID.</p></li>
<li><p>If the source address is a URI, the URI scheme component MUST be removed.</p></li>
<li><p>If there are any instances of character sequences that correspond to escapings of the disallowed characters (e.g., the character sequence "\27") or the escaping character (i.e., the character sequence "\5c") in the source address, the leading backslash character MUST be escaped to the character sequence "\5c" (e.g., resulting in the character sequences "\5c27" or "\5c5c").</p></li>
<li><p>All disallowed characters in the source address MUST be properly escaped in the resulting JID (as described above).</p></li>
</ol>
<p>While the fourth step should be clear from the foregoing text and the second step is necessary since XMPP addresses are not URIs, the meaning of the first and third steps may not be obvious.</p>
<p>Regarding step one, many non-XMPP messaging systems use URIs to identify addresses (examples include the mailto:, sip:, sips:, im:, pres:, and wv: URI schemes) or follow some other encoding rules for an identifier (e.g., an LDAP distinguished name). Before transforming a non-XMPP address or identifier into a JID, the address or identifier MUST first be decoded according the rules specified for that type of address or identifier in order to ensure that the proper characters are transformed.</p>
@ -224,9 +224,9 @@
<p>The following table shows user input, the escaped JID for sending over the wire, and client display (same as user input) for node identifiers that might possibly be used in native JIDs. The examples are numbered for easy reference. Naturally, a client that does not perform JID escaping would display the JIDs in their escaped form (e.g., "space\20cadet" instead of "space cadet").</p>
<table caption='JID Examples'>
<tr><th>#</th><th>User Input</th><th>Escaped JID</th><th>Client Display</th></tr>
<tr><td>1</td><td>space cadet@example.com</td><td>space\20cadet@example.com</td><td>space cadet@example.com</td></tr>
<tr><td>2</td><td>call me "ishmael"@example.com</td><td>call\20me\20\22ishmael\22@example.com</td><td>call me "ishmael"@example.com</td></tr>
<tr><td>3</td><td>at&amp;t guy@example.com</td><td>at\26t\20guy@example.com</td><td>at&amp;t guy@example.com</td></tr>
<tr><td>1</td><td>space&#160;cadet@example.com</td><td>space\20cadet@example.com</td><td>space&#160;cadet@example.com</td></tr>
<tr><td>2</td><td>call&#160;me&#160;"ishmael"@example.com</td><td>call\20me\20\22ishmael\22@example.com</td><td>call&#160;me&#160;"ishmael"@example.com</td></tr>
<tr><td>3</td><td>at&amp;t&#160;guy@example.com</td><td>at\26t\20guy@example.com</td><td>at&amp;t&#160;guy@example.com</td></tr>
<tr><td>4</td><td>d'artagnan@example.com</td><td>d\27artagnan@example.com</td><td>d'artagnan@example.com</td></tr>
<tr><td>5</td><td>/.fanboy@example.com</td><td>\2f.fanboy@example.com</td><td>/.fanboy@example.com</td></tr>
<tr><td>6</td><td>::foo::@example.com</td><td>\3a\3afoo\3a\3a@example.com</td><td>::foo::@example.com</td></tr>