%ents; ]>
BOSH Script Syntax This specification provides historical documentation regarding the "alternative script syntax" first defined in Version 1.6 of XEP-0124. &LEGALNOTICE; 0252 Deferred Historical Standards Council XMPP Core XEP-0124 NOT_YET_ASSIGNED &ianpaterson; 0.1 2008-10-31 psa

Initial published version.

0.0.1 2008-08-14 ip

Extracted from XEP-0124 version 1.6.

The cross domain security restrictions of some runtime environments permit clients to access pure XML text only if it was received from a specific server (e.g., the hostname a Web client was downloaded from). Surprisingly, the same environments typically permit clients to receive and execute scripts from any server. The Security Considerations section below describes the significant risks of deploying Script Syntax.

To enable domain-restricted clients to use BOSH with any connection manager, this section proposes an optional alternative to the standard "BOSH Pure Syntax" defined in &xep0124;. The "BOSH Script Syntax" defined here essentially inserts each <body/> element sent by the client into an HTTP GET header instead of into the body of a POST request. Each <body/> element sent by the connection manager is wrapped inside an &ecma262; string and function call. No changes to the <body/> element or to any other aspects of the BOSH protocol are needed.

If, and only if, a client is unable to use the Pure Syntax defined in XEP-0124, then it MAY instead request the use of the Script Syntax defined herein.

A BOSH client can send a session request to a BOSH connection manager for using Script Syntax instead of Pure Syntax.

If the connection manager supports Script Syntax then it MUST send its Session Creation Response using Script Syntax, and all subsequent client requests and connection manager responses within the session MUST be sent using Script Syntax. If the connection manager does not support the "BOSH Script" syntax then it SHOULD return either an 'item-not-found' terminal binding error (in Script Syntax) or an HTTP 404 (Not Found) error in response to the client's session request.

Note: The line break in the body of the HTTP response in the following example is included only to improve readability. In practice there MUST be no line breaks.

") ]]>

Clients MUST make the following changes to convert their requests to Script Syntax:

  1. Certain octets within the UTF-8 encoded <body/> element SHOULD be replaced according to the rules for escaping octets within URIs defined by &rfc3986;. Therefore all octets except those representing 7-bit alphanumeric characters or the characters -._~!$&'()*+,;=:@/? should be substituted with a character triplet, consisting of the percent character "%" followed by the two hexadecimal digits that represent the value of the octet.

  2. A '?' character and the URI-encoded <body/> element MUST be appended to the URI at which the connection manager is operating within its server.

  3. The resulting URI MUST be sent to the connection manager within an HTTP GET request.

  4. Include extra HTTP headers to prevent request/response caching or storage by any intermediary.

Note: All whitespace between "GET " and " HTTP/1.1" in the HTTP GET header lines in the following two examples is included only to improve readability. In practice there MUST be no whitespace.

Although RFC 2616 does not limit the length of HTTP URIs, the runtime environment of the client might restrict the length of the URI that it can include in each GET request. Internet Explorer versions 4.0 thru 7.0 have a maximum path length of 2048 characters and a maximum URL length of 2083 characters. Other popular browsers appear to have no limit. In these cases the client MUST reduce the content of the <body/> element accordingly and send the remaining content in subsequent HTTP GET requests wrapped in new <body/> elements (with incremented 'rid' attributes). This is possible since, unlike Pure Syntax, with Script Syntax the connection manager MUST treat the string of characters between the opening and closing <body> tags of each request as part of an octet stream instead of as a set of complete XML stanzas. The content of any one <body/> element MUST NOT be parsed in isolation from the rest of the stream.

Connection managers MUST make the following changes to convert their responses to Script Syntax:

1. Certain characters within the <body/> element MUST be replaced according to the rules for escaping characters within strings defined by ECMAScript. The necessary substitutions are summarised in the table below.

CharacterUnicode Code Point ValueEscape sequence
"U+0022\"
Line Feed (New Line)U+000A\n
Carriage ReturnU+000D\r
Line SeparatorU+2028\u2028
Paragraph SeparatorU+2029\u2029
\U+005C\\

Each Unicode format-control character (i.e., the characters in category "Cf" in the Unicode Character Database, e.g., LEFT-TO-RIGHT MARK or RIGHT-TO-LEFT MARK) MUST also be substituted by its Unicode escape sequence (e.g. \u200e or \u200f).

2. The following eight characters MUST be prepended to the <body/> element:

_BOSH_("

3. The following two characters MUST be appended to the <body/> element:

")

4. If the client request does not possess a 'content' attribute, then the HTTP Content-Type header of responses MUST be either "text/javascript; charset=utf-8" or "application/x-javascript; charset=utf-8".

5. Include extra HTTP headers to prevent caching or storage by any intermediary.

Note: All line breaks in the bodies of the HTTP responses in the following two examples are included only to improve readability. In practice there MUST be no line breaks.

") ]]> \n \n I said \"Hi!\"\n\n") ]]>

Note: As with Pure Syntax, each <body/> element sent to the client MUST encapsulate zero or more complete XML stanzas.

Any connection manager (not only legacy connection managers) can indicate that it does not support Script Syntax by sending an HTTP 404 error code (instead of sending a bad-request error using Scrypt Syntax).

To avoid the storage of private communications by third parties, when using the alternative Script Syntax connection managers MUST (and clients SHOULD) include all the appropriate HTTP/1.0 and/or HTTP/1.1 headers necessary to ensure as far as possible that no request or response will ever be cached or stored by any intermediary.

The alternative Script Syntax returns code for the client to execute. This code is typically executed immediately without any validation and with the same rights as the code of the client itself. This vulnerability could be exploited to steal passwords and private keys, or to fabricate messages sent from and received by the client, or to forward or modify priviledged information on the servers to which the client has access, or to interfere with any aspect of the client's functionality -- limited only by the extent of the runtime environment ("sandbox"), by the extent that naive users can be tricked into doing things outside that environment, and by the attacker's fertile imagination. Therefore, although the client could use Script Syntax with any connection manager on the network, in practice it MUST take care to employ it only with connection managers that the client's user trusts (as much as the server from which the client was downloaded). To prevent a-man-in-the-middle from manipulating the code clients SHOULD only use Script Syntax over encrypted connections (see above). If the client was downloaded over an encrypted connection then it MUST NOT use Script Syntax over connections that are not encrypted.