From 76efa4a6946a7ae526e8916b5fa22aca7406c491 Mon Sep 17 00:00:00 2001
From: Peter Saint-Andre Cleaned up text and examples, and added material about the HTTP bindings (currently only BOSH, with WebSocket to be added in a future revision). Rough draft. Establishing an XMPP session requires a fairly large number of round trips between the initiating entity and the receiving entity. In many deployment scenarios, it would be helpful to reduce the number of round trips and, in general, the time needed to establish a session. This document defines protocols and best practices to do just that. Note: Various parts of this document might be moved to separate documents at some point. Establishing an XMPP session requires a fairly large number of round trips between the initiating entity and the receiving entity. In many deployment scenarios, it would be helpful to reduce the number of round trips and therefore the time needed to establish a session. This document defines protocols and best practices to do just that. In accordance with &rfc6120;, before attempting to establish a stream the initiating entity needs to determine the IP address and port at which to connect, usually by means of DNS lookups as described in Section 3.2 of RFC 6120. Implementations SHOULD cache the results of DNS lookups in order to avoid this step whenever possible. In accordance with &rfc6120;, before attempting to establish a stream over TCP the initiating entity needs to determine the IP address and port at which to connect, usually by means of DNS lookups as described in Section 3.2 of RFC 6120. Implementations SHOULD cache the results of DNS lookups in order to avoid this step whenever possible. Similar considerations apply to connections established over one of the HTTP bindings, i.e., either BOSH (see &xep0124; and &xep0206;) or WebSocket (see &rfc6455; and &xmppoverwebsocket;). XMPP applications SHOULD cache whatever information they can about the peer, especially stream features data and &xep0030; information. To facilitate such caching, servers SHOULD include &xep0115; data in stream features as shown in Section 6.3 of XEP-0115. Note that for maximum benefit the server MUST include all of the stream features it supports in its replies to "disco#info" queries (i.e., not advertise such features only during stream establishment). XMPP clients SHOULD cache roster information, and servers SHOULD make such caching possible, using &xep0237; as subsequently included in Section 2.1.1 of &rfc6121;. One method of speeding the connection process is pipelining of requests, as in &rfc2920; and the QUICKSTART extension proposed for SMTP (&smtpquickstart;). The application of similar principles to XMPP was originally suggested by Tony Finch in February 2008 <http://mail.jabber.org/pipermail/standards/2008-February/017966.html>. The primary method of speeding the connection process is pipelining of requests, as in &rfc2920; and the QUICKSTART extension proposed for SMTP (&smtpquickstart;). The application of similar principles to XMPP was originally suggested by Tony Finch in February 2008 <http://mail.jabber.org/pipermail/standards/2008-February/017966.html>. In essence, pipelining relies on two assumptions: Together, these assumptions enable the parties to reduce the number of round trips needed to complete the stream negotiation process. Note well that pipelining at the XMPP layer is not to be confused with HTTP pipelining, which was added to HTTP in version 1.1 and which is not encouraged when using the HTTP bindings for XMPP. If an XMPP server supports pipelining, it MUST advertise a stream feature of <pipelining xmlns='urn:xmpp:features:pipelining'/>. If both parties support pipelining, they can proceed as follows (the examples use the XML from Section 9.1 of RFC 6120 for the client-server stream establishment, but the same principles apply to server-to-server streams). In Step 1, the client assumes that the server supports the XMPP STARTTLS extension so it pipelines its initial stream header, the <starttls/> command, and the TLS ClientHello message.
If both parties support pipelining, they can proceed as follows over the TCP binding (the examples use the XML from Section 9.1 of RFC 6120 for the client-server stream establishment, but the same principles apply to server-to-server streams).
+In the client-to-server half of the first exchange, the client assumes that the server supports the XMPP STARTTLS extension so it pipelines its initial stream header, the <starttls/> command, and the TLS ClientHello message.
+In Step 2, the server pipelines its response stream header, stream features advertisement, STARTTLS <proceed/> response, and TLS ServerHello messages (which might include ServerHello, Certificate, ServerKeyExchange, CertificateRequest, and ServerHelloDone -- see &rfc5246; for details).
-In Step 3, the parties complete the TLS negotiation.
-In Step 4, the server knows that the client will need to restart the stream so it proactively attaches its response stream header and stream features after the TLS Finished message.
-Now the parties complete the TLS negotiation; for our purposes we don't count these round trips because they are the same no matter whether we use pipelining or not (i.e., some combination of the TLS messages specified in RFC 5246).
+At the end of the TLS negotiation, the server knows that the client will need to restart the stream so it proactively attaches its response stream header and stream features in the same TCP packet at the TLS Finished message, thus starting the next exchange.
+In Step 5, the client pipelines its initial stream header with the command for initiating the SASL authentication process (including SASL "initial response" data as explained in Section 6.3.10 of RFC 6120 to reduce the number of round trips).
-At this point the client and server might exchange multiple SASL-related messages, depending on the SASL mechanism in use. This specification does not attempt to reduce the number of round trips involved in the challenge-response sequence.
-When the client suspects that it is sending its final SASL response, it SHOULD append an initial stream header and resource binding request.
-At this point the client and server might exchange multiple SASL-related messages, depending on the SASL mechanism in use. Because this specification does not attempt to reduce the number of round trips involved in the challenge-response sequence, we do not describe these exchanges here.
+When the client suspects that it is sending its final SASL response, with pipelining it appends an initial stream header and resource binding request.
+In Step 8, the server informs the client of SASL success (including "additional data with success" as explained in Section 6.3.10 of RFC 6120 to reduce the number of round trips), sends a response stream header and stream features, and informs the client of successful resource binding.
-The XMPP stream negotiation process in RFC 6120 required at least 19 round trips (including 4 for TLS negotiation). With pipelining, the number of round trips is reduced to 8.
+Without pipelining, this exchange would require another 3 round trips; with pipelining it requires only 1.
+Therefore, without pipelining the XMPP exchanges for stream establishment require at least 6 round trips (and perhaps more depending on the SASL mechanism used); with pipelining the minimum number of round trips is 3.
Naturally, for typical client-to-server sessions, additional round trips are needed so that the client can gather service discovery information, retrieve the roster, etc. As noted, these steps can be reduced or eliminated by using entity capabilities and roster versioning.
In the HTTP bindings (BOSH and WebSocket) channel encryption occurs at the HTTP layer and therefore the first exchange shown above for the TCP binding is not used.
+For now, this section focuses on BOSH. A future version of this document will discuss WebSocket (once draft-moffitt-xmpp-over-websocket has been updated to include examples).
+When pipelining is used, a BOSH client can include its XMPP authentication (SASL) request in the BOSH session creation request, as shown in the following example.
+