%ents; ]>
Data Sequencing This specification defines an XMPP extension that enables a requesting entity to receive a large data set only if the set has changed; the primary use case is sequencing of roster changes for more efficient downloading of the roster information. &LEGALNOTICE; 0237 Experimental Standards Track Standards Council XMPP Core XMPP IM NOT_YET_ASSIGNED &stpeter; 0.4 2008-09-17 psa

Defined new namespace and generalized to handle service discovery and other use cases in addition to rosters.

0.3 2008-04-21 psa

Defined protocol solely in terms of full rosters and roster pushes (no more roster diffs); added implementation notes; clarified server behavior if cached version is unavailable.

0.2 2008-03-06 psa

Renamed to data sequencing; clarified server behavior.

0.1 2008-03-05 psa

Initial published version; per Council consensus, removed optionality regarding semantics of the version attribute.

0.0.3 2008-03-05 psa

Corrected semantics of version attribute (should be a strictly increasing sequence number but may be any unique identifier).

0.0.2 2008-03-04 psa

Clarified description of roster diff; added diff attribute and specified its use in roster results; specified use of version attribute in roster pushes.

0.0.1 2008-03-04 psa

First draft.

Certain XMPP technologies can return large data sets to users (examples are rosters as specified in &xmppim; and item lists as specified in &xep0030;). Although &xep0059; provides a generic way to page through such data sets, it does not provide a way to learn if the data set has changed since it was last retrieved. If the client could cache the data set (e.g., the roster) and retrieve only changes to the data set, certain use cases (e.g., the login process) could be significantly streamlined. This feature might be especially valuable over low-bandwidth connections such as those common in mobile environments. This document defines a method for such streamlining, via the concept of data sequencing.

This document defines a <seq/> element qualified by the 'urn:xmpp:tmp:seq' namespace &NSNOTE;. This element can be included in any IQ request that might result in a large data set. Because only one child element is allowed in an IQ stanza, the <seq/> element MUST be included as a child of the payload element (i.e., as a grandchild of the IQ stanza).

The <seq/> element is defined as empty (except when used to advertise a stream feature). It possesses a single attribute: 'num'.

The value of the 'num' attribute MUST be a non-negative integer representing a strictly increasing sequence number that is increased (but not necessarily incremented-by-one) with any change to the data set.

If a client supports data sequencing and knows that the server does so (see Determining Support), it SHOULD include the <seq/> element in its request for the roster, where the 'num' attribute is set to the sequence number associated with its last cache of the roster.

]]>

If the client has not yet cached the roster or the cache is lost or corrupted, but the client wishes to bootstrap the use of data sequencing, it SHOULD include the <seq/> element with the 'num' attribute set to a value of zero (0).

Naturally, if the client does not support data sequencing or does not wish to bootstrap use of data sequencing, it will behave like an RFC-3921-compliant client by not including the <seq/> element.

If the roster has not changed since the version enumerated by the client, the server MUST return an empty IQ-result.

]]>

If the roster has changed since the version enumerated by the client, the server MUST return a &QUERY; element that includes the latest sequence number.

The &QUERY; element MUST either contain the complete roster (including the sequence number to indicate that the roster has changed) or be empty (indicating that roster changes will be sent as interim roster pushes).

In general, if returning the complete roster would use less bandwidth than sending individual roster pushes to the client (e.g., if the roster contains only a few items), the server SHOULD return the complete roster.

Servants ]]>

However, if returning the complete roster would use more bandwidth than sending individual roster pushes to the client (e.g., if the roster contains many items, only a few of which have changed), the server SHOULD return an empty &QUERY; element, then send individual roster pushes.

]]> ]]>

The interim roster pushes can be understood as follows:

  1. Imagine that the client had an active presence session for the entire time between its cached roster version (in this case, 305) and the new roster version (317).
  2. During that time, the client might have received roster pushes related to data sequence numbers 306, 307, 310, 311, 313, 314, 315, and 317 (the sequence numbers must be strictly increasing but there is no requirement that the sequence shall be continuous).
  3. However, some of those roster pushes might have contained intermediate updates to the same roster item (e.g., changes in the subscription state for bill@shakespeare.lit from "none" to "to" and from "to" to "both").
  4. The interim roster pushes would not include all of the intermediate steps, only the final result of all changes applied while the client was in fact offline.

The client can determine when the interim roster pushes have ended by comparing the sequence number it received on the empty &QUERY; element against the sequence number it receives in roster pushes.

When the server sends subsequent roster pushes to the client, it MUST include the updated data sequence number. Roster pushes MUST occur in sequence order. The sequence number contained in a roster push MUST be unique. A "change to the roster" is any addition of, update to, or removal of a roster item that would result in a roster push, including changes in subscription states, as described in RFC 3921 or rfc3921bis.

]]>

If the requesting supports data sequencing and knows that another entity does so (see Determining Support), it MAY include the <seq/> element in its disco#items request, where the 'num' attribute is set to the sequence number associated with its last cache of the items.

]]>

As above, if the requesting entity has not yet cached the data set (or the cache is lost or corrupted) but wishes to bootstrap the use of data sequencing, it SHOULD include the <seq/> element with the 'num' attribute set to a value of zero (0).

If the set of disco items has not changed since the version enumerated by the requesting entity, the responding entity MUST return an empty IQ-result.

]]>

If the set of disco items has changed since the version enumerated by the client, the server MUST return a &QUERY; element that includes the latest sequence number.

The &QUERY; element MUST either contain the complete set of items (including the sequence number to indicate that the set has changed) or be empty (indicating that changes will be sent as notifications as specified in &xep0230;).

In general, if returning the complete set of items would use less bandwidth than sending individual notifications (e.g., if the set contains only a few items), the server SHOULD return the complete set.

]]>

However, if returning the complete set would use more bandwidth than sending individual notifications (e.g., if the complete set contains many items, only a few of which have changed), the server SHOULD return an empty &QUERY; element, then send individual notifications.

]]> ]]>

The client can determine when the interim notifications have ended by comparing the sequence number it received on the empty &QUERY; element against the sequence number it receives in the notifications.

When the responding entity sends subsequent notifications to the requesting entity, it MUST include the updated sequence number. Notifications MUST occur in sequence order. The sequence number contained in a notification MUST be unique.

]]>

If a server supports data sequencing, it MUST inform the connecting entity when returning stream features during the stream negotiation process; at the latest, when informing a client that resource binding is required. This is done by including a <seq/> element qualified by the 'urn:xmpp:tmp:seq' namespace &NSNOTE;.

]]>

In order for an application to determine whether an entity supports this protocol, where possible it SHOULD use the dynamic, presence-based profile of service discovery defined in &xep0115;. However, if an application has not received entity capabilities information from an entity, it SHOULD use explicit service discovery instead.

It is possible that caching of data sets (rather than holding them in memory only for the life of the session) could introduce new vulnerabilities. Implementations are advised to appropriately protect cached data sets.

This document requires no interaction with &IANA;.

Until this specification advances to a status of Draft, the associated namespace for its stream feature shall be "urn:xmpp:tmp:seq". Upon advancement of this specification, the ®ISTRAR; shall issue a permanent namespace in accordance with the process defined in Section 4 of &xep0053;; the requested namespace is "urn:xmpp:seq", which is thought to be unique per the XMPP Registrar's requirements.

]]>

Thanks to Dave Cridland, Richard Dobson, Fabio Forno, Alexander Gnauck, Juha Hartikainen, Joe Hildebrand, Justin Karneges, and Pedro Melo for their comments.