%ents; ]>
Using Efficient XML Interchange (EXI) Format in XMPP This specification describes how EXI compression can be used in XMPP networks. This XMPP Extension Protocol is copyright (c) 1999 - 2013 by the XMPP Standards Foundation (XSF). Permission is hereby granted, free of charge, to any person obtaining a copy of this specification (the "Specification"), to make use of the Specification without restriction, including without limitation the rights to implement the Specification in a software program, deploy the Specification in a network service, and copy, modify, merge, publish, translate, distribute, sublicense, or sell copies of the Specification, and to permit persons to whom the Specification is furnished to do so, subject to the condition that the foregoing copyright notice and this permission notice shall be included in all copies or substantial portions of the Specification. Unless separate permission is granted, modified works that are redistributed shall not contain misleading information regarding the authors, title, number, or publisher of the Specification, and shall not claim endorsement of the modified works by the authors, any organization or project to which the authors belong, or the XMPP Standards Foundation. ## NOTE WELL: This Specification is provided on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. In no event shall the XMPP Standards Foundation or the authors of this Specification be liable for any claim, damages, or other liability, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with the Specification or the implementation, deployment, or other use of the Specification. ## In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall the XMPP Standards Foundation or any author of this Specification be liable for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising out of the use or inability to use the Specification (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if the XMPP Standards Foundation or such author has been advised of the possibility of such damages. This XMPP Extension Protocol has been contributed in full conformance with the XSF's Intellectual Property Rights Policy (a copy of which may be found at <http://www.xmpp.org/extensions/ipr-policy.shtml> or obtained by writing to XSF, P.O. Box 1641, Denver, CO 80201 USA). xxxx ProtoXEP Standards Track Standards Council XMPP Core XEP-0001 XEP-0138 NOT_YET_ASSIGNED Peter Waher peter.waher@clayster.com peter.waher@jabber.org http://se.linkedin.com/pub/peter-waher/1a/71b/a29/ 0.0.1 2013-03-12 pwa

First draft.

The Efficient XML Interchange (EXI) Format Efficient XML Interchange (EXI) Format <http://www.w3.org/TR/exi/>. is an efficient way to compress XML documents and XML fragments. This document provides information on how EXI can be used in XMPP streams to efficiently compress data transmitted between the server and the client. For certain applications (like applications in sensor networks) EXI is a vital component, decreasing packet size enabling sensors with limited memory to communicate efficiently. The strong support in EXI for generating efficient stubcodes is also vital to build efficient code in constrained devices.

Activating EXI compression requires a handshake to take place prior, where the server and client agrees on a set of parameters. Some of these parameters may increase compression ratio, at the cost of processing power and readability. These parameters include:

These parameters will be discussed deeper in the following sections. There are also default values users can use to commence evaluating EXI compression.

The single most important property to agree on however, is the set of schemas to use during EXI compression. EXI compresses XML much more efficiently if there exist schemas describing the format of the expected XML. Since the server is not supposed to know all possible XML schemas, a mechanism is provided in this document whereby schemas can be interchanged, so that the server can adapt its compression to the needs of the client.

This XEP is based on &xep0138;. When the client connects to the XMPP Server, it will receive a list of features supported by the server:

zlib lzw exi ]]>

Support for EXI compression is detected by the existance of the exi compression method in the features stanza.

If the client attempts to activate EXI stream at this point, before negotiation of EXI properties have been performed, the server must respond with a setup-failed response.

exi ]]>

When the client decides to activate EXI compression, it sends a setup stanza containing parameter proposals to the server as follows:

]]>

The server in turn responds with a setupResponse stanza containing the parameters it can accept, based on the initial values provided by the client. Any buffer sizes, etc., may have been changed, but only lowered, never raised.

]]>

Note: Schema files are identified using three properties: Its target namespace, its byte size and its MD5 hash. The MD5 hash provides a way to detect small changes in the file, even if the byte size and namespace are the same.

Schema files that the server does not have, based on namespace, byte size and MD5 hash, are marked with the missingSchema element instead of the normal schema element.

The client can, at this point, choose to abort the EXI enablement sequence, if it cannot accommodate itself with the proposed parameter settings provided by the server. The XMPP session will continue to work in its current state.

If the server lacks information about a schema file, this is specified in the response through the missingSchema elements. The client can at this point either choose to accept that these schema files are not available, making compression less efficient, or choose to upload the missing schema files to the server. Of course, uploading schema files would require the device to have sufficient buffers and memory to store and upload the schema files in the first place. (If not possible to upload the schema files, consideration should be taken to install the schema files manually at the server.)

To upload a schema file, the client simply sends the schema file using an uploadSchema element, as follows:

PD94bWwgdmVyc2lvbj0nMS4wJyBlbmNvZGluZz0nVVRGLTgnPz4NCjx4czpzY2hlbWENCiAgICB4 bWxuczp4cz0naHR0cDovL3d3dy53My5vcmcvMjAwMS9YTUxTY2hlbWEnDQogICAgdGFyZ2V0TmFt ZXNwYWNlPSd1cm46eG1wcDpzbjpwcm92aXNpb25pbmcnDQogICAgeG1sbnM9J3Vybjp4bXBwOnNu ... dmlsZWdlJz4NCgkJPHhzOmF0dHJpYnV0ZSBuYW1lPSdpZCcgdHlwZT0nUHJpdmlsZWdlSWQnIHVz ZT0ncmVxdWlyZWQnLz4NCgk8L3hzOmNvbXBsZXhUeXBlPg0KIA0KPC94czpzY2hlbWE+DQo= ]]>

The schema itself is sent using base64 encoding to the server. This, to make sure a binary exact copy is transferred, maintaining encoding, processing instructions, etc. The server then computes the target namespace, byte size and MD5 Hash from this schema file sent.

If the client desires, it can test the EXI setup again. This is however optional, but can be used to test that uploading the schema files, and any new property values are accepted by the server.

]]>

And the server should now respond:

]]>

Note the agreement attribute in the response this time. The server must set this attribute to true if it agrees with the proposal from the client. The client in turn can check this attribute as a quick way to check if agreement exists.

As an alternative to uploading a schema file to the server, the client can ask the server to download a schema file by itself. This is done using the downloadSchema command, as follows:

]]>

The server tries to download the schema by itself, and then computes the target namespace, byte size and MD5 Hash from the downloaded schema.

When the schema has been downloaded, a successful download response is returned, as follows:

]]>

If an HTTP error occurred while trying to download the schema, a response as follows is returned:

]]>

If the URL could not be resolved, a response as follows is returned:

]]>

If a timeout occurred during the download attempt, a response as follows is returned:

]]>

If the url points to something that is not a schema, a response as follows is returned:

]]>

If some other error occurs, unforseen by this specification, the server can simply respond with a generic error message, as follows:

]]>

Note: Downloading a schema, might download a version which does not correspond to the desired version of the schema. So, it's more important in this case, the client checks that the server actually has the version of the schema required by the client.

When EXI option negotiation has been completed, the client can tell the server that it is ready to start compression. It does this using the normal compress stanza, as follows:

exi ]]>

The server now having necessary knowledge on how the EXI engine should be configured for the current session, responds:

]]>

When the client receives acknowledgement that the compression method has been accepted, it restarts the stream, as explained in XEP 0138.

The following segment is taken from the EXI specification. It describes the different EXI options that need to be negotiated before enabling EXI.

The alignment option is used to control the alignment of event codes and content items. The value is one of bit-packed, byte-alignment or pre-compression, of which bit-packed is the default value assumed when the "alignment" element is absent in the EXI Options document. The option values byte-alignment and pre-compression are effected when "byte" and "pre-compress" elements are present in the EXI Options document, respectively. When the value of compression option is set to true, alignment of the EXI Body is governed by the rules specified in 9. EXI Compression instead of the alignment option value. The "alignment" element MUST NOT appear in an EXI options document when the "compression" element is present.

The alignment option value bit-packed indicates that the event codes and associated content are packed in bits without any padding in-between.

The alignment option value byte-alignment indicates that the event codes and associated content are aligned on byte boundaries. While byte-alignment generally results in EXI streams of larger sizes compared with their bit-packed equivalents, byte-alignment may provide a help in some use cases that involve frequent copying of large arrays of scalar data directly out of the stream. It can also make it possible to work with data in-place and can make it easier to debug encoded data by allowing items on aligned boundaries to be easily located in the stream.

The alignment option value pre-compression indicates that all steps involved in compression (see section 9. EXI Compression) are to be done with the exception of the final step of applying the DEFLATE algorithm. The primary use case of pre-compression is to avoid a duplicate compression step when compression capability is built into the transport protocol. In this case, pre-compression just prepares the stream for later compression.

The compression option is a Boolean used to increase compactness using additional computational resources. The default value "false" is assumed when the "compression" element is absent in the EXI Options document whereas its presence denotes the value "true". When set to true, the event codes and associated content are compressed according to 9. EXI Compression regardless of the alignment option value. As mentioned above, the "compression" element MUST NOT appear in an EXI options document when the "alignment" element is present.

The strict option is a Boolean used to increase compactness by using a strict interpretation of the schemas and omitting preservation of certain items, such as comments, processing instructions and namespace prefixes. The default value "false" is assumed when the "strict" element is absent in the EXI Options document whereas its presence denotes the value "true". When set to true, those productions that have NS, CM, PI, ER, and SC terminal symbols are omitted from the EXI grammars, and schema-informed element and type grammars are restricted to only permit items declared in the schemas. A note in section 8.5.4.4.2 Adding Productions when Strict is True describes some additional restrictions consequential of the use of this option. The "strict" element MUST NOT appear in an EXI options document when one of "dtd", "prefixes", "comments", "pis" or "selfContained" element is present in the same options document.

The preserve option is a set of Booleans that can be set independently to each enable or disable a share of the format's capacity determining whether or how certain information items can be preserved in the EXI stream. Section 6.3 Fidelity Options describes the set of information items affected by the preserve option. The presence of "dtd", "prefixes", "lexicalValues", "comments" and "pis" in the EXI Options document each turns on fidelity options Preserve.comments, Preserve.pis, Preserve.dtd, Preserve.prefixes and Preserve.lexicalValues whereas the absence denotes turning each off. The elements "dtd", "prefixes", "comments" and "pis" MUST NOT appear in an EXI options document when the "strict" element is present in the same options document. The element "lexicalValues", on the other hand, is permitted to occur in the presence of "strict" element.

The selfContained option is a Boolean used to enable the use of self-contained elements in the EXI stream. Self-contained elements may be read independently from the rest of the EXI body, allowing them to be indexed for random access. The "selfContained" element MUST NOT appear in an EXI options document when one of "compression", "pre-compression" or "strict" elements are present in the same options document. The default value "false" is assumed when the "selfContained" element is absent from the EXI Options document whereas its presence denotes the value "true".

The datatypeRepresentationMap option specifies an alternate set of datatype representations for typed values in the EXI body as described in 7.4 Datatype Representation Map. When there are no "datatypeRepresentationMap" elements in the EXI Options document, no Datatype Representation Map is used for processing the EXI body. This option does not take effect when the value of the Preserve.lexicalValues fidelity option is true (see 6.3 Fidelity Options), or when the EXI stream is a schema-less EXI stream.

The blockSize option specifies the block size used for EXI compression. When the "blockSize" element is absent in the EXI Options document, the default blocksize of 1,000,000 is used. The default blockSize is intentionally large but can be reduced for processing large documents on devices with limited memory.

The valueMaxLength option specifies the maximum length of value content items to be considered for addition to the string table. The default value "unbounded" is assumed when the "valueMaxLength" element is absent in the EXI Options document.

The valuePartitionCapacity option specifies the maximum number of value content items in the string table at any given time. The default value "unbounded" is assumed when the "valuePartitionCapacity" element is absent in the EXI Options document. Section 7.3.3 Partitions Optimized for Frequent use of String Literals specifies the behavior of the string table when this capacity is reached.

To successfully implement a network with clients having limited memory, such as sensor networks, care should be taken to make sure necessary schema files are preinstalled on the server, to avoid the necessity to upload schema files from the clients. Clients with limited memory might be unable to perform this task.

An alternative may be to install a richer client, that can upload the schema files to the server dynamically, and installing it into the network. Any client uploading a schema file, will make that schema file available for EXI compression to any other client in the network.

Schema files uploaded to the server should be cached on the server in some kind of schema repository. If memory is limited on the server, schema files should be sorted by last access. Schema files with the oldest last access timestamp could be removed to maintain the cache within an approved cache size.

Note that schema files have three keys: Target namespace, byte size and MD5 Hash. Multiple versions of a schema file may exist (that is, with the same target namespace but different byte sizes or MD5 hash codes). Note also, that for any practical purpose, schema files can be stored using only the MD5 hash as a key, since it is highly improbable that two different schema files will have the same MD5 hash (unless consciously created that way). MD5 hash values are always in lower case.

When the server lacks information about a given XML schema, the client has two options for updating the server. Either it uploads the schema, or it asks the server to download one.

Uloading a schema has the advantage, that the client knows exactly the version that the server requires. It has the disadvantage, that the client needs to store the schema and send a possible large schema to the server. If EXI is used because the device has limited memory, uploading a schema might not be an option.

Downloading a schema has the advantage, that size of schema does not matter. The disadvantage is that asynchronous errors might occur, so the client needs to pay respect to the responses returned by the server when downloading schemas. Also, downloading a schema, might download a version which does not correspond to the desired version of the schema. So, it's more important in this case, the client checks that the server actually has the version of the schema required by the client.

If two XMPP clients communicate with each other through an XMPP server, and both clients use EXI compression, the server must only forward binary packets if both EXI compressed channels have exactly the same setup. If any parameter is different, the server MUST always recompress packets sent through it.

Since the server always needs to decompress incoming EXI compressed packets to decode headers, omitting the compression part might save the server some processing power, but not all. Note that, in some networks it might be common using similar compression settings, while in others different compression settings are most common.

Note that EXI compressed information, even though it is hard to decode by humans, is by no means encrypted. If sensitive data is to be sent over an EXI compressed channel, encryption should be considered as well.

This document requires no interaction with &IANA;.

REQUIRED.

The alignment option is used to control the alignment of event codes and content items. The value is one of bit-packed, byte-alignment or pre-compression, of which bit-packed is the default value assumed when the "alignment" element is absent in the EXI Options document. The option values byte-alignment and pre-compression are effected when "byte" and "pre-compress" elements are present in the EXI Options document, respectively. When the value of compression option is set to true, alignment of the EXI Body is governed by the rules specified in 9. EXI Compression instead of the alignment option value. The "alignment" element MUST NOT appear in an EXI options document when the "compression" element is present. The compression option is a Boolean used to increase compactness using additional computational resources. The default value "false" is assumed when the "compression" element is absent in the EXI Options document whereas its presence denotes the value "true". When set to true, the event codes and associated content are compressed according to 9. EXI Compression regardless of the alignment option value. As mentioned above, the "compression" element MUST NOT appear in an EXI options document when the "alignment" element is present. The strict option is a Boolean used to increase compactness by using a strict interpretation of the schemas and omitting preservation of certain items, such as comments, processing instructions and namespace prefixes. The default value "false" is assumed when the "strict" element is absent in the EXI Options document whereas its presence denotes the value "true". When set to true, those productions that have NS, CM, PI, ER, and SC terminal symbols are omitted from the EXI grammars, and schema-informed element and type grammars are restricted to only permit items declared in the schemas. A note in section 8.5.4.4.2 Adding Productions when Strict is True describes some additional restrictions consequential of the use of this option. The "strict" element MUST NOT appear in an EXI options document when one of "dtd", "prefixes", "comments", "pis" or "selfContained" element is present in the same options document. Comments are preserved. Must not be used together with the strict option. Processing instructions are preserved. Must not be used together with the strict option. DTD is preserved. Must not be used together with the strict option. Prefixes are preserved. Must not be used together with the strict option. Lexical form of element and attribute values can be preserved in value content items. Can be used together with the strict option. The selfContained option is a Boolean used to enable the use of self-contained elements in the EXI stream. Self-contained elements may be read independently from the rest of the EXI body, allowing them to be indexed for random access. The "selfContained" element MUST NOT appear in an EXI options document when one of "compression", "pre-compression" or "strict" elements are present in the same options document. The default value "false" is assumed when the "selfContained" element is absent from the EXI Options document whereas its presence denotes the value "true". The blockSize option specifies the block size used for EXI compression. When the "blockSize" element is absent in the EXI Options document, the default blocksize of 1,000,000 is used. The default blockSize is intentionally large but can be reduced for processing large documents on devices with limited memory. The valueMaxLength option specifies the maximum length of value content items to be considered for addition to the string table. The default value "unbounded" is assumed when the "valueMaxLength" element is absent in the EXI Options document. The valuePartitionCapacity option specifies the maximum number of value content items in the string table at any given time. The default value "unbounded" is assumed when the "valuePartitionCapacity" element is absent in the EXI Options document. Section 7.3.3 Partitions Optimized for Frequent use of String Literals specifies the behavior of the string table when this capacity is reached. The alignment option value bit-packed indicates that the event codes and associated content are packed in bits without any padding in-between. The alignment option value byte-alignment indicates that the event codes and associated content are aligned on byte boundaries. While byte-alignment generally results in EXI streams of larger sizes compared with their bit-packed equivalents, byte-alignment may provide a help in some use cases that involve frequent copying of large arrays of scalar data directly out of the stream. It can also make it possible to work with data in-place and can make it easier to debug encoded data by allowing items on aligned boundaries to be easily located in the stream. The alignment option value pre-compression indicates that all steps involved in compression (see section 9. EXI Compression) are to be done with the exception of the final step of applying the DEFLATE algorithm. The primary use case of pre-compression is to avoid a duplicate compression step when compression capability is built into the transport protocol. In this case, pre-compression just prepares the stream for later compression. ]]>

Thanks to Joachim Lindborg for all valuable feedback.