1
0
mirror of https://github.com/moparisthebest/xeps synced 2024-11-22 01:02:17 -05:00
xeps/xep-0322.xml
2013-04-16 14:07:31 -06:00

837 lines
52 KiB
XML

<?xml version='1.0' encoding='UTF-8'?>
<!-- TODO: Add sequence diagrams. -->
<!DOCTYPE xep SYSTEM 'xep.dtd' [
<!ENTITY % ents SYSTEM 'xep.ent'>
%ents;
]>
<?xml-stylesheet type='text/xsl' href='xep.xsl'?>
<xep>
<header>
<title>Efficient XML Interchange (EXI) Format</title>
<abstract>This specification describes how EXI compression can be used in XMPP networks.</abstract>
&LEGALNOTICE;
<number>0322</number>
<status>Experimental</status>
<type>Standards Track</type>
<sig>Standards</sig>
<approver>Council</approver>
<dependencies>
<spec>XMPP Core</spec>
<spec>XEP-0001</spec>
<spec>XEP-0138</spec>
</dependencies>
<supersedes/>
<supersededby/>
<shortname>NOT_YET_ASSIGNED</shortname>
<author>
<firstname>Peter</firstname>
<surname>Waher</surname>
<email>peter.waher@clayster.com</email>
<jid>peter.waher@jabber.org</jid>
<uri>http://se.linkedin.com/pub/peter-waher/1a/71b/a29/</uri>
</author>
<revision>
<version>0.1</version>
<date>2013-04-16</date>
<initials>psa</initials>
<remark><p>Initial published version approved by the XMPP Council.</p></remark>
</revision>
<revision>
<version>0.0.4</version>
<date>2013-03-19</date>
<initials>pwa</initials>
<remark>
<p>Added support for uploading EXI-compressed schema files.</p>
</remark>
</revision>
<revision>
<version>0.0.3</version>
<date>2013-03-15</date>
<initials>pwa</initials>
<remark>
<p>Added definition: EXI body.</p>
<p>Added note regarding preserverance of namespace prefixes.</p>
<p>Corrected the language.</p>
</remark>
</revision>
<revision>
<version>0.0.2</version>
<date>2013-03-13</date>
<initials>pwa</initials>
<remark>
<p>Added support for session-wide buffers and string tables.</p>
</remark>
</revision>
<revision>
<version>0.0.1</version>
<date>2013-03-12</date>
<initials>pwa</initials>
<remark>
<p>First draft.</p>
</remark>
</revision>
</header>
<section1 topic='Introduction' anchor='intro'>
<p>
The Efficient XML Interchange (EXI) Format <note>Efficient XML Interchange (EXI) Format &lt;<link url='http://www.w3.org/TR/exi/'>http://www.w3.org/TR/exi/</link>&gt;.</note> is an
efficient way to compress XML documents and XML fragments. This document provides information on how EXI can be used in XMPP streams to efficiently compress data transmitted between
the server and the client. For certain applications (like applications in sensor networks) EXI is a vital component, decreasing packet size enabling sensors with limited memory to
communicate efficiently. The strong support in EXI for generating efficient stubcodes is also vital to build efficient code in constrained devices.
</p>
<p>
Activating EXI compression requires a handshake to take place prior, where the server and client agree on a set of parameters. Some of these parameters may increase the compression ratio,
at the cost of processing power and readability. These parameters include:
</p>
<ul>
<li>Schemas to use.</li>
<li>EXI version number.</li>
<li>Data alignment (bit-packed, byte-alignment, pre-compression).</li>
<li>If EXI-compressed data should be further compressed using additional compression.</li>
<li>Strict or loose adherence to schemas.</li>
<li>If comments, processing instructions, dtd:s, prefixes, lexical values, etc. should be preserved.</li>
<li>If self-contained elements should be allowed.</li>
<li>Alternate data type representations for types values.</li>
<li>Block size for EXI compression.</li>
<li>Maximum string length of value content items in string tables.</li>
<li>Value partition capacity.</li>
</ul>
<p>
These parameters will be discussed in greater depth in the following sections. There are also default values that can be used to commence evaluating EXI compression.
</p>
<p>
The single most important property to agree on however, is the set of schemas to use during EXI compression. EXI compresses XML much more efficiently if schemas exist
describing the format of the expected XML. Since the server is not supposed to know all possible XML schemas, a mechanism is provided in this document whereby schemas can be
interchanged, so that the server can adapt its compression to the needs of the client.
</p>
</section1>
<section1 topic='Use Cases' anchor='usecases'>
<section2 topic='Detecting support'>
<p>
This XEP is based on &xep0138;. When the client connects to the XMPP Server, it will receive a list of features supported by the server:
</p>
<example caption='Search Features'>
<![CDATA[
<stream:features>
<starttls xmlns='urn:ietf:params:xml:ns:xmpp-tls'/>
<compression xmlns='http://jabber.org/features/compress'>
<method>zlib</method>
<method>lzw</method>
<method>exi</method>
</compression>
</stream:features>]]>
</example>
<p>
Support for EXI compression is detected by the existence of the <strong>exi</strong> compression method in the <strong>features</strong> stanza.
</p>
</section2>
<section2 topic='Invalid setup'>
<p>
If the client attempts to activate an EXI stream at this point, before the negotiation of EXI properties has been performed, the server must respond with a
<strong>setup-failed</strong> response.
</p>
<example caption='Invalid setup'>
<![CDATA[
<compress xmlns='http://jabber.org/protocol/compress'>
<method>exi</method>
</compress>
<failure xmlns='http://jabber.org/protocol/compress'>
<setup-failed/>
</failure>]]>
</example>
</section2>
<section2 topic='Proposing compression parameters'>
<p>
When the client decides to activate EXI compression, it sends a <strong>setup</strong> stanza containing parameter proposals to the server as follows:
</p>
<example caption='Proposing compression parameters'>
<![CDATA[
<setup xmlns='http://jabber.org/protocol/compress/exi' version='1' strict='true' blockSize='1024' valueMaxLength='32' valuePartitionCapacity='100'>
<schema ns='urn:xmpp:sn' bytes='8092' md5Hash='18829242ca7a72a552a7e15af5b9e44d'/>
<schema ns='urn:xmpp:sn:provisioning' bytes='6303' md5Hash='e5301add51f3b24c15a71256b53daa47'/>
</setup>]]>
</example>
<p>
The server in turn responds with a <strong>setupResponse</strong> stanza containing the parameters it can accept, based on the initial values provided by the client.
Any buffer sizes, etc., may have been changed, but only lowered, never raised.
</p>
<example caption='Unable to accommodate parameters'>
<![CDATA[
<setupResponse xmlns='http://jabber.org/protocol/compress/exi' version='1' strict='true'
blockSize='1024' valueMaxLength='32' valuePartitionCapacity='100'>
<schema ns='urn:xmpp:sn' bytes='8092' md5Hash='18829242ca7a72a552a7e15af5b9e44d'/>
<missingSchema ns='urn:xmpp:sn:provisioning' bytes='6303' md5Hash='e5301add51f3b24c15a71256b53daa47'/>
</setupResponse>]]>
</example>
<p>
<strong>Note:</strong> Schema files are identified using three properties: Its <strong>target namespace</strong>, its <strong>byte size</strong> and its
<strong>MD5 hash</strong>. The <strong>MD5 hash</strong> provides a way to detect small changes in the file, even if the byte size and namespace are the same.
</p>
<p>
Schema files that the server does not have (based on namespace, byte size and MD5 hash) are marked with the <strong>missingSchema</strong> element instead of the
normal <strong>schema</strong> element.
</p>
<p>
At this point the client can choose to abort the EXI enablement sequence, if it cannot accommodate itself with the proposed parameter settings provided by the server.
The XMPP session will continue to work in its current state. Aborting does not require taking further action from the client.
</p>
</section2>
<section2 topic='Uploading new schema files'>
<p>
If the server lacks information about a schema file, it is specified in the response through the <strong>missingSchema</strong> elements. At this point, the client can
either choose to accept that these schema files are not available, making compression less efficient, or choose to upload the missing schema files to the server. Of course,
uploading schema files would require the device to have sufficient buffers and memory to store and upload the schema files in the first place. (If it is not possible to upload the
schema files, consideration should be given to installing the schema files manually at the server.)
</p>
<p>
To upload a schema file, the client simply sends the schema file using an <strong>uploadSchema</strong> element, as follows:
</p>
<example caption='Uploading schema file'>
<![CDATA[
<uploadSchema xmlns='http://jabber.org/protocol/compress/exi' contentType='Text'>
PD94bWwgdmVyc2lvbj0nMS4wJyBlbmNvZGluZz0nVVRGLTgnPz4NCjx4czpzY2hlbWENCiAgICB4
bWxuczp4cz0naHR0cDovL3d3dy53My5vcmcvMjAwMS9YTUxTY2hlbWEnDQogICAgdGFyZ2V0TmFt
ZXNwYWNlPSd1cm46eG1wcDpzbjpwcm92aXNpb25pbmcnDQogICAgeG1sbnM9J3Vybjp4bXBwOnNu
...
dmlsZWdlJz4NCgkJPHhzOmF0dHJpYnV0ZSBuYW1lPSdpZCcgdHlwZT0nUHJpdmlsZWdlSWQnIHVz
ZT0ncmVxdWlyZWQnLz4NCgk8L3hzOmNvbXBsZXhUeXBlPg0KIA0KPC94czpzY2hlbWE+DQo=
</uploadSchema>]]>
</example>
<p>
The schema itself is sent using base64 encoding to the server. This is to make sure a binary exact copy is transferred, maintaining encoding, processing instructions, etc. The
server then computes the <strong>target namespace</strong>, <strong>byte size</strong> and <strong>MD5 Hash</strong> from the sent schema file.
</p>
<p>
If the client desires, it can test the EXI setup again. This is optional, but can be used to test that uploading the schema files, and any new property values
are accepted by the server.
</p>
<example caption='Testing newly uploaded schema files'>
<![CDATA[
<setup xmlns='http://jabber.org/protocol/compress/exi' version='1' strict='true' blockSize='1024' valueMaxLength='32' valuePartitionCapacity='100'>
<schema ns='urn:xmpp:sn' bytes='8092' md5Hash='18829242ca7a72a552a7e15af5b9e44d'/>
<schema ns='urn:xmpp:sn:provisioning' bytes='6303' md5Hash='e5301add51f3b24c15a71256b53daa47'/>
</setup>]]>
</example>
<p>
And the server should then respond:
</p>
<example caption='Agreement between client and server'>
<![CDATA[
<setupResponse xmlns='http://jabber.org/protocol/compress/exi' version='1' strict='true'
blockSize='1024' valueMaxLength='32' valuePartitionCapacity='100' agreement='true'>
<schema ns='urn:xmpp:sn' bytes='8092' md5Hash='18829242ca7a72a552a7e15af5b9e44d'/>
<schema ns='urn:xmpp:sn:provisioning' bytes='6303' md5Hash='e5301add51f3b24c15a71256b53daa47'/>
</setupResponse>]]>
</example>
<p>
Note the <strong>agreement</strong> attribute in the response this time. The server must set this attribute to true if it agrees with the proposal from the client.
The client in turn can check this attribute as a quick way to check if agreement exists.
</p>
</section2>
<section2 topic='Uploading compressed schema files'>
<p>
The <strong>uploadSchema</strong> command has an optional attribute called <strong>contentType</strong> that can be used to send different types of documents
to the server. This is not a MIME content type, but an enumeration with the following options:
</p>
<table caption='contentType values'>
<tr>
<th>Value</th>
<th>Description</th>
</tr>
<tr>
<td>Text</td>
<td>The schema is sent as plain text. If no encoding is provided in the XML header of the schema file, UTF-8 encoding is assumed. This is the default value.</td>
</tr>
<tr>
<td>ExiBody</td>
<td>The schema file is sent as an EXI compressed file, but only the body is sent. *</td>
</tr>
<tr>
<td>ExiDocument</td>
<td>The schema file is sent as an EXI compressed file. The entire file, including Exi header is provided. *</td>
</tr>
</table>
<p>
(*) These options assume the following set of default EXI options are used. It is assumed the XMPP server has more capabilities than the client, so the following
set of options must be supported by the XMPP server. The schema files can be precompressed and stored as binary files on the client for easier transmission.
</p>
<table caption='Default EXI options'>
<tr>
<th>Option</th>
<th>Default value</th>
</tr>
<tr>
<td>Version</td>
<td>1</td>
</tr>
<tr>
<td>alignment</td>
<td>bit-packed</td>
</tr>
<tr>
<td>compression</td>
<td>false</td>
</tr>
<tr>
<td>strict</td>
<td>false</td>
</tr>
<tr>
<td>fragment</td>
<td>false</td>
</tr>
<tr>
<td>preserve</td>
<td>all false</td>
</tr>
<tr>
<td>selfContained</td>
<td>false</td>
</tr>
<tr>
<td>schemaId</td>
<td>No schema</td>
</tr>
<tr>
<td>datatypeRepresentationMap</td>
<td>No map</td>
</tr>
<tr>
<td>blockSize</td>
<td>1000000 (one million)</td>
</tr>
<tr>
<td>valueMaxLength</td>
<td>unbounded</td>
</tr>
<tr>
<td>valuePartitionCapacity</td>
<td>unbounded</td>
</tr>
</table>
<p>
Since EXI compression does not perserve the extact binary representation of the schema file (for instance it doesn't preserve white space), the server
cannot correctly compute byte size and an MD5 hash for the file. Therefore, the client needs to provide this information in the <strong>uploadSchema</strong>
command using the <strong>bytes</strong> and <strong>md5Hash</strong> attributes. They are mandatory in case EXI compressed schema files are uploaded to the
server. Also note that the byte length and MD5 Hash should be computed on the original XML Schema file, not the compressed or decompressed version.
</p>
</section2>
<section2 topic='Downloading new schema files on server'>
<p>
As an alternative to uploading a schema file to the server, the client can ask the server to download a schema file by itself. This is done using the <strong>downloadSchema</strong>
command, as follows:
</p>
<example caption='Downloading new schema files on server'>
<![CDATA[
<downloadSchema xmlns='http://jabber.org/protocol/compress/exi' url='http://schemavault.se/compress/sn/provisioning.xsd'/>]]>
</example>
<p>
The server tries to download the schema by itself, and then computes the <strong>target namespace</strong>, <strong>byte size</strong> and <strong>MD5 Hash</strong>
from the downloaded schema.
</p>
<p>
When the schema has been downloaded, the following successful download response is returned:
</p>
<example caption='Schema successfully downloaded'>
<![CDATA[
<downloadSchemaResponse xmlns='http://jabber.org/protocol/compress/exi' url='http://schemavault.se/compress/sn/provisioning.xsd' result='true'/>]]>
</example>
<p>
If an HTTP error occurred while trying to download the schema, a response as follows is returned:
</p>
<example caption='HTTP Error'>
<![CDATA[
<downloadSchemaResponse xmlns='http://jabber.org/protocol/compress/exi' url='http://schemavault.se/compress/sn/provisioning.xsd' result='false'>
<httpError code='404' message='NotFound'/>
</downloadSchemaResponse>]]>
</example>
<p>
If the URL could not be resolved, the following response is returned:
</p>
<example caption='Invalid URL'>
<![CDATA[
<downloadSchemaResponse xmlns='http://jabber.org/protocol/compress/exi' url='urk://example.com/schema.xsd' result='false'>
<invalidUrl message='Unrecognized schema.'/>
</downloadSchemaResponse>]]>
</example>
<p>
If a timeout occurred during the download attempt, the following response is returned:
</p>
<example caption='Timeout'>
<![CDATA[
<downloadSchemaResponse xmlns='http://jabber.org/protocol/compress/exi' url='http://schemavault.se/compress/sn/provisioning.xsd' result='false'>
<timeout message='No response returned.'/>
</downloadSchemaResponse>]]>
</example>
<p>
If the url points to something that is not a schema, the following response is returned:
</p>
<example caption='Invalid Content Type'>
<![CDATA[
<downloadSchemaResponse xmlns='http://jabber.org/protocol/compress/exi' url='http://schemavault.se/compress/sn/provisioning.xsd' result='false'>
<invalidContentType contentTypeReturned='text/html'/>
</downloadSchemaResponse>]]>
</example>
<p>
If an error occurs that is unforeseen by this specification, the server can simply respond with a generic error message, as follows:
</p>
<example caption='Other types of errors'>
<![CDATA[
<downloadSchemaResponse xmlns='http://jabber.org/protocol/compress/exi' url='http://schemavault.se/compress/sn/provisioning.xsd' result='false'>
<error message='No free space left.'/>
</downloadSchemaResponse>]]>
</example>
<p>
<strong>Note:</strong> Downloading a schema, might download a version which does not correspond to the desired version
of the schema. It's more important in this case so the client checks that the server actually has the version of the schema required by the client.
</p>
</section2>
<section2 topic='Start compression'>
<p>
When EXI option negotiation has been completed, the client can tell the server that it is ready to start compression. It does this using the normal <strong>compress</strong>
stanza, as follows:
</p>
<example>
<![CDATA[
<compress xmlns='http://jabber.org/protocol/compress'>
<method>exi</method>
</compress>]]>
</example>
<p>
The server now has the necessary knowledge on how the EXI engine should be configured for the current session and it responds as follows:
</p>
<example caption='Compression accepted'>
<![CDATA[
<compressed xmlns='http://jabber.org/protocol/compress'/>]]>
</example>
<p>
When the client receives acknowledgement that the compression method has been accepted, it restarts the stream, as explained in
<link url='http://xmpp.org/extensions/xep-0138.html#usecase'>XEP 0138</link>, except that it <strong>must not</strong> resend the <strong>&lt;stream&gt;</strong>
start element sequence. Similarly, the client must not send a <strong>&lt;/stream&gt;</strong> element when closing the session. Closing the connection is sufficient.
</p>
</section2>
</section1>
<section1 topic='Implementation Notes' anchor='impl'>
<section2 topic='EXI options'>
<p>
The following segment is taken from the <link url='http://www.w3.org/TR/exi/#options'>EXI specification</link>. It describes the different EXI options that need to be negotiated before enabling EXI.
</p>
<p>
The <strong>alignment option</strong> is used to control the alignment of event codes and content items. The value is one of bit-packed, byte-alignment or pre-compression, of which bit-packed is the default value assumed when the "alignment" element is absent in the EXI Options document. The option values byte-alignment and pre-compression are effected when "byte" and "pre-compress" elements are present in the EXI Options document, respectively. When the value of compression option is set to true, alignment of the EXI Body is governed by the rules specified in <link url='http://www.w3.org/TR/exi/#compression'>9. EXI Compression</link> instead of the alignment option value. The "alignment" element MUST NOT appear in an EXI options document when the "compression" element is present.
</p>
<p>
The alignment option value <strong>bit-packed</strong> indicates that the event codes and associated content are packed in bits without any padding in-between.
</p>
<p>
The alignment option value <strong>byte-alignment</strong> indicates that the event codes and associated content are aligned on byte boundaries. While byte-alignment generally results in EXI streams of larger sizes compared with their bit-packed equivalents, byte-alignment may provide a help in some use cases that involve frequent copying of large arrays of scalar data directly out of the stream. It can also make it possible to work with data in-place and can make it easier to debug encoded data by allowing items on aligned boundaries to be easily located in the stream.
</p>
<p>
The alignment option value <strong>pre-compression</strong> indicates that all steps involved in compression (see section <link url='http://www.w3.org/TR/exi/#compression'>9. EXI Compression</link>) are to be done with the exception of the final step of applying the DEFLATE algorithm. The primary use case of pre-compression is to avoid a duplicate compression step when compression capability is built into the transport protocol. In this case, pre-compression just prepares the stream for later compression.
</p>
<p>
The <strong>compression option</strong> is a Boolean used to increase compactness using additional computational resources. The default value "false" is assumed when the "compression" element is absent in the EXI Options document whereas its presence denotes the value "true". When set to true, the event codes and associated content are compressed according to <link url='http://www.w3.org/TR/exi/#compression'>9. EXI Compression</link> regardless of the alignment option value. As mentioned above, the "compression" element MUST NOT appear in an EXI options document when the "alignment" element is present.
</p>
<p>
The <strong>strict option</strong> is a Boolean used to increase compactness by using a strict interpretation of the schemas and omitting preservation of certain items, such as comments, processing instructions and namespace prefixes. The default value "false" is assumed when the "strict" element is absent in the EXI Options document whereas its presence denotes the value "true". When set to true, those productions that have NS, CM, PI, ER, and SC terminal symbols are omitted from the EXI grammars, and schema-informed element and type grammars are restricted to only permit items declared in the schemas. A note in section <link url='http://www.w3.org/TR/exi/#addingProductionsStrict'>8.5.4.4.2 Adding Productions when Strict is True</link> describes some additional restrictions consequential of the use of this option. The "strict" element MUST NOT appear in an EXI options document when one of "dtd", "prefixes", "comments", "pis" or "selfContained" element is present in the same options document.
</p>
<p>
The <strong>preserve option</strong> is a set of Booleans that can be set independently to each enable or disable a share of the format's capacity determining whether or how certain information items can be preserved in the EXI stream. Section <link url='http://www.w3.org/TR/exi/#fidelityOptions'>6.3 Fidelity Options</link> describes the set of information items affected by the preserve option. The presence of "dtd", "prefixes", "lexicalValues", "comments" and "pis" in the EXI Options document each turns on fidelity options Preserve.comments, Preserve.pis, Preserve.dtd, Preserve.prefixes and Preserve.lexicalValues whereas the absence denotes turning each off. The elements "dtd", "prefixes", "comments" and "pis" MUST NOT appear in an EXI options document when the "strict" element is present in the same options document. The element "lexicalValues", on the other hand, is permitted to occur in the presence of "strict" element.
</p>
<p>
The <strong>selfContained option</strong> is a Boolean used to enable the use of self-contained elements in the EXI stream. Self-contained elements may be read independently from the rest of the EXI body, allowing them to be indexed for random access. The "selfContained" element MUST NOT appear in an EXI options document when one of "compression", "pre-compression" or "strict" elements are present in the same options document. The default value "false" is assumed when the "selfContained" element is absent from the EXI Options document whereas its presence denotes the value "true".
</p>
<p>
The <strong>datatypeRepresentationMap option</strong> specifies an alternate set of datatype representations for typed values in the EXI body as described in <link url='http://www.w3.org/TR/exi/#datatypeRepresentationMap'>7.4 Datatype Representation Map</link>. When there are no "datatypeRepresentationMap" elements in the EXI Options document, no Datatype Representation Map is used for processing the EXI body. This option does not take effect when the value of the Preserve.lexicalValues fidelity option is true (see <link url='http://www.w3.org/TR/exi/#fidelityOptions'>6.3 Fidelity Options</link>), or when the EXI stream is a schema-less EXI stream.
</p>
<p>
The <strong>blockSize option</strong> specifies the block size used for EXI compression. When the "blockSize" element is absent in the EXI Options document, the default blocksize of 1,000,000 is used. The default blockSize is intentionally large but can be reduced for processing large documents on devices with limited memory.
</p>
<p>
The <strong>valueMaxLength option</strong> specifies the maximum length of value content items to be considered for addition to the string table. The default value "unbounded" is assumed when the "valueMaxLength" element is absent in the EXI Options document.
</p>
<p>
The <strong>valuePartitionCapacity option</strong> specifies the maximum number of value content items in the string table at any given time. The default value "unbounded" is assumed when the "valuePartitionCapacity" element is absent in the EXI Options document. Section <link url='http://www.w3.org/TR/exi/#encodingOptimizedForMisses'>7.3.3 Partitions Optimized for Frequent use of String Literals</link> specifies the behavior of the string table when this capacity is reached.
</p>
<p>
The <strong>sessionWideBuffers</strong> controls buffer and string table life time. If set to true, all buffers, string tables, etc. will be maintained during the entire session.
This may improve performance during time since strings can be omitted in the compressed binary stream, but it might also in some cases degrade performance since more options are
available in the tables, requiring more bits to encode strings. The default value is false, meaning that buffers, string tables, etc., are cleared between each stanza. (This option
is EXI/XMPP specific.)
</p>
</section2>
<section2 topic='Transmission of EXI bodies and Session-wide Buffers'>
<p>
The transmission of EXI-compressed stanzas takes the form of a sequence of EXI bodies. In order for the recipient to be able to correctly interpret these incoming
EXI bodies, the sender is required to flush any pending bits at the end of the last End Document (ED) event for each stanza and then send any pending bytes available
in the output buffer. Since this makes sure each EXI body starts at an even byte boundary, it permits the recipient to decompress the body into an XML stanza.
</p>
<p>
Therefore, each stanza sent on the stream, must be compressed separately, reusing the same options as used by the stream.
(Options are not sent on the stream, only the generated EXI bodies).
</p>
<p>
Compression of the stanza must be done in document mode, not fragment mode, including the Start Document (SD) and End Document (ED) events.
If there are unwritten bits pending after the last End Document (ED) event (after the end of the stanza), Zero-bits are written until a byte boundary is created.
The receptor must ignore bits in the last byte after the last End Document event has been received.
</p>
<p>
During setup of the EXI compression engine, the client can choose if buffers are to be reused between stanzas, or cleared between each stanza. This is done
using the EXI over XMPP specific option <strong>sessionWideBuffers</strong>, which is false by default, meaning buffers and string tables are cleared between
each stanza.
</p>
<p>
There may be cases where maintaining buffers and string tables throughout the session is preferrable. Since strings are already available in the buffers,
they don't need to be output in the stream the first time they appear in a stanza. However, the number of strings in tables increase, and so does the number of bits required to
encode them. Depending on what type of communication is performed, this option might give better results one way or another. If the same type of message is always
sent, maintaining string buffers may be more efficient. But if the client sends many many different types of messages, clearing buffers may be more efficient.
</p>
<p>
Note that the stream of EXI bodies is indefinite. It only stops when the session is closed, i.e. when the socket connection is dropped. Therefore, the buffers can grow
indefinitely unless control is maintained on what types of messages are sent, their contents (specifically string values), and to whom they are sent (JIDs being strings).
All string tables and buffers must be cleared when a connection is lost.
</p>
<p>
Note also that if you want the option to enter a session in the middle of the flow to listen to the communication, you need to clear tables and buffers between each
stanza, or you will not be able to decode the binary stream appropriately.
</p>
</section2>
<section2 topic='Preserving prefixes'>
<p>
Normally, prefixes are not preserved during EXI compression and decompression. If the communicating parties (sending client, XMPP server(s) and receiving clients)
interpret incoming stanzas and content according to namespace, this should be sufficient. However, some implementations do not check namespaces, but prefix names used.
In such cases, all communicating parties are required to enable the preserve prefixes option during negotionating.
</p>
<p>
<strong>Note:</strong> It is not sufficient that one party enable this option. Both sender and received are required to enable this option, or prefix names will be
lost in the transmission.
</p>
<p>
Note also, that preserving prefix names result in less efficient compression. Therefore, all clients implementing EXI compression should strive to parse incoming
XML based on namespace, not prefix name.
</p>
</section2>
<section2 topic='Networks containing clients having limited memory'>
<p>
To successfully implement a network with clients having limited memory, such as sensor networks, care should be taken to make sure necessary schema files are
preinstalled on the server, to avoid the necessity to upload schema files from the clients. Clients with limited memory might be unable to perform this task.
</p>
<p>
An alternative may be to install a richer client, that can upload the schema files to the server dynamically, and installing it into the network. Any client uploading
a schema file, will make that schema file available for EXI compression to any other client in the network.
</p>
</section2>
<section2 topic='Caching schema files'>
<p>
Schema files uploaded to the server should be cached on the server in some kind of schema repository. If memory is limited on the server, schema files should be
sorted by last access. Schema files with the oldest last access timestamp could be removed to maintain the cache within an approved cache size.
</p>
<p>
Note that schema files have three keys: <strong>Target namespace</strong>, <strong>byte size</strong> and <strong>MD5 Hash</strong>. Multiple versions of a schema file
may exist (that is, with the same target namespace but different byte sizes or MD5 hash codes). Note also, that for any practical purpose, schema files can be stored
using only the MD5 hash as a key, since it is highly improbable that two different schema files will have the same MD5 hash (unless consciously created that way). MD5 hash
values are always in <strong>lower case</strong>.
</p>
</section2>
<section2 topic='Uploading vs. Downloading schemas'>
<p>
When the server lacks information about a given XML schema, the client has two options for updating the server. Either it uploads the schema, or it asks the server to
download one.
</p>
<p>
Uploading a schema has the advantage, that the client knows exactly the version that the server requires. It has the disadvantage, that the client needs to store the schema
and send a possible large schema to the server. If EXI is used because the device has limited memory, uploading a schema might not be an option.
</p>
<p>
Downloading a schema has the advantage, that size of schema does not matter. The disadvantage is that asynchronous errors might occur, so the client needs to pay attention
to the responses returned by the server when downloading schemas. Also, downloading a schema, might download a version which does not correspond to the desired version
of the schema. So, it's more important in this case that the client checks that the server actually has the version of the schema required by the client.
</p>
</section2>
<section2 topic='Server decompression and recompression vs. binary forwarding'>
<p>
If two XMPP clients communicate with each other through an XMPP server, and both clients use EXI compression, the server must only forward
binary packets if both EXI compressed channels have exactly the same setup. If any parameter is different, the server MUST always recompress
packets sent through it.
</p>
<p>
Since the server always needs to decompress incoming EXI compressed packets to decode headers, omitting the compression part might save the server
some processing power, but not all. Note that, in some networks it might be common using similar compression settings, while in others different compression
settings are most common.
</p>
</section2>
</section1>
<section1 topic='Security Considerations' anchor='security'>
<p>
Note that EXI compressed information, even though it is hard to decode by humans, is by no means encrypted. If sensitive data is to be sent over an EXI compressed
channel, encryption should be considered as well.
</p>
</section1>
<section1 topic='IANA Considerations' anchor='iana'>
<p>This document requires no interaction with &IANA;.</p>
</section1>
<section1 topic='XMPP Registrar Considerations' anchor='registrar'>
<p>REQUIRED.</p>
<!-- TODO -->
</section1>
<section1 topic='XML Schema' anchor='schema'>
<code>
<![CDATA[
<?xml version='1.0' encoding='UTF-8'?>
<xs:schema
xmlns:xs='http://www.w3.org/2001/XMLSchema'
targetNamespace='http://jabber.org/protocol/compress/exi'
xmlns='http://jabber.org/protocol/compress/exi'
elementFormDefault='qualified'>
<xs:element name='setup' type='Setup'/>
<xs:element name='setupResponse' type='SetupResponse'/>
<xs:complexType name='Setup'>
<xs:choice minOccurs='0' maxOccurs='unbounded'>
<xs:element name='schema' type='Schema'/>
<xs:element name='datatypeRepresentationMap' type='DatatypeRepresentationMap'/>
</xs:choice>
<xs:attributeGroup ref='Options'/>
</xs:complexType>
<xs:complexType name='SetupResponse'>
<xs:complexContent>
<xs:extension base='Setup'>
<xs:choice minOccurs='0' maxOccurs='unbounded'>
<xs:element name='missingSchema' type='Schema'/>
</xs:choice>
<xs:attribute name='agreement' type='xs:boolean' use='optional' default='false'/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:complexType name='Schema'>
<xs:attribute name='ns' type='xs:string' use='required'/>
<xs:attribute name='bytes' type='xs:positiveInteger' use='required'/>
<xs:attribute name='md5Hash' type='MD5Hash' use='required'/>
</xs:complexType>
<xs:complexType name='DatatypeRepresentationMap'>
<xs:attribute name='type' type='xs:string' use='required'/>
<xs:attribute name='representAs' type='xs:string' use='required'/>
</xs:complexType>
<xs:attributeGroup name='Options'>
<xs:attribute name='version' type='xs:positiveInteger' use='optional' default='1'/>
<xs:attribute name='alignment' type='Alignment' use='optional' default='bit-packed'>
<xs:annotation>
<xs:documentation>The alignment option is used to control the alignment of event codes and content items.
The value is one of bit-packed, byte-alignment or pre-compression, of which bit-packed is the default value
assumed when the "alignment" element is absent in the EXI Options document. The option values byte-alignment
and pre-compression are effected when "byte" and "pre-compress" elements are present in the EXI Options
document, respectively. When the value of compression option is set to true, alignment of the EXI Body is
governed by the rules specified in 9. EXI Compression instead of the alignment option value. The "alignment"
element MUST NOT appear in an EXI options document when the "compression" element is present.</xs:documentation>
</xs:annotation>
</xs:attribute>
<xs:attribute name='compression' type='xs:boolean' use='optional' default='false'>
<xs:annotation>
<xs:documentation>The compression option is a Boolean used to increase compactness using additional
computational resources. The default value "false" is assumed when the "compression" element is absent in
the EXI Options document whereas its presence denotes the value "true". When set to true, the event codes
and associated content are compressed according to 9. EXI Compression regardless of the alignment option
value. As mentioned above, the "compression" element MUST NOT appear in an EXI options document when the
"alignment" element is present.</xs:documentation>
</xs:annotation>
</xs:attribute>
<xs:attribute name='strict' type='xs:boolean' use='optional' default='false'>
<xs:annotation>
<xs:documentation>The strict option is a Boolean used to increase compactness by using a strict
interpretation of the schemas and omitting preservation of certain items, such as comments, processing
instructions and namespace prefixes. The default value "false" is assumed when the "strict" element is
absent in the EXI Options document whereas its presence denotes the value "true". When set to true, those
productions that have NS, CM, PI, ER, and SC terminal symbols are omitted from the EXI grammars, and
schema-informed element and type grammars are restricted to only permit items declared in the schemas.
A note in section 8.5.4.4.2 Adding Productions when Strict is True describes some additional restrictions
consequential of the use of this option. The "strict" element MUST NOT appear in an EXI options document
when one of "dtd", "prefixes", "comments", "pis" or "selfContained" element is present in the same
options document.</xs:documentation>
</xs:annotation>
</xs:attribute>
<xs:attribute name='preserveComments' type='xs:boolean' use='optional' default='false'>
<xs:annotation>
<xs:documentation>Comments are preserved. Must not be used together with the strict option.</xs:documentation>
</xs:annotation>
</xs:attribute>
<xs:attribute name='preservePIs' type='xs:boolean' use='optional' default='false'>
<xs:annotation>
<xs:documentation>Processing instructions are preserved. Must not be used together with the strict
option.</xs:documentation>
</xs:annotation>
</xs:attribute>
<xs:attribute name='preserveDTD' type='xs:boolean' use='optional' default='false'>
<xs:annotation>
<xs:documentation>DTD is preserved. Must not be used together with the strict option.</xs:documentation>
</xs:annotation>
</xs:attribute>
<xs:attribute name='preservePrefixes' type='xs:boolean' use='optional' default='false'>
<xs:annotation>
<xs:documentation>Prefixes are preserved. Must not be used together with the strict option.</xs:documentation>
</xs:annotation>
</xs:attribute>
<xs:attribute name='preserveLexical' type='xs:boolean' use='optional' default='false'>
<xs:annotation>
<xs:documentation>Lexical form of element and attribute values can be preserved in value content items.
Can be used together with the strict option.</xs:documentation>
</xs:annotation>
</xs:attribute>
<xs:attribute name='selfContained' type='xs:boolean' use='optional' default='false'>
<xs:annotation>
<xs:documentation>The selfContained option is a Boolean used to enable the use of self-contained elements
in the EXI stream. Self-contained elements may be read independently from the rest of the EXI body,
allowing them to be indexed for random access. The "selfContained" element MUST NOT appear in an EXI
options document when one of "compression", "pre-compression" or "strict" elements are present in the
same options document. The default value "false" is assumed when the "selfContained" element is absent
from the EXI Options document whereas its presence denotes the value "true".</xs:documentation>
</xs:annotation>
</xs:attribute>
<xs:attribute name='blockSize' type='xs:positiveInteger' use='optional' default='1000000'>
<xs:annotation>
<xs:documentation>The blockSize option specifies the block size used for EXI compression. When the
"blockSize" element is absent in the EXI Options document, the default blocksize of 1,000,000 is used.
The default blockSize is intentionally large but can be reduced for processing large documents on
devices with limited memory.</xs:documentation>
</xs:annotation>
</xs:attribute>
<xs:attribute name='valueMaxLength' type='xs:positiveInteger' use='optional'>
<xs:annotation>
<xs:documentation>The valueMaxLength option specifies the maximum length of value content items to be
considered for addition to the string table. The default value "unbounded" is assumed when the
"valueMaxLength" element is absent in the EXI Options document.</xs:documentation>
</xs:annotation>
</xs:attribute>
<xs:attribute name='valuePartitionCapacity' type='xs:positiveInteger' use='optional'>
<xs:annotation>
<xs:documentation>The valuePartitionCapacity option specifies the maximum number of value content
items in the string table at any given time. The default value "unbounded" is assumed when the
"valuePartitionCapacity" element is absent in the EXI Options document. Section 7.3.3 Partitions
Optimized for Frequent use of String Literals specifies the behavior of the string table when this
capacity is reached.</xs:documentation>
</xs:annotation>
</xs:attribute>
<xs:attribute name='sessionWideBuffers' type='xs:boolean' use='optional' default='false'>
<xs:annotation>
<xs:documentation>If set to true, all buffers, string tables, etc. will be maintained during the
entire session. This may improve performance during time since strings can be omitted in the
compressed binary stream, but it might also in some cases degrade performance since more options
are available in the tables, requiring more bits to encode strings. The default value is false,
meaning that buffers, string tables, etc., are cleared between each stanza.</xs:documentation>
</xs:annotation>
</xs:attribute>
</xs:attributeGroup>
<xs:simpleType name='MD5Hash'>
<xs:restriction base='xs:string'>
<xs:pattern value='^[0-9a-f]{32}$'/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name='Alignment'>
<xs:restriction base='xs:string'>
<xs:enumeration value='bit-packed'>
<xs:annotation>
<xs:documentation>The alignment option value bit-packed indicates that the event codes and associated
content are packed in bits without any padding in-between.</xs:documentation>
</xs:annotation>
</xs:enumeration>
<xs:enumeration value='byte-alignment'>
<xs:annotation>
<xs:documentation>The alignment option value byte-alignment indicates that the event codes and
associated content are aligned on byte boundaries. While byte-alignment generally results in EXI
streams of larger sizes compared with their bit-packed equivalents, byte-alignment may provide a
help in some use cases that involve frequent copying of large arrays of scalar data directly out
of the stream. It can also make it possible to work with data in-place and can make it easier to
debug encoded data by allowing items on aligned boundaries to be easily located in the stream.</xs:documentation>
</xs:annotation>
</xs:enumeration>
<xs:enumeration value='pre-compression'>
<xs:annotation>
<xs:documentation>The alignment option value pre-compression indicates that all steps involved
in compression (see section 9. EXI Compression) are to be done with the exception of the final
step of applying the DEFLATE algorithm. The primary use case of pre-compression is to avoid a
duplicate compression step when compression capability is built into the transport protocol. In
this case, pre-compression just prepares the stream for later compression.</xs:documentation>
</xs:annotation>
</xs:enumeration>
</xs:restriction>
</xs:simpleType>
<xs:element name='uploadSchema'>
<xs:complexType>
<xs:simpleContent>
<xs:extension base='xs:base64Binary'>
<xs:attribute name='contentType' type='ContentType' use='optional' default='Text'/>
<xs:attribute name='bytes' type='xs:positiveInteger' use='optional'/>
<xs:attribute name='md5Hash' type='MD5Hash' use='optional'/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
<xs:simpleType name='ContentType'>
<xs:restriction base='xs:string'>
<xs:enumeration value='Text'/>
<xs:enumeration value='ExiBody'/>
<xs:enumeration value='ExiDocument'/>
</xs:restriction>
</xs:simpleType>
<xs:element name='downloadSchema' type='DownloadSchema'/>
<xs:element name='downloadSchemaResponse' type='DownloadSchemaResponse'/>
<xs:complexType name='DownloadSchema'>
<xs:attribute name='url' type='xs:string' use='required'/>
</xs:complexType>
<xs:complexType name='DownloadSchemaResponse'>
<xs:complexContent>
<xs:extension base='DownloadSchema'>
<xs:choice minOccurs='0' maxOccurs='1'>
<xs:element name='httpError'>
<xs:complexType>
<xs:attribute name='code' type='xs:positiveInteger' use='required'/>
<xs:attribute name='message' type='xs:string' use='required'/>
</xs:complexType>
</xs:element>
<xs:element name='invalidUrl'>
<xs:complexType>
<xs:attribute name='message' type='xs:string' use='required'/>
</xs:complexType>
</xs:element>
<xs:element name='timeout'>
<xs:complexType>
<xs:attribute name='message' type='xs:string' use='required'/>
</xs:complexType>
</xs:element>
<xs:element name='invalidContentType'>
<xs:complexType>
<xs:attribute name='contentTypeReturned' type='xs:string' use='required'/>
</xs:complexType>
</xs:element>
<xs:element name='error'>
<xs:complexType>
<xs:attribute name='message' type='xs:string' use='required'/>
</xs:complexType>
</xs:element>
</xs:choice>
<xs:attribute name='result' type='xs:boolean' use='required'/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
</xs:schema>
]]>
</code>
</section1>
<section1 topic='Acknowledgements' anchor='ack'>
<p>Thanks to Joachim Lindborg, Yusuke Doi, Takuki Kamiya, Tina Beckman, Karin Forsell, Jeff Freund and Rumen Kyusakov for all valuable feedback.</p>
</section1>
</xep>