<remark><p>Modified cid generation rules to use a hash of the data instead of a UUID (of the form algo+hash@bob.xmpp.org); modified caching rules to typically base checking on the hash, not the sender JID.</p></remark>
<remark><p>Simplified the protocol; removed fetch element because the cid: URI uniquely identifies the data; changed the name of the protocol to something more catchy.</p></remark>
</revision>
<revision>
<version>0.6</version>
<date>2008-08-06</date>
<initials>psa/ps</initials>
<remark><p>More clearly described recommended protocol and usage; added fetch element to diambiguate data from reference; cleaned up text throughout.</p></remark>
<remark><p>Removed alt attribute; more clearly specified where to include the data element in message, presence, and IQ stanzas; moved use cases to other specifications; removed service discovery features; modified examples.</p></remark>
<remark><p>Generalized text regarding inclusion of parameters in type attribute per RFC 2045; added max-age attribute, matching semantics from RFC 2965; added section on caching of data; more clearly specified generation of Content-ID.</p></remark>
<remark><p>Initial published version.</p></remark>
</revision>
<revision>
<version>0.0.4</version>
<date>2008-01-29</date>
<initials>psa</initials>
<remark><p>Separately described service discovery feature for inclusion of the data element in file previews.</p></remark>
</revision>
<revision>
<version>0.0.3</version>
<date>2007-12-27</date>
<initials>psa</initials>
<remark><p>Described use cases for previewing data to be exchanged in file transfers and for inclusion of media information in data forms.</p></remark>
</revision>
<revision>
<version>0.0.2</version>
<date>2007-12-18</date>
<initials>psa</initials>
<remark><p>Changed syntax to not use data: URL scheme; added cid and type attributes; described use cases for messaging and data retrieval.</p></remark>
<p>Sometimes it is desirable to include a small bit of binary data in an XMPP stanza. Typical use cases might be to include icon or emoticon in a message, a thumbnail in a file transfer request, a rasterized image in a whiteboarding session, or a small bit of media in a data form. Currently, there is no lightweight method for including such data in an XMPP stanza, since existing methods (e.g., &xep0047;) are designed for larger blobs of data and therefore require some form of negotiation (e.g., via &xep0096; or &xep0234;).</p>
<p>This document specifies just such a lightweight method. The key building blocks are:</p>
<ol>
<li>A Content-ID ("cid") that uniquely identifies the data.</li>
<li>A <data/> element (similar to the data: URL scheme defined in &rfc2397;) that enables the sender and recipient to exchange the data identified by the cid.</li>
<p>The RECOMMENDED approach is for the sender to include the cid when communicating with the recipient. The recipient SHOULD then check its cache of data to determine if the data identified by that cid is cached. If the data is cached, the recipient would then load its cached data. If the data is not cached, the recipient would then retrieve the data by sending an IQ-get to the sender (or potentially some other entity) containing an empty <data/> element whose 'cid' attribute specifies the data to be retrieved, to which the sender would reply with an IQ-result containing a <data/> element whose XML character data provides the binary data.</p>
<p>The <data/> element MUST be used only to encapsulate small bits of binary data and MUST NOT be used for large data transfers. Naturally the definitions of "small" and "large" are rather loose. In general, the data SHOULD NOT be more than 8 kilobytes, and dedicated file transfer methods (e.g., &xep0065; or &xep0047;) SHOULD be used for exchanging blobs of data larger than 8 kilobytes. However, implementations or deployments MAY impose their own limits.</p>
<p>If the data to be shared is particularly small (e.g., less than 1k), then the sender MAY send it directly by including a <data/> element directly in a &MESSAGE;, &PRESENCE;, or &IQ; stanza. The following rules apply:</p>
<ol>
<li>When the <data/> element is directly included in an XMPP &MESSAGE; or &PRESENCE; stanza, it SHOULD be a first-level child of the stanza.</li>
<li>When the <data/> element is directly included in an XMPP &IQ; stanza, it MUST be a child of the appropriate first-level child (since the IQ stanza must not include more than one first-level child).</li>
<li>When the <data/> element is used to retrieve the data from the sender as described under <linkurl='#retrieving'>Retrieving Uncached Data</link>, it MUST be a first-level child of the stanza.</li>
<p>The sender can refer to data that it hosts by including a cid in the data it sends. The following example shows how to include the cid in &xep0071; but any appropriate format can be used, such as &xep0221;.</p>
<examplecaption='An XHTML-IM message with a cid'><![CDATA[
<p>Data is requested and transferred using the XMPP &IQ; stanza type by making reference to the cid. In particular, the recipient requests the binary data by sending an IQ-get containing an empty <data/> element with a 'cid' attribute that matches the cid URI previously communicated.</p>
<p>It is RECOMMENDED for the recipient to cache data; however, the recipient MAY opt not to cache data, for example because it runs on a device that does not have sufficient space for data storage.</p>
<p>The default behavior is for the recipient to cache the data only for the life of the entity's application session (not a client's presence session with the server or the controlling user's communication session with the contact from whom the user received the data); that is, the recipient would clear the cache when the application is terminated or restarted.</p>
<p>As a hint regarding the suggested period for caching the data, the sender MAY include a 'max-age' attribute whenever it sends a <data/> element. The meaning of the 'max-age' attribute exactly matches that of the Max-Age attribute from <cite>RFC 2965</cite>.</p>
<p>If it is not suggested to cache the data (e.g., because it is ephemeral), the value of the 'max-age' attribute MUST be "0" (the number zero).</p>
<p>A recipient SHOULD cache data based on the hash of the data as encapsulated in the cid. However, if a hash cannot be extracted from the cid, if the recipient does not support the hashing algorithm used, or the recipient does not support hashes, then the recipient SHOULD cache based on the JID of the sender.</p>
<p>To exchange binary data, the data is encapsulated as the XML character data of a <data/> element qualified by the 'urn:xmpp:bob' namespace, where the data MUST be encoded as Base64 in accordance with Section 4 of &rfc4648; (note: the Base64 output MUST NOT include whitespace and MUST set the number of pad bits to zero).</p>
<td>A Content-ID that can be mapped to a cid: URL as specified in &rfc2111;. The 'cid' value SHOULD be of the form algo+hash@bob.xmpp.org, where the "algo" is the hashing algorithm used (e.g., "sha1" for the SHA-1 algorithm as specified in &rfc3174;) and the "hash" is the hex output of the algorithm applied to the binary data itself.</td>
<td>A suggestion regarding how long (in seconds) to cache the data; the meaning matches the Max-Age attribute from &rfc2965;.</td>
<td>RECOMMENDED</td>
</tr>
<tr>
<td>type</td>
<td>The value of the 'type' attribute MUST match the syntax specified in &rfc2045;. That is, the value MUST include a top-level media type, the "/" character, and a subtype; in addition, it MAY include one or more optional parameters (e.g., the "audio/ogg" MIME type in the example shown below includes a "codecs" parameter as specified in &rfc4281;). The "type/subtype" string SHOULD be registered in the &ianamedia;, but MAY be an unregistered or yet-to-be-registered value.</td>
<p>If an entity supports the protocol specified herein, it MUST advertise that fact by returning a feature of "urn:xmpp:bob" in response to &xep0030; information requests.</p>
<p>In order for an application to determine whether an entity supports this protocol, where possible it SHOULD use the dynamic, presence-based profile of service discovery defined in &xep0115;. However, if an application has not received entity capabilities information from an entity, it SHOULD use explicit service discovery instead.</p>
<p>The ability to include arbitrary binary data implies that it is possible to send scripts, applets, images, and executable code, which may be potentially harmful. To reduce the risk of such exposure, an implementation MAY choose to not display or process such data but instead either completely ignore the data, show only the value of the 'alt' attribute, or prompt a human user for approval (either explicitly via user action or implicitly via a list of approved entities from whom the user will accept binary data without per-event approval).</p>