<title>Stateless Inline Media Sharing (SIMS)</title>
<abstract>This specification describes a protocol for stateless asynchronous media sharing with integrity and transport flexibility. It allows clients to provide a good interoperable user experience in combination with Carbons and MAM.</abstract>
<p>File sharing in XMPP has mainly been addressed by synchronous solutions like &xep0096; and &xep0234;.
However, these extensions only address the transfer of files and there is more to file sharing than the simple transfer of the data.</p>
<p>Extentions that go beyond the simple transfer of data are &xep0329; and &xep0363;. XEP-0329 allows sharing folder structures to other users, allowing them to browse the remote folder and fetch interesting files using existing file-transfer protocols. XEP-0363 describes a protocol to ask a server component for a HTTP storage URL where a client can use HTTP PUT to save a file to and afterwards share the public URL with other users to share the file. While this provides some form of asynchronus file sharing it does not provide integrity protection and requires a server component.</p>
<p>This proposal aims to provide a protocol that will enable XMPP clients to implement a great user experience (UX) around the process of sharing media in conversations. Shared media can take any form of static media like photos, videos, documents, compresses archives, etc.
This is directly refelected in the requirents of this extension lined out in the following sections.</p>
<p>The state of sharing media with chat partners in the XMPP community is a protocol zoo in 2016. There are three major protocols for sharing media in XMPP.</p>
<p>&xep0231; is designed for small media, i.e. less than 8 KB in size, that is hosted server-side and transferred Base64 encoded in-band of an existing XMPP stream. Example use-cases are custom emoticons that are referenced in &xep0071; img-tags, or thumbnails for &xep0234;.</p>
<p>&xep0234; describes a peer-to-peer protocol for synchronous file-transfer between two XMPP entities. It attempts a direct transmission, followed by a proxied transmission, via &xep0260;. If neither works it will fallback to &xep0261; which will transfer the data inband of the exsiting XMPP stream.</p>
<p>&xep0363; was designed as a simpler to implement alternative to &xep0231;. This is achieved by reusing the HTTP APIs in todays mobile and language SDKs. It requires a server component where clients can request HTTP URLs to upload data to and share the corresponding download URL as part of plain text in a conversation.</p>
</section2>
<section2topic='Comparison of File Transfer Protocols'>
<tablecaption='Comparison of File Transfer Protocols in 2016'>
<p>The file element is the same as from &xep0234;. It MUST specify media-type, size, description, and one or multiple hash elements as described in &xep0300;.
The hash elements are essential as they provide end-to-end file integrity and allow efficient caching and flexible retrieval methods.</p>
</section2>
<section2topic='Receiving a shared photo'>
<p>On receive of a reference to a <media-sharing> element inside a message, a client SHOULD lookup in a local storage, whether the media with any of the proivded hashes has already
been retrieved and is available. In that case no transfer needs to be initated and the image can be displayed in-line of the chat.</p>
<p>If the media file is not available locally, the media file can be obtained by one of the references in the <sources> element. If a client support HTTP downloads, it can simply download HTTP references.</p>
<p>If not, it can fetch the media file via a &xep0358; URI reference in the sources and initiate a Jingle File-Transfer. If the client does not support &xep0358;, it can attempt fetching the media file via &xep0234; by using the hash elements in the file element as described in <linkurl='http://xmpp.org/extensions/xep-0234.html#requesting'>Jingle File-Transfer</link>.</p>
<p>A client MAY retrieve the file from other sources than these mentioned in the sources element. This may be via &xep0234; from the senders' other resources or from a
media caching service located at the local service. The standardization of such cache is out of scope for this document.</p>
<p>Regardless of the transport method used to obtain the file, the received content MUST be verified against one of the hashes.
<p>This XEP delegates actual transport of the media data to one of the existing file-transfer XEPs. Thus a client supporting this XEP MUST implement &xep0234; and &xep0363;.</p>
<p>If a users server supports &xep0363;, it SHOULD upload the file to the service and add the retrieval URL to the <sources> tag, unless the user specifically asked to not store media in the cloud.</p>
<p>Using &xep0363; for media file transfer highly increases the UX, since the HTTP server has a higher availability than XMPP end-user clients and can easily handle the load of lots of requests that result from sharing media in &xep0045; and &xep0369; rooms.</p>
</section2>
<section2topic='Media Support'>
<p>Sharing the raw data of media does not provide a complete user experience. Clients ideally need to be able to display the media inline of the chat. For this we set baseline requirements for audio, video and picture formats, that a client supports to display. These requrirements are shown in the following table.</p>
<p>A client usually will always send in one format per media type, if it creates that media itself.</p>
<tablecaption='Mandated encoding and in-line display support.'>
<tr>
<th>Media Type</th>
<th>Mime Type</th>
<th>Format/Container</th>
<th>Codec</th>
<th>Requirement</th>
<th>Comment</th>
</tr>
<tr>
<td>Audio</td>
<td>audio/m4a</td>
<td>MPEG4</td>
<td>AAC</td>
<td>SHOULD</td>
<td>Can be encoded/decoded by stock <linkurl='https://developer.android.com/guide/appendix/media-formats.html'>Android</link> and <linkurl='https://developer.apple.com/library/content/documentation/Miscellaneous/Conceptual/iPhoneOSTechOverview/MediaLayer/MediaLayer.html'>iOS</link> systems.</td>
</tr>
<tr>
<td>Image</td>
<td>image/jpeg</td>
<td>-</td>
<td>JPEG</td>
<td>SHOULD</td>
<td>Supported on common desktop and mobile systems. Use for photos.</td>
</tr>
<tr>
<td>Image</td>
<td>image/png</td>
<td>-</td>
<td>PNG</td>
<td>SHOULD</td>
<td>Supported on common desktop and mobile systems. Use for non-photos.</td>
</tr>
<tr>
<td>Video</td>
<td>image/gif</td>
<td>-</td>
<td>GIF</td>
<td>SHOULD</td>
<td>Widespread history animation format supported everywhere.</td>
</tr>
<tr>
<td>Video</td>
<td>video/mp4</td>
<td>MPEG4</td>
<td>H.264 AVC</td>
<td>SHOULD</td>
<td>Can be encoded/decoded by stock <linkurl='https://developer.android.com/guide/appendix/media-formats.html'>Android</link> and <linkurl='https://developer.apple.com/library/content/documentation/Miscellaneous/Conceptual/iPhoneOSTechOverview/MediaLayer/MediaLayer.html'>iOS</link> systems.</td>
</tr>
</table>
</section2>
<section2topic='Atomatic retrieval of shared media'>
<p>Depending on the size of the shared media, a client MAY want to automatically download and display the media instead of fetching and displaying the thumbnail. The size threshold depends on the network environment the client currently runs in.</p>
<p>If a client supports automatic retrieval it MUST disclose this feature to the end user and provide a way to disable it, as it may result in high network traffic.</p>
</section2>
<section2topic='MUC and MIX related rules'>
<p>In cases where media is shared in a &xep0045; or &xep0369; room the sender has to expect that a large number of clients may retrieve the shared media automatically. Ideally multiple sources, including HTTP or other high availability sources, are provided in the <sources> tag of the <media-sharing> tag in case the media is shared in a MUC/MIX room.</p>
<pclass='box'>TODO: Describe protocol for MIX members to advertise media availabililty to peers in a dedicated MIX channel PubSub node. Maybe as a dedicated XEP.</p>
<p>For the media sharing described in this XEP to work, it is REQUIRED for MAM to store the whole stanza instead of only the body content. If the MAM component of the user's server strips away the <media-sharing> tag, any shared media will be missing in archived messages.</p>
<p>If sensitve media is shared a client MAY add relevant hints for the server via &xep0334;.</p>
</section2>
<section2topic='XHTML-IM related rules'>
<p>To refer to shared media in a XHTML-IM message, this XEP takes advantage of the requirement for hash elements in the file metadata and &rfc6920; and its ni URI format. Using the URI format, XHTML-IM can easily refer to media that is attached to a message via a <media-sharing> element, as shown in the following example.</p>
<examplecaption='Sharing an image with a contact'><![CDATA[<message to='julient@shakespeare.lit' from='romeo@montague.lit'>
<body>Look at the nice view from the summit.</body>
<htmlxmlns='http://jabber.org/protocol/xhtml-im'>
<bodyxmlns='http://www.w3.org/1999/xhtml'>
<p>Look at the nice <pstyle='font-weight:bold;display:inline">view</p> from the summit.</p>
<p>This way the client can aquire the content addressable resource mentioned in the img-tag in the XHTML-IM message, and when finished show in in the rendered XHTML-IM message.</p>
<p>The size element in the file element provides clients to automatically load small files and if not provide the users with a hint on how long a transfer might take.</p>
<p>The OPTIONAL thumbnail element in the file element improves the user experience as it provides further hints for users on whether the file could be of interest to them.</p>
<p>The desc element in the file element is criticial for clients to enable them to provide accessibility to users who use screen readers.</p>
<section2topic='Clearing of privacy sensitive metadata'>
<p>Mobile devices are able to attach the geographic location of where a photo was taken to the photo. It is RECOMMENDED that a client implementing this XEP attempts to detect privacy exposing metadata in media shared and if found provides the user with an option to clear the media of such metadata.</p>
</section2>
<section2topic='The value and cost of end-to-end media integrity'>
<p>Requiring end-to-end media integrity prevents trival server side optimizations or other processing on shared media as it will change the cryptographic hash of the media file. On the other hand, requring a matching cryptographic hash guarantees that everybody sees the exact same media a user has shared in a group conversation.</p>