%ents; ]>
File Information Sharing This document specifies a simple extension to existing protocols that allows an entity to request information about files. &LEGALNOTICE; xxxx ProtoXEP Standards Track Standards Council XMPP Core XEP-0234 XEP-0059 XEP-0265 XEP-0055 XEP-0135 fis Jefry Lagrange jefry.reyes@gmail.com j.lagrange@jabber.org j.lagrange@gajim.org 0.0.2 2012-10-14 jl Second draft

XMPP extensions provide ways of transferring files between peers (such as &xep0234; and &xep0096;). However, file transfer is currently limited to needing that the transfer be initiated by the hosting entity. The &xep0234; extension, provides for a way to request files, but it requires the requesting entity to have information about the file being requested, so that it can be uniquely identified.

This document defines an extension which allows the request of information of files being offered by a hosting entity so they can later be requested in a file transfer; If the requesting entity is interested in the file.

IRC users have been able to bypass the limitations of the protocol by using bots that provide information of files and initiate transfer on command. A major downside of this method is that not every user is capable of sharing its files. The aim of this document is to provide a similar functionality while making it easier for users to offer and request information about files.

Microsoft's MSN proprietary IM client, used to provide similar functionality using "Sharing Folders", but this was replaced by Windows Live SkyDrive

  1. Enable a requesting entity to traverse the shared directory of an offering entity (REQUIRED)
  2. Enable a requesting entity to get detailed information about files. (REQUIRED)
  3. Enable a requesting entity to search files hosted by an offering entity.(OPTIONAL)
  4. Allow the use of MUCs to share information about files among the users.(OPTIONAL)

This protocol assumes the existence of a shared directory (either virtual or physical). The hosting entity must not advertise empty directories. The hosting entity is responsible of maintaining the structure of that directory (such as not allowing two files with the same name and preventing cycles within directories). The hosting entity is in no way required to present the same shared directory to different requesters.

If a requesting entity wishes to traverse the shared folder of an offering entity. It can do so by querying the root directory as it is shown in the following example:

]]>

If the offering entity has files to share, it MUST respond with the root-level files of its shared folder. Files and directories at the root level MUST not be the child of any "directory" tag. In order to save bandwidth, the offering entity MAY omit all the children of the file tag except the "name" which is required and MUST always be present.

The file tag has the same attributes as defined in &xep0096;. The "name" attribute is required and must be included in every response. The "size" attribute is only required when responding with detailed information about a file.

1969-07-21T02:56:15Z This is a test. If this were a real file... test.txt 1022 552da749930852c69ae5d2141d3766b1 test2.txt ]]>

if the offering entity has no files to offer

]]>

If the requesting entity wants to get detailed information about a file. It can do so by providing its full path.

test2.txt ]]>

When replying with detailed information about a file, the offering entity must always include the "name" and "size".

test2.txt 1000 ]]> ]]> test3.png test4.png ]]>

Sometimes the list of files is too big to be efficiently traversed or there are too many peers offering files. This extensions allows simple file search to ease the discovery of files.

The requesting entity can request the fields on which a search can be performed.

]]>

The offering entity replies with its search fields. The fields in the following example are required, and they MUST be supported.

]]>

Different files have different metadata and all of them may not be completly covered by the fields mentioned before. Of course, any field that doesn't fit could be included in "desc", however this can make search difficult as it is not clearly defined what to look for.

Fields can be extended using dataforms, as it is defined in &xep0055;

jabber:iq:search ]]> jabber:iq:search image ]]> jabber:iq:search pics/test3.png pics/test4.png ]]>

If a requesting entity wishes to search for a particular keyword in the files name, it only needs to send the keyword within the 'name' tag and it MUST not be a full path.

The requester may also use the 'desc' to match keywords.

test ]]> test.txt test2.txt pics/test3.png pics/test4.png ]]>

When simple keyword matching is not enough to efficiently search files, regular expressions (as defined in &xep0004;) can be used

jabber:iq:search *.png ]]> pics/test3.png pics/test4.png ]]>

For the most part, discovering files in a MUC is exactly the same as what has been described in this document. However there are some considerations to have present.

First, it is RECOMMENDED that a participant in a MUC should have a single shared folder associated with the entire room, as opposed to advertise different files to different participants of the room. This is to reduce the complexity of the client software. Also, due to volatile nature of the participants in a room, keeping track of permissions is more trouble than what it is worth.

Second, a participant may discover files of all the participants in the room by sending the request to the room itself. It is RECOMMENDED that the search capabilities of this protocol be used for this.

If a considerable amount of files are being shared by the offering entity, it may be the case that the offering entity response might be too be for the server to handle; As there might be a limitation on the size of the stanzas in the current stream. In order to solve this, extensions have been devised and their implementation are hereby recommended along with the implementation of this extension.

&xep0059; defines a way of limiting the results of a request. There are some considerations to use &xep0059; along with this extension.

First, in &xep0059; it is defined that the requesting entity is the one that sets the limit of the number of items that can be replied. So it is up to the requesting entity to choose a sensible number.

Second, since this protocol defines a way of handling the directory tree structure by allowing files tags to be children of a directory tags, it becomes difficult to define items for &xep0059;. Therefore, when responding to a request using &xep0059;, the offering entity MUST NOT send directory tags with files as their children, file tags must be sent separately with their path (starting at the root shared folder) in their name attribute.

test.txt test2.txt pics/test3.png pics/test4.png ]]>

One obvious way to overcome the limitations of sending large stanzas in-band, is to transfer that information out of band. &xep0065; could be used for that purpose. It is hereby RECOMMENDED its implementation when the offering entity has a massive amount of files that would not be practical to advertise in-band.

It is further recommended that when using XEP-0065, the entire directory structure, along with all the files in the shared folder and subfolders, be exchanged in one single reply. Also, all the files attributes should be included. This is to avoid wasting bandwidth initiating out of band streams going back and forth.

As it was previously discussed, when requesting detailed information about a file, only the "name" and "size" attributes required, but it is strongly RECOMMENDED that the hash attribute be included, in order to reduce the chances of sending the wrong file. When requesting the file to be transferred using &xep0234;, the information that must be provided has to identify the file uniquely. It is then RECOMMENDED that when requesting a file, the full path of the file in the shared folder be included in the "name" attribute.

pics/test4.png 10740 ]]>

A denial of service is possible by repeatedly requesting files. Implementers are advised to take this into consideration and include queues and limits into their implementations.