<p>Update groupchat-messages-in-user-archive advice, introducing fields and disco features to make behaviour explicit in future implementations, in light of Last Call feedback.</p>
<remark><p>Document the ability to page through results by message UIDs, define the <archived/> element, and various minor improvements.</p></remark>
<p>An archive contains a collection of messages relevant to a particular XMPP address, e.g. a user, MUC, pubsub node, server. Note: while a service might have many "archives" as defined here (one per JID capable of being queried) this is a conceptual distinction,
and a server is not bound to any particular implementation or arrangement of data stores.</p>
<p>Exactly which messages a server archives is up to implementation and deployment policy,
but it is expected that all messages that hold meaningful content, rather than state changes such as Chat State Notifications, would be archived. Rules are specified later in this document.</p>
<p>A stored message consists of at least the following pieces of information:</p>
<ul>
<li>A timestamp of when the message was sent (for an outgoing message) or received (for
an incoming message).</li>
<li>The remote JID that the stanza is to (for an outgoing message) or from (for an
incoming message).</li>
<li>A server-assigned UID that MUST be unpredictable and unique within the archive.</li>
<p>Note that 'incoming' and 'outgoing' messages are viewed within the context of the archived JID, rather than the system as a whole. For example, if romeo@montegue.lit sent a message to juliet@capulet.lit, it would be an outgoing message in the context of archiving for Romeo, and an incoming message in the context of archiving for Juliet.</p>
<section2topic='Order of messages'anchor='archive_order'>
<p>Order within the archive MUST be preserved, where the order of messages is the same as the order that the client originally received them (or would have received them if online). Throughout this document the term 'chronological order' refers to this order, however implementors should take care not to rely on timestamps alone for
ordering messages, as multiple messages may share the same timestamp.</p>
</section2>
<section2topic='Message retention and deletion'anchor='archives_deletion'>
<p>A server MAY impose limits on the size of an individual archive. For example a server might begin
to discard old messages once the archive reaches a certain size, or only keep messages until they
reach a certain age. Any such deleted messages MUST be the oldest in the archive, i.e. it is not permitted
to create gaps or "holes" in the archive. The UIDs of deleted messages MUST NOT be reused for new messages.</p>
<p>Servers that expose archive messages of sent/received messages on behalf of local users MUST expose these archives to the user on the user's bare JID.</p>
<p>While this document talks about 'clients' and 'servers', as these are the common cases, the querying entity (referred to as a 'client') need not be an XMPP client as defined by RFC6120, but could potentially be any type of entity, and the queried entity (referred to as a 'server') need not be an XMPP server as defined by RFC6120, although access controls might prohibit any given entity from being able to access an archive.</p>
<section2topic='Communicating the archive ID'anchor='archives_id'>
<p>When a message is archived, the server MUST add an <stanza-id/> element as defined in &xep0359; to the message, which informs the recipient of where and under what ID the message is stored. When doing this the server MUST follow the business rules defined in XEP-0359. The 'by' attribute MUST be set to the address of the archive. For regular users that’s the bare JID of the account and for MUC that’s the bare JID of the room.</p>
<p>Servers MUST NOT include the <stanza-id/> element in messages addressed to JIDs that do not have permissions to access the archive, such as a users’s outgoing messages to their contacts. However servers SHOULD include the element as a child of the forwarded message when using &xep0280;</p>
<examplecaption='Client receives a message that has been archived'><![CDATA[
<p>Note: Previous versions of this protocol did not specify any interaction with stanza-id, and clients MUST NOT interpret XEP-0359 IDs in messages as archive IDs unless the server advertises support for 'urn:xmpp:mam:2' specifically.</p>
<p>To ensure that the client knows when the results are complete, the server MUST send the &IQ; result after last query result has been sent
to the client. The client can optionally include a 'queryid' attribute in their query, which allows the client to match results to their initiating query.</p>
server MUST only return messages if they match the supplied JID. A message in a user's archive matches if the JID matches either the to or from of the message. An item in a MUC archive matches if the publisher of the item matches the JID; note that this should only be available to entities that would already have been allowed to know the publisher of the events (e.g. this could not be used by a visitor to a semi-anonymous MUC).</p>
<p>To allow querying for messages the user sent to themselves, the client needs to set the 'with' attribute to the account JID. In that case, the server MUST only return results where both the 'to' and 'from' match the bare JID (either as bare or by ignoring the resource), as otherwise every message in the archive would match.</p>
<p>If any UID requested by the client in any of the 'before-id', 'after-id' or 'ids' form fields is not present in the archive, the server MUST return an item-not-found error in response to the query.</p>
<section3topic='Including groupchat results in a user archive'anchor='query-include-groupchat'>
<p>If the server advertises that it includes groupchat messages in a user's archive (see <linkurl='#support'>Determining support</link>), a client may query a user archive and request for them to be included in the result with the 'include-groupchat' field set to 'true'.
</p>
<examplecaption='Querying the archive and including groupchat messages in results'><![CDATA[
<iqtype='set'id='juliet1'>
<queryxmlns='urn:xmpp:mam:2'>
<xxmlns='jabber:x:data'type='submit'>
<fieldvar='FORM_TYPE'type='hidden'>
<value>urn:xmpp:mam:2</value>
</field>
<fieldvar='include-groupchat'>
<value>true</value>
</field>
...
</x>
</query>
</iq>
]]></example>
<p>If the server advertises that it includes groupchat messages in the archive, or it advertises that it doesn't, a client may request that they not be included by setting the 'include-groupchat' field to 'false'.</p>
<examplecaption='Querying the archive and excluding groupchat messages from results'><![CDATA[
<iqtype='set'id='juliet1'>
<queryxmlns='urn:xmpp:mam:2'>
<xxmlns='jabber:x:data'type='submit'>
<fieldvar='FORM_TYPE'type='hidden'>
<value>urn:xmpp:mam:2</value>
</field>
<fieldvar='include-groupchat'>
<value>false</value>
</field>
...
</x>
</query>
</iq>
]]></example>
<p>Note that where the client doesn't specify the 'include-groupchat' field, it is implementation-defined whether groupchat messages are included in the results (see <linkurl='#business_rules'>Business Rules</link>). Clients MUST NOT include this field where servers don't advertise support, as the server would reject such a form.</p>
<section3topic='Retrieving form fields'anchor='query-form'>
<p>In order for the client find out about additional fields the server might support, it can send an iq stanza of type 'get' addressed to the archive like this:</p>
<p>If the client understands any of the additional fields it MAY proceed to include any of them in subsequent queries. It is not required to include any or all of the supported fields in queries.</p>
<p>A special note about the 'ids' field: this field is of type 'list-multi' which typically is used to allow the client to select from a provided list of options. In this case the list of all possible ids MUST NOT be provided by the server, as it is likely to be extremely large. Instead the server MUST include a &xep0122;<validate/> element that signals the list is open to arbitrary values provided by the client.</p>
<p>As specified in &xep0068;, names of custom fields SHOULD use Clark notation to avoid conflicts with other extensions.</p>
<examplecaption="Client uses two discovered query fields in a query"><![CDATA[
<p>Note that as the 'with', 'start' and 'end' fields MUST be implemented by servers, clients are able to submit forms using combinations of only these fields without needing to first fetch the form from the server and the types of these fields MUST be 'jid-single', 'text-single' and 'text-single' respectively. A server MUST NOT rely on a client having first requested the form before submitting queries.</p>
<p>The archive results MUST be sorted in chronological order, both within the returned results and within the ordering of RSM such that if a client were to request the first 10 stanzas in an archive, then use RSM to request the next 10 stanzas (by providing the 'after' element with the UID of the 10th stanza in the first results) all 20 result stanzas would be received in chronological order.
<p>Note: There is no concept of an "open query", and servers MUST be prepared to receive arbitrary page requests at any time.</p>
<p>RSM does not define the behaviour of including both <before> and <after> in the same request. To retrieve a range of items between two known ids, use before-id and after-id in the query form instead.</p>
<p>If the UID contained within an <after> or <before> element is not present in the archive, the server MUST return an item-not-found error in response to the query.</p>
<examplecaption='Message id not found in archive'><![CDATA[
<p>When the results returned by the server are complete (that is: when they have not been limited by the maximum size of the result page (either as specified or enforced by the server)), the server MUST include a 'complete' attribute on the <fin> element, with a value of 'true'; this informs the client that it doesn't need to perform further paging to retreive the requested data. If it is not the last page of the result set, the server MUST either omit the 'complete' attribute, or give it a value of 'false'.</p>
<examplecaption='Server completes a result with the last page of messages'><![CDATA[
<!-- result messages -->
<iqtype='result'id='u29303'>
<finxmlns='urn:xmpp:mam:2'complete='true'>
<setxmlns='http://jabber.org/protocol/rsm'>
<firstindex='0'>23452-4534-1</first>
<last>390-2342-22</last>
<count>16</count>
</set>
</fin>
</iq>
]]></example>
<p>Sometimes (e.g. due to network or storage partitioning, or other transient errors) the server might return results to a client that are unstable (e.g. they might later change in sequence or content). In such a situation the server MUST stamp the <fin> element with a 'stable' attribute with a value of 'false'. If the server knows that the data it's serving are stable it MUST either stamp a 'stable' attribute with a value of 'true', or no such attribute. An example of when unstable might legitimately be returned is if the MAM service uses a clustered data store and a query covers a time period for which the data store has not yet converged; it the server could return best-guess results and tell the client that they may be unstable. A client SHOULD NOT cache unstable results long-term without later confirming (by reissuing appropriate queries) that they have become stable.</p>
</section3>
<section3topic='Requesting the last page'>
<p>To request the page at the end of the archive (i.e. the most recent messages), include just an empty <before/> element in the RSM part of the query. As defined by RSM, this will return the last page of the archive.</p>
<examplecaption='A request for the last page in an archive'><![CDATA[
<p>Within the returned page, all results are still in chronological order, that is, the first result you receive will be the oldest item in the page, and the last result you receive will be the last item in the archive.</p>
<p>When planning a query, a client may wish to learn the current state of the archive. This includes information about the first/last entries in the archive.</p>
<p>When the archive advertises support for 'urn:xmpp:mam:2#extended' then the archive supports queries for this metadata via an iq of type 'get' to the
archive's address, with a <metadata/> payload in the 'urn:xmpp:mam:2' namespace.</p>
<p>The server response includes a <metadata/> element containing information about the archive. This element MUST include <start/> and <end/>
elements, which each have an 'id' and XEP-0082 formatted 'timestamp of the first and last messages in the archive respectively.</p>