<remark><p>Document the ability to page through results by message UIDs, define the <archived/> element, and various minor improvements.</p></remark>
<p>An archive contains a collection of messages relevant to a particular XMPP address, e.g. a user, MUC, pubsub node, server. Note: while a service might have many "archives" as defined here (one per JID capable of being queried) this is a conceptual distinction,
and a server is not bound to any particular implementation or arrangement of data stores.</p>
<p>Exactly which messages a server archives is up to implementation and deployment policy,
but it is expected that all messages that hold meaningful content, rather than state changes such as Chat State Notifications, would be archived. Rules are specified later in this document.</p>
<p>A stored message consists of at least the following pieces of information:</p>
<ul>
<li>A timestamp of when the message was sent (for an outgoing message) or received (for
an incoming message).</li>
<li>The remote JID that the stanza is to (for an outgoing message) or from (for an
incoming message).</li>
<li>A server-assigned UID that MUST be unpredictable and unique within the archive.</li>
<li>The message stanza itself. The entire original stanza SHOULD be stored, but at a
minimum only the &BODY; tag MUST be preserved (ie. the server might, at its
discretion, strip certain extensions from messages before storage).</li>
</ul>
<p>Note that 'incoming' and 'outgoing' messages are viewed within the context of the archived JID, rather than the system as a whole. For example, if romeo@montegue.lit sent a message to juliet@capulet.lit, it would be an outgoing message in the context of archiving for Romeo, and an incoming message in the context of archiving for Juliet.</p>
<p>A server MAY impose limits on the size of an individual archive. For example a server might begin
<p>While this document talks about 'clients' and 'servers', as these are the common cases, the querying entity (referred to as a 'client') need not be an XMPP client as defined by RFC6120, but could potentially be any type of entity, and the queried entity (referred to as a 'server') need not be an XMPP server as defined by RFC6120, although access controls might prohibit any given entity from being able to access an archive.</p>
<p>To ensure that the client knows when the results are complete, the server MUST send a <fin> message. The client can optionally include a 'queryid' attribute in their query, which allows the client to match results to their initiating query, and if present in the client's query the server MUST include it in the <fin> response.</p>
<p>When querying a pubsub node's archive, the 'node' attribute is added to the <query> element.</p>
<examplecaption="A user queries a pubsub node's archive for messages"><![CDATA[
<p>If a 'with' field is present in the form, it contains a JID against which to match messages. The
server MUST only return messages if they match the supplied JID. A message in a user's archive matches if the JID matches either the to or from of the message. An item in a pubsub or MUC archive matches if the publisher of the item matches the JID; note that this should only be available to entities that would already have been allowed to know the publisher of the events (e.g. this could not be used by a visitor to a semi-anonymous MUC).</p>
<p>If the 'with' field's value is the bare JID of the archive, the server must only return results where both the 'to' and 'from' match the bare JID (either as bare or by ignoring the resource), as otherwise every message in the archive would match</p>
<p>If 'with' is omitted, the server MUST match all messages in the selected timespan with the query,
<p>When the results returned by the server are complete (that is: when they are the last page of the result set), the server MUST include a 'complete' attribute on the <fin> element, with a value of 'true'. If it is not the last page of the result set, the server MUST either omit the 'complete' attribute, or give it a value of 'false'.</p>
<examplecaption='Server completes a result with the last page of messages'><![CDATA[
<iqtype='result'id='u29303'/>
<!-- result messages -->
<message>
<finxmlns='urn:xmpp:mam:0'complete='true'>
<setxmlns='http://jabber.org/protocol/rsm'>
<firstindex='0'>23452-4534-1</first>
<last>390-2342-22</last>
<count>16</count>
</set>
</fin>
</message>
]]></example>
<p>Sometimes (e.g. due to network or storage partitioning, or other transient errors) the server might return results to a client that are unstable (e.g. they might later change in sequence or content). In such a situation the server MUST stamp the <fin> element with a 'stable' attribute with a value of 'false'. If the server knows that the data it's serving are stable it MUST either stamp a 'stable' attribute with a value of 'true', or no such attribute. An example of when unstable might legitimately be returned is if the MAM service uses a clustered data store and a query covers a time period for which the data store has not yet converged; it the server could return best-guess results and tell the client that they may be unstable. A client SHOULD NOT cache unstable results long-term without later confirming (by reissuing appropriate queries) that they have become stable.</p>
<p>In order for the client find out about additional fields the server might support, it can send an iq stanza of type='get' addressed to the archive like this:</p>
<p>Note that as the 'with', 'start' and 'end' fields MUST be implemented by servers, clients are able to submit forms using combinations of only these fields without needing to first fetch the form from the server and the types of these fields MUST be 'jid-single', 'text-single' and 'text-single' respectively. A server MUST NOT rely on a client having first requested the form before submitting queries</p>
<p>The archive results MUST be sorted in chronological order, both within the returned results and within the ordering of RSM such that if a client were to request the first 10 stanzas in an archive, then use RSM to request the next 10 stanzas, using the 'after' attribute of the 10th stanza in the first results, the 20 received stanzas would be receiving in chronological order.
<p>Different entities will have different requirements for which data are stored, as might different deployments. This section provides general rules within which a server will act. While there may be local policy restrictions that prevent archiving of some aspects discussed here, this is a RECOMMENDED baseline. A server MAY implement any subset of possible archives for JIDs it controls (although it MUST advertise support only for those JIDs that support it).</p>
<p>No requirements are placed on how a server implements its storage beyond that it has to store data sufficient to be able to comply with this document. When this document describes storage requirements (e.g. MUST NOT store more than one copy...), it refers to what would appear to have been stored in order to satisfy the query.</p>
<p>If an entity (user's server, MUC room, pubsub node, ...) rejects an incoming message (such as from an occupant not allowed to send messages to the room, a user not authorized to publish to a pubsub node, a contact blocked by the user etc.) that message should not appear in the archive for the entity that rejected it - the archive should represent what logical entities (MUC occupants, users, pubsub subscribers...) would have received, and so only contain messages accepted for delivery to such entities.</p>
<p>A user archive is anticipated to provide the user with the ability to access their prior conversations. To this end, a server SHOULD include in a user archive all of the messages a user sends or receives of type 'normal' or 'chat' that contain a <body> element. A server SHOULD also include messages of type 'groupchat' that have a <body>, but where such history is accessible through another method (e.g. through an archive on the MUC JID), a server MAY exclude these from the archive. A server MAY include additional non-conversation messages. A server MAY include messages of type 'headline', but this is not generally suggested.</p>
<p>At a minimum, the server MUST store the <body> elements of a stanza. It is suggested that other elements that are used in a given deployment to supplement conversations (e.g. XHTML-IM payloads) are also stored. Other elements MAY be stored.</p>
<p>If a server supports mechanisms that multiply copies of a stanza (e.g. Carbons, or forking a stanza to a bare JID), it MUST store such a staza within a given archive only once, irrespective of multiple connected clients receiving copies</p>
<p>A MUC archives allows a user to view the conversation within a room. All messages sent to the room that contain a <body> element SHOULD be stored, as should subject change stanzas, apart from those messages that the room rejects.</p>
<p>A MUC archive MUST store each message only once (not, for example, every copy sent out to an occupant).</p>
<p>When sending out the archives to a requesting client, the 'to' of the forwarded stanza MUST be empty, and the 'from' MUST be the occupant JID of the sender of the archived message.</p>
<p>A MUC archive MUST NOT include 'private message' results (those sent directly between occupants, not shared in the room) in the results</p>
</section3>
<section3topic="Pubsub Archives">
<p>A PubSub service offering MAM SHOULD store each of the items published to each node. When responding to MAM requests it MUST construct the message stanza within the <forwarded> element in the same manner as the notifications sent to subscribers for the item, except that specifying the 'from' 'to' and 'id' attributes are OPTIONAL.</p>
<examplecaption='Server returns a pubsub messages'><![CDATA[
The IDs used within an archive MUST be unique per item stored and MUST NOT be reused, even if the original item with a given ID has since been removed from the archive. If a server provides multiple archives (e.g. many user archives, or many MUC archives), the IDs do not need to be unique across all of these archives unless the server also allows a single query to be run across multiple archives (e.g. searching of all MUC rooms), discussion of which is beyond the scope of this document. These IDs are strings that servers may construct in any manner, and clients must treat as opaque strings (e.g. is no requirement for them to be numeric, sequenced or GUIDs).