%ents; ]>
OMEMO Encryption This specification defines a protocol for end-to-end encryption in one-on-one chats that may have multiple clients per account. &LEGALNOTICE; 0384 Deferred Standards Track Standards Council XMPP Core XEP-0163 OMEMO Andreas Straub andy@strb.org andy@strb.org 0.3.0 2018-07-31 egp

Make examples show items published to the id "current", as per XEP-0060 §12.20.

0.2.2 2018-11-03 pep Fix a bunch of typos, batch-style. 0.2.1 2018-05-21 mb Fix attribute names in schema 0.2 2017-06-02 dg

Depend on SignalProtocol instead of Olm.

Changed to eu.siacs.conversations.axolotl Namespace which is currently used in the wild

0.1 2016-12-07 XEP Editor: ssw

Initial version approved by the council.

0.0.2 2016-09-22 ssw, dg

Depend on Olm instead of Axolotl.

0.0.1 2015-10-25 as

First draft.

There are two main end-to-end encryption schemes in common use in the XMPP ecosystem, Off-the-Record (OTR) messaging (&xep0364;) and OpenPGP (&xep0027;). OTR has significant usability drawbacks for inter-client mobility. As OTR sessions exist between exactly two clients, the chat history will not be synchronized across other clients of the involved parties. Furthermore, OTR chats are only possible if both participants are currently online, due to how the rolling key agreement scheme of OTR works. OpenPGP, while not suffering from these mobility issues, does not provide any kind of forward secrecy and is vulnerable to replay attacks. Additionally, PGP over XMPP uses a custom wireformat which is defined by convention rather than standardization, and involves quite a bit of external complexity.

This XEP defines a protocol that leverages the SignalProtocol encryption to provide multi-end to multi-end encryption, allowing messages to be synchronized securely across multiple clients, even if some of them are offline. The SignalProtocol is a cryptographic double ratched protocol based on work by Trevor Perrin and Moxie Marlinspike first published as the Axolotl protocol. While the protocol itself has specifications in the public domain, the protobuf-based wire format of the signal protocol is not fully documented. The signal protocol currently only exists in GPLv3-licensed implementations maintained by OpenWhisperSystems.

The general idea behind this protocol is to maintain separate, long-standing SignalProtocol-encrypted sessions with each device of each contact (as well as with each of our other devices), which are used as secure key transport channels. In this scheme, each message is encrypted with a fresh, randomly generated encryption key. An encrypted header is added to the message for each device that is supposed to receive it. These headers simply contain the key that the payload message is encrypted with, and they are separately encrypted using the session corresponding to the counterpart device. The encrypted payload is sent together with the headers as a <message> stanza. Individual recipient devices can decrypt the header item intended for them, and use the contained payload key to decrypt the payload message.

As the encrypted payload is common to all recipients, it only has to be included once, reducing overhead. Furthermore, SignalProtocols’s transparent handling of messages that were lost or received out of order, as well as those sent while the recipient was offline, is maintained by this protocol. As a result, in combination with &xep0280; and &xep0313;, the desired property of inter-client history synchronization is achieved.

OMEMO currently uses version 3 SignalProtocol. Instead of a Signal key server, &xep0163; (PEP) is used to publish key data.

Device
A communication end point, i.e. a specific client instance
OMEMO element
An <encrypted> element in the urn:xmpp:omemo:1 namespace. Can be either MessageElement or a KeyTransportElement
MessageElement
An OMEMO element that contains a chat message. Its <payload>, when decrypted, corresponds to a <message>'s <body>.
KeyTransportElement
An OMEMO element that does not have a <payload>. It contains a fresh encryption key, which can be used for purposes external to this XEP.
Bundle
A collection of publicly accessible data that can be used to build a session with a device, namely its public IdentityKey, a signed PreKey with corresponding signature, and a list of (single use) PreKeys.
rid
The device id of the intended recipient of the containing <key>
sid
The device id of the sender of the containing OMEMO element
IdentityKey
Per-device public/private key pair used to authenticate communications
PreKey
A Diffie-Hellman public key, published in bulk and ahead of time
PreKeySignalMessage
An encrypted message that includes the initial key exchange. This is used to transparently build sessions with the first exchanged message.
SignalMessage
An encrypted message

This protocol uses the DoubleRatchet encryption mechanism in conjunction with the X3DH key exchange. The following section provides detailed technical information about the protocol that should be sufficient to build a compatible OMEMO implementation. Readers who do not intend to build an OMEMO-compatible library can safely skip this section, relevant details are repeated where needed.

The X3DH key exchange is specified here and placed under the public domain. OMEMO uses this key exchange mechanism with the following parameters/settings:

curve
X25519
hash function
SHA-256
info string
"OMEMO X3DH"
byte-encoding of the public keys
the default as used by most crypto libraries TODO
signed pre-key rotation period
Signed pre-keys SHOULD be rotated periodically once a week to once a month. A faster or slower rotation period should not be required.
time to keep the private key of the old signed pre-key after rotating it
The private key of the old signed pre-key SHOULD be kept for another rotation period as defined above, to account for delayed messages using the old signed pre-key.
number of pre-keys to provide in the bundle
The bundle SHOULD always contain around 100 pre-keys.
minimum number of pre-keys to provide in the bundle
The bundle MUST always contain at least 25 pre-keys.
associated data
The associated data is created by concatenating the identity keys of Alice and Bob: AD = Encode(IK_A) || Encode(IK_B)
XEdDSA
To reduce the amount of bytes that have to be transferred, the key exchange uses XEdDSA on curves X25519/Ed25519 (aka XEd25519) to derive signing keys from encryption keys.

NOTE: OMEMOMessage.proto and OMEMOAuthenticatedMessage.proto refer to the protobuf structures as defined here.

The DoubleRatchet protocol is specified here and placed under the public domain. OMEMO uses this protocol with the following parameters/settings:

ratchet initialization
The double ratchet is initialized using the shared secret, ad and public keys as yielded by the X3DH key exchange, as explained in the double ratchet specification.
MAX_SKIP
It is RECOMMENDED to keep around 1000 skipped message keys.
deletion policy for skipped message keys
Skipped message keys MUST be stored until MAX_SKIP message keys are stored. At that point, keys are discarded on a LRU basis to make space for new message keys. Implementations SHOULD not keep skipped message keys around forever, but discard old keys on a different implementation-defined policy. It is RECOMMENDED to base this policy on deterministic events rather than time.
authentication tag truncation
Authentication tags are truncated to 16 bytes/128 bits.
CONCAT(ad, header)
CONCAT(ad, header) = ad || OMEMOMessage.proto(header) NOTE: the OMEMOMessage.proto is initialized without the ciphertext, which is optional. NOTE: Implementations are not strictly required to return a parseable byte array here, as the unpacked/parsed data is required later in the protocol.
KDF_RK(rk, dh_out)
HKDF-SHA-256 using the rk as HKDF salt, dh_out as HKDF input material and "OMEMO Root Chain" as HKDF info.
KDF_CK(ck)
HMAC-SHA-256 using ck as the HMAC key, a single byte constant 0x01 as HMAC input to produce the next message key and a single byte constant 0x02 as HMAC input to produce the next chain key.
ENCRYPT(mk, plaintext, associated_data)
The encryption step uses authenticated encryption consisting of AES-256-CBC with HMAC-SHA-256.
  1. Use HKDF-SHA-256 to generate 80 bytes of output from the message key by providing mk as HKDF input, 256 zero-bits as HKDF salt and "OMEMO Message Key Material" as HKDF info.
  2. Divide the HKDF output into a 32-byte encryption key, a 32-byte authentication key and a 16 byte IV.
  3. Encrypt the plaintext (which is a 16 bytes key as specified here) using AES-256-CBC with PKCS#7 padding, using the encryption key and IV derived in the previous step.
  4. Split the associated data as returned by CONCAT into the original ad and the OMEMOMessage.proto structure.
  5. Add the ciphertext to the OMEMOMessage.proto structure.
  6. Serialize the ad and the OMEMOMessage.proto structure into a parseable byte array by concatenating ad and the serialized protobuf structure.
  7. Calculate the HMAC-SHA-256 using the authentication key and the input material as derived in the steps above.
  8. Put the OMEMOMessage.proto structure and the HMAC into a new OMEMOAuthenticatedMessage.proto structure.

The contents are encrypted and authenticated using a combination of AES-256-CBC and HMAC-SHA-256.

  1. Generate 16 bytes of cryptographically secure random data, called key in the remainder of this algorithm.
  2. Encrypt this key using the double ratchet as specified above, once for each intended recipient.
  3. Use HKDF-SHA-256 to generate 80 bytes of output from the key by providing the key as HKDF input, 256 zero-bits as HKDF salt and "OMEMO Payload" as HKDF info.
  4. Divide the HKDF output into a 32-byte encryption key, a 32-byte authentication key and a 16 byte IV.
  5. Encrypt the plaintext using AES-256-CBC with PKCS#7 padding, using the encryption key and IV derived in the previous step.
  6. Calculate the HMAC-SHA-256 using the authentication key and the ciphertext from the previous steps.

The first thing that needs to happen if a client wants to start using OMEMO is they need to generate an IdentityKey and a Device ID. The IdentityKey is a &curve25519; public/private Key pair. The Device ID is a randomly generated integer between 1 and 2^31 - 1.

In order to determine whether a given contact has devices that support OMEMO, the devicelist node in PEP is consulted. Devices MUST subscribe to 'urn:xmpp:omemo:1:devices' via PEP, so that they are informed whenever their contacts add a new device. They MUST cache the most up-to-date version of the devicelist.

]]>

In order for other devices to be able to initiate a session with a given device, it first has to announce itself by adding its device ID to the devicelist PEP node.

It is RECOMMENDED to set the access model of the ‘urn:xmpp:omemo:1:devices’ node to ‘open’ to give entities without presence subscription read access to the devices and allow them to establish an OMEMO session. Not having presence subscription is a common occurrence on the first few messages between two contacts and can also happen fairly frequently in group chats as not every participant had prior communication with every other participant.

The access model can be changed efficiently by using publish-options.

http://jabber.org/protocol/pubsub#publish-options open ]]>

NOTE: as per XEP-0060 §12.20, it is RECOMMENDED for the publisher to specify an ItemID of "current" to ensure that the publication of a new item will overwrite the existing item.

This step presents the risk of introducing a race condition: Two devices might simultaneously try to announce themselves, unaware of the other's existence. The second device would overwrite the first one. To mitigate this, devices MUST check that their own device ID is contained in the list whenever they receive a PEP update from their own account. If they have been removed, they MUST reannounce themselves.

Furthermore, a device MUST publish its IdentityKey, a signed PreKey, and a list of PreKeys. This tuple is called a bundle. Bundles are maintained as multiple items in a PEP node called ‘urn:xmpp:omemo:1:bundles’. Each bundle MUST be stored in a seperate item. The item id MUST be set to the device id.

A bundle is an element called 'bundle' in the 'urn:xmpp:omomo:1' namespace. It has a child element called ‘spk’ that contains the signed PreKey as base64 encoded data, a child element called ‘spks’ that contains the signed PreKey signature as base64 encoded data and a child element called ‘ik’ that contains the identity key as base64 encoded data. PreKeys are multiple elements called ‘pk’ that each contain one PreKey as base64 encoded data. PreKeys are wrapped in an element called ‘prekeys’ which is a child of the bundle element.

The bundle element MAY contain an attribute called label, which is a user defined string describing the device that published that bundle.

When publishing bundles a client MUST make sure that the 'urn:xmpp:omemo:1' node is configured to store multiple items. This is not the default with &xep0163;. If the node doesn’t exist yet it can be configured on the fly by using publish-options as described in XEP-0060 §7.1.5. The value for 'pubsub#max_items' in publish_options MUST be set to 'max'. If the node did exist and was configured differently the bundle publication will fail. Clients MUST then reconfigure the node as described in XEP-0060 §8.2.

BASE64ENCODED BASE64ENCODED BASE64ENCODED BASE64ENCODED BASE64ENCODED BASE64ENCODED http://jabber.org/protocol/pubsub#publish-options max ]]>

As with the 'urn:xmpp:omemo:1:devices' node it is RECOMMENDED to set the access model of the 'urn:xmpp:omemo:1:bundles' to open. Clients that do not adhere to the recommended access model (and for example want to stick to the default 'presence') are highly encouraged to configure the same access model for 'urn:xmpp:omemo:1:devices' and 'urn:xmpp:omemo:1:bundles', otherwise remote entities might end up in a situation where they are able to retrieve the device list but not the bundle or vice versa.

The access model can be changed efficiently by using publish-options.

http://jabber.org/protocol/pubsub#publish-options max open ]]>

In order to build a session with a device, their bundle information is fetched.

]]>

A random preKeyPublic entry is selected, and used to build a SignalProtocol session.

In order to signal a contact that you like to terminate a session, your device MUST send an <terminate> element to all intended recipient devices inside an encrypted stanza. A user or client MAY tag the element with a reason. If a device is receiving a stanza containing a <terminate> element, it MUST show an information that the peer has ended the session. To prevent that the user is accidentally sending plaintext messages, the client MUST block all outgoing message until the user switched to plaintext.

In order to send a chat message, its <body> first has to be encrypted. The client MUST use fresh, randomly generated key with AES-256.. The 16 bytes key and the GCM authentication tag (The tag SHOULD have at least 128 bit) are concatenated and for each intended recipient device, i.e. both own devices as well as devices associated with the contact, the result of this concatenation is encrypted using the corresponding long-standing SignalProtocol session. Each encrypted payload key/authentication tag tuple is tagged with the recipient device's ID. The key element MUST be tagged with a prekey attribute set to true if a PreKeySignalMessage is being used. This is all serialized into a MessageElement, which is transmitted in a <message> as follows:

BASE64ENCODED... BASE64ENCODED... BASE64ENCODED...
BASE64ENCODED
]]>

The client may wish to transmit keying material to the contact. This first has to be generated. The client MUST generate a fresh, randomly generated key. The 16 bytes key and the GCM authentication tag (The tag SHOULD have at least 128 bit) are concatenated and for each intended recipient device, i.e. both own devices as well as devices associated with the contact, this key is encrypted using the corresponding long-standing SignalProtocol session. Each encrypted payload key/authentication tag tuple is tagged with the recipient device's ID. The key element MUST be tagged with a prekey attribute set to true if a PreKeySignalMessage is being used This is all serialized into a KeyTransportElement, omitting the <payload> as follows:

BASE64ENCODED... BASE64ENCODED...
]]>

This KeyTransportElement can then be sent over any applicable transport mechanism.

When an OMEMO element is received, the client MUST check whether there is a <key> element with an rid attribute matching its own device ID. If this is not the case, the element MUST be silently discarded. If such an element exists, the client checks whether the element's contents are a PreKeySignalMessage.

If this is the case, a new session is built from this received element. The client SHOULD then republish their bundle information, replacing the used PreKey, such that it won't be used again by a different client. If the client already has a session with the sender's device, it MUST replace this session with the newly built session. The client MUST delete the private key belonging to the PreKey after use.

If the element's contents are a SignalMessage, and the client has a session with the sender's device, it tries to decrypt the SignalMessage using this session. If the decryption fails or if the element's contents are not a SignalMessage either, the OMEMO element MUST be silently discarded.

If the OMEMO element contains a <payload>, it is an OMEMO message element. The client tries to decrypt the base64 encoded contents using the key and the authentication tag extracted from the <key> element. If the decryption fails, the client MUST silently discard the OMEMO message. If it succeeds, the decrypted contents are treated as the <body> of the received message.

If the OMEMO element does not contain a <payload>, the client has received a KeyTransportElement. The key extracted from the <key> element can then be used for other purposes (e.g. encrypted file transfer).

Before publishing a freshly generated Device ID for the first time, a device MUST check whether that Device ID already exists, and if so, generate a new one.

Clients SHOULD NOT immediately fetch the bundle and build a session as soon as a new device is announced. Before the first message is exchanged, the contact does not know which PreKey has been used (or, in fact, that any PreKey was used at all). As they have not had a chance to remove the used PreKey from their bundle announcement, this could lead to collisions where both Alice and Bob pick the same PreKey to build a session with a specific device. As each PreKey SHOULD only be used once, the party that sends their initial PreKeySignalMessage later loses this race condition. This means that they think they have a valid session with the contact, when in reality their messages MAY be ignored by the other end. By postponing building sessions, the chance of such issues occurring can be drastically reduced. It is RECOMMENDED to construct sessions only immediately before sending a message.

As there are no explicit error messages in this protocol, if a client does receive a PreKeySignalMessage using an invalid PreKey, they SHOULD respond with a KeyTransportElement, sent in a <message> using a PreKeySignalMessage. By building a new session with the original sender this way, the invalid session of the original sender will get overwritten with this newly created, valid session.

If a PreKeySignalMessage is received as part of a &xep0313; catch-up and used to establish a new session with the sender, the client SHOULD postpone deletion of the private key corresponding to the used PreKey until after MAM catch-up is completed. If this is done, the client MUST then also send a KeyTransportMessage using a PreKeySignalMessage before sending any payloads using this session, to trigger re-keying. (as above) This practice can mitigate the previously mentioned race condition by preventing message loss.

As the asynchronous nature of OMEMO allows decryption at a later time to currently offline devices client SHOULD include a &xep0334; <store /> hint in their OMEMO messages. Otherwise, server implementations of &xep0313; will generally not retain OMEMO messages, since they do not contain a <body />

When a client receives the first message for a given ratchet key with a counter of 53 or higher, it MUST send a heartbeat message. Heartbeat messages are normal OMEMO encrypted messages where they SCE payload does not include any elements. These heartbeat messages cause the ratchet to forward, thus consequent messages will have the counter restarted from 0.

The SignalProtocol-library uses a trust model that doesn't work very well with OMEMO. For this reason it may be desirable to have the library consider all keys trusted, effectively disabling its trust management. This makes it necessary to implement trust handling oneself.

Clients MUST NOT use a newly built session to transmit data without user intervention. If a client were to opportunistically start using sessions for sending without asking the user whether to trust a device first, an attacker could publish a fake device for this user, which would then receive copies of all messages sent by/to this user. A client MAY use such "not (yet) trusted" sessions for decryption of received messages, but in that case it SHOULD indicate the untrusted nature of such messages to the user.

When prompting the user for a trust decision regarding a key, the client SHOULD present the user with a fingerprint in the form of a hex string, QR code, or other unique representation, such that it can be compared by the user.

While it is RECOMMENDED that clients postpone private key deletion until after MAM catch-up and this standards mandates that clients MUST NOT use duplicate-PreKey sessions for sending, clients MAY delete such keys immediately for security reasons. For additional information on potential security impacts of this decision, refer to Menezes, Alfred, and Berkant Ustaoglu. "On reusing ephemeral keys in Diffie-Hellman key agreement protocols." International Journal of Applied Cryptography 2, no. 2 (2010): 154-158..

In order to be able to handle out-of-order messages, the SignalProtocol stack has to cache the keys belonging to "skipped" messages that have not been seen yet. It is up to the implementor to decide how long and how many of such keys to keep around.

This document requires no interaction with the Internet Assigned Numbers Authority (IANA).

This specification defines the following XMPP namespaces:

  • urn:xmpp:omemo:1
&NSVER;
]]>

Big thanks to Daniel Gultsch for mentoring me during the development of this protocol. Thanks to Thijs Alkemade and Cornelius Aschermann for talking through some of the finer points of the protocol with me. And lastly I would also like to thank Sam Whited, Holger Weiss, and Florian Schmaus for their input on the standard.