From e5ec8c061e9111041aa89fa161a45b7ff14eac43 Mon Sep 17 00:00:00 2001
From: Peter Saint-Andre Made small clarifications and corrections throughout; added section on Jingle Actions. There exists no widely-adopted standard for initiating and managing peer-to-peer (p2p) interactions (such as voice, video, or file sharing exchanges) from within Jabber/XMPP clients. Although several large service providers and Jabber/XMPP clients have written and implemented their own proprietary XMPP extensions for p2p signalling (usually only for voice), those technologies are not open and do not always take into account requirements to interoperate with the Public Switched Telephone Network (PSTN) or emerging SIP-based Internet voice networks. By contrast, the only existing open protocol has been &xep0111;, which made it possible to initiate and manage p2p sessions, but which did not provide enough of the key signalling semantics to be easily implemented in Jabber/XMPP clients. There exists no widely-adopted standard for initiating and managing peer-to-peer (p2p) interactions (such as voice, video, or file sharing exchanges) from within Jabber/XMPP clients. Although several large service providers and Jabber/XMPP clients have written and implemented their own proprietary XMPP extensions for p2p signalling (usually only for voice), those technologies are not open and do not always take into account requirements to interoperate with the Public Switched Telephone Network (PSTN) or Voice over Internet Protocol (VoIP) networks based on the IETF's Session Initiation Protocol (SIP) as specified in &rfc3261; and its various extensions. By contrast, the only existing open protocol has been &xep0111;, which made it possible to initiate and manage p2p sessions, but which did not provide enough of the key signalling semantics to be easily implemented in Jabber/XMPP clients. The result has been an unfortunate fragmentation within the XMPP community regarding signalling protocols. There are, essentially, two approaches to solving the problem: Implementation experience indicates that a dual-stack approach may not be feasible on all the computing platforms for which Jabber clients have been written, or even desirable on platforms where it is feasible. As a result of feedback received on XEP-0111, the second and fourth authors of this document began to define such a signalling protocol, code-named Jingle. Upon communication with members of the Google Talk team, Implementation experience indicates that a dual-stack approach may not be feasible on all the computing platforms for which Jabber clients have been written, or even desirable on platforms where it is feasible. Therefore, it seems reasonable to define an XMPP signalling protocol that can provide the necessary signalling semantics while also making it relatively straightforward to interoperate with existing Internet standards. As a result of feedback received on XEP-0111, the original authors of this document (Joe Hildebrand and Peter Saint-Andre) began to define such a signalling protocol, code-named Jingle. Upon communication with members of the Google Talk team, The purpose of Jingle is not to supplant or replace SIP. Because dual-stack XMPP+SIP clients are difficult to build, given that they essentially have two centers of program control, The protocol defined herein is designed to meet the following requirements: This document defines the signalling protocol only. Additional documents specify the following: Various content description formats (audio, video, etc.) and, where possible, mapping those types to the Session Description Protocol (SDP; see &rfc4566;); one example is &xep0167;. Various content transport methods. Procedures for mapping the Jingle signalling protocol to existing signalling standards such as the IETF's Session Initiation Protocol (SIP; see &rfc3261;) and the ITU's H.323 protocol (see &h323;). Various content description formats (audio, video, etc.) and, where possible, mapping those types to the Session Description Protocol (SDP; see &rfc4566;); examples include &xep0167; and &xep0180;. Various content transport methods; examples include &xep0176; and &xep0177;. Procedures for mapping the Jingle signalling protocol to existing signalling standards such as the IETF's Session Initiation Protocol (SIP; see &rfc3261;) and the ITU's H.323 protocol (see &h323;); these documents are not yet written Jingle consists of three parts, each with its own syntax, semantics, and state machine: This document defines the semantics and syntax for overall session management, and provides pluggable "slots" for content description formats and content transport methods, which are specified in separate documents. This document defines the semantics and syntax for overall session management. It also provides pluggable "slots" for content description formats and content transport methods, which are specified in separate documents; however, for the sake of completeness, this document also includes examples for all of the actions related to description formats and transport methods. The state machine for overall session management (i.e., the state per Session ID) is as follows: The actions related to management of the overall Jingle session are: The actions related to management of the overall Jingle session are as follows: These actions are defined in greater detail under Jingle Actions. The content type of a session is made up of two aspects: These actions are defined in greater detail under Jingle Actions. As with the content description formats, the content transport methods are specified in separate specifications. Possible content transport methods include Real-time Transport Protocol (RTP) with Interactive Connectivity Establishment (ICE) and raw UDP. Those specifications will also define the state chart for the content transport method in question. As with the content description formats, the content transport methods are defined in separate specifications. Possible content transport methods include Real-time Transport Protocol (RTP) with Interactive Connectivity Establishment (ICE) and raw UDP. The relevant specifications also define the state chart for the content transport method in question. The generic state machine for any given content transport method is as follows: These actions are defined in greater detail under Jingle Actions. In order to initiate a Jingle session, the initiating entity must determine which of the target entity's XMPP resources is best for the desired content description format. If a contact has only one XMPP resource, this task MUST be completed using &xep0030; or the presence-based profile of service discovery specified in &xep0115;. In order to initiate a Jingle session, the initiating entity must determine which of the receiver's XMPP resources is best for the desired content description format. If a contact has only one XMPP resource, this task MUST be completed using &xep0030; or the presence-based profile of service discovery specified in &xep0115;. Naturally, instead of sending service discovery requests to every contact in a user's roster, it is more efficient to use Entity Capabilities, whereby support for Jingle and various Jingle content description formats and content transport methods is determined for a client version in general (rather than on a per-JID basis) and then cached. Refer to XEP-0115 for details. If a contact has more than one XMPP resource, it may be that only one of the resources supports Jingle and the desired content description format, in which case the user MUST initiate the Jingle signalling with that resource. If a contact has more than one XMPP resource that supports Jingle and the desired content description format, it is RECOMMENDED for a client to use &xep0168; in order to determine which is the best resource with which to initiate the desired Jingle session. Once the initiating entity has discovered which of the target entity's XMPP resources is ideal for the desired content description format, it sends a session initiation request to the target entity. This request is an IQ-set containing a &JINGLE; element qualified by the 'http://jabber.org/protocol/jingle' namespace. The &JINGLE; element MUST possess the 'action', 'initiator', and 'sid' attributes (the latter two uniquely identify the session). For initiation, the 'action' attribute MUST have a value of "session-initiate" and the &JINGLE; element MUST contain one or more &CONTENT; elements, each of which defines a content type to be transferred during the session; each &CONTENT; element in turn contains one &DESCRIPTION; child element that specifies a desired content description format and one or more &TRANSPORT; child elements that specify potential content transport methods. Once the initiating entity has discovered which of the receiver's XMPP resources is ideal for the desired content description format, it sends a session initiation request to the receiver. This request is an IQ-set containing a &JINGLE; element qualified by the 'http://jabber.org/protocol/jingle' namespace. The &JINGLE; element MUST possess the 'action', 'initiator', and 'sid' attributes (the latter two uniquely identify the session). For initiation, the 'action' attribute MUST have a value of "session-initiate" and the &JINGLE; element MUST contain one or more &CONTENT; elements, each of which defines a content type to be transferred during the session; each &CONTENT; element in turn contains one &DESCRIPTION; child element that specifies a desired content description format and one or more &TRANSPORT; child elements that specify potential content transport methods. The following example shows a Jingle session initiation request for a session that contains both audio and video content: Unless an error occurs, the target entity MUST acknowledge receipt of the initiation request: Unless an error occurs, the receiver MUST acknowledge receipt of the initiation request: If the target entity acknowledges receipt of the initation request, both parties must consider the session to be in the PENDING state. There are several reasons why the target entity might return an error instead of acknowledging receipt of the initiation request: If the receiver acknowledges receipt of the initation request, both parties must consider the session to be in the PENDING state. There are several reasons why the receiver might return an error instead of acknowledging receipt of the initiation request: The initiating entity is unknown to the target entity (e.g., via presence subscription) and the target entity has a policy of not communicating via Jingle with unknown entities, it MUST return a &unavailable; error.
-
-
-
@@ -180,15 +192,15 @@
Session
- A number of pairs of negotiated content transport methods and content description formats connecting two entities. It is delimited in time by an initiate request and session ending events. During the lifetime of a session, pairs of content descriptions and content transport methods can be added or removed.
+ A number of pairs of negotiated content transport methods and content description formats connecting two entities. It is delimited in time by an initiate request and session ending events. During the lifetime of a session, pairs of content descriptions and content transport methods can be added or removed. A session consists of at least one active negotiated content type at a time.
Content Type
- A formal declaration of the purpose(s) of the session. Common sessions might include types such as "voice", both "voice" and "video", and "file sharing". A session consists of at least one active negotiated content type at a time. Depending on the content type and the content description, one content description may require multiple components to be communicated by the transport. This is the 'what' of the session. In Jingle XML syntax the content type is the namespace of the &DESCRIPTION; element.
+ The combination of a description with a transport method.
Content Description
- The details of the content type being established. For instance, this might describe the acceptable codecs when establishing a voice conversation. The XML elements for the content description are qualified by the namespace of the content type. The content description defines the bits to be transferred.
+ The format of the content type being established, which formally declares one purpose of the session (e.g., "voice" or "video"). This is the 'what' of the session (i.e., the bits to be transferred), such as the acceptable codecs when establishing a voice conversation. In Jingle XML syntax the content type is the namespace of the &DESCRIPTION; element.
Transport Method
@@ -196,10 +208,11 @@
Component
- A component is a numbered stream of data which needs to be transmitted between the endpoints for a given content type in the context of a given session. It is up to the transport to negotiate the details of each component. For instance, the voice content type might use two components, one to transmit an RTP stream, and another to transmit RTCP timing information.
+ A component is a numbered stream of data which needs to be transmitted between the endpoints for a given content type in the context of a given session. It is up to the transport to negotiate the details of each component. Depending on the content type and the content description, one content description may require multiple components to be communicated (e.g., the audio content type might use two components: one to transmit an RTP stream and another to transmit RTCP timing information).
@@ -207,7 +220,7 @@
-
@@ -222,7 +235,7 @@ PENDING o---------------------+ |
| | content-remove, | |
| | content-modify, | |
| | content-accept, | |
- | | content-decline, | |
+ | | content-decline | |
| +------------------+ |
| |
| session-accept |
@@ -233,7 +246,7 @@ PENDING o---------------------+ |
| | content-remove, | |
| | content-modify, | |
| | content-accept, | |
- | | content-decline, | |
+ | | content-decline | |
| +------------------+ |
| |
+-------------------------+
@@ -249,19 +262,20 @@ PENDING o---------------------+ |
-
+
+
START
@@ -361,18 +376,20 @@ PENDING o---------------------+ |
-
-
If the target entity does not support Jingle, it MUST return a &unavailable; error.
-If the target entity does not support any of the specified content description formats, it MUST return a &feature; error with a Jingle-specific error condition of <unsupported-content/>.
-If the target entity does not support any of the specified content transport methods, it MUST return a &feature; error with a Jingle-specific error condition of <unsupported-transports/>.
-If the initiation request was malformed, the target entity MUST return a &badrequest; error.
+If the initiation request was malformed, the receiver MUST return a &badrequest; error.
After acknowledging receipt of the initiation request, the target entity MAY redirect the session to another address (e.g., because the principal is not answering at the original resource). This is done by sending a Jingle redirect action to the initiating entity:
-In order to decline the session initiation request, the target entity MUST acknowledge receipt of the session initiation request, then terminate the session as described in the Termination section of this document.
+In order to decline the session initiation request, the receiver MUST acknowledge receipt of the session initiation request, then terminate the session as described under Termination.
In general, negotiation will be necessary before the parties can agree on an acceptable set of content types, content description formats, and content transport methods. The potential combinations of parameters to be negotiated are many, and not all are shown herein (some are shown in the relevant specifications for various content description formats and content transport methods).
One session-level negotiation is to remove a content types. For example, let us imagine that Juliet is having a bad hair day. She certainly does not want to include video in her Jingle session with Romeo, so she sends a "content-remove" request to Romeo:
If (after negotiation of content transport methods and content description formats) the target entity determines that it will be able to establish a connection, it sends a definitive acceptance to the initiating entity:
-The &JINGLE; element in the accept stanza MUST contain one or more <content/> elements, each of which MUST contain only one <description/> element and one or more <transport/> elements. The &JINGLE; element SHOULD possess a 'responder' attribute that explicitly specifies the full JID of the responding entity, and the initiating entity SHOULD send all future commmunications about this Jingle session to the JID provided in the 'responder' attribute.
-The initiating entity then acknowledges the target entity's definitive acceptance:
+The initiating entity then acknowledges the receiver's definitive acceptance:
Now the initiating entity and target entity can begin sending content over the negotiated connection.
+Now the initiating entity and receiver can begin sending content over the negotiated connection.
If one of the parties cannot find a suitable content transport method, it SHOULD terminate the session as described below.
If both parties send modify messages at the same time, the modify message from the session initiator MUST trump the modify message from the recipient and the initiator SHOULD return an &unexpected; error to the other party.
One example of modifying an active session is to add a content type. For example, let us imagine that Juliet gets her hair in order and now wants to add video. She now sends a "content-add" request to Romeo:
In order to gracefully end the session (which MAY be done at any point after acknowledging receipt of the initiation request, including immediately thereafter in order to decline the request), either the target entity or the initiating entity MUST a send a "terminate" action to the other party:
-In particular, one party MUST consider the session to be in the ENDED state if it receives presence of type "unavailable" from the other party:
-Naturally, in this case there is nothing for the initiating entity to acknowledge.
@@ -651,6 +668,64 @@ PENDING o---------------------+ |This section provides more detailed descriptions for each Jingle-related action.
+This action enables a party to accept a content-add, content-modify, or content-remove action received from another party. Implicitly this action also serves as a description-accept and transport-accept.
+This action enables a party to add one or more new content types to the session. This action MUST NOT be sent while the session is in the PENDING state.
+This action enables a party to reject a content-add or content-modify action received from another party.
+This action enables a party to change an existing content type. This is mainly used to modify the directionality of the session.
+This action enables a party to remove one or more content types from the session.
+This action enables a party to accept a description-modify action received from another party.
+This action enables a party to decline a description-modify action received from another party.
+This action enables a party to sent description-level information / messages.
+This action enables a party to request a change to a content description format.
+This action enables a party to definitively accept a session negotiation. Implicitly this action also serves as a content-accept (which in turn serves as a description-accept and transport-accept).
+This action enables a party to request negotiation of a new Jingle session.
+This action enables a party to send session-level information / messages.
+This action enables a party to redirect an initiate request to another address.
+This action enables a party to end an existing session.
+This action enables a party to accept a transport-modify action received from another party.
+This action enables a party to decline a transport-modify action received from another party.
+This action enables a party to send transport-level information / messages.
+This action enables a party to request a change to the content transport methods.
+The Jingle-specific error conditions are as follows.