Moved codec recommendations to a separate specification; harmonized session flows with XEP-0166; modified flow for combined audio/video scenario to use content-modify with senders attribute set to none for media pause and set to both for media resumption; clarified handling of description-info message.
+ 0.272009-02-17
@@ -280,7 +286,7 @@
The &DESCRIPTION; element MAY possess a 'ssrc' attribute that specifies the 32-bit synchronization source for this media stream, as defined in RFC 3550.
After inclusion of one or more &PAYLOADTYPE; child elements, the &DESCRIPTION; element MAY also contain a <bandwidth/> element that specifies the allowable or preferred bandwidth for use by this application type. The 'type' attribute of the <bandwidth/> element SHOULD be a value for the SDP "bwtype" parameter as listed in the &ianasdp;. For RTP sessions, often the <bandwidth/> element will specify the "session bandwidth" as described in Section 6.2 of RFC 3550, measured in kilobits per second as described in Section 5.2 of RFC 4566.
The encodings SHOULD be provided in order of preference by placing the most-preferred payload type as the first &PAYLOADTYPE; child of the &DESCRIPTION; element and the least-preferred payload type as the last child.
-
The allowable attributes of the &PAYLOADTYPE; element are as follows:
+
The attributes of the &PAYLOADTYPE; element are as follows:
When the initiator sends a session-initiate stanza to the responder, the &DESCRIPTION; element includes all of the payload types that the initiator can send and/or receive for Jingle RTP, each one encapsulated in a separate &PAYLOADTYPE; element (the rules specified in &rfc3264; SHOULD be followed regarding inclusion of payload types).
+
When the initiator sends a session-initiate message to the responder, the &DESCRIPTION; element includes all of the payload types that the initiator can send and/or receive for Jingle RTP, each one encapsulated in a separate &PAYLOADTYPE; element (the rules specified in &rfc3264; SHOULD be followed regarding inclusion of payload types).
-
@@ -406,18 +413,18 @@ Initiator Responder
If there is no immediate error, the responder acknowledges the session initiation request.
]]>
-
Depending on user preferences or client configuration, a user agent controlled by a human user might need to wait for the user to affirm a desire to proceed with the session before continuing. When the user agent has received such affirmation (or if the user agent can automatically proceed for any reason, e.g. because no human intervention is expected or because a human user has configured the user agent to automatically accept sessions with a given entity), it returns a Jingle session-accept message. The session-accept SHOULD include a subset of the payload types sent by the initiator, i.e., a list of the offered payload types that the responder can send and/or receive. The list that the responder sends SHOULD retain the ID numbers specified by the initiator. The order of the &PAYLOADTYPE; elements indicates the responder's preferences, with the most-preferred type first.
-
In the following example, we imagine that the responder supports Speex at clockrate of 8000 but not 16000, G729, and PCMA but not PMCU. Therefore the responder returns only two payload types (since PMCA was not offered).
+
Depending on user preferences or client configuration, a user agent controlled by a human user might need to wait for the user to affirm a desire to proceed with the session before continuing. When the user agent has received such affirmation (or if the user agent can automatically proceed for any reason, e.g. because no human intervention is expected or because a human user has configured the user agent to automatically accept sessions with a given entity), it returns a Jingle session-accept message. The session-accept message SHOULD include a subset of the payload types sent by the initiator, i.e., a list of the offered payload types that the responder can send and/or receive. The list that the responder sends SHOULD retain the ID numbers specified by the initiator. The order of the &PAYLOADTYPE; elements indicates the responder's preferences, with the most-preferred type first.
+
In the following example, we imagine that the responder supports Speex at a clockrate of 8000 but not 16000, G729, and PCMA but not PMCU. Therefore the responder returns only two payload types (since PMCA was not offered).
-
-
+
]]>
-
And the initiator acknowledges session acceptance:
+
Note: If the responder supports none of the payload-types offered by the initiator, the responder SHOULD terminate the session and include a Jingle reason of <failed-application/>.
+
If the responder accepts the session, the initiator acknowledges the session-accept message:
]]>
-
The initiator and responder would then exchange media using any of the codecs that meet the following criteria:
+
The initiator and responder would then attempt to establish connectivity for the data channel, Once they do, they would exchange media using any of the codecs that meet the following criteria:
If the value of the 'senders' attribute is "initiator" then the initiator MAY use any codec that it can send and the responder can receive.
If the value of the 'senders' attribute is "responder" then the responder MAY use any codec that it can send and the initiator can receive.
@@ -461,11 +471,11 @@ Initiator Responder
The SDP media type for Jingle RTP is "audio" (see Section 8.2.1 of RFC 4566) for audio media, "video" (see Section 8.2.1 of RFC 4566) for video media, etc. The media type is reflected in the Jingle 'media' attribute.
The Jingle <bandwidth/> element SHALL be mapped to an SDP b= line; in particular, the value of the 'type' attribute shall be mapped to the SDP <bwtype> parameter and the XML character data of the Jingle <bandwidth/> element shall be mapped to the SDP <bandwidth> parameter.
-
If the payload type is static (payload-type IDs 0 through 95 inclusive), it MUST be mapped to a media field defined in RFC 4566. The generic format for the media field is as follows:
+
If the payload type is static (payload-type IDs 0 through 95 inclusive), it MUST be mapped to an m= line as defined in RFC 4566. The generic format for this line is as follows:
]]>
-
In the context of Jingle audio sessions, the <media> parameter is "audio" or "video" or some other media type as specified by the 'media' attribute, the <port> parameter is the preferred port for such communications (which might be determined dynamically), and the <fmt list> parameter is the payload-type ID.
+
The SDP <media> parameter is "audio" or "video" or some other media type as specified by the Jingle 'media' attribute, the <port> parameter is the preferred port for such communications (which might be determined dynamically), and the <fmt list> parameter is the payload-type ID.
For example, consider the following static payload-type:
The term "early media" refers to media that is exchanged before a responder has definitively accepted a session request generated by an initiator. Early media is typically used to send ringing tones and announcements, using either audio streams or Dual Tone Multi-Frequency (DTMF) events.
+
The term "early media" refers to media that is exchanged before a responder has definitively accepted a session request generated by an initiator or before end-to-end connectivity has been established (e.g., the media could be generated by an intermediate call manager or media relay). Early media is typically used to send ringing tones and announcements, using either audio streams or Dual Tone Multi-Frequency (DTMF) events.
In Jingle, the exchange of early media is established through use of the "content-add" action. In order to match the usage specified in &rfc3959; and &rfc3960;, when adding a content definition for early media the value of the &CONTENT; element's 'disposition' attribute MUST be "early-session" for mapping to a SIP Content-Disposition header value of "early-session". This enables endpoints or intermediate gateways to apply the application server model described in RFC 3960.
-
An entity that generates a content-add for early media SHOULD specify the same codecs for both session media and early media (however, it is possible that the entity that generates the early media does not generate the session media, for example in the case of an intermediate gateway or application server; in this case the entity MUST use one of the codecs advertised by the initiator).
-
Upon receiving a content-add action specifying the use of early media, the initiator's client SHOULD acknowledge the content-add, complete any required transport negotiation, and then send a content-accept (or content-reject) to the sender. When the responder subsequently sends a session-accept action, the acceptance MUST NOT be construed to include the content definition whose disposition is "early-session".
+
An entity that generates a content-add message for early media SHOULD specify the same codecs for both session media and early media (however, it is possible that the entity that generates the early media does not generate the session media, for example in the case of an intermediate gateway or application server; in this case the entity MUST use one of the codecs advertised by the initiator).
+
Upon receiving a content-add message specifying the use of early media, the initiator's client SHOULD acknowledge the content-add, complete any required transport negotiation, and then send a content-accept (or content-reject) to the sender. When the responder subsequently sends a session-accept message, the acceptance MUST NOT be construed to include the content definition whose disposition is "early-session".
In handling early media and deciding whether to generate local ringing or to play early media received from the responder or an intermediate gateway, the initiator's client SHOULD proceed as follows:
If no ringing notification is received via a session-info event containing a <ringing/> condition, do not generate local ringing.
When the responder receives a session-initiate action containing an <encryption/> element, the responder MUST either (1) accept the offer by denoting one of the <crypto/> elements as acceptable (it does this by mirroring that <crypto/> element in its session acceptance) or (2) reject the offer by sending a session-terminate action with a Jingle reason of <security-error/> and an RTP-specific condition of <invalid-crypto/>.
+
When the responder receives a session-initiate message containing an <encryption/> element, the responder MUST either (1) accept the offer by denoting one of the <crypto/> elements as acceptable (it does this by mirroring that <crypto/> element in its session acceptance) or (2) reject the offer by sending a session-terminate message with a Jingle reason of <security-error/> and an RTP-specific condition of <invalid-crypto/>.
-
-
+
]]>
-
If the responder requires encryption but the initiator did not include an <encryption/> element in its offer, the responder MUST reject the offer by sending a session-terminate action with a Jingle reason of <security-error/> and an RTP-specific condition of <crypto-required/>.
+
If the responder requires encryption but the initiator did not include an <encryption/> element in its offer, the responder MUST reject the offer by sending a session-terminate message with a Jingle reason of <security-error/> and an RTP-specific condition of <crypto-required/>.
If the initiator requires encryption but the responder does not include an <encryption/> element in its session acceptance, the initiator MUST terminate the session with a Jingle reason of <security-error/> and an RTP-specific condition of <crypto-required/>.
Informational messages can be sent by either party within the context of Jingle to communicate the status of a Jingle RTP session, device, or principal. The informational message MUST be an IQ-set containing a &JINGLE; element of type "session-info", where the informational message is a payload element qualified by the 'urn:xmpp:jingle:apps:rtp:info:1' namespace; the following payload elements are defined: A <trying/> element (equivalent to the SIP 100 Trying response code) is not necessary, since each session-level action is acknowledged via XMPP IQ semantics.
+
Informational messages can be sent by either party within the context of Jingle to communicate the status of a Jingle RTP session, device, or principal. The informational message MUST be an IQ-set containing a &JINGLE; element of type "session-info", where the informational message is a payload element qualified by the 'urn:xmpp:jingle:apps:rtp:info:1' namespace; the following payload elements are defined: A <trying/> element (equivalent to the SIP 100 Trying response code) is not necessary, since each session-level message is acknowledged via XMPP IQ semantics.
The principal is temporarily pausing the chat (i.e., putting the other party on hold).
+
The principal is temporarily pausing the chat (i.e., putting the other party on hold). This message is purely informational; to ensure that no media will be exchanged, it is necessary to change the value of the 'senders' attribute to "none" via a content-modify message.
Before or during an RTP session, either party can share suggested application parameters with the other party by sending a Jingle stanza with an action of "description-info". The stanza shall contain only a &DESCRIPTION; element, which specifies suggested parameters for a given application type (e.g., a change to the height and width for display of a video stream). An example follows.
The description-info stanza SHOULD include only the suggested or modified information, not the complete set of application parameters (if those parameters have not changed). Furthermore, the data provided is purely advisory; the session SHOULD NOT fail if the receving party cannot adjust its parameters accordingly.
+
The description-info message SHOULD include only the modified codecs, not the complete set of codecs (if those codecs have not changed). Their order is NOT meaningful. Furthermore, the data provided is purely advisory; the session SHOULD NOT fail if the receiving party cannot adjust its parameters accordingly.
Now the responder immediately terminates the session.
-
Note: It might be wondered why the responder does not accept the session and then terminate. That order would be acceptable, too, but here we assume that the responder's client has immediate information about the responder's free/busy status (e.g., because the responder is on the phone) and therefore returns an automated busy signal without requiring user interaction.
-
-
-
-
-
-
-
- ]]>
-
- ]]>
-
-
-
-
In this scenario, Romeo initiates a voice chat with Juliet using a transport method of ICE-UDP. The parties also exchange informational messages.
Because the parties have chosen the Jingle ICE-UDP Transport Method, the initiator and responder exchange an open-ended number of possible candidate transports, perform connectivity checks, and agree upon a candidate transport as explained in XEP-0176. Once ICE negotiation is completed, the responder sends a session-accept action to the initiator.
- Now the responder immediately terminates the session.
+
Note: It might be wondered why the responder does not accept the session and then terminate. That order would be acceptable, too, but here we assume that the responder's client has immediate information about the responder's free/busy status (e.g., because the responder is on the phone) and therefore returns an automated busy signal without requiring user interaction.
+
-
+
+
+
+
+
+ ]]>
+
+ ]]>
+
+
+
+
In this scenario, Romeo initiates a voice chat with Juliet using a transport method of ICE-UDP. The parties also exchange informational messages.
As soon as possible, the responder's client sends a session-accept message to the initiator.
+
+
-
+
]]>
-
If the payload types and transport candidate can be successfully used by both parties, the initiator acknowledges the session-accept action.
+
The initiator acknowledges the session-accept message.
]]>
-
The parties now begin to exchange media. In this case they would use RTP to exchange audio using the Speex codec at a clockrate of 8000 since that is the highest-priority codec for the responder (as determined by the XML order of the &PAYLOADTYPE; children).
+
Once connectivity is established (which might necessitate the exchange of additional candidates via transport-info messages), the parties begin to exchange media. In this case they would use RTP to exchange audio using the Speex codec at a clockrate of 8000 since that is the highest-priority codec for the responder (as determined by the XML order of the &PAYLOADTYPE; children).
The parties can continue the session as long as desired.
Eventually, one of the parties terminates the session.
-
@@ -1047,7 +1061,7 @@ Romeo Juliet
The other party then acknowledges termination of the session:
The responder immediately acknowledges the session initiation request.
]]>
If the keying material is acceptable, the responder's continues with the negotiation. If the keying material is not acceptable, the responder's client terminates the session as described under Negotiation of SRTP.
-
@@ -1164,17 +1178,17 @@ Romeo Juliet
]]>
]]>
-
Because the parties have chosen the Jingle ICE-UDP Transport Method, the initiator and responder exchange an open-ended number of possible candidate transports, perform connectivity checks, and agree upon a candidate transport as explained in XEP-0176. Once ICE negotiation is completed, the responder sends a session-accept action to the initiator.
+
As soon as possible, the responder's client sends a session-accept message to the initiator. In this case, the session-accept message includes a <crypto/> element to indicate that the responder finds the offered keying material acceptable.
-
-
+
]]>
-
If the payload types and transport candidate can be successfully used by both parties, then the initiator acknowledges the session-accept action.
+
The initiator acknowledges the session-accept action.
]]>
-
The parties now begin to exchange media. In this case they would use SRTP to exchange audio using the Speex codec at a clockrate of 8000 since that is the highest-priority codec for the responder (as determined by the XML order of the &PAYLOADTYPE; children).
+
Once connectivity is established (which might necessitate the exchange of additional candidates via transport-info messages), the parties begin to exchange media. In this case they would use RTP to exchange audio using the Speex codec at a clockrate of 8000 since that is the highest-priority codec for the responder (as determined by the XML order of the &PAYLOADTYPE; children).
The parties can continue the session as long as desired.
Eventually, one of the parties terminates the session.
-
@@ -1236,7 +1252,7 @@ Romeo Juliet
The other party then acknowledges termination of the session:
]]>
@@ -1287,10 +1303,10 @@ Romeo Gateway Juliet
The protocol flow is as follows, showing only the stanzas sent between Romeo and the gateway (acting on Juliet's behalf).
Now the gateway sends a content-add action to Romeo while waiting for Juliet to pay attention to her telephony interface.
+
Now the gateway sends a content-add message to Romeo while waiting for Juliet to pay attention to her telephony interface.
-
@@ -1355,7 +1371,9 @@ Romeo Gateway Juliet
-
+ Romeo then acknowledges the content-add action.
]]>
-
Because the gateway (on behalf of the responder) specified a transport method of Raw UDP for the early session data, the initiator then would send a Raw UDP candidate to the gateway (see XEP-0177 for details).
+
Because the gateway (on behalf of the responder) specified a transport method of Raw UDP for the early session data, the initiator then might also send a Raw UDP candidate to the gateway in a transport-info message (see XEP-0177 for details).
Eventually the initiator would send a content-accept to the gateway.
-
@@ -1399,7 +1417,7 @@ Romeo Gateway Juliet
The gateway then acknowledges the acceptance on behalf of Juliet.
]]>
@@ -1407,10 +1425,10 @@ Romeo Gateway Juliet
Eventually, the responder sends a session-accept.
-
]]>
-
The endpoints now begin to exchange session media; as a result, Romeo and the gateway terminate the exchange of early media.
+
Once end-to-end connectivity is established (which might necessitate the exchange of additional candidates via transport-info messages), the parties begin to exchange media; as a result, Romeo and the gateway terminate the exchange of early media (this does not necessitate exchange of a content-remove message, since the endpoint and the gateway can simply stop sending media).
The endpoints can continue the session as long as desired.
Eventually, one of the endpoints terminates the session.
-
@@ -1464,72 +1482,86 @@ Romeo Gateway Juliet
The other party then acknowledges termination of the session:
]]>
-
In this scenario, Romeo initiates a combined audio and video chat with Juliet using a transport method of ICE-UDP. Juliet at first refuses the video portion, then later offers to add video, which Romeo accepts. The parties also exchange various informational messages
-
The session flow is as follows:
+
In this scenario, Romeo initiates an audio chat with Juliet using a transport method of ICE-UDP. Romeo wants to add video but Juliet refuses; later she offers to add video, which Romeo accepts. The parties also exchange various informational messages
+
The session flow is as follows (some of these messages are sent in parallel):
However, Juliet doesn't want to do video because she is having a bad hair day, so she sends a "content-remove" request to Romeo.
-
-
-
-
-
- ]]>
-
Romeo then acknowledges the content-remove request:
-
- ]]>
-
Because the parties have chosen the Jingle ICE-UDP Transport Method, the initiator and responder exchange an open-ended number of possible candidate transports, perform connectivity checks, and agree upon a candidate transport as explained in XEP-0176. Once ICE negotiation is completed, the responder sends a session-accept action to the initiator.
+
The responder sends a session-accept message to the initiator.
-
]]>
-
As above, if the payload types and transport candidate can be successfully used by both parties, then the initiator acknowledges the session-accept action.
+
The initiator acknowledges the session-accept action.
]]>
-
The parties now begin to exchange media. In this case they would use RTP to exchange audio using the Speex codec at a clockrate of 8000 since that is the highest-priority codec for the responder (as determined by the XML order of the &PAYLOADTYPE; children).
-
The parties chat for a while. Eventually Juliet wants to get her hair in order so she puts Romeo on hold.
- Once end-to-end connectivity is established (which might necessitate the exchange of additional candidates via transport-info messages), the parties begin to exchange media. In this case they would use RTP to exchange audio using the Speex codec at a clockrate of 8000 since that is the highest-priority codec for the responder (as determined by the XML order of the &PAYLOADTYPE; children).
+
Romeo, being an amorous young man, requests to add video to the audio chat.
Juliet returns so she informs Romeo that she is actively engaged in the call again.
- To ensure that the session is truly paused, Juliet's client sends a content-modify message, setting the 'senders' attribute to "none".
+
-
+
+
+
+ ]]>
+
+ ]]>
+
After a few minutes, Juliet returns and informs Romeo that she is actively engaged in the call again.
The media session proceeds. Now they would exchange both audio and video, where the audio is exchanged via the Speex codec at a clockrate of 8000 and the video is exchanged using the Theora codec with a height of 600 pixels, a width of 800 pixels, and so on.
+
The media session proceeds. Now they would exchange both audio and video, where the audio is exchanged via the Speex codec at a clockrate of 8000 and the video is exchanged using the Theora codec with a height of 600 pixels, a width of 800 pixels, a bandwidth limit of 128,000 kilobits per second, etc.
The parties can continue the session as long as desired.
Other events might occur throughout the life of the session. For example, one of the parties might want to tweak the video parameters using a description-info action.
For the sake of interoperability with a wide variety of free and open-source voice systems as well as deployment of patent-free technologies, support for the Speex codec is RECOMMENDED.
-
-
-
For the sake of interoperability with the public switched telephone network (PSTN) and most VoIP providers, support for the Pulse Code Modulation (PCM) codec defined in &ITU; recommendation G.711 is RECOMMENDED, including both the μ-law ("U-law") version deployed in North America and in Japan, and the A-law version deployed in the rest of the world.
-
-
-
-
XMPP applications that use Jingle RTP sessions for voice chat MUST support and prefer native RTP methods of communicating DTMF information, in particular the "audio/telephone-event" and "audio/tone" media types. It is NOT RECOMMENDED to use the protocol described in &xep0181; for communicating DTMF information with RTP-aware endpoints.
-
-
-
When the Jingle RTP content type is accepted via a session-accept action, both initiator and responder SHOULD start listening for audio as defined by the negotiated transport method and audio application format. For interoperability with telephony systems, after the responder acknowledges the session initiation request, the responder SHOULD send a "ringing" message and both parties SHOULD play any audio received. For more detailed suggestions in the context of early media, see under Early Media.
-
+
+
XMPP applications that use Jingle RTP sessions for voice chat MUST support and prefer native RTP methods of communicating DTMF information, in particular the "audio/telephone-event" and "audio/tone" media types. It is NOT RECOMMENDED to use the protocol described in &xep0181; for communicating DTMF information with RTP-aware endpoints.
-
-
-
Support for the Theora codec is RECOMMENDED.
-
+
+
When the Jingle RTP content type is accepted via a session-accept action, both initiator and responder SHOULD start listening for audio as defined by the negotiated transport method and audio application format. For interoperability with telephony systems, after the responder acknowledges the session initiation request, the responder SHOULD send a "ringing" message and both parties SHOULD play any audio received. For more detailed suggestions in the context of early media, see under Early Media.