diff --git a/xep-0167.xml b/xep-0167.xml index 708e389b..73746ff4 100644 --- a/xep-0167.xml +++ b/xep-0167.xml @@ -25,6 +25,12 @@ &stpeter; &seanegan; &robmcqueen; + + 0.12 + 2007-11-27 + psa +

Further editorial review.

+
0.11 2007-11-15 @@ -110,9 +116,11 @@

First draft.

+

&xep0166; can be used to initiate and negotiate a wide range of peer-to-peer sessions. One session type of interest is audio chat. This document specifies an application format for negotiating Jingle audio sessions, where the media is exchanged over the Realtime Transport Protocol (RTP; see &rfc3550;).

+

The Jingle application format defined herein is designed to meet the following requirements:

    @@ -121,6 +129,7 @@
  1. Define informational messages related to audio chat (e.g., ringing, on hold, on mute).
+

In accordance with Section 8 of XEP-0166, this document specifies the following information related to the Jingle Audio via RTP application type:

    @@ -137,6 +146,7 @@
+

A Jingle audio session is described by a content type that contains one application format and one transport method. The application format consists of one or more encodings contained within a wrapper <description/> element qualified by the 'http://www.xmpp.org/extensions/xep-0167.html#ns' namespace &NSNOTE;. In the language of RFC 4566 each encoding is a payload-type; therefore, each <payload-type/> element specifies an encoding that can be used for the audio stream, as illustrated in the following example.

Note: The parameter names are effectively guaranteed to be unique, since &IANA; maintains a registry of SDP parameters (see <http://www.iana.org/assignments/sdp-parameters>).

+

When the initiator sends a session-initiate stanza to the responder, the &DESCRIPTION; element includes all of the payload types that the initiator can receive for Jingle audio (each one encapsulated in a separate &PAYLOADTYPE; element):

action='session-initiate' @@ -234,9 +245,9 @@ ]]>

Upon receiving the session-initiate stanza, the responder determines whether it can proceed with the negotiation. The general Jingle error cases are specified in XEP-0166 and illustrated in the Scenarios section of this document. In addition, the responder must determine if it supports any of the payload types advertised by the initiator; if it supports none of the offered payload types, it must reject the session by returning a ¬acceptable; error with a Jingle-Audio-specific condition of <unsupported-codecs/>:

@@ -244,18 +255,18 @@ ]]> -

If there is no error, the responder acknowledges the session-initiation request.

+

If there is no error, the responder acknowledges the session initiation request.

]]>

The responder then should send a list of the payload types that it can receive via a Jingle "content-accept" (or "session-accept") action. The list that the responder sends MAY include any payload types (not a subset of the payload types sent by the initiator) but SHOULD retain the ID numbers specified by the initiator. The order of the &PAYLOADTYPE; elements indicates the responder's preferences, with the most-preferred types first.

action='content-accept' @@ -270,7 +281,7 @@ - + @@ -278,16 +289,16 @@ ]]>

The initiator acknowledges the 'content-accept' with an empty IQ result:

]]>

After successful transport negotiation (not shown here), the responder then accepts the session:

And the initiator acknowledges session acceptance:

]]>

Note: Because a "session-accept" action implicitly indicates acceptance of the application format (i.e., "content-accept"), it is not necessary to send a separate "content-accept" action. This flow is shown for completeness only.

- -

The following sections show a number of Jingle audio scenarios, in relative order of complexity.

- -

In this scenario, Romeo initiates a voice chat with Juliet but she is otherwise engaged.

-

The session flow is as follows:

- | - | error | - | (recipient-unavailable) | - |<----------------------------| - ]]> -

The protocol flow is as follows.

- - - - - - - - - - - - - - - ]]> - - - - - - - ]]> -
- -

In this scenario, Romeo initiates a voice chat with Juliet using a transport method of ICE-UDP. The parties also exchange informational messages.

-

The session flow is as follows:

- | - | ack | - |<----------------------------| - | session-info (ringing) | - |<----------------------------| - | ack | - |---------------------------->| - | transport-info (X times) | - | (with acks) | - |<--------------------------->| - | session-accept | - |<----------------------------| - | ack | - |---------------------------->| - | AUDIO (RTP) | - |<===========================>| - | session-terminate | - |<----------------------------| - | ack | - |---------------------------->| - | | - ]]> -

The protocol flow is as follows.

- - - - - - - - - - - - - - - ]]> - - ]]> - - - - - - ]]> - - ]]> - - - - - - - - - - ]]> - - - - - - - - - - ]]> - - - - - - - - - - ]]> -

For each candidate received, the other party acknowledges receipt or returns an error:

- - - - - - ]]> -

At the same time (i.e., immediately after acknowledging the session-initation request, not waiting for the initiator to begin or finish sending candidates), the responder also begins sending candidates that may work for it. As above, the initiator acknowledges receipt of the candidates.

-

As the initiator and responder receive candidates, they probe the various candidate transports for connectivity. In performing these connectivity checks, the parties follow the procedure specified in Section 7 of draft-ietf-mmusic-ice.

-

If one of the candidate transports is found to work, the responder accepts the session.

- - - - - - - - - - - - - - - - - - - ]]> -

If the payload types and transport candidate can be successfully used by both parties, then the initiator acknowledges the session-accept.

- - ]]> -

The parties now begin to exchange media. In this case they would exchange audio using the Speex codec at a clockrate of 8000 since that is the highest-priority codec for the responder (as determined by the XML order of the &PAYLOADTYPE; children).

-

The parties may continue the session as long as desired.

-

Eventually, one of the parties terminates the session.

- - - - ]]> -

The other party then acknowledges termination of the session:

- - ]]> -
- -

In this scenario, Romeo initiates a combined audio and video chat with Juliet using a transport method of ICE-UDP. Juliet at first refuses the video portion, then later offers to add video, which Romeo accepts. The parties also exchange various informational messages

-

The session flow is as follows:

- | - | ack | - |<----------------------------| - | session-info (ringing) | - |<----------------------------| - | ack | - |---------------------------->| - | content-remove | - |<----------------------------| - | ack | - |---------------------------->| - | content-accept | - |---------------------------->| - | ack | - |<----------------------------| - | transport-info (X times) | - | (with acks) | - |<--------------------------->| - | session-accept | - |<----------------------------| - | ack | - |---------------------------->| - | AUDIO (RTP) | - |<===========================>| - | session-info (hold) | - |<----------------------------| - | ack | - |---------------------------->| - | session-info (active) | - |<----------------------------| - | ack | - |---------------------------->| - | content-add | - |<----------------------------| - | ack | - |---------------------------->| - | content-accept | - |---------------------------->| - | ack | - |<----------------------------| - | AUDIO + VIDEO (RTP) | - |<===========================>| - | session-terminate | - |<----------------------------| - | ack | - |---------------------------->| - | | - ]]> -

The protocol flow is as follows.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - ]]> - - ]]> - - - - - - ]]> - - ]]> -

However, Juliet doesn't want to do video because she is having a bad hair day, so she sends a "content-remove" request to Romeo.

- - - - - - ]]> -

Romeo then acknowledges the content-remove request and, if it is acceptable, returns a content-accept:

- - ]]> - - - - - - ]]> -

The other party then acknowledges the acceptance.

- - ]]> -

As in the previous scenario, the parties exchange ICE candidates (see above for examples).

-

Once the parties find candidate transports that work, the responder accepts the session.

- - - - - - - - - - - - - - - - - - - ]]> -

As above, if the payload types and transport candidate can be successfully used by both parties, then the initiator acknowledges the session-accept.

- - ]]> -

The parties now begin to exchange media. In this case they would exchange audio using the Speex codec at a clockrate of 8000 since that is the highest-priority codec for the responder (as determined by the XML order of the &PAYLOADTYPE; children).

-

The parties chat for a while. Eventually Juliet wants to get her hair in order so she puts Romeo on hold.

- - - - - - ]]> - - ]]> -

Juliet returns so she informs Romeo that she is actively engaged in the call again.

- - - - - - ]]> - - ]]> -

The parties now continue the audio chat.

-

Finally Juliet decides that she is presentable for a video chat so she sends a content-add request to Romeo.

- - - - - - - - - - - - - - - - - ]]> -

The entity receiving the content-add request then acknowledges the request and, if it is acceptable, returns a content-accept:

- - ]]> - - - - - - - - - - - - - - - - - ]]> -

The other party then acknowledges the acceptance.

- - ]]> -

The media session proceeds. Now they would exchange both audio and video, where the audio is exchanged the Speex codec at a clockrate of 8000 and the video is exchanged using the Theora codec with a height of 720 pixels, a width of 1280 pixels, and so on.

-

The parties may continue the session as long as desired.

-

Eventually, one of the parties terminates the session.

- - - - ]]> - - ]]> -
- -

In this scenario, Romeo initiates a voice chat with Juliet using a transport method of ICE-UDP and an unencrypted profile of "RTP/AVP", but Juliet wants to chat securely so she requests the use of a secure transport as specified in &sdpdtls; (via a profile of "UDP/TLS/RTP/AVP").

-

The session flow is as follows:

- | - | ack | - |<----------------------------| - | session-info (ringing) | - |<----------------------------| - | ack | - |---------------------------->| - | content-modify | - |<----------------------------| - | ack | - |---------------------------->| - | content-accept | - |---------------------------->| - | ack | - |<----------------------------| - | transport-info (X times) | - | (with acks) | - |<--------------------------->| - | session-accept | - |<----------------------------| - | ack | - |---------------------------->| - | AUDIO (RTP) | - |<===========================>| - | session-terminate | - |<----------------------------| - | ack | - |---------------------------->| - | | - ]]> -

The protocol flow is as follows.

- - - - - - - - - - - - - - - ]]> - - ]]> - - - - - - ]]> - - ]]> -

However, Juliet wants to make sure that the communications are encrypted, so she sends a "content-modify" request to Romeo.

- - - - - - ]]> -

Romeo then acknowledges the content-modify request and, if it is acceptable, returns a content-accept:

- - ]]> - - - - - - ]]> -

The other party then acknowledges the acceptance.

- - ]]> -

As in the previous scenario, the parties exchange ICE candidates (see above for examples).

-

If one of the candidate transports is found to work, the responder accepts the session.

- - - - - - - - - - - - - - - - - - - ]]> -

If the payload types and transport candidate can be successfully used by both parties, then the initiator acknowledges the session-accept.

- - ]]> -

The parties now begin to exchange media. In this case they would exchange audio using the Speex codec at a clockrate of 8000 since that is the highest-priority codec for the responder (as determined by the XML order of the &PAYLOADTYPE; children).

-

The parties may continue the session as long as desired.

-

Eventually, one of the parties terminates the session.

- - - - ]]> -

The other party then acknowledges termination of the session:

- - ]]> -
-

If the payload type is static (payload-type IDs 0 through 95 inclusive), it MUST be mapped to a media field defined in RFC 4566. The generic format for the media field is as follows:

@@ -1127,6 +352,7 @@ m= ]]> +

That Jingle-formatted information would be mapped to SDP as follows:

@@ -1135,6 +361,7 @@ m=audio 9999 RTP/AVP 13 ]]> +

That Jingle-formatted information would be mapped to SDP as follows:

]]> +

That Jingle-formatted information would be mapped to SDP as follows:

+

Informational messages may be sent by either party within the context of Jingle to communicate the status of a Jingle audio session, device, or principal. The informational message MUST be an IQ-set containing a &JINGLE; element of type "session-info", where the informational message is a payload element qualified by the 'http://www.xmpp.org/extensions/xep-0167.html#ns-info' namespace; the following payload elements are defined: A <trying/> element (equivalent to the SIP 100 Trying response code) is not necessary, since each session-level action is acknowledged via XMPP IQ semantics.

@@ -1275,7 +504,800 @@ a=fmtp:96 vbr=on;cng=on ]]> -

Naturally, support may also be discovered by the dynamic, presence-based profile of service discovery defined in &xep0115;.

+

Naturally, support MAY also be determined via the dynamic, presence-based profile of Service Discovery defined in &xep0115;.

+
+ + +

The following sections show a number of Jingle audio scenarios, in relative order of complexity.

+ +

In this scenario, Romeo initiates a voice chat with Juliet but she is otherwise engaged.

+

The session flow is as follows:

+ | + | error | + | (recipient-unavailable) | + |<----------------------------| + ]]> +

The protocol flow is as follows.

+ + + + + + + + + + + + + + + ]]> + + + + + + + ]]> +
+ +

In this scenario, Romeo initiates a voice chat with Juliet using a transport method of ICE-UDP. The parties also exchange informational messages.

+

The session flow is as follows:

+ | + | ack | + |<----------------------------| + | session-info (ringing) | + |<----------------------------| + | ack | + |---------------------------->| + | transport-info (X times) | + | (with acks) | + |<--------------------------->| + | session-accept | + |<----------------------------| + | ack | + |---------------------------->| + | AUDIO (RTP) | + |<===========================>| + | session-terminate | + |<----------------------------| + | ack | + |---------------------------->| + | | + ]]> +

The protocol flow is as follows.

+ + + + + + + + + + + + + + + ]]> + + ]]> + + + + + + ]]> + + ]]> + + + + + + + + + + ]]> + + + + + + + + + + ]]> + + + + + + + + + + ]]> +

For each candidate received, the other party acknowledges receipt or returns an error:

+ + + + + + ]]> +

At the same time (i.e., immediately after acknowledging the session-initation request, not waiting for the initiator to begin or finish sending candidates), the responder also begins sending candidates that may work for it. As above, the initiator acknowledges receipt of the candidates.

+

As the initiator and responder receive candidates, they probe the various candidate transports for connectivity. In performing these connectivity checks, the parties follow the procedure specified in Section 7 of draft-ietf-mmusic-ice.

+

If one of the candidate transports is found to work, the responder accepts the session.

+ + + + + + + + + + + + + + + + + + + ]]> +

If the payload types and transport candidate can be successfully used by both parties, then the initiator acknowledges the session-accept action.

+ + ]]> +

The parties now begin to exchange media. In this case they would exchange audio using the Speex codec at a clockrate of 8000 since that is the highest-priority codec for the responder (as determined by the XML order of the &PAYLOADTYPE; children).

+

The parties may continue the session as long as desired.

+

Eventually, one of the parties terminates the session.

+ + + + ]]> +

The other party then acknowledges termination of the session:

+ + ]]> +
+ +

In this scenario, Romeo initiates a combined audio and video chat with Juliet using a transport method of ICE-UDP. Juliet at first refuses the video portion, then later offers to add video, which Romeo accepts. The parties also exchange various informational messages

+

The session flow is as follows:

+ | + | ack | + |<----------------------------| + | session-info (ringing) | + |<----------------------------| + | ack | + |---------------------------->| + | content-remove | + |<----------------------------| + | ack | + |---------------------------->| + | content-accept | + |---------------------------->| + | ack | + |<----------------------------| + | transport-info (X times) | + | (with acks) | + |<--------------------------->| + | session-accept | + |<----------------------------| + | ack | + |---------------------------->| + | AUDIO (RTP) | + |<===========================>| + | session-info (hold) | + |<----------------------------| + | ack | + |---------------------------->| + | session-info (active) | + |<----------------------------| + | ack | + |---------------------------->| + | content-add | + |<----------------------------| + | ack | + |---------------------------->| + | content-accept | + |---------------------------->| + | ack | + |<----------------------------| + | AUDIO + VIDEO (RTP) | + |<===========================>| + | session-terminate | + |<----------------------------| + | ack | + |---------------------------->| + | | + ]]> +

The protocol flow is as follows.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + ]]> + + ]]> + + + + + + ]]> + + ]]> +

However, Juliet doesn't want to do video because she is having a bad hair day, so she sends a "content-remove" request to Romeo.

+ + + + + + ]]> +

Romeo then acknowledges the content-remove request and, if it is acceptable, returns a content-accept:

+ + ]]> + + + + + + ]]> +

The other party then acknowledges the acceptance.

+ + ]]> +

As in the previous scenario, the parties exchange ICE candidates (see above for examples).

+

Once the parties find candidate transports that work, the responder accepts the session.

+ + + + + + + + + + + + + + + + + + + ]]> +

As above, if the payload types and transport candidate can be successfully used by both parties, then the initiator acknowledges the session-accept action.

+ + ]]> +

The parties now begin to exchange media. In this case they would exchange audio using the Speex codec at a clockrate of 8000 since that is the highest-priority codec for the responder (as determined by the XML order of the &PAYLOADTYPE; children).

+

The parties chat for a while. Eventually Juliet wants to get her hair in order so she puts Romeo on hold.

+ + + + + + ]]> + + ]]> +

Juliet returns so she informs Romeo that she is actively engaged in the call again.

+ + + + + + ]]> + + ]]> +

The parties now continue the audio chat.

+

Finally Juliet decides that she is presentable for a video chat so she sends a content-add request to Romeo.

+ + + + + + + + + + + + + + + + + ]]> +

The entity receiving the content-add request then acknowledges the request and, if it is acceptable, returns a content-accept:

+ + ]]> + + + + + + + + + + + + + + + + + ]]> +

The other party then acknowledges the acceptance.

+ + ]]> +

The media session proceeds. Now they would exchange both audio and video, where the audio is exchanged the Speex codec at a clockrate of 8000 and the video is exchanged using the Theora codec with a height of 720 pixels, a width of 1280 pixels, and so on.

+

The parties may continue the session as long as desired.

+

Eventually, one of the parties terminates the session.

+ + + + ]]> + + ]]> +
+ +

In this scenario, Romeo initiates a voice chat with Juliet using a transport method of ICE-UDP and an unencrypted profile of "RTP/AVP", but Juliet wants to chat securely so she requests the use of a secure transport as specified in &sdpdtls; (via a profile of "UDP/TLS/RTP/AVP").

+

The session flow is as follows:

+ | + | ack | + |<----------------------------| + | session-info (ringing) | + |<----------------------------| + | ack | + |---------------------------->| + | content-modify | + |<----------------------------| + | ack | + |---------------------------->| + | content-accept | + |---------------------------->| + | ack | + |<----------------------------| + | transport-info (X times) | + | (with acks) | + |<--------------------------->| + | session-accept | + |<----------------------------| + | ack | + |---------------------------->| + | AUDIO (RTP) | + |<===========================>| + | session-terminate | + |<----------------------------| + | ack | + |---------------------------->| + | | + ]]> +

The protocol flow is as follows.

+ + + + + + + + + + + + + + + ]]> + + ]]> + + + + + + ]]> + + ]]> +

However, Juliet wants to make sure that the communications are encrypted, so she sends a "content-modify" request to Romeo.

+ + + + + + ]]> +

Romeo then acknowledges the content-modify request and, if it is acceptable, returns a content-accept:

+ + ]]> + + + + + + ]]> +

The other party then acknowledges the acceptance.

+ + ]]> +

As in the previous scenario, the parties exchange ICE candidates (see above for examples).

+

If one of the candidate transports is found to work, the responder accepts the session.

+ + + + + + + + + + + + + + + + + + + ]]> +

If the payload types and transport candidate can be successfully used by both parties, then the initiator acknowledges the session-accept action.

+ + ]]> +

The parties now begin to exchange media. In this case they would exchange audio using the Speex codec at a clockrate of 8000 since that is the highest-priority codec for the responder (as determined by the XML order of the &PAYLOADTYPE; children).

+

The parties may continue the session as long as desired.

+

Eventually, one of the parties terminates the session.

+ + + + ]]> +

The other party then acknowledges termination of the session:

+ + ]]> +
@@ -1286,7 +1308,7 @@ a=fmtp:96 vbr=on;cng=on

If it is necessary to send Dual Tone Multi-Frequency (DTMF) tones, it is REQUIRED to use the XML format specified &xep0181;.

-

When the Jingle Audio content type is accepted via a "content-accept" action, both initiator and responder SHOULD start listening for audio as defined by the negotiated transport method and audio application format. For interoperability with telephony systems, after the responder acknowledges the session-initiate request, the responder SHOULD send a "ringing" message and both parties SHOULD play any audio received.

+

When the Jingle Audio content type is accepted via a "content-accept" action, both initiator and responder SHOULD start listening for audio as defined by the negotiated transport method and audio application format. For interoperability with telephony systems, after the responder acknowledges the session initiation request, the responder SHOULD send a "ringing" message and both parties SHOULD play any audio received.