%ents; ]>
Jingle Audio via RTP This document defines methods for negotiating Jingle audio sessions that use the Real-time Transport Protocol (RTP) for media exchange. &LEGALNOTICE; 0167 Experimental Standards Track Standards Council XMPP Core XEP-0166 TO BE ASSIGNED &scottlu; &stpeter; &seanegan; &robmcqueen; 0.9 2007-04-17 psa

Specified Jingle conformance, including the preference for lossy transports over reliable transports and the process of sending and receiving audio content over each transport type.

0.8 2007-03-23 psa/ram

Renamed to mention RTP as the associated transport; corrected negotiation flow to be consistent with SIP/SDP (each party specifies a list of the payload types it can receive); added profile attribute to content element in order to specify RTP profile in use.

0.7 2006-12-21 psa

Modified spec to use provisional namespace before advancement to Draft (per XEP-0053).

0.6 2006-10-31 psa/se

Specified how to include SDP parameters and codec-specific parameters; clarified negotiation process; added Speex examples; removed queued info message.

0.5 2006-08-23 psa

Modified namespace to track XEP-0166.

0.4 2006-07-12 se/psa

Specified when to play received audio (early media); specified that DTMF must use in-band signalling (XEP-0181).

0.3 2006-03-20 psa

Defined info messages for hold and mute.

0.2 2006-02-13 psa

Defined info message for busy; added info message examples; recommended use of Speex; updated schema and XMPP Registrar considerations.

0.1 2005-12-15 psa

Initial version.

0.0.3 2005-12-05 psa

Described service discovery usage; defined initial informational messages.

0.0.2 2005-10-27 psa

Added SDP mapping, security considerations, IANA considerations, XMPP Registrar considerations, and XML schema.

0.0.1 2005-10-21 psa/sl

First draft.

&xep0166; can be used to initiate and negotiate a wide range of peer-to-peer sessions. One session type of interest is audio chat. This document specifies a format for negotiating Jingle audio sessions over the Realtime Transport Protocol (RTP; see &rfc3550;).

The Jingle content description format defined herein is designed to meet the following requirements:

  1. Enable negotiation of parameters necessary for audio chat over Realtime Transport Protocol (RTP; see &rfc3550;).
  2. Map these parameters to Session Description Protocol (SDP; see &rfc4566;) to enable interoperability.
  3. Define informational messages related to audio chat (e.g., busy and ringing).

In accordance with Section 8 of XEP-0166, this document specifies the following information related to the Jingle Audio via RTP application type:

  1. The content negotiation process is defined in the Negotiating a Jingle Audio Session section of this document.

  2. The semantics of the &DESCRIPTION; element are defined in the Content Description Format section of this document.

  3. A mapping of Jingle semantics to the Session Description Protocol is provided in the Mapping to Session Description Protocol section of this document.

  4. A Jingle audio session SHOULD use a lossy transport method such as &xep0177; or the "ice-udp" method specified in &xep0176;, but MAY use a reliable transport such as "ice-tcp".

  5. Content is to be sent and received as follows:

    • For lossy transports, outbound audio content shall be encoded into RTP packets and each packet shall be sent individually over the transport. Each inbound packet received over the transport is an RTP packet.

    • For reliable transports, outbound audio content shall be encoded into RTP packets and each packet data shall be sent in succession over the transport. Incoming data received over the transport shall be processed as a stream of RTP packets, where each RTP packet boundary marks the location of the next packet.

A Jingle audio session is described by one or more encodings contained within a wrapper <description/> element. In the language of RFC 4566 these encodings are payload-types; therefore, each <payload-type/> element specifies an encoding that can be used for the audio stream. In Jingle Audio, these encodings are used in the context of RTP. The most common encodings for the Audio/Video Profile (AVP) of RTP are listed in &rfc3551; (these "static" types are reserved from payload ID 0 through payload ID 95), although other encodings are allowed (these "dynamic" types use payload IDs 96 to 127) in accordance with the dynamic assignment rules described in Section 3 of RFC 3551.

The allowable attributes are as follows:

Attribute Description Datatype Inclusion
channels The number of channels; if omitted, it MUST be assumed to contain one channel positiveInteger (defaults to 1) RECOMMENDED
clockrate The sampling frequency in Hertz positiveInteger RECOMMENDED
id The payload identifier positiveInteger REQUIRED
maxptime Maximum packet time as specified in RFC 4566 positiveInteger OPTIONAL
name The appropriate subtype of the audio MIME type string RECOMMENDED for static payload types, REQUIRED for dynamic payload types
ptime Packet time as specified in RFC 4566 positiveInteger OPTIONAL

The encodings SHOULD be provided in order of preference.

]]>

The <description/> element is intended to be a child of a &JINGLE; element as specified in XEP-0166.

Each <payload-type/> element MAY contain one or more child elements that specify particular parameters related to the payload. For example, as described in &rtpspeex;, the "cng", "mode", and "vbr" parameters may be specified in relation to usage of the Speex See <http://www.speex.org/>. codec. Where such parameters are encoded via the "fmtp" SDP attribute, they shall be represented in Jingle via the following format:

]]>

Note: The parameter names are effectively guaranteed to be unique, since &IANA; maintains a registry of SDP parameters (see <http://www.iana.org/assignments/sdp-parameters>).

When the initiator sends a session-initiate stanza to the receiver, the &DESCRIPTION; element includes all of the payload types that the initiator can receive for Jingle audio (each one encapsulated in a separate &PAYLOADTYPE; element):

action='session-initiate' initiator='romeo@montague.net/orchard' sid='a73sjjvkla37jfea'> ]]>

Upon receiving the session-initiate stanza, the receiver determines whether it can provisionally accept the session and proceed with the negotiation. The general Jingle error cases are specified in XEP-0166. In addition, the receiver must determine if it supports any of the payload types advertised by the initiator; if it does not, it MUST reject the session by sending a <unsupported-codecs/> error:

]]>

If there is no error, the receiver provisionally accepts the session:

]]>

The receiver then should send a list of the payload types that it can receive via a Jingle "content-accept" (or "session-accept") action. The list that the receiver sends MAY include any payload types (not a subset of the payload types sent by the initiator) but SHOULD retain the ID numbers and order specified by the initiator.

action='content-accept' initiator='romeo@montague.net/orchard' sid='a73sjjvkla37jfea'> ]]>

The initiator acknowledges the 'content-accept' with an empty IQ result:

]]>

After successful transport negotiation (not shown here), the receiver then accepts the session:

]]>

And the initiator acknowledges session acceptance:

]]>

If the payload type is static (payload-type IDs 0 through 95 inclusive), it MUST be mapped to a media field defined in RFC 4566. The generic format for the media field is as follows:

]]>

In the context of Jingle audio sessions, the <media> is "audio", the <port> is the preferred port for such communications (which may be determined dynamically), the <transport> is whatever transport method is negotiated via the Jingle negotiation (e.g., "RTP/AVT"), and the <fmt list> is the payload-type ID.

For example, consider the following static payload-type:

]]>

If the payload type is dynamic (payload-type IDs 96 through 127 inclusive), it SHOULD be mapped to an SDP media field plus an SDP attribute field named "rtpmap".

For example, consider a payload of 16-bit linear-encoded stereo audio sampled at 16KHz associated with dynamic payload-type 96:

]]>

As noted, if additional parameters are to be specified, they shall be represented as attributes of the <payload-type/> element of the child <parameter/> element, as in the following example.

]]>

Informational messages may be sent by either party within the context of Jingle to communicate the status of a Jingle audio session, device, or principal. The informational message MUST be an IQ-set containing a &JINGLE; element of type "session-info", where the informational message is a payload element qualified by the 'http://www.xmpp.org/extensions/xep-0167.html#ns-info' namespace; the following payload elements are defined: A <trying/> element (equivalent to the SIP 100 Trying response code) is not necessary, since each session-level action is acknowledged via XMPP IQ semantics.

Element Meaning
<busy/> The principal or device is currently unavailable for a session because busy with another (audio or other) session.
<hold/> The principal is temporarily pausing the chat (i.e., putting the other party on hold).
<mute/> The principal is temporarily stopping audio output but continues to accept audio input.
<ringing/> The device is ringing but the principal has not yet interacted with it to answer (maps to the SIP 180 response code).

Note: Because the informational message is sent in an IQ-set, the receiving party MUST return either an IQ-result or an IQ-error (normally only an IQ-result to acknowledge receipt; no error flows are defined or envisioned at this time).

action='session-info' initiator='romeo@montague.net/orchard' sid='a73sjjvkla37jfea'> ]]> ]]> ]]> ]]>

The Jingle Audio-specific error conditions are as follows:

Jingle Condition XMPP Condition Description
<unsupported-codecs/> ¬acceptable; The recipient does not support any of the offered audio encodings.

If an entity supports Jingle audio exchanges via RTP, it MUST advertise that fact by returning a feature of "http://www.xmpp.org/extensions/xep-0167.html#ns" &NSNOTE; in response to &xep0030; information requests.

]]> ... ... ]]>

Support for the Speex codec is RECOMMENDED.

If it is necessary to send Dual Tone Multi-Frequency (DTMF) tones, it is REQUIRED to use the XML format specified &xep0181;.

When the Jingle Audio content is accepted via a 'content-accept' action, both initiator and responder SHOULD start listening for audio as defined by the negotiated transport method and audio description. For interoperability with telephony systems, each entity SHOULD both play any audio received and send a ringing tone at this time (i.e., before the receiver sends a 'session-accept' action).

In order to secure the data stream, implementations SHOULD use encryption methods appropriate to the transport method and media being exchanged; for example, in the case of UDP, that would include Datagram Transport Layer Security (DTLS) as specified in &rfc4347;. &sdpdtls; defines such methods for the Session Description Protocol; the relevant RTP profile (e.g., "UDP/TLS/RTP/AVP" for transporting the RTP stream over DTLS with UDP) shall be specified as the value of the &CONTENT; element's 'profile' attribute.

This document requires no interaction with &IANA;.

Until this specification advances to a status of Draft, its associated namespaces shall be "http://www.xmpp.org/extensions/xep-0167.html#ns" and "http://www.xmpp.org/extensions/xep-0167.html#ns-info"; upon advancement of this specification, the ®ISTRAR; shall issue permanent namespaces in accordance with the process defined in Section 4 of &xep0053;.

The XMPP Registrar shall include "audio-rtp" in its registry of Jingle content description formats. The registry submission is as follows:

audio-rtp Jingle sessions that support audio exchange via the Real-time Transport Protocol lossy XEP-0167 ]]>
]]> ]]> ]]>