diff --git a/inbox/colibri.xml b/inbox/colibri.xml new file mode 100644 index 00000000..f1715676 --- /dev/null +++ b/inbox/colibri.xml @@ -0,0 +1,668 @@ + + + %ents; + ]> + + +
+ COnferences with LIghtweight BRIdging (COLIBRI) + This specification defines an XMPP extension that allows + real-time communications clients to discover and interact with + conference bridges that provide conference mixing or relaying + capabilities. + + &LEGALNOTICE; + xxxx + ProtoXEP + Standards Track + Standards + Council + + XEP-0167 + + + + colibri + + Emil + Ivov + emcho@jitsi.org + emcho@sip-communicator.org + jitsi.org + + + Lyubomir + Marinov + lubo@jitsi.org + lubo@sip-communicator.org + jitsi.org + + &fippo; + jingle + + 0.0.1 + 2013-12-04 + ei/lm + +

First draft.

+
+
+ + 0.0.2 + 2013-12-04 + ei/ph + +

Added usecases.

+
+
+
+ +

+ &xep0298; defines a way for XMPP agents to establish and + participate in tightly coupled conference calls. Such conference + calls would typically involve a number of regular participants + that establish direct one-to-one sessions with a single entity, + often referred to as a focus agent. Focus agents are generally + responsible for making sure that media sent from one call + participant would be distributed to all others so that everyone + would effectively hear or see everyone else. In other words they + often act as media mixers. +

+ +

+ Depending on the mixing technology used by media mixers, they may + require significant bandwidth, processing resources or both. It is + hence common for mixers to be hosted on dedicated servers that can + provide such resources. They are then made reachable as + rendez-vous points and conference call participants are required + to call in, in order to join a conference call. This requires a + certain amount of pre-call configuration to be completed by the + service maintainers in order to create conference rooms and grant + proper access to the expected participants. The authorization + credentials are then often relayed to the participants in + preparation of the call by other means, such as IM or mail. +

+ +

+ In certain situations, such pre-call preparations are + inconvenient and it is important for users to be able to establish + ad-hoc conference calls. One way to achieve this is for user + agents themselves to act as focus agents and media mixers. + Everyone else just calls the user at the focus agent, who then can + decide whether to accept or reject the calls as they arrive. This + works particularly well for audio only calls as the amount of + bandwidth and processing resources that they require is generally + within reach for end-user devices. +

+ +

+ The situation is quite different for video calls. Media decoding + and especially encoding require considerably more resources with + video than they do with audio. Today, encoding a single video flow + with an acceptable quality is often the maximum that can be + expected from an end-user device. The advantages that come with + Moore's law will likely be insufficient to improve this, given the + massive shift toward mobile devices and the ever-increasing user + expectations toward video quality. +

+ +

+ Therefore, this specification (COLIBRI) aims to provide a means + for user agents to interact with conference mixers. Such + interaction allows user agents to allocate mixing channels, + indicate what conferences they should be attached to, what + integers the various payload types map to, etc. Using COLIBRI + would hence allow any user agent to organize conference calls and + act as a signalling focus by outsourcing the actual media mixing + to a dedicated mixer. +

+ +
+ +
+ +
+ Focus or Focus Agent +
+
+ The terms apply to XMPP agents that, in terms of signalling, + stand at the center of a tightly-coupled conference call. In + other words, all conference participants establish a &xep0166; + session with and only with that agent. Focus Agents are not + necessarily performing media mixing themselves. In fact, the + very purpose of this specification is to provide them with a + means of handling this elsewhere. +
+
+ +
+ Mixer or Bridge +
+
+ Throughout this document the term is used to depict an + entity that is responsible for mixing and delivering to + conference participants all media exchanged in a conference + call. Mixers or bridges can provide their service by either + performing Content Mixing, or RTP translation or both. +
+
+ +
+ Content Mixing +
+
+ The term refers to a kind of media processing where the + content of multiple input RTP streams is "mixed" into a single + output stream. In conference calls audio mixing + implementations generally simply add and adjust all source + audio streams to produce their single output stream. Video + content mixing, on the other hand, is often implemented by + creating composite images containing individual frames from + the input streams. Another common implementation consists in + producing an output that is identical to one of the input + streams, often the one belonging to a currently active + speaker. +
+
+ +
+ RTP Translation +
+
+ RFC 3550 defines a translator as "an intermediate + system that forwards RTP packets with their synchronization + source identifier intact." This specification respects that + definition but it also uses "RTP Translation" in opposition + with "Content Mixing". Conference bridges that perform RTP + translation simply redirect each incoming RTP packet to all + participants, often excluding the one where it came from. + Contrary to content mixing, rtp translation generally requires + less processing resources since it does not involve media + manipulation. Bandwidth requirements on the other hand, could + be significantly higher with RTP translation than with content + mixing. +
+
+
+
+ +

+ The extension defined herein is designed to meet the following + requirements: +

+
    +
  1. + Provide a means for conference focus agents to interact with + conference mixers in order to configure payload type mappings + and allocate ports or other resources that they could then + advertise in Jingle sessions so that all media would traverse + the bridge. +
  2. +
  3. Impose no COLIBRI specific requirements on non-focus + participants so that any Jingle supporting client would be able + to participate. +
  4. +
  5. [TODO] Anything else?
  6. +
+
+ +

+ This section provides a friendly introduction to COLIBRI. +

+

+ In essence, the goal of COLIBRI is to provide focus agents with a + way of using remote mixers as if as they were available locally. + The most important part of that is the possibility to allocate + ports on the mixer interfaces and then use these ports when + establishing Jingle sessions with the various participants. +

+

+ Every participant in the conference call is assigned one port for + RTP data and one for RTCP. An RTP/RTCP port couple is called a + channel. Each participant would use one channel per media type. + That is, a client participating with audio only would get one + channel, while another one that joins with both audio and video + would get two: one for audio and one for video. +

+

+ Channels are used for streams from the bridge to participants and + from participants to the bridge. Typically a channel would contain + one stream from a participant to a bridge, for example their + webcam or desktop, and one or more streams in the opposite + direction (e.g. webcam or desktop streams from other participants + to the one using this channel). This is not a requirement though + and a channel can certainly be used for transportation of multiple + streams in both directions in cases where one bridge is connected + to another. +

+

+ Typically channels would be created by the entity controlling a + conference call. This could either be a conferencing server or a + smart client capable of handling conferences. We would refer to + both of these as the "focus". In either case, the important part + is that the focus terminates all signalling. It is a signalling + endpoint and it is responsible for all aspects of call signalling + including offer/answer. +

+

+ In other words, when setting up a conference, a focus would first + allocate the necessary channels, then directly initiate sessions + (invite) other participants into the call. Only, when sending the + invitations to these participants, the focus would use the + transport information (addresses and ports) that it would have + received from the COLIBRI bridge, rather than its own. +

+ +
+ + +

+ The most important thing about setting up a conference is the + creation of channels for every participant. Conference setup is + not the only chance an organiser gets to declare all + participants but typically when a conference call is setup it + is because there are at least some number of known participants + and there would be no point in delaying channel creation for + them. +

+

+ The following example shows how Romeo creates an audio/video + conference at a bridge, requesting that three participant + channels be created. +

+ + + + + [optional payload and transport description] + + + ... + + + + + ... + + + + +]]> +

+ Notice how the 'initiator' channel above is set to true. The + setting determines ICE and DTLS/SRTP behaviour for the bridge. + In this specific case, 'initiator' being set to 'true' Romeo is + requesting that the bridge behave as the initiator of the + session which means that it would try to be the controlling ICE + agent and also assume the 'dtls-actpass' role for DTLS/SRTP + negotiation. A value of 'false' would have meant that the bridge + would behave as the controlled ICE agent and assume the + 'dtls-active' role. +

+

+ When sending its result back, the bridge confirms creation of + the requested channels and it also delivers transport + information that would be necessary for participants to + transport media to the bridge. These would most often include + ICE candidates, ufrag and pwd parameters, and DTLS fingerprints. +

+

+ Note that ICE is not mandatory for use and COLIBRI bridges can + just as well perform Hosted NAT Traversal using latching and a + RAW-UDP transport. +

+ + + + + + + + 99:...:F6 + + + + + + ... + + + + ... + + ... + + + +]]> +

+ The above "result" also contains the following elements of + interest for every channel: +

+

+ - an ID that is necessary for any further modification from that + the focus wants to set on a channel. [FIXME: clients should be + able to specify these id-s so as not to rely on ordering to + identify channels and get complexes thinking they are SDP + parsers] +

+

+ - an rtp-level-relay-type attribute with possible values of + 'mixer' and 'translator' indicating how the bridge is going to + deliver data on a specific channel [FIXME: this would definitely + need to be specifiable from the client]. +

+

+ - mixer channels would also include ssrc-s for that channels in + question. This is particularly necessary when SSRC-s need to be + announced to participants (because people never learned how RTP + works and are afraid from anything that wasn't explicitly + announced with an Offer/Answer exchange). Generally such + announcements would be possible by simply propagating SSRCs that + other participants announce. In a mixed flow however the SSRC + would belong to the mixer (or COLIBRI bridge) so it would need + to be known in advance. + attribute +

+

+ - the initiator value is echoed +

+

+ - expire describes how many seconds the bridge will keep the + channel open without media activity +

+
+ +

+ Channel updates can happen for various reasons. The following + examples illustrate two of them: +

+

+ - specifying payload types. While payload types in RTP are + sometimes static (e.g. for older codecs such as G.711), this is + not always the case for more recent types, which need to be + assigned dynamically during session establishment. The tricky + part here is that dynamic means dynamic so every participant in + a conference call may end up expecting different payload types. + As a result, a COLIBRI bridge SHOULD know about everyone's + expectations, which is why channels are updated with payload + types. Note that if a bridge does see unknown payload types it + MUST still relay them to other participants as they might have + used some other mechanism to make sure they know what they + mean. +

+

+ - DTLS/SRTP fingerprints. +

+ + + + + + + + + + + 08:...:C7 + + + + + + + + + + + + 08:...:C7 + + + + + + +]]> +

+ Note that while the result in this case is essentially an + acknowledgement, it still carries a full representation of the + bridge. +

+ + + + + + + 99:...:F6 + + + + + + ... + + + + ... + + ... + + + +]]> +
+ + + + + + + + + + + + +]]> + + + + + + + + 99:..:F6 + + + + + + + + +]]> +

Essentially that information is the transport description from + the bridge. +

+
+ +

+ ICE candidates are another reason why a focus might want to + update a channel. Earlier examples indicated how conference + setup could be completed without providing any transport + information whatsoever. Whenever that is the case, such + information would need to be provided through channel + modification. +

+ + + + + + + + + + +]]> + + + + + + + + A9:...:2F + + + + + + + + + + + D7:...:C2 + + + + + + + + +]]> +
+
+ + +

If an entity supports COLIBRI, it SHOULD advertise that fact by + returning + a feature of "http://jitsi.org/protocol/colibri" in response to + a &xep0030; + information request. +

+ + + + ]]> + + + + + + ]]> +

In order for an application to determine whether an entity + supports this + protocol, where possible it SHOULD use the dynamic, presence-based + profile + of service discovery defined in &xep0115;. However, if an + application has + not received entity capabilities information from an entity, it + SHOULD use + explicit service discovery instead. +

+
+ +

PENDING

+
+ +

PENDING

+
+ +

PENDING

+
+ +

Jitsi's participation in this specification is funded by the + NLnet + Foundation. +

+
+