1
0
mirror of https://github.com/moparisthebest/xeps synced 2024-11-21 08:45:04 -05:00
This commit is contained in:
stpeter 2012-03-19 11:43:02 -06:00
parent a48417c5c8
commit 047b5b9446

View File

@ -29,6 +29,12 @@
<org>RealJabber.org and Rejhon Technologies Inc.</org>
<uri>http://www.realjabber.com</uri>
</author>
<revision>
<version>0.2</version>
<date>2012-03-19</date>
<initials>MDR</initials>
<remark><p>Lots of edits. Simplifications, improvements and corrections. Forward and backward compatible with version 0.1.</p></remark>
</revision>
<revision>
<version>0.1</version>
<date>2011-06-29</date>
@ -37,9 +43,9 @@
</revision>
<revision>
<version>0.0.3</version>
<date>2011-06-24</date>
<date>2011-06-25</date>
<initials>MDR</initials>
<remark><p>Third draft, minor edits.</p></remark>
<remark><p>Third draft, recommended edits.</p></remark>
</revision>
<revision>
<version>0.0.2</version>
@ -58,67 +64,67 @@
<section1 topic="Introduction" anchor="introduction">
<p>This document defines a specification for real-time text transmitted in-band over an XMPP session.</p>
<p>Real-time text is text that is sent as it is created. The recipient can watch the sender type "as written words are typed" similar to a telephone conversation where one listens to a conversation "as words are spoken". It provides a sense of contact in conversation, eliminates waiting times found in messaging, and is also favored by the deaf who prefer text conversation. For a visual animation of real-time text, see <span class="ref"><strong><link url="http://www.realjabber.org">RealJabber.org</link></strong></span>
<p><strong>Real-time text is text transmitted live while it is being typed or created.</strong> The recipient can immediately read the sender's typing, without waiting before reading. This is similar to a telephone conversation where one listens "as words are spoken". This allows text to be used conversationally, provides a sense of contact, eliminates waiting times found in messaging, and is favored by deaf individuals who prefer text conversation. For a visual animation of real-time text, see <span class="ref"><strong><link url="http://www.realjabber.org">RealJabber.org</link></strong></span>
<note>RealJabber.org is the author's web site containing work related to this specification, including animation examples of what real time text looks like. &lt;<link url="http://www.realjabber.org">http://www.realjabber.org</link>&gt;.</note>.</p>
<p>Real-time text has been around for decades in various implementations:</p>
<ul>
<li>The 'talk' command on UNIX systems since the 1970's.</li>
<li>ICQ had a split screen mode from 1996-1999 before this feature was removed.</li>
<li>ICQ had a peer-to-peer split screen mode from 1996-1999.</li>
<li>TTY and text telephones for the deaf.</li>
<li>For SIP, real-time text is sent using <span class="ref"><strong><link url="http://tools.ietf.org/html/rfc4103">IETF RFC 4103</link></strong></span> <note>IETF RFC 4103: RTP Payload for Text Conversation. &lt;<link url="http://tools.ietf.org/html/rfc4103">http://tools.ietf.org/html/rfc4103</link>&gt;.</note> with <span class="ref"><strong><link url="http://www.itu.int/rec/T-REC-T.140">ITU-T T.140</link></strong></span> <note>ITU-T T.140: Protocol for multimedia application text conversation. &lt;<link url="http://www.itu.int/rec/T-REC-T.140">http://www.itu.int/rec/T-REC-T.140</link>&gt;.</note> presentation coding.</li>
<li>In 2008, AOL AIM 6.8 gained the <span class="ref"><strong><link url="http://help.aol.com/help/microsites/microsite.do?cmd=displayKC&amp;externalId=223568">AOL Real-Time IM</link></strong></span> <note>AOL AIM Real Time Text: &lt;<link url="http://help.aol.com/help/microsites/microsite.do?cmd=displayKC&amp;externalId=223568">http://help.aol.com/help/microsites/microsite.do?cmd=displayKC&amp;externalId=223568</link>&gt;.</note> feature.</li>
<li>Hobby BBS chat programs from the 1990's.</li>
<li>TDD, TTY and text telephones for the deaf.</li>
<li>In SIP calls, real-time text is sent using an <span class="ref"><strong><link url="http://tools.ietf.org/html/rfc4103">IETF RFC 4103</link></strong></span> <note>IETF RFC 4103: RTP Payload for Text Conversation. &lt;<link url="http://tools.ietf.org/html/rfc4103">http://tools.ietf.org/html/rfc4103</link>&gt;.</note> transport, with <span class="ref"><strong><link url="http://www.itu.int/rec/T-REC-T.140">ITU-T T.140</link></strong></span> <note>ITU-T T.140: Protocol for multimedia application text conversation. &lt;<link url="http://www.itu.int/rec/T-REC-T.140">http://www.itu.int/rec/T-REC-T.140</link>&gt;.</note> presentation coding.</li>
<li>Deployment of <span class="ref"><strong><link url="http://www.reach112.eu">Reach112</link></strong></span> <note>Reach112: European emergency service with real-time text. &lt;<link url="http://www.reach112.eu">http://www.reach112.eu</link>&gt;.</note> in Europe, an accessible emergency service with real-time text.</li>
</ul>
<p>Real-time text is suitable for smooth and rapid mainstream communication in text, as an all-inclusive technology to complement instant messaging. At the same time, real-time text has special usefulness to many audiences including the deaf and other people who cannot use speech on the telephone. This document defines a specification for real-time text transmitted in-band over an XMPP network.</p>
</section1>
<section1 topic="Requirements" anchor="requirements">
<section2 topic="Fluid Real-Time Text" anchor="fluid_realtime_text">
<ol>
<li>Allow transmission of real-time text with a low latency.</li>
<li>Support real-time message editing, including text insertions, deletions and cursor movements.</li>
<li>Support transmission of the original intervals between key presses, to preserve look-and-feel of typing independently of transmission intervals.</li>
</ol>
</section2>
<section2 topic="Interoperable" anchor="interoperable">
<ol>
<li>Balance low latencies versus system, network and server limitations.</li>
<li>Be backwards compatible with XMPP clients that do not support real-time text.</li>
<li>Be interoperable with other real-time text protocols via gateways, including RFC 4103 and other standards.</li>
<li>Support message editing in real-time, including text insertions and deletions.</li>
<li>Support transmission of the original intervals between key presses, to preserve look-and-feel of typing independently of transmission intervals.</li>
</ol>
</section2>
<section2 topic="In-Band Transmission" anchor="inband_transmission">
<ol>
<li>Reliable real-time text delivery.</li>
<li>Provide a high level XML mechanism of transmitting real-time text.</li>
<li>Be backwards compatible with XMPP clients that do not support real-time text.</li>
<li>Minimize reliance on knowledge of network transversal protocols and/or out-of-band transmission protocols.</li>
<li>Compatible with multi-user chat (MUC) and simultaneous logins.</li>
</ol>
</section2>
<section2 topic="Flexible" anchor="flexible">
<section2 topic="Flexible and Interoperable" anchor="flexible_and_interoperable">
<ol>
<li>Allow use within existing instant-messaging user interfaces, with minimal modifications.</li>
<li>Allow alternate presentations of real-time text, including split screen and/or other layouts.</li>
<li>Protocol design extensible for new features.</li>
<li>Protocol recovery from lost/missing messages.</li>
<li>Allow use within existing instant-messaging user interfaces, with minimal UI modifications.</li>
<li>Allow alternate optional presentations of real-time text, including split screen and/or other layouts.</li>
<li>Protocol design allows error recovery, and allows extensions for new features.</li>
<li>Be interoperable with other real-time text protocols via gateways, including RFC 4103 and other standards.</li>
</ol>
</section2>
<section2 topic="Accessible" anchor="accessible">
<ol>
<li>Allow XMPP to follow the <span class="ref"><strong><link url="http://www.itu.int/rec/T-REC-F.703">ITU-T Rec. F.703</link></strong></span> <note>ITU-T Rec. F.703: Multimedia conversational services. &lt;<link url="http://www.itu.int/rec/T-REC-F.703">http://www.itu.int/rec/T-REC-F.703</link>&gt;.</note> Total Conversation accessibility standard for simultaneous voice, video, and real-time text.</li>
<li>Be a candidate technology for use with Next Generation 9-1-1 / 1-1-2 emergency services.</li>
<li>Be suitable for transcription services and (when coupled with voice at user's choice) for TTY/text telephone alternatives, relay services, and captioned telephone systems.</li>
<li>Be an accessible enhancement for mobile phone text messaging and mainstream instant messaging.</li>
</ol>
</section2>
</section1>
<section1 topic="Glossary" anchor="glossary">
<p><strong>real-time text</strong> Text transmitted and displayed in real-time as it is typed or entered.</p>
<p><strong>real-time message</strong> A chat message that changes in real-time, via real-time text, and as it is edited by the remote sender.</p>
<p><strong>real-time action</strong> An action done to a real-time message, such as an edit action or a presentation action.</p>
<p><strong>real-time chat session</strong> A chat session that supports real-time messages.</p>
<p><strong>RTT</strong> Acronym for real-time text. This is also the name of the main XML element used by this standard.</p>
<p><strong>action element</strong> An XML element that indicates a single edit action, or presentation action.</p>
<p><strong>edit action</strong> A text modification of any kind, including text insertion or deletion. This may be as small as a single key press.</p>
<p><strong>presentation action</strong> A presentation behavior such as the movement of a visible cursor, a pause, or a flash.</p>
<p><strong>real-time text</strong> Text transmitted live while it is being typed or created.</p>
<p><strong>real-time message</strong> Recipient's real-time live view of the sender's message still being typed or created.</p>
<p><strong>real-time message edit</strong> An edit operation done by the remote sender, that is transmitted in real-time to the recipient.</p>
<p><strong>action element</strong> An XML element that represents a single real-time message edit, such as text insertion or deletion.</p>
<p><strong>RTT</strong> Acronym for real-time text.</p>
</section1>
<section1 topic="Protocol" anchor="protocol">
<section2 topic="RTT Element" anchor="rtt_element">
<p>Real-time text is transmitted via an &lt;rtt&gt; child element of a &lt;message&gt; stanza. The &lt;rtt&gt; element is transmitted at regular intervals by the sender while a chat message is being composed, to allow the recipient to watch the sender type (and edit) the message before the full message is sent.</p>
<p>This is a basic example of a <em><strong>real-time message</strong></em> "Hello, my Juliet!", transmitted in three real-time text fragments, followed by a final message delivery:</p>
<p>Real-time text is transmitted via an &lt;rtt/&gt; child element of a &lt;message/&gt; stanza. The &lt;rtt/&gt; element is transmitted at regular intervals by the sender while a chat message is being composed, to allow the recipient to watch the sender type (and edit) the message before the full message is sent in a &lt;body/&gt; element.</p>
<p>This is a basic example of a <em><strong>real-time message</strong></em> "Hello, my Juliet!", transmitted live while it is being typed, before a final message delivery:</p>
<p><strong>Example 1: Introductory Example</strong></p>
<p><code><![CDATA[<message to='juliet@capulet.lit' from='romeo@montague.lit/orchard' type='chat' id='a01'>
<rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
@ -144,60 +150,49 @@
<p>The &lt;rtt&gt; element contains a series of one or more child elements representing <em><strong>real-time actions</strong></em> including <em><strong>edit actions</strong></em> and/or <em><strong>presentation actions</strong></em>. Example 1 illustrates only a single edit action, the &lt;t&gt; <em><strong>action element</strong></em>, which simply adds text to the end of a message. For more information about action elements, see <link url="#realtime_actions">Real-Time Actions</link>.</p>
<p>If the recipient client does not support this real-time text standard, the sender SHOULD NOT transmit the &lt;rtt&gt; element. For more information, see <link url="#determining_support">Determining Support</link>.</p>
<p>The &lt;rtt/&gt; element contains a series of one or more child elements called <em><strong>action elements</strong></em> that represent <em><strong>real-time message edits</strong></em> such as text being appended, inserted, or deleted. Example 1 illustrates only the &lt;t/&gt; action element, which appends text to the end of a message. For more information, see <link url="#realtime_message_editing">Real-Time Message Editing</link>.</p>
<p>Transmission of &lt;rtt/&gt; occurs at regular intervals whenever the sender is actively composing a message. If there are no changes to the message since the last transmission, no transmission occurs. For more information, see <link url="#transmission_interval">Transmission Interval</link>.</p>
<p>The namespace of the &lt;rtt/&gt; element is “urn:xmp:rtt:0”.</p>
</section2>
<section2 topic="RTT Attributes" anchor="rtt_attributes">
<section3 topic="xmlns" anchor="xmlns">
<p>This REQUIRED attribute MUST be <strong>urn:xmpp:rtt:0</strong></p>
</section3>
<section3 topic="seq" anchor="seq">
<p>This REQUIRED attribute indicates the sequence number, and MUST begin at 0 in the first &lt;rtt&gt; element sent at the start of a <em><strong>real-time chat session</strong></em>. This attribute MUST increment by 1 for every &lt;rtt&gt; element sent until the end of the session. This value may be used by the receiver to detect lost &lt;message&gt; elements, which affects the integrity of real-time text. For more information, see <link url="#error_recovery_of_realtime_text">Error Recovery of Real-Time Text</link>.</p>
<section3 topic="seq" anchor="seq">
<p>This REQUIRED attribute is a counter to maintain the integrity of a real-time message. Senders MUST increment the <strong>seq</strong> attribute by 1 for each subsequent &lt;rtt/&gt; transmitted. Recipients MUST monitor the <strong>seq</strong> value to verify that it is incrementing. For more info, see <link url="#automatic_recovery_of_realtime_text">Automatic Recovery of Real-Time Text</link>.</p>
<p>The bounds of <strong>seq</strong> is 31-bits, the range of positive values of a signed integer. The exception to the incrementing rule is &lt;rtt/&gt; elements with an 'event' attribute. In this case, senders MAY use any <strong>seq</strong> value as the new starting value. For best integrity, <strong>seq</strong> SHOULD be randomized. The new starting value SHOULD be less than 1 million to allow plenty of incrementing room, and to keep &lt;rtt/&gt; compact.</p>
</section3>
<section3 topic="event" anchor="event">
<p>This attribute signals session events for real-time messages, such as the start of a new real-time message. The <strong>event</strong> attribute is omitted from the &lt;rtt&gt; element, when it is not needed.</p>
<p>This attribute signals events for real-time messages, such as the start of a new real-time message. The <strong>event</strong> attribute is omitted from the &lt;rtt/&gt; element, when it is not needed, except in the following situations:</p>
<ol>
<li><p><strong>event='new'</strong><br />
The sender MUST use this value on the first &lt;rtt&gt; element of a new real-time message. The recipient MUST initialize a blank real-time message for display, before processing the &lt;rtt&gt; payload, if any is provided.</p></li>
Senders MUST use this value on the first &lt;rtt/&gt; element of a new message, which also delivers the first character(s) being typed in a message. Recipients MUST initialize a new real-time message for display, and then process action elements within this &lt;rtt/&gt;. A new <strong>seq</strong> value MAY be used.</p></li>
<li><p><strong>event='reset'</strong><br />
The sender MAY use this value to retransmit a real-time message. The recipient MUST clear the existing real-time message, before processing the &lt;rtt&gt; payload. One use case is for error recovery. Another use case is where the recipient logs off and on, all while the sender is still composing a message. This allows the recipient re-display the real-time message.</p></li>
<li><p><strong>event='start'</strong><br />
The sender MAY use this value to indicate the start of a real-time session. The &lt;rtt&gt; payload MUST be empty, with no real-time actions.</p></li>
Identical to event='new', except it replaces the existing real-time message. Senders MAY use this attribute during <link url="#automatic_recovery_of_realtime_text">Automatic Recovery of Real-Time Text</link>. Recipients MUST support this attribute, and process action elements within this &lt;rtt/&gt; to replace the existing real-time message.</p></li>
<li><p><strong>event='cancel'</strong><br />
The sender MAY use this value to indicate the end of a real-time session. The recipient SHOULD clear or close the current real-time message, if any is still displayed. An example case is closing a chat window before a message is delivered. The &lt;rtt&gt; payload MUST be empty, with no real-time actions.</p></li>
Senders MAY use this value to signal recipient to stop transmitting real-time text. Recipients SHOULD clear the real-time message, and discontinue sending back &lt;rtt/&gt; for the remainder of the current chat session until the sender sends another &lt;rtt/&gt; to resume real-time text. No action elements should be included within &lt;rtt/&gt;.</p></li>
</ol>
<p>Only one &lt;rtt&gt; element is allowed per &lt;message&gt;. Therefore, to transmit multiple events, use multiple consecutive &lt;message&gt;'s.</p>
<p>The first &lt;rtt/&gt; element in a chat session, signals the start of real-time text. The &lt;rtt event='cancel'/&gt; signals the end of real-time text in a chat session. There MUST NOT be more than one &lt;rtt/&gt; element per &lt;message/&gt;.&nbsp;</p>
</section3>
</section2>
<section2 topic="Body Element" anchor="body_element">
<p>To turn a real-time message into a permanent delivered message, the sender MUST transmit the whole message as a standard &lt;body&gt; child element within the &lt;message&gt; stanza.</p>
<p>Upon receipt of &lt;body/&gt;, the message becomes permanent and can not be edited any further. The delivered message is displayed instead of the real-time message. In the ideal case, the message from &lt;body/&gt; is redundant since this delivered message is identical to the final contents of the real-time message. When the sender begins composing a new message after a &lt;body/&gt; is sent, the next &lt;rtt/&gt; transmitted by the sender MUST contain the <strong>event='new'</strong> attribute.</p>
<section3 topic="Backwards Compatible" anchor="backwards_compatible">
<p>The &lt;body&gt; element continues to follow the &xmppcore; standard. This keeps backwards compatibility with XMPP clients that do not support this specification. Such clients will continue to behave normally, displaying complete lines of messages as they are delivered.</p>
</section3>
<section3 topic="Behavior in Clients Supporting This Specification" anchor="behavior_in_clients_supporting_this_specification">
<p>Upon receipt of &lt;body&gt;, the recipient MUST replace the real-time message with the final delivered message from &lt;body&gt;. The message is thus becomes permanent and can not be edited any further.</p>
<p>In the ideal case, the message in a &lt;body&gt; is redundant since it simply repeats the entire contents of the real-time message. In the event that there are lost &lt;messages&gt;, the delivery of the &lt;body&gt; permits <link url="#error_recovery_of_realtime_text">Error Recovery of Real-Time Text</link>.</p>
<p>After sending a completed message as a &lt;body&gt;, the sender may begin a real-time message, using the <strong>event='new'</strong> attribute.</p>
<p>The real-time text standard simply provides early delivery of text before the &lt;body/&gt; element. The &lt;body/&gt; element continues to follow the &xmppcore; standard. Clients that do not support real-time text, will continue to behave normally, displaying complete lines of messages as they are delivered.</p>
</section3>
</section2>
<section2 topic="Transmission Interval" anchor="transmission_interval">
<p>For the best balance between interoperability and usability, the interval of transmission of &lt;rtt&gt; for a continuously-changing real-time message SHOULD be once every <strong>1 second</strong>. If there has been no changes to the real-time message, no transmission should take place.</p>
<p>A much shorter interval may more frequently trigger the flooding protection algorithms in XMPP servers, leading to dropped &lt;message&gt; elements and/or <link url="#congestion_considerations">Congestion Considerations</link>. A longer interval will lead to a less optimal user experience. One second is a balance that meets the requirements of real-time text. This interval is mentioned in other real-time text standards, including section 5.4 of IETF RFC 4103 and section 5.2.2 of <span class="ref"><strong><link url="http://tools.ietf.org/html/rfc5194">IETF RFC 5194</link></strong></span> <note>IETF RFC 5194: Framework for Real-Time Text over IP Using the Session Initiation Protocol (SIP). &lt;<link url="http://tools.ietf.org/html/rfc5194">http://tools.ietf.org/html/rfc5194</link>&gt;.</note>, used for SIP.</p>
<p>To smooth the output of text, this specification supports transmission of the sender's original <link url="#key_press_intervals">Key Press Intervals</link>. This allows the recipient software to display the sender's typing at the original speed, regardless of the transmission interval.</p>
</section2>
<section2 topic="Real-Time Actions" anchor="realtime_actions">
<p>The &lt;rtt&gt; element is used to transmit a series of one or more real-time actions, including edit actions and presentation actions.</p>
<p>Most chat clients allow a sender to edit their message before sending (i.e. via a Send button, or hitting Enter). The inclusion of real-time functionality to existing chat client software must not degrade the sender's existing expectation of being able to edit their messages before sending. Thus, in a real-time chat session, the recipient can watch the sender compose and edit their message before it is delivered.</p>
<p>Edit actions include typing of text, backspacing, and blocks of text being inserted or deleted. In addition, a real-time chat session may also include presentation actions, including:</p>
<p>For the best balance between interoperability and usability, the transmission interval of &lt;rtt/&gt; for a continuously-changing message SHOULD be approximately <strong>0.7 second</strong>. This interval meets <span class="ref"><strong><link url="http://www.itu.int/rec/T-REC-F.700">ITU-T Rec. F.700</link></strong></span> <note>ITU-T Rec. F.700: Framework Recommendation for multimedia services &lt;<link url="http://www.itu.int/rec/T-REC-F.700">http://www.itu.int/rec/T-REC-F.700</link>&gt;.</note> for good real-time text. If a different transmission interval needs to be used, the interval SHOULD be <strong>between 0.3 second and 1 second</strong>.</p>
<p>A longer interval will lead to a less optimal user experience. Conversely, a much shorter interval may more frequently trigger throttling or flooding protection algorithms in public XMPP servers, leading to dropped &lt;message/&gt; elements and/or <link url="#congestion_considerations">Congestion Considerations</link>.</p>
<p>To provide fluid real-time text, one or more of the following methods can be used:</p>
<ul>
<li>Key press intervals, to smooth the flow of real-time text independently of the transmission interval;</li>
<li>Cursor movements, to make it easier for the recipient to watch edits being made by the sender;</li>
<li>Visual flash (beep) to allow the sender to catch the attention of the recipient.</li>
<li><link url="#preserving_key_press_intervals">Preserving Key Press Intervals</link> for natural typing display, independently of the transmission interval.</li>
<li>Use of <link url="#time_critical_and_low_latency_methods">Time Critical And Low Latency Methods</link>, for real-time captioning/transcription.</li>
<li>For other options or reduced-precision options, see <link url="#reduced_precision_text_smoothing_methods">Reduced Precision Text Smoothing Methods</link>.</li>
</ul>
<p>Each real-time action is represented by an <em><strong>action element</strong></em><em>.</em> Examples can be found in <link url="#use_cases">Use Cases</link>.</p>
</section2>
<section2 topic="Real-Time Message Editing" anchor="realtime_message_editing">
<p>The &lt;rtt/&gt; element MAY contain one or more <em><strong>action elements</strong></em> representing real-time message editing operations, including text being appended, inserted, or deleted.</p>
<p>Most chat clients allow a sender to edit their message before sending (i.e. via a Send button, or hitting Enter). The inclusion of real-time functionality to existing chat client software must not degrade the sender's existing expectation of being able to edit their messages before sending. Thus, in a chat session with real-time text, the recipient can watch the sender compose and edit their message before it is delivered.</p>
<section3 topic="Summary of Action Elements" anchor="summary_of_action_elements">
<p>The following is a short summary. For more detailed information, see <link url="#action_elements">Action Elements</link>.</p>
<section4 topic="Edit Actions (Tier 1)" anchor="edit_actions_tier_1">
<p>This is a short summary of action elements that operate on a real-time message. For detailed information, see <link url="#action_elements">Action Elements</link>.</p>
<table>
<tr>
<th>Action</th>
@ -212,147 +207,126 @@
<tr>
<td>Backspace</td>
<td>&lt;e&nbsp;p='#'&nbsp;n='#'/&gt;</td>
<td>REQUIRED. Remove <em><strong>n</strong></em> characters to the left of position <em><strong>p</strong></em> in message<em>.</em></td>
<td>REQUIRED. Remove <em><strong>n</strong></em> characters before position <em><strong>p</strong></em> in message<em>.</em></td>
</tr>
<tr>
<td>Forward&nbsp;Delete</td>
<td>&lt;d&nbsp;p='#'&nbsp;n='#'/&gt;</td>
<td>REQUIRED. Remove <em><strong>n</strong></em> characters starting at position <em><strong>p</strong></em> in message<em>.</em></td>
</tr>
</table>
</section4>
<section4 topic="Presentation Actions (Tier 2)" anchor="presentation_actions_tier_2">
<table>
<tr>
<th>Action</th>
<th>Element</th>
<th>Description</th>
<td>REQUIRED. Remove <em><strong>n</strong></em> characters starting at position <em><strong>p</strong></em> in message.</td>
</tr>
<tr>
<td>Interval</td>
<td>&lt;w&nbsp;n='#'/&gt;</td>
<td>RECOMMENDED. Execute a pause of <em><strong>n</strong></em> thousandths of a second.</td>
</tr>
<tr>
<td>Cursor&nbsp;Position</td>
<td>&lt;c&nbsp;p='#'/&gt;</td>
<td>OPTIONAL. Move cursor to position <em><strong>p</strong></em> in message.</td>
</tr>
<tr>
<td>Flash</td>
<td>&lt;g/&gt;</td>
<td>OPTIONAL. Execute a visual flash, beep, or buzz.</td>
</tr>
</table>
</section4>
<section4 topic="Rules for Attribute Values" anchor="rules_for_attribute_values">
</section3>
<section3 topic="Rules for Attribute Values" anchor="rules_for_attribute_values">
<ul>
<li>The <em><strong>n</strong></em> and <em><strong>p</strong></em> attributes are unsigned 32-bit integers, represented as a string.</li>
<li>If the <em><strong>n</strong></em> attribute is omitted, the default value for <em><strong>n</strong></em> is 1.</li>
<li>If the <em><strong>p</strong></em> attribute is omitted, the default value for <em><strong>p</strong></em> is the length of the current real-time message.</li>
<li>A <em><strong>p</strong></em> value of 0 represents the start of the message.</li>
<li><em><strong>n</strong></em> and <em><strong>p</strong></em> values are counts of individual Unicode code points.</li>
<li><p>The <em><strong>n</strong></em> attribute represents a length value. If the <em><strong>n</strong></em> attribute is omitted, the default value for <em><strong>n</strong></em> MUST be 1.</p></li>
<li><p>The <em><strong>p</strong></em> attribute represents an absolute position value. This is a 0-based index, where 0 represents the first character of the real-time message. If <em><strong>p</strong></em> is omitted, <em><strong>p</strong></em> MUST be treated as the length of the message (points to end of the real-time message).</p></li>
<li><p>For text modifications, both <em><strong>n</strong></em> and <em><strong>p</strong></em> attributes are based on <link url="#unicode_character_counting">Unicode Character Counting</link>. Also see <link url="#ensuring_accuracy_of_attribute_values">Ensuring Accuracy Of Attribute Values</link>.</p></li>
</ul>
<p>For interoperability of <em><strong>p</strong></em> and <em><strong>n</strong></em> values, processing MUST be done on the original Unicode real-time message. For both senders and receivers, this is the version of the Unicode message text without Unicode normalization, emoticon graphics images, display text formatting, processing of Unicode combining marks, etc. For recipients obtaining text from the &lt;t&gt; element, this is the Unicode text immediately after XML processing, and before any further processing. From the perspective of <em><strong>p</strong></em> and <em><strong>n</strong></em> values, a real-time message is treated as an editable array of Unicode code points.</p>
<p>Regardless of the original format of line breaks during XMPP transmission, line breaks are treated as a single code unit (LINE FEED U+000A) for the purposes of real-time message processing. Conversion of line breaks into a single U+000A character is REQUIRED for XML processors, according to section 2.11 of <span class="ref"><strong><link url="http://www.w3.org/TR/xml/">XML</link></strong></span> <note>XML: Extensible Markup Language 1.0 (Fifth Edition). &lt;<link url="http://www.w3.org/TR/xml/">http://www.w3.org/TR/xml/</link>&gt;.</note>, so a compliant XML processor already do this automatically, and already provide the correct original Unicode text for interoperability.</p>
<p><strong>NOTE WELL</strong>: Extreme care MUST be taken to correctly calculate n and p values based on Unicode code points, to avoid corruption of the real-time message during real-time editing. For more information, see <link url="#internationalization_considerations">Internationalization Considerations</link>.</p>
</section4>
</section3>
</section3>
<section3 topic="Action Elements" anchor="action_elements">
<section4 topic="Element &lt;t&gt; Insert Text" anchor="element_t_insert_text">
<p>(Tier 1) REQUIRED. Supports the transmission of key presses, text block inserts, and text being pasted.<br />
<em>Note:</em> <em>Any text permitted in the &lt;body&gt; element of a &lt;message&gt; may be used, subject to the rules in XMPP Core. More examples can be found in <link url="#use_cases">Use Cases</link>.</em></p>
<p>Recipients are REQUIRED to support &lt;t/&gt;, &lt;e/&gt; and &lt;d/&gt; action elements for incoming &lt;rtt/&gt; transmissions, even if not all elements are used for outgoing &lt;rtt/&gt; transmissions. Support for &lt;w/&gt; is RECOMMENDED for both senders and recipients in order to accommodate <link url="#preserving_key_press_intervals">Preserving Key Press Intervals</link>. Recipients MUST ignore unexpected or unsupported elements within &lt;rtt/&gt;, while continuing to process subsequent action elements. Action elements are immediate child elements of the &lt;rtt/&gt; element, and are never nested. Examples can be found in <link url="#use_cases">Use Cases</link>.</p>
<section4 topic="Element &lt;t/&gt; Insert Text" anchor="element_t_insert_text">
<p>REQUIRED. Supports the transmission of key presses, text block inserts, and text being pasted.<br />
<em>Note:</em> <em>Any text normally used in the &lt;body/&gt; element of a &lt;message/&gt; may be used. If the &lt;t/&gt; element is empty, no text modification takes place.</em></p>
<p><code><![CDATA[<t p='#'>text</t>]]></code></p>
<p>Inserts specified <em><strong>text</strong></em> at position <em><strong>p</strong></em> in the message text.</p>
<p><code><![CDATA[<t>text</t>]]></code></p>
<p>Appends specified <em><strong>text</strong></em> at the end of message. (<em><strong>p</strong></em> defaults to message length)</p>
</section4>
<section4 topic="Element &lt;e&gt; Backspace" anchor="element_e_backspace">
<p>(Tier 1) REQUIRED. Supports the behavior of Backspace key presses.<br />
<em>Note: Direction 'left' represents the numeric direction. Thus, for right-to-left text (i.e. Arabic), numeric 'left' represents visible 'right'.</em></p>
<section4 topic="Element &lt;e/&gt; Backspace" anchor="element_e_backspace">
<p>REQUIRED. Supports the behavior of Backspace key presses.<br />
<em>Note: Excess backspaces, at the start of the message, MUST be ignored.</em></p>
<p><code><![CDATA[<e n='#' p='#'/>]]></code></p>
<p>Remove <em><strong>n</strong></em> characters to the left of position <em><strong>p</strong></em> in message.</p>
<p>Remove <em><strong>n</strong></em> characters before position <em><strong>p</strong></em> in message.</p>
<p><code><![CDATA[<e p='#'/>]]></code></p>
<p>Remove 1 character to the left of position <em><strong>p</strong></em> in message. (<em><strong>n</strong></em> defaults to 1)</p>
<p>Remove 1 character before position <em><strong>p</strong></em> in message. (<em><strong>n</strong></em> defaults to 1)</p>
<p><code><![CDATA[<e n='#'/>]]></code></p>
<p>Remove <em><strong>n</strong></em> characters from end of message. (<em><strong>p</strong></em> defaults to message length)</p>
<p><code><![CDATA[<e/>]]></code></p>
<p>Remove 1 character from end of message. (Both <em><strong>n</strong></em> and <em><strong>p</strong></em> at default values)</p>
</section4>
<section4 topic="Element &lt;d&gt; Forward Delete" anchor="element_d_forward_delete">
<p>(Tier 1) REQUIRED. Supports the behavior of Delete key presses, text block deletes, and text being cut.<br />
<em>Note: Direction 'right' represents the numeric direction. Thus, for right-to-left text (i.e. Arabic), numeric 'right' represents visible 'left'.</em></p>
<section4 topic="Element &lt;d/&gt; Forward Delete" anchor="element_d_forward_delete">
<p>REQUIRED. Supports the behavior of Delete key presses, text block deletes, and text being cut.<br />
<em>Note: Excess deletes, beyond end of message, MUST be ignored.</em></p>
<p><code><![CDATA[<d p='#' n='#'/>]]></code></p>
<p>Remove <em><strong>n</strong></em> characters to the right of position <em><strong>p</strong></em> in message.</p>
<p>Remove <em><strong>n</strong></em> characters beginning at position <em><strong>p</strong></em> in message.</p>
<p><code><![CDATA[<d p='#'/>]]></code></p>
<p>Remove 1 character to the right of position <em><strong>p</strong></em> in message. (<em><strong>n</strong></em> defaults to 1)</p>
<p>Remove 1 character beginning at position <em><strong>p</strong></em> in message. (<em><strong>n</strong></em> defaults to 1)</p>
</section4>
<section4 topic="Element &lt;w&gt; Interval" anchor="element_w_interval">
<p>(Tier 2) RECOMMENDED. Allows the transmission of the original intervals between real-time actions, including the pauses between key presses. For more information, see <link url="#key_press_intervals">Key Press Intervals</link>.</p>
<section4 topic="Element &lt;w/&gt; Interval" anchor="element_w_interval">
<p>RECOMMENDED. Allows the transmission of intervals between real-time message edits, such as the pauses between key presses. For more information, see <link url="#preserving_key_press_intervals">Preserving Key Press Intervals</link>.</p>
<p><code><![CDATA[<w n='#'/>]]></code></p>
<p>Executes a pause of <em><strong>n</strong></em> thousandths of a second. The <em><strong>n</strong></em> value SHOULD NOT exceed the <link url="#transmission_interval">Transmission Interval</link>. Also, if a <link url="#body_element">Body Element</link> arrives, pauses SHOULD be interrupted to prevent message delivery delay.</p>
</section4>
<section4 topic="Element &lt;c&gt; Cursor Position" anchor="element_c_cursor_position">
<p>(Tier 2) OPTIONAL. Allows the transmission of cursor positions. This allows the recipient to see the sender's cursor in their real-time message, and makes it easier to track the sender's message edits. For more information, see <link url="#remote_cursor">Remote Cursor</link>.</p>
<p><code><![CDATA[<c p='#'/>]]></code></p>
<p>Moves cursor (caret) to the character position <em><strong>p</strong></em> in message.</p>
</section4>
<section4 topic="Element &lt;g&gt; Flash" anchor="element_g_flash">
<p>(Tier 2) OPTIONAL. Allows a flash/beep/buzz feature. This feature is the real-time version of &xep0224;, and MAY execute the same alerting method.<br />
<em>Note: This supports real-time text interoperability with similar features in text telephones for the deaf (TTY / TDD), ITU-T T.140 implementations, and Control-G beep at consoles.</em></p>
<p><code><![CDATA[<g/>]]></code></p>
<p>Executes a brief flash, sound, vibration, etc.</p>
<p>Executes a pause of <em><strong>n</strong></em> thousandths of a second. This pause may be approximate, and not necessarily be of millisecond precision. The <em><strong>n</strong></em> value SHOULD NOT exceed the <link url="#transmission_interval">Transmission Interval</link>. Also, if a <link url="#body_element">Body Element</link> arrives, pauses SHOULD be interrupted to prevent a delay in message delivery.</p>
</section4>
</section3>
<section3 topic="Processing Rules" anchor="processing_rules">
<section3 topic="Ensuring Accuracy Of Attribute Values" anchor="ensuring_accuracy_of_attribute_values">
<p>Real-time message edits work only within the boundaries of the current real-time message, and do not affect previous messages. Senders MUST NOT use negative values for any attribute, nor use <em><strong>p</strong></em> values beyond the current message length. However, recipients receiving such values MUST clip negative values to 0, and clip excessively high <em><strong>p</strong></em> values to the current message length.</p>
<p>For senders, <em><strong>p</strong></em> and <em><strong>n</strong></em> values are calculated relative to the plain text version of the message. This is the message otherwise normally transmitted in a &lt;body/&gt; element after all processing is complete, including emoticon graphics as plain text. For recipients, <em><strong>p</strong></em> and <em><strong>n</strong></em> are calculated relative to the message text immediately after XML processing, and before any further processing.</p>
<p>Regardless of the original format of line breaks during XMPP transmission, line breaks are treated as a single code point (LINE FEED U+000A). Conversion of line breaks into a single line feed is REQUIRED for XML processors, according to section 2.11 of <span class="ref"><strong><link url="http://www.w3.org/TR/xml/">XML</link></strong></span> <note>XML: Extensible Markup Language 1.0 (Fifth Edition). &lt;<link url="http://www.w3.org/TR/xml/">http://www.w3.org/TR/xml/</link>&gt;.</note>, so a compliant XML processor already do this automatically, and already provide the correct original Unicode text for interoperability.</p>
</section3>
<section3 topic="Unicode Character Counting" anchor="unicode_character_counting">
<p>For platform-independent interoperability, calculations of <em><strong>p</strong></em> and <em><strong>n</strong></em> values MUST be based on Unicode code points. Different platforms use different internal Unicode encodings, which may be different from the transmission encoding (UTF-8) for XMPP. Consider these factors:</p>
<ul>
<li>There MUST not be more than one &lt;rtt&gt; child element per &lt;message&gt;.</li>
<li>The &lt;rtt&gt; element MAY be an empty element, or contain one or more action elements.</li>
<li>Action elements MUST be immediate children of &lt;rtt&gt;. Nesting is not allowed.</li>
<li>Support of all Tier 1 edit actions is REQUIRED.</li>
<li>Tier 2 presentation actions MUST NOT affect the contents of the real-time message string.</li>
<li>Excess Backspaces and Forward Deletes beyond start/end of the message, MUST be ignored.</li>
<li>Recipients MUST process supported action elements in the same order as received within the &lt;rtt&gt; element. Unrecognized action elements MUST be ignored.</li>
<li>Real-time actions work only within the boundaries of the current real-time message, and MUST NOT affect previous messages.</li>
<li>Values for <em><strong>p</strong></em> and <em><strong>n</strong></em> attributes MUST be calculated according to <link url="#rules_for_attribute_values">Rules for Attribute Values</link>.</li>
<li><p>Multiple Unicode code points may represent one displayable Unicode glyph (i.e. combining marks).<br />
<em>Action elements operate on Unicode code points, not on displayable character glyphs.</em></p></li>
<li><p>Characters U+10000 through U+1FFFF, which are single code points, but are represented as multiple surrogate code units in certain Unicode encodings (i.e. UTF-16).<br />
<em>Action elements operate on Unicode code points, not on individual surrogate code units.</em></p></li>
<li><p>Some Unicode encodings use a variable number of bytes per Unicode character (i.e. UTF-8).<br /><em>Action elements operate on Unicode code points, not on individual bytes.</em></p></li>
</ul>
<p>Incorrectly calculated <em><strong>p</strong></em> and <em><strong>n</strong></em> values may cause scrambled text during real-time message editing for many languages. This scrambled text persists until full message delivery, or <link url="#message_retransmission">Message Retransmission</link>. From the perspective of <em><strong>p</strong></em> and <em><strong>n</strong></em> values, a real-time message is treated equivalent to an editable array of Unicode code points, even if not necessarily stored as such.</p>
</section3>
</section2>
<section2 topic="Error Recovery of Real-Time Text" anchor="error_recovery_of_realtime_text">
<p>In a real-time chat session, it is critical that the real-time message is identical on both the sender and recipient ends. The loss of a single &lt;rtt&gt; transmission can represent missing text, or a missing edit. This leads to the real-time message getting out of sync, the message becoming different on the sender versus the recipient ends.</p>
<p>Transmissions of &lt;message&gt; elements may be lost for several reasons. One reason is that a recipient may disconnect and reconnect while a sender is still typing a message. Another reason is some XMPP servers may drop &lt;message&gt; elements automatically (i.e. flooding protection).</p>
<section2 topic="Automatic Recovery of Real-Time Text" anchor="automatic_recovery_of_realtime_text">
<p>In a chat session with real-time text, it is critical that the real-time message is identical on both the sender and recipient ends. The loss of a single &lt;rtt/&gt; transmission can represent missing text, or a missing edit. This leads to the real-time message getting out of sync. Recovery of in-progress real-time message is useful in several situations:</p>
<ul>
<li>Disconnect and reconnection (i.e. intentional, unintentional, wireless reception, servers, etc.)</li>
<li>XMPP servers may drop &lt;message/&gt; elements automatically (i.e. flooding protection).</li>
<li>Multiple <link url="#simultaneous_logins">Simultaneous Logins</link>. (i.e. additional clients logging in, recipient switching computers)</li>
</ul>
<section3 topic="Staying In Sync" anchor="staying_in_sync">
<p>To stay synchronized:</p>
<p>To stay synchronized, for &lt;rtt/&gt; elements that do not contain an 'event' attribute:</p>
<ol>
<li>The sender MUST increment the <strong>seq</strong> attribute for each consecutive &lt;rtt&gt; element sent.</li>
<li>The recipients MUST monitor the <strong>seq</strong> attribute value of received &lt;rtt&gt; elements, to verify that it is incrementing.</li>
<li>The seq values for the sender, and for the recipient, are independent and kept track of separately.</li>
<li>The sender MUST increment the <strong>seq</strong> attribute for consecutive &lt;rtt/&gt; element.</li>
<li>The recipient MUST monitor the <strong>seq</strong> attribute value of received &lt;rtt/&gt; elements, to verify that it is incrementing.</li>
<li>The seq values for incoming messages, versus outgoing messages, are independent and kept track of separately.</li>
</ol>
</section3>
<section3 topic="Detecting Loss of Sync" anchor="detecting_loss_of_sync">
<p>The sync is considered lost if the <strong>seq</strong> attribute of the &lt;rtt&gt; element does not increment as expected. Trying to process certain real-time edit actions after loss of sync, will result in scrambled text. Therefore, to avoid this situation:</p>
<p>The sync is considered lost if the <strong>seq</strong> attribute of the &lt;rtt/&gt; element does not increment as expected. Trying to process certain action elements, after loss of sync, can result in scrambled text. Therefore, to avoid this situation:</p>
<ol>
<li>The recipient MUST stop processing all subsequent real-time action elements, and freeze the current real-time message.</li>
<li>An indicator (i.e. reception bars, color code, missing text indicator) or a chat state message (i.e. “Typing Frozen...”) MAY be used by the recipient to indicate the loss of sync.</li>
<li>The recipient MUST stop processing all subsequent action elements, and pause the current real-time message.</li>
<li>An indicator MAY be used by the recipient to indicate the loss of sync. (i.e. reception bars, color code, missing text indicator, chat state message)</li>
</ol>
</section3>
<section3 topic="Recovery From Loss of Sync" anchor="recovery_from_loss_of_sync">
<p>Recovery occurs when any of the following happens:</p>
<ol>
<li>A message &lt;body&gt; is delivered. The frozen real-time message MUST be replaced with this delivered message.</li>
<li>The <strong>event</strong> attribute of &lt;rtt&gt; has a value of <strong>new</strong> or <strong>reset</strong>. Processing of real-time text MUST resume, with the new correct <strong>seq</strong> value obtained from this &lt;rtt&gt; element.</li>
<li>A message &lt;body/&gt; is delivered. The <link url="#body_element">Body Element</link> replaces the real-time message.</li>
<li>The <strong>event</strong> attribute of &lt;rtt/&gt; has a value of <strong>new</strong> or <strong>reset</strong>. Processing of real-time MUST restart, with the new starting <strong>seq</strong> value obtained from this &lt;rtt/&gt; element.</li>
</ol>
</section3>
<section3 topic="Helping The Recipient Stay In Sync" anchor="helping_the_recipient_stay_in_sync">
<p>The sender MAY help the recipient stay in sync by automatically retransmitting the real-time message whenever the recipient status changes from offline to online. The entire contents of the real-time message, may be retransmitted using the &lt;rtt&gt; attribute <strong>event='reset'</strong> with a single Insert Text action.</p>
<p><code><![CDATA[<rtt event='reset' seq='#'>
<section3 topic="Message Retransmission" anchor="message_retransmission">
<p>In order to prevent recipients from waiting for <link url="#recovery_from_loss_of_sync">Recovery From Loss of Sync</link>, senders SHOULD retransmit the contents of a partially-composed message, in the following situations:</p>
<ul>
<li>When the recipient's presence changes. (i.e. offline to online)</li>
<li>When the recipient sends a &lt;message/&gt; from a different full JID than before. (i.e. <link url="#simultaneous_logins">Simultaneous Logins</link>)</li>
<li>At regular intervals, to allow recovery from unexpected situations such as lost &lt;message/&gt; stanzas.</li>
</ul>
<p>A message retransmit is done using the &lt;rtt/&gt; attribute <strong>event='reset'</strong> (see <link url="#rtt_attributes">RTT Attributes</link>).</p>
<p><code><![CDATA[<rtt event='reset' seq='#' xmlns='urn:xmpp:rtt:0'>
<t>This is a retransmission of the entire real-time message.</t>
</rtt>]]></code></p>
<p>Retransmission SHOULD be done at a regular interval of 10 seconds, unless there are no message changes. This interval is frequent enough to minimize user waiting time, while being infrequent enough to reduce bandwidth overhead. This interval MAY vary in order to reduce average bandwidth requirements for minor message changes and/or for long messages.</p>
</section3>
</section2>
</section1>
<section1 topic="Determining Support" anchor="determining_support">
<p>If a client supports the Real Time Text protocol, it MUST advertise that fact in its responses via &xep0030; information ("disco#info") requests by returning a feature of <strong>urn:xmpp:rtt:0</strong></p>
<p>If a client supports this real-time text protocol, it MUST advertise that fact in its responses via &xep0030; information ("disco#info") requests by returning a feature of <strong>urn:xmpp:rtt:0</strong></p>
<p><strong>Example 1. A disco#info query</strong></p>
<p><code><![CDATA[<iq from='romeo@montague.lit/orchard'
id='disco1'
@ -373,23 +347,29 @@
</iq>
]]></code></p>
<p>If this successful response of <strong>&lt;feature var='urn:xmpp:rtt:0'/&gt;</strong> is not received, the client SHOULD NOT transmit any outgoing &lt;rtt&gt; elements in &lt;message&gt; transmissions. This avoids unnecessary consumption of bandwidth to clients that do not support this protocol.</p>
<p>If this successful response of <strong>&lt;feature var='urn:xmpp:rtt:0'/&gt;</strong> is not received, the client SHOULD NOT transmit any outgoing &lt;rtt/&gt; elements in &lt;message/&gt; transmissions. This avoids unnecessary consumption of bandwidth to clients that do not support this protocol.</p>
</section1>
<section1 topic="Implementation Notes" anchor="implementation_notes">
<section2 topic="Key Press Intervals" anchor="key_press_intervals">
<section2 topic="Text Presentation" anchor="text_presentation">
<section3 topic="Avoid Bursty Text Presentation" anchor="avoid_bursty_text_presentation">
<p>To prevent flooding the public XMPP network, transmissions of messages containing real-time text is rate-limited to the recommended &lt;rtt&gt; message transmission interval, usually 1 second according to <link url="#transmission_interval">Transmission Interval</link>. If the display of text is not smoothed, text will appear in intermittent bursts. This hurts usability of real-time text.</p>
<p>If a long <link url="#transmission_interval">Transmission Interval</link> is used without <link url="#preserving_key_press_intervals">Preserving Key Press Intervals</link>, then text will appear in intermittent bursts if the display of text is not smoothed. This hurts user experience of real-time text.</p>
</section3>
<section3 topic="Preserving Key Press Intervals" anchor="preserving_key_press_intervals">
<p>Through the use of the RECOMMENDED <link url="#element_w_interval">Element &lt;w&gt; Interval</link>, the original look-and-feel of typing can be preserved, despite the long transmission interval. Using the &lt;w&gt; element, the sender can record multiple key presses including key press intervals, and transmit them over the XMPP network in a single &lt;message&gt;. The recipient can then play back the sender's typing in real-time at original typing speed including the intervals between key presses. The text is displayed exactly as it was typed.</p>
<p>Much like VoIP is a packetization of sound, this spec enables packetization of typing including the original key press intervals. This enables the real-time feel of typing over virtually any Internet connection, and without requiring shorter transmission intervals. Look and feel of typing is also preserved over polled XMPP including &xep0206;, as well as over satellite and long international connections with heavy packet-bursting tendencies and variable latencies.</p>
<p>The recipient can watch the sender fluidly compose/edit their message in real-time without any “bursting” effects. This is “Natural Typing”, and appears indistinguishable from local typing. Since all key press intervals are preserved at a high precision, all subtleties of typing are preserved, including the 'mood' (calm typing versus panicked or emphatic typing, etc).</p>
<p>For an example transmission of key intervals, see <link url="#real_world_message_with_key_press_intervals">Real World Message With Key Press Intervals</link>.</p>
<p>For the highest quality display of text being typed, using <link url="#element_w_interval">Element &lt;w/&gt; Interval</link> allows the original look-and-feel of typing to be preserved, independently of the transmission interval. Using the &lt;w/&gt; element, the sender can record multiple key presses including key press intervals, and transmit them over the XMPP network in a single &lt;message/&gt;. The recipient can then play back the sender's typing in real-time at original typing speed including the intervals between key presses.</p>
<p>Much like VoIP is a packetization of sound, this spec enables packetization of typing including the original key press intervals. This enables the real-time feel of typing over virtually any network connection, without requiring frequent transmission intervals. Look and feel of typing is also preserved over variable latency connections including &xep0206;, mobile phone, satellite and long international connections with heavy packet-bursting tendencies.</p>
<p>The recipient can watch the sender fluidly compose/edit their message in real-time without any “bursting” effects. This is “Natural Typing”, and appears indistinguishable from local typing. When key press intervals are preserved at high precision, all subtleties of typing are preserved, including the 'mood' (calm typing versus panicked or emphatic typing, etc). For an example transmission of key intervals, see <link url="#full_message_including_key_press_intervals">Full Message Including Key Press Intervals</link>.</p>
</section3>
<section3 topic="Time Critical And Low Latency Methods" anchor="time_critical_and_low_latency_methods">
<p>There are specialized situations such as live transcriptions and captioning (i.e. transcription service, closed captioning provider, captioned telephone, relay services, Remote CART) that demands low latency transmission. Such systems typically use voice recognition and/or stenotype machines, which output text in word bursts rather than a character at a time. Senders with bursty output MAY immediately transmit word bursts of text without buffering. This eliminates any lag caused by the <link url="#transmission_interval">Transmission Interval</link>. It is NOT REQUIRED to monitor or transmit <link url="#element_w_interval">Element &lt;w/&gt; Interval</link> for transcription. If additional accuracy is required, it is also possible to timecode the &lt;rtt/&gt; elements.</p>
</section3>
<section3 topic="Reduced Precision Text Smoothing Methods" anchor="reduced_precision_text_smoothing_methods">
<p>Some software platforms (i.e. JavaScript, BOSH, mobile devices, etc.) may have low-precision timers that impact <link url="#transmission_interval">Transmission Interval</link> and/or <link url="#preserving_key_press_intervals">Preserving Key Press Intervals</link>. Clients MAY optimize for bandwidth, performance and/or screen repaints by eliminating, merging, or ignoring <link url="#element_w_interval">Element &lt;w/&gt; Interval</link> selectively, especially those containing shorter intervals. The transmission interval of &lt;rtt/&gt; MAY also vary, either intentionally for optimizations, or due to precision limitation.</p>
<p>Clients MAY choose to implement alternate text-smoothing methods, such as adaptive-rate character-at-a-time output, and/or word buffering for incoming real-time text. Word buffering prevents most typing mistakes from being displayed, which can be a useful mode of operation for certain recipients who may dislike watching the sender's typing mistakes.</p>
</section3>
</section2>
<section2 topic="Real-time Transmission" anchor="realtime_transmission">
<section2 topic="Real-Time Transmission" anchor="realtime_transmission">
<section3 topic="Monitoring Message Edits" anchor="monitoring_message_edits">
<p>For sending clients, there are several methods of capturing typing and message edits, in order to generate action elements for an &lt;rtt&gt; transmission. The most reliable and practical method is to monitor the <strong>text change event</strong> of a text box field (rather than monitoring key press events) since:</p>
<p>For sending clients, there are several potential methods of capturing typing and message edits, in order to generate action elements for an &lt;rtt/&gt; transmission. However, instead of monitoring key presses directly, the most reliable and practical method is to monitor the <strong>text changes</strong> to the local message text field:</p>
<ul>
<li>it captures all typing, including edits and deletes.</li>
<li>it captures cut &amp; paste operations, as well as edits made via a pointing device.</li>
@ -398,50 +378,51 @@
<li>it makes no assumptions about different keyboards or input entry methods.</li>
<li>text change events are more cross-platform portable, including on mobile phones.</li>
</ul>
<p>In the text change event, the current message string can be compared to the previous message string in order to calculate what text changes took place. For more information, see <link url="#rules_for_attribute_values">Rules for Attribute Values</link>. The appropriate action elements are then generated, to represent text insertions and deletions. The key press interval can be measured as the time elapsed in milliseconds between text change events.</p>
<p>In a text change event, the current message string can be compared to the previous message string in order to calculate what text changes took place. The appropriate action elements are then generated, to represent text insertions and deletions. If <link url="#preserving_key_press_intervals">Preserving Key Press Intervals</link> are supported, then the interval is implemented as the time elapsed between text change events. For additional information, see <link url="#action_elements">Action Elements</link> and <link url="#rules_for_attribute_values">Rules for Attribute Values</link>. The following guidelines are for clients that use keyboard input.</p>
</section3>
<section3 topic="Guidelines for Senders" anchor="guidelines_for_senders">
<ul>
<li><p>Monitor typing via the technique in <link url="#monitoring_message_edits">Monitoring Message Edits</link> to generate action elements, and add these action elements to a buffer. This is equivalent to recording a small sequence of typing.</p></li>
<li><p>During every <link url="#transmission_interval">Transmission Interval</link>, all buffered action elements are transmitted in &lt;rtt&gt; element of a &lt;message&gt;. This is equivalent to transmitting a small sequence of typing at a time.</p></li>
<li><p>If there are no changes to the real-time message, and no cursor movements, no unnecessary &lt;rtt&gt; transmission takes place.</p></li>
<li><p>During every <link url="#transmission_interval">Transmission Interval</link>, all buffered action elements are transmitted in &lt;rtt/&gt; element of a &lt;message/&gt;. This is equivalent to transmitting a small sequence of typing at a time.</p></li>
<li><p>If there are no changes to the real-time message, then no unnecessary &lt;rtt/&gt; transmission takes place.</p></li>
</ul>
</section3>
<section3 topic="Guidelines for Receivers" anchor="guidelines_for_receivers">
<ul>
<li><p>Upon receipt of a &lt;message&gt; containing an &lt;rtt&gt; element, the action elements in &lt;rtt&gt; should be added to a queue, in the order that they are received. This provides immunity to variable network conditions, since the buffering action smooths out the latency fluctuations of &lt;message&gt; delivery.</p></li>
<li><p>The recipient software should interpret the action elements in the playback queue in sequential order, including &lt;w&gt; elements (intervals), into a real-time message. This is equivalent to playing back the sender's original typing, including key press intervals.</p></li>
<li><p>Processing of intervals (&lt;w&gt; elements) SHOULD be done via non-blocking programming techniques.</p></li>
<li><p>Upon receiving a &lt;message&gt; containing &lt;body&gt; indicating a completed message, the full message SHOULD be displayed immediately in place of the real-time message, and unprocessed action elements cleared from the playback queue. This ensures final message delivery is not delayed by late processing of action elements.</p></li>
<li><p>If support for the &lt;w&gt; element is not possible, receiving software SHOULD use an alternate text-smoothing method, such as time-smoothed progressive output of received text.</p></li>
<li><p>Upon receipt of a &lt;message/&gt; containing an &lt;rtt/&gt; element, the action elements in &lt;rtt/&gt; are added to a queue, in the order that they are received. This provides immunity to variable network conditions, since the buffering action smooths out the latency fluctuations of &lt;message/&gt; delivery.</p></li>
<li><p>The recipient software should interpret the action elements in the playback queue in sequential order, including &lt;w/&gt; elements (intervals), into a real-time message. This is equivalent to playing back the sender's original typing, including key press intervals.</p></li>
<li><p>Processing of intervals (&lt;w/&gt; elements) SHOULD be done via non-blocking programming techniques.</p></li>
<li><p>Upon receiving a &lt;message/&gt; containing &lt;body/&gt; indicating a completed message, the full message SHOULD be displayed immediately in place of the real-time message, and unprocessed action elements cleared from the playback queue. This ensures final message delivery is not delayed by late processing of action elements.</p></li>
<li><p>If support for the &lt;w/&gt; element is not possible, receiving software SHOULD use an alternate text-smoothing method. See <link url="#reduced_precision_text_smoothing_methods">Reduced Precision Text Smoothing Methods</link> for more info.</p></li>
<li><p>If the playback queue contains too much delay in &lt;w/&gt; elements (i.e. &lt;w/&gt; elements from two &lt;rtt/&gt; transmissions ago), the recipient client MAY ignore or shorten the intervals of &lt;w/&gt; elements, to allow lagged real-time text to "catch up" more quickly.</p></li>
</ul>
</section3>
</section2>
<section2 topic="Remote Cursor" anchor="remote_cursor">
<p>Senders MAY choose to transmit changes to cursor positions. Recipient clients MAY choose to display a cursor (or caret) within incoming real-time messages. This enhances usability of real-time text further, since it becomes easier for a recipient to observe the sender's real-time message edits.</p>
<section2 topic="Optional Remote Cursor" anchor="optional_remote_cursor">
<p>Recipient clients MAY choose to display a cursor (or caret) within incoming real-time messages. This enhances usability of real-time text further, since it becomes easier for a recipient to observe the sender's real-time message edits. Recipient clients that do not support a remote cursor, can simply ignore calculating a cursor position, and skip this section.</p>
<section3 topic="Calculating Cursor Position" anchor="calculating_cursor_position">
<p>The &lt;c&gt; element (Cursor Position) MAY be used to specify an exact cursor position, as a zero-based index into the real-time message string. While between &lt;c&gt; elements, the current cursor position MAY be calculated from the last edit action element as follows:</p>
<p>All action elements always have absolute cursor positioning. When &lt;t/&gt;, &lt;e/&gt;, or &lt;d/&gt; action elements are processed in incoming real-time text, the beginning value for the cursor position calculation is the absolute position value of the <em><strong>p</strong></em> attribute, according to <link url="#rules_for_attribute_values">Rules for Attribute Values</link>. The cursor position immediately after an action element, is calculated as follows:</p>
<ul>
<li>For &lt;t&gt; element (Insert Text), the cursor position is moved to the end of the inserted text.</li>
<li>For &lt;e&gt; element (Backspace), the cursor position is moved left as text is deleted.</li>
<li>For &lt;d&gt; element (Forward Delete) and all other action elements, the cursor position is unaffected.</li>
<li>The current cursor position does not affect the behavior of subsequent edit elements.</li>
<li><p>After <link url="#element_t_insert_text">Element &lt;t/&gt; Insert Text</link>, the cursor position is the <em><strong>p</strong></em> attribute plus the length of the text being inserted. The cursor position is put at the end of inserted text.<br />
<em>This mimics normal forward cursor movement during text insertion.</em></p></li>
<li><p>After <link url="#element_e_backspace">Element &lt;e/&gt; Backspace</link>, the cursor position is the <em><strong>p</strong></em> attribute minus the <em><strong>n</strong></em> attribute. If the <em><strong>n</strong></em> value is greater than <em><strong>p</strong></em>, then cursor position becomes 0.<br />
<em>This mimics normal cursor response to a Backspace key.</em></p></li>
<li><p>After <link url="#element_d_forward_delete">Element &lt;d/&gt; Forward Delete</link>, the cursor position is the <em><strong>p</strong></em> attribute, unaffected by the <em><strong>n</strong></em> attribute.<br />
<em>This mimics normal cursor response to a Delete key.</em></p></li>
<li><p>After an empty <link url="#element_t_insert_text">Element &lt;t/&gt; Insert Text</link> (in the format of <strong>&lt;t p='#'/&gt;</strong> with no text to insert), the cursor position is the <strong>p</strong> attribute, and no text modification is done.<br /><em>This allows cursor response to arrow keys and/or mouse repositioning the cursor.</em></p></li>
</ul>
<p>The remote cursor SHOULD be clearly distinguishable from the sender's real local cursor. One example is to use a non-blinking cursor, easily emulated with a Unicode character or the vertical bar character '|'.</p>
</section3>
<section3 topic="Guidelines for Senders" anchor="_guidelines_for_senders">
<p>The sender MAY choose to transmit cursor positions via &lt;c&gt; elements, only whenever the actual cursor position is different from the calculated position according to <link url="#calculating_cursor_position">Calculating Cursor Position</link> above.</p>
<p>Monitoring the actual cursor position may need to be done via a “selection changed” event of a text box field in many programming platforms. This event typically monitors the start/end indexes of a text marking operation, and usually doubles as the event for monitoring the cursor position. In this case, the start index should be used, since the transmission of the visual appearance of text marking operations is not supported in this current specification.</p>
</section3>
<section3 topic="Guidelines for Receivers" anchor="_guidelines_for_receivers">
<p>Recipient software MAY choose to display a remote cursor within received real-time messages. The remote cursor SHOULD be clearly distinguishable from the sender's local cursor. One example is to use a non-blinking cursor, easily emulated with the vertical bar character '|'.</p>
<p>While waiting for the next &lt;c&gt; element (if any), the cursor position MAY be calculated from the last edit action element according to <link url="#calculating_cursor_position">Calculating Cursor Position</link>.</p>
<p>Whenever the cursor is moving without any text modifications (via arrow keys or mouse), the sender MAY transmit extra <link url="#element_t_insert_text">Element &lt;t/&gt; Insert Text</link> with an empty string to update the remote cursor position via attribute <em><strong>p</strong></em>. This maintains accurate positioning for the remote cursor in recipients that support a remote cursor. These extra elements are ignored by recipients that do not support a remote cursor.</p>
<p>Monitoring the actual cursor position may need to be done via a “selection changed” event of a text box field in many programming platforms. This event typically monitors text marking/selection operations, and doubles as the event for monitoring the cursor position.</p>
</section3>
</section2>
<section2 topic="Other Guidelines" anchor="other_guidelines">
<p><link name="_Hlk168620696"></link>There are other special basic considerations for real-time message transmissions that need to be considered by implementors.</p>
<section3 topic="Message Length Limit" anchor="message_length_limit">
<p>A large sequence of rapid message changes may generate a large series of action elements in an &lt;rtt&gt; element, resulting in the &lt;message&gt; exceeding the XMPP server's maximum allowed length of a &lt;message&gt; stanza. This may result in dropped messages. It is acceptable to simply retransmit the whole real-time message using &lt;rtt event='reset'&gt; if the length of the &lt;rtt&gt; element would otherwise exceed the application's maximum chat message length. The process of retransmitting the whole real-time message, has the disadvantage of discarding <link url="#key_press_intervals">Key Press Intervals</link> for one &lt;rtt&gt; element.</p>
<p>For long messages, the final &lt;rtt&gt; transmission may be made in a separate &lt;message&gt; than the &lt;message&gt; containing the &lt;body&gt;. For example:</p>
<p>A large sequence of rapid message changes may generate a large series of action elements in an &lt;rtt/&gt; element, resulting in the &lt;message/&gt; exceeding the XMPP server's maximum allowed length of a &lt;message/&gt; stanza. This may result in dropped messages. It is acceptable to simply retransmit the whole real-time message using &lt;rtt event='reset'/&gt; if the length of the &lt;rtt/&gt; element would otherwise exceed the application's maximum chat message length.</p>
<p>For long messages, the final &lt;rtt/&gt; transmission may be made in a separate &lt;message/&gt; than the &lt;message/&gt; containing the &lt;body/&gt;. For example:</p>
<p><code><![CDATA[<message to='alice@example.com' from='bob@example.com/home' type='chat' id='dda'>
<rtt xmlns='urn:xmpp:rtt:0' seq='95'>
<t>Hello World...In a Super Long Message! [etc]</t>
@ -459,20 +440,38 @@
</message>
<message to='alice@example.com' from='bob@example.com/home' type='chat' id='ddb'>
<rtt xmlns='urn:xmpp:rtt:0' seq='96' />
<body>Hello World...In a Super Long Message! [etc]</body>
</message>]]></code></p>
</section3>
<section3 topic="Performance" anchor="performance">
<p>The user interface display of the real-time text output, is usually the main performance bottleneck. Care should be taken not to do inefficient entire-screen repainting during every single key press, since fast typists may type over 10 key presses per second. This is especially important for slower platforms. To improve performance, the display of real-time messages may need to be implemented as a separate display element, rather than as a string concatenated to the current message history, so that the display can efficiently be refreshed every key press.</p>
<section3 topic="Usage With Chat States" anchor="usage_with_chat_states">
<p>Real-time text MAY be accompanied with XEP-0085 &xep0085;. These are simple guidelines for &lt;message/&gt; stanzas that include an &lt;rtt/&gt; element:</p>
<ul>
<li>For &lt;rtt/&gt; transmitted without an accompanying &lt;body/&gt;, include &lt;composing/&gt; chat state.</li>
<li>For &lt;rtt/&gt; transmitted with an accompanying &lt;body/&gt;, include the &lt;active/&gt; chat state.</li>
<li>Other chat states are handled as specified by XEP-0085 Chat States.</li>
</ul>
</section3>
<section3 topic="Battery Life" anchor="battery_life">
<p>Battery life considerations are closely related to Performance. The addition of real-time text to a mobile device, will typically significantly impact battery life due mostly to more frequent screen refreshes. The specific implementation of interval action elements (play back of key press intervals) may play a factor, and this should be programmed efficiently. Also, in cases where screen updates are the primary inefficient bottleneck on a specific mobile device, and the code cannot be sufficiently optimized, the number of repaints per second may need to be throttled in order to prolong battery life, at the slight expense of the look-and-feel of typing transmissions. Also see &xep0286;.</p>
<section3 topic="Usage With Multi-User Chat and Simultaneous Logins" anchor="usage_with_multiuser_chat_and_simultaneous_logins">
<p>The in-band nature of this real-time text standard allows one-to-many situations. Thus, real-time text is compatible with &xep0045; (MUC), as well as concurrent simultaneous logins. Support for real-time text in MUC is OPTIONAL, and is fully <link url="#backwards_compatible">Backwards Compatible</link> with group chat participants that do not support real-time text.</p>
<section4 topic="Multi-User Chat" anchor="multiuser_chat">
<p>For MUC, the <link url="#rtt_element">RTT Element</link> <link url="#event">event</link> attribute value of 'cancel' SHOULD NOT be used. This prevents one participant from suppressing real-time text for all participants in a group chat. Participants that turn off real-time text for themselves, can simply ignore incoming &lt;rtt/&gt; and not transmitting outgoing &lt;rtt/&gt;. Participant clients without real-time text (whether unsupported or turned off) will simply see group chat function normally on a line-by-line basis. Participants that enable real-time text during group chat, need to keep track of separate real-time messages on a per-participant basis, via full JID. As a result, participants with real-time text, will see real-time text coming from each participant that have real-time text enabled. Software MAY hide idle real-time messages to minimize on-screen clutter when more than one person is typing. Congestion control MAY also be used, via automatic adjustment of <link url="#transmission_interval">Transmission Interval</link>, see <link url="#congestion_considerations">Congestion Considerations</link>.</p>
</section4>
<section4 topic="Simultaneous Logins" anchor="simultaneous_logins">
<p>In simultaneous login situations, transmitting of &lt;rtt/&gt; works in one-to-many situations without any special software support. For many-to-one situations where there is incoming &lt;rtt/&gt; from more than one simultaneous login, the existing <link url="#automatic_recovery_of_realtime_text">Automatic Recovery of Real-Time Text</link> already catches this situation until there is only one typist. A good implementation of <link url="#message_retransmission">Message Retransmission</link> will improve user experience, regardless of whether or not XEP-0296 is used (&xep0296;). Alternatively, clients MAY choose to improve on this behavior, by keeping track of multiple separate real-time messages per full JID, similar to <link url="#multiuser_chat">Multi-User Chat</link>.</p>
</section4>
</section3>
<section3 topic="Performance &amp; Efficiency" anchor="performance_efficiency">
<p>With real-time text, frequent screen updates may occur. Screen updates are a potential performance bottleneck, because fast typists type many key presses per second. Optimizing screen updates becomes especially important for slower platforms. Real-time messages should be updated efficiently in a flicker-free manner. Alternatively, to improve performance, the display of real-time messages may be implemented as a separate window or separate display element.</p>
<p>Battery life considerations are closely related to performance, as the addition of real-time text may impact battery life. If <link url="#preserving_key_press_intervals">Preserving Key Press Intervals</link> are supported, then the implementation of <link url="#element_w_interval">Element &lt;w/&gt; Interval</link> should be implemented in a battery-efficient manner. The <link url="#transmission_interval">Transmission Interval</link> may vary dynamically to optimize for battery life and wireless reception. For devices where screen updates are an unavoidable inefficient bottleneck, see <link url="#reduced_precision_text_smoothing_methods">Reduced Precision Text Smoothing Methods</link> to reduce the number of screen updates per second. Also see &xep0286;.</p>
</section3>
<section3 topic="Total Conversation Combination With Audio And Video" anchor="total_conversation_combination_with_audio_and_video">
<p>According to ITU-T Rec. F.703, the “Total Conversation” accessibility standard defines the simultaneous use of audio, video, and real-time text. For convenience, chat applications may be designed to have automatic negotiation of as many as possible of the three media preferred by the users.</p>
<p>In the XMPP session environment, the Jingle protocol (&xep0166;) is available for negotiation and transport of the more time-critical, real-time audio and video media. Any combination of audio, video, and real-time text MAY be used together simultaneously.</p>
</section3>
</section2>
</section1>
<section1 topic="Use Cases" anchor="use_cases">
<p>The first examples are deliberately kept simple instead of real-world, and are designed to educate in a progressively more difficult manner. For simplicity, most examples do not include the RECOMMENDED <link url="#key_press_intervals">Key Press Intervals</link> except for the last Use Case example. For real-world communications in software implementations supporting key press intervals, most transmissions will tend to resemble the last Use Case example, <link url="#real_world_message_with_key_press_intervals">Real World Message With Key Press Intervals</link>.</p>
<p>Most of these examples are deliberately kept simple. In software implementations supporting key press intervals, transmissions will most resemble the last example, <link url="#full_message_including_key_press_intervals">Full Message Including Key Press Intervals</link>.</p>
<section2 topic="Three Backspaces" anchor="three_backspaces">
<p><code><![CDATA[<message to='bob@example.com' from='alice@example.com/home' id='a01' type='chat'>
<rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
@ -482,8 +481,8 @@
<p>Resulting real-time message: "Hello back"<br />
This code sends the misspelled "Hello bcak", then &lt;e/&gt;&lt;e/&gt;&lt;e/&gt; backspaces 3 times, then sends "ack".</p>
<p>Resulting real-time message: "<strong>Hello back</strong>"<br />
This code sends the misspelled "<strong>Hello bcak</strong>", then <strong>&lt;e/&gt;&lt;e/&gt;&lt;e/&gt;</strong> backspaces 3 times, then sends "<strong>ack</strong>".</p>
</section2>
<section2 topic="Three Backspaces In One Action Element" anchor="three_backspaces_in_one_action_element">
<p><code><![CDATA[<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
@ -494,8 +493,8 @@
<p>Resulting real-time message: "Hello back"<br />
This code is the same as the previous example, demonstrating that &lt;e n='3'/&gt; does the same thing as &lt;e/&gt;&lt;e/&gt;&lt;e/&gt;.</p>
<p>Resulting real-time message: "<strong>Hello back</strong>"<br />
This code is the same as the previous example, demonstrating that <strong>&lt;e n='3'/&gt;</strong> does the same thing as <strong>&lt;e/&gt;&lt;e/&gt;&lt;e/&gt;</strong>.</p>
</section2>
<section2 topic="Message Edits Split Into Multiple Transmissions" anchor="message_edits_split_into_multiple_transmissions">
<p><code><![CDATA[<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
@ -524,7 +523,7 @@
<p>Resulting real-time message: "Hello back"<br />
<p>Resulting real-time message: "<strong>Hello back</strong>"<br />
This code results in the same final text as the previous two examples, segmented into four separate messages.</p>
</section2>
<section2 topic="Deleting Text From Message" anchor="deleting_text_from_message">
@ -536,9 +535,9 @@
<p>Resulting real-time message: "Hello, this is Alice!"<br />
This code outputs "Hello Bob, this is Alice!" then &lt;d n='4' p='5'/&gt; deletes 4 characters from position 5.<br />
(This erases the text " Bob" including the preceding space character).</p>
<p>Resulting real-time message: "<strong>Hello, this is Alice!</strong>"<br />
This code outputs "<strong>Hello Bob, this is Alice!</strong>" then <strong>&lt;d n='4' p='5'/&gt;</strong> deletes 4 characters from position 5.<br />
(This erases the text " <strong>Bob</strong>" including the preceding space character).</p>
</section2>
<section2 topic="Inserting Text Into Message" anchor="inserting_text_into_message">
<p><code><![CDATA[<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
@ -549,8 +548,8 @@
<p>Resulting real-time message: "Hello Bob, this is Alice!"<br />
This is because the code outputs "Hello, this is Alice!" then the &lt;t p='5'&gt; inserts the specified text " Bob" at position 5.</p>
<p>Resulting real-time message: "<strong>Hello Bob, this is Alice!</strong>"<br />
This is because the code outputs "<strong>Hello, this is Alice!</strong>" then the <strong>&lt;t p='5'&gt;</strong> inserts the specified text " <strong>Bob</strong>" at position 5.</p>
</section2>
<section2 topic="Deleting And Replacing Text In Message" anchor="deleting_and_replacing_text_in_message">
<p><code><![CDATA[<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
@ -563,11 +562,11 @@
<p>Resulting real-time message: "Hello Bob, this is Alice!"<br />
This code outputs "Hello Bob, tihsd is Alice!", then &lt;d p='11' n='5'/&gt; deletes 5 characters at position 11 in the string of text. (erases the mistyped word "tihsd"). Finally, &lt;t p='11'&gt;this&lt;/t&gt; inserts the text "this" place of the original misspelled word.</p>
<p>Resulting real-time message: "<strong>Hello Bob, this is Alice!</strong>"<br />
This code outputs "<strong>Hello Bob, tihsd is Alice!</strong>", then <strong>&lt;d p='11' n='5'/&gt;</strong> deletes 5 characters at position 11 in the string of text. (erases the mistyped word "<strong>tihsd</strong>"). Finally, <strong>&lt;t p='11'&gt;this&lt;/t&gt;</strong> inserts the text "<strong>this</strong>" place of the original misspelled word.</p>
</section2>
<section2 topic="Multiple Message Edits" anchor="multiple_message_edits">
<p>This is an example message containing multiple consecutive edit actions.</p>
<p>This is an example message containing multiple consecutive real-time message edits.</p>
<p><code><![CDATA[<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
<rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
<t>Helo</t>
@ -577,13 +576,12 @@
<t> World</t>
<d n='3' p='5'/>
<t p='5'> there,</t>
<c p='18'/>
</rtt>
</message>]]></code></p>
<p>Resulting real-time message: "Hello there, World", completed in the following series of steps:</p>
<p>Resulting real-time message: "<strong>Hello there, World</strong>", completed in the following series of steps:</p>
<table>
<tr>
<th>Element</th>
@ -605,7 +603,7 @@
</tr>
<tr>
<td>&lt;t&gt;lo...planet&lt;/t&gt;</td>
<td>Output&nbsp;"lo...planet"</td>
<td>Output&nbsp;"lo...planet"&nbsp;at&nbsp;end&nbsp;of&nbsp;line.</td>
<td>Hello...planet</td>
<td>14</td>
</tr>
@ -617,7 +615,7 @@
</tr>
<tr>
<td>&lt;t&gt;&nbsp;World&lt;/t&gt;</td>
<td>Output&nbsp;"&nbsp;World"</td>
<td>Output&nbsp;"&nbsp;World"&nbsp;at&nbsp;end&nbsp;of&nbsp;line.</td>
<td>Hello...&nbsp;World</td>
<td>14</td>
</tr>
@ -633,14 +631,8 @@
<td>Hello&nbsp;there,&nbsp;World</td>
<td>12</td>
</tr>
<tr>
<td>&lt;c&nbsp;p='18'/&gt;</td>
<td>Move&nbsp;cursor&nbsp;to&nbsp;end</td>
<td>Hello&nbsp;there,&nbsp;World</td>
<td>18</td>
</tr>
</table>
<p>Normally, the action elements are split into multiple separate transmissions, with <link url="#key_press_intervals">Key Press Intervals</link> added.</p>
<p>Normally, the action elements are split into multiple separate transmissions. This example also does not illustrate <link url="#preserving_key_press_intervals">Preserving Key Press Intervals</link>. The Cursor Pos column is only relevant if the <link url="#optional_remote_cursor">Optional Remote Cursor</link> is implemented.</p>
</section2>
<section2 topic="Three Consecutive Messages" anchor="three_consecutive_messages">
<p>Representing a short chat session of three separate messages:<br />
@ -700,23 +692,23 @@
<li>The <strong>seq</strong> attribute always increments.</li>
</ul>
</section2>
<section2 topic="Real World Message With Key Press Intervals" anchor="real_world_message_with_key_press_intervals">
<p>This is the most important example. It is a transmission of “Hello there!” with <link url="#key_press_intervals">Key Press Intervals</link>. It illustrates a four-second typing sequence:</p>
<section2 topic="Full Message Including Key Press Intervals" anchor="full_message_including_key_press_intervals">
<p>This example is a transmission of “Hello there!” while <link url="#preserving_key_press_intervals">Preserving Key Press Intervals</link>. It illustrates a four-second typing sequence:</p>
<ul>
<li>The misspelled phrase “Hello tehre!” is typed;</li>
<li>Three cursor-left movements back to the mis-typed letters;</li>
<li>Two backspaces to delete the two mis-typed letters;</li>
<li>Three cursor-left movements back to the typing mistake;</li>
<li>Two backspaces to delete the typing mistake;</li>
<li>Two correct key presses to correctly spell the word “there”.</li>
</ul>
<p>In between each key press, is <link url="#element_w_interval">Element &lt;w&gt; Interval</link> to allow the receiving client execute a small pause between action elements, which allows the play back of the typing at its original look-and-feel.</p>
<p>In between each key press, is <link url="#element_w_interval">Element &lt;w/&gt; Interval</link> to allow the receiving client execute a small pause between action elements, which allows the playback of the typing at its original look-and-feel.</p>
<p><code><![CDATA[<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
<rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
<t>H</t>
<w n='215'/><t>e</t>
<w n='115'/><t>e</t>
<w n='154'/><t>l</t>
<w n='251'/><t>l</t>
<w n='151'/><t>l</t>
<w n='115'/><t>o</t>
<w n='265'/>
<w n='165'/>
</rtt>
</message>
@ -724,94 +716,79 @@
<rtt xmlns='urn:xmpp:rtt:0' seq='1'>
<w n='40'/><t> </t>
<w n='161'/><t>t</t>
<w n='237'/><t>e</t>
<w n='137'/><t>e</t>
<w n='135'/><t>h</t>
<w n='234'/><t>r</t>
<w n='193'/>
<w n='134'/><t>r</t>
<w n='93'/>
</rtt>
</message>
<message to='bob@example.com' from='alice@example.com/home' type='chat' id='c03'>
<rtt xmlns='urn:xmpp:rtt:0' seq='2'>
<w n='109'/><t>e</t>
<w n='215'/><t>!</t>
<w n='530'/><c p='11'/>
<w n='108'/><c p='10'/>
<w n='115'/><t>!</t>
<w n='330'/><t p='11'/>
<w n='108'/><t p='10'/>
<w n='38'/>
</rtt>
</message>
<message to='bob@example.com' from='alice@example.com/home' type='chat' id='d04'>
<rtt xmlns='urn:xmpp:rtt:0' seq='3'>
<w n='109'/><c p='9'>
<w n='161'/><e p='9'/>
<w n='150'/><e p='8'/>
<w n='144'/><t>h</t>
<w n='209'/><t>e</t>
<w n='227'/>
<w n='109'/><t p='9'/>
<w n='111'/><e p='9'/>
<w n='106'/><e p='8'/>
<w n='138'/><t p='7'>h</t>
<w n='209'/><t p='8'>e</t>
<w n='27'/>
</rtt>
</message>
<message to='bob@example.com' from='alice@example.com/home' type='chat' id='d04'>
<rtt xmlns='urn:xmpp:rtt:0' seq='4'>
<w n='445'/><c p='12'>
<w n='445'/><t p='12'>
</rtt>
<body>Hello there!</body>
</message>]]></code></p>
<p>This real-world example also illustrate the following:</p>
<p>This example also illustrate the following:</p>
<ul>
<li>Typing is done via <link url="#element_t_insert_text">Element &lt;t&gt; Insert Text</link>.</li>
<li>Cursor movements are done via <link url="#element_c_cursor_position">Element &lt;c&gt; Cursor Position</link>.</li>
<li>Backspaces are done via <link url="#element_e_backspace">Element &lt;e&gt; Backspace</link>.</li>
<li>Intervals between key presses are done via <link url="#element_w_interval">Element &lt;w&gt; Interval</link>.</li>
<li>Each &lt;message&gt; is delivered every one second, the default <link url="#transmission_interval">Transmission Interval</link>.</li>
<li>To achieve the smoothest playback of typing, the total sum of all values in &lt;w&gt; elements in one &lt;message&gt; equal the <link url="#transmission_interval">Transmission Interval</link> during periods of continuous typing.</li>
<li>Some &lt;w&gt; interval elements are split between consecutive messages, since key presses are not timed with transmission intervals.</li>
<li>The &lt;w&gt; interval elements also apply to cursor movements of the <link url="#remote_cursor">Remote Cursor</link>.</li>
<li>There is a final transmission with a <link url="#body_element">Body Element</link>, transmitted immediately when the message is finished.</li>
<li>These action elements are generated automatically via <link url="#monitoring_message_edits">Monitoring Message Edits</link>.</li>
<li>Typing is done via the REQUIRED <link url="#element_t_insert_text">Element &lt;t/&gt; Insert Text</link>.</li>
<li>Backspaces are done via the REQUIRED <link url="#element_e_backspace">Element &lt;e/&gt; Backspace</link>.</li>
<li>There is a final transmission with a <link url="#body_element">Body Element</link>, when the message is finished.</li>
<li>Intervals between key presses are done via the RECOMMENDED <link url="#element_w_interval">Element &lt;w/&gt; Interval</link>.</li>
<li>Each &lt;message/&gt; is delivered every 0.7 seconds, the default RECOMMENDED <link url="#transmission_interval">Transmission Interval</link>.</li>
<li>These action elements are generated automatically via <link url="#monitoring_message_edits">Monitoring Message Edits</link> RECOMMENDED method.</li>
<li>Cursor movements are done via empty &lt;t/&gt; elements, for an <link url="#optional_remote_cursor">Optional Remote Cursor</link>.</li>
<li>In order to maximize precision, and achieve the smoothest playback of typing, the total sum of all values in <link url="#element_w_interval">Element &lt;w/&gt; Interval</link> in one &lt;message/&gt; equal the <link url="#transmission_interval">Transmission Interval</link> during periods of continuous typing. This also results in some &lt;w/&gt; interval elements being split between consecutive messages.</li>
</ul>
</section2>
</section1>
<section1 topic="Interoperability Considerations" anchor="interoperability_considerations">
<p>There are other real-time text formats with interoperability considerations relating to the session setup level, the media transport level, and presentation level. For each environment where interoperability is supported, an interoperability specification should be documented that covers addressing, session control, media negotiation and media transcoding.</p>
<section2 topic="RFC 4103 and T.140" anchor="rfc_4103_and_t140">
<p>One environment for such interoperability considerations is SIP with real-time text (also called Text over IP, or ToIP) as specified in ITU-T T.140 and IETF RFC 4103. One reason for its importance is that this protocol combination is specified by IETF and by regional emergency service organizations to be the protocols supported for IP based real-time emergency calls that support real-time text. Another reason is that SIP is the currently dominating peering protocol between services, and many implementations of real-time text in SIP exist.</p>
<p>Interoperability implies addressing translation, media negotiation and translation, and media transcoding. For media transcoding between this specification and T.140/RFC 4103, the real-time text transcoding is straight forward, except the editing feature of this specification. Backwards positioning and insertion or deletion far back in the message can cause a large number of erase operations in T.140, that takes time and bandwidth to convey.</p>
<p>It should be noted that T.140 specifies use of ISO 6429 control codes for presentation characteristics such as text color etc, that are not possible to represent in plain text according to this specification. All control codes from both sides that cannot be presented on the other side of the conversion, must be filtered off in order to not disturb the presentation of text.</p>
<p>Note that a future version of this specification may support real-time transmission of XHTML-IM formatting, in order to support transcoding of ISO 6429 formatting codes.</p>
<section2 topic="Other Real-Time Text Standards" anchor="other_realtime_text_standards">
<p>It is noted there is also another real-time text standard (RFC 4103, <span class="ref"><strong><link url="http://tools.ietf.org/html/rfc5194">IETF RFC 5194</link></strong></span> <note>IETF RFC 5194: Framework for Real-Time Text over IP Using the Session Initiation Protocol (SIP). &lt;<link url="http://tools.ietf.org/html/rfc5194">http://tools.ietf.org/html/rfc5194</link>&gt;.</note>), used for SIP sessions with real-time text. In the situation where an implementor needs to decide which real-time text standard to use, it is generally recommended to use the real-time text specification of the specific session control standard in use for that particular session. This varies from implementation to implementation. For example, Google Talk network uses XMPP messaging for instant messages sent during audio/video conversations. Therefore, in this situation, it is recommended to use this XEP-0301 specification to add real-time text functionality. However, there are other situations where it is necessary to support multiple real-time-text standards, and to interoperate between the multiple real-time text standards.</p>
</section2>
<section2 topic="RFC 4103 and T.140" anchor="rfc_4103_and_t140">
<p>One environment for such interoperability considerations is SIP with real-time text (also called Text over IP, or ToIP) as specified in ITU-T T.140 and IETF RFC 4103. This protocol combination is specified by IETF, and by regional emergency service organizations, to be one of the protocols supported for IP based real-time emergency calls that support real-time text. Another reason is that SIP is the currently dominating peering protocol between services, and many implementations of real-time text in SIP exist.</p>
<p>Interoperability implies addressing translation, media negotiation and translation, and media transcoding. For media transcoding between this specification and T.140/RFC 4103, the real-time text transcoding is straight forward, except the editing feature of this specification. Backwards positioning and insertion or deletion far back in the message can cause a large number of erase operations in T.140, that takes time and bandwidth to convey.</p>
<p>It should be noted that T.140 specifies use of ISO 6429 control codes for presentation characteristics such as text color etc, that are not covered in this version of this specification. All control codes from both sides that cannot be presented on the other side of the conversion, must be filtered off in order to not disturb the presentation of text.</p>
<p>Also, see <link url="#total_conversation_combination_with_audio_and_video">Total Conversation Combination With Audio And Video</link>.</p>
</section2>
<section2 topic="Combination With Other Real-Time Media" anchor="combination_with_other_realtime_media">
<p>In some cases, it may be beneficial in a real-time conversation situation to have simultaneous availability of multiple real-time media.</p>
<p>In the XMPP session environment, the Jingle protocol (&xep0166;) is available for negotiation and transport of the more time-critical, real-time audio and video media. For clients that already support audio and/or video, it is RECOMMENDED to continue providing real-time text according to this specification, regardless of whether audio and/or video is negotiated.</p>
<p>It is noted there is also another real-time text standard (RFC 4103, RFC 5194), used for SIP sessions with real-time text. In the situation where an implementor needs to decide which real-time text standard to use, it is generally recommended to use the real-time text specification of the specific session control standard in use for that particular session. This varies from implementation to implementation. For example, Google Talk network uses XMPP messaging for instant messages sent during audio/video conversations. Therefore, in this situation, it is recommended to use this XMPP extension document to add real-time text functionality. However, there are other situations where it is necessary to support multiple real-time-text standards, and to interoperate between the multiple real-time text standards. For more information, see the next section.</p>
<section3 topic="Total Conversation" anchor="total_conversation">
<p>According to <span class="ref"><strong><link url="http://www.itu.int/rec/T-REC-F.703">ITU-T F.703</link></strong></span> <note>ITU-T F.703: Multimedia conversational services. &lt;<link url="http://www.itu.int/rec/T-REC-F.703">http://www.itu.int/rec/T-REC-F.703</link>&gt;.</note>, "Total Conversation" defines the simultaneous use of audio, video, and real-time text. For convenience, some chat applications may be designed to have automatic negotiation of as many as possible of the three media preferred by the users.</p>
</section3>
</section2>
</section1>
<section1 topic="Internationalization Considerations" anchor="internationalization_considerations">
<p>There are special internationalization considerations involving real-time editing of international text, due to the character positioning and length values used by <link url="#action_elements">Action Elements</link>, in the form of <em><strong>p</strong></em> and <em><strong>n</strong></em> attributes. Different programming platforms use different internal Unicode encodings, which may be different from the transmission encoding (UTF-8) for XMPP. To achieve universally correct calculations for <em><strong>p</strong></em> and <em><strong>n</strong></em> attributes, consider these factors:</p>
<ul>
<li><p>Multiple Unicode code points may represent one displayable Unicode character (i.e. combining marks).<br />
<em>Action elements operate on Unicode code points</em><em>, not on displayable characters.</em></p></li>
<li><p>Characters U+10000 through U+1FFFF, which are single code points, but are represented as multiple surrogate code units in certain Unicode encodings (i.e. UTF-16).<br />
<em>Action elements operate on</em> <em>Unicode</em> <em>code points, not on individual surrogate code units.</em></p></li>
<li><p>Some Unicode encodings use a variable number of bytes per Unicode code point (i.e. UTF-8).<br /><em>Action elements operate on Unicode code points, not on individual bytes.</em></p></li>
</ul>
<p>Failure to correctly calculate <em><strong>p</strong></em> and <em><strong>n</strong></em> values by counting individual Unicode code points, will result in interoperability problems in the form of scrambled text during real-time editing. In some cases, this problem do not become visible until a chat session occurs in a different international language for the first time. It is critical to follow <link url="#rules_for_attribute_values">Rules for Attribute Values</link> in order to maintain world-wide interoperability of international text.</p>
<p>The main internationalization consideration involve real-time message editing of international and mixed-language text. Correct calculations for <link url="#action_elements">Action Elements</link> based on <link url="#unicode_character_counting">Unicode Character Counting</link> is necessary to avoid scrambled text for many languages.</p>
</section1>
<section1 topic="Security Considerations" anchor="security_considerations">
<section2 topic="Privacy" anchor="privacy">
<p>It is important for implementors of real-time text to educate users about real-time text. Users of real-time text should be aware that their typing in the local input buffer is now visible to everyone in the current chat conversation. This may have security implications if users copy &amp; paste private information into their chat entry buffer (i.e. a shopping invoice) before editing out the private parts of the pasted text (i.e. a credit card number) before they send the message. With real-time editing, recipients can watch all text changes that occur in the sender's text, before the sender sends the final message.</p>
<p>It is important for implementors of real-time text to educate users about real-time text. Users of real-time text should be aware that their typing in the local input buffer is now visible to everyone in the current chat conversation. This may have security implications if users copy &amp; paste private information into their chat entry buffer (i.e. a shopping invoice) before editing out the private parts of the pasted text (i.e. a credit card number) before they send the message. With real-time message editing, recipients can watch all text changes that occur in the sender's text, before the sender sends the final message.</p>
</section2>
<section2 topic="Congestion Considerations" anchor="congestion_considerations">
<p>The nature of real-time text result in more frequent transmission of &lt;message&gt; elements than may otherwise happen in a non-real-time text conversation. This may lead to increased network and server loading of XMPP networks. Care SHOULD to be taken to use a reasonable <link url="#transmission_interval">Transmission Interval</link>, and avoid transmitting messages at an excessive rate, to avoid creating unnecessary congestion on public XMPP networks. Also, see &xep0205;.</p>
<p>Network monitoring mechanisms (i.e. &xep0184; and/or &xep0199;, etc.) MAY be used to monitor reliability and latency, in order to temporarily adjust the interval to prevent failure of real-time text transmissions during extreme network conditions.</p>
<p>That said, the load between participants using this specification in the recommended way, will cause a load that is only marginally higher than a user communicating without this specification. This is very low compared to many other activities possible on XMPP networks including VoIP and file transfers.</p>
<p>The nature of real-time text result in more frequent transmission of &lt;message/&gt; elements than may otherwise happen in a non-real-time text conversation. This may lead to increased network and server loading of XMPP networks. Care SHOULD to be taken to use a reasonable <link url="#transmission_interval">Transmission Interval</link>, and avoid transmitting messages at an excessive rate, to avoid creating unnecessary congestion on public XMPP networks. Also, see &xep0205;.</p>
<p>Network monitoring mechanisms (i.e. &xep0184; and/or &xep0199;, etc.) MAY be used to monitor reliability and latency, in order to temporarily adjust the interval to prevent failure of real-time text transmissions during extreme network conditions. This is also useful for mission-critical applications such as Next Generation 9-1-1 emergency services.</p>
<p>The load between participants using this specification in the recommended way, will cause a load that is only marginally higher than a user communicating without this specification. Bandwidth overhead of real-time text is very low compared to many other activities possible on XMPP networks including VoIP and file transfers.</p>
</section2>
</section1>
<section1 topic="IANA Considerations" anchor="iana_considerations">
@ -837,7 +814,7 @@
<xs:annotation>
<xs:documentation>
The protocol documented by this schema is defined in
XEP-0xxx: http://www.xmpp.org/extensions/xep-0xxx.html
XEP-0301: http://www.xmpp.org/extensions/xep-0301.html
</xs:documentation>
</xs:annotation>
@ -849,9 +826,7 @@
<xs:element ref='t' minOccurs='0' maxOccurs='unbounded'/>
<xs:element ref='e' minOccurs='0' maxOccurs='unbounded'/>
<xs:element ref='d' minOccurs='0' maxOccurs='unbounded'/>
<xs:element ref='c' minOccurs='0' maxOccurs='unbounded'/>
<xs:element ref='w' minOccurs='0' maxOccurs='unbounded'/>
<xs:element ref='g' minOccurs='0' maxOccurs='unbounded'/>
</xs:sequence>
</xs:complexType>
</xs:element>
@ -876,20 +851,12 @@
</xs:complexType>
</xs:element>
<xs:element name='c' type='empty'>
<xs:complexType>
<xs:attribute name='p' type='xs:unsignedInteger' use='required'/>
</xs:complexType>
</xs:element>
<xs:element name='w' type='empty'>
<xs:complexType>
<xs:attribute name='n' type='xs:unsignedInteger' use='required'/>
</xs:complexType>
</xs:element>
<xs:element name='g' type='empty'/>
<xs:simpleType name='empty'>
<xs:restriction base='xs:string'>
<xs:enumeration value=''/>
@ -898,8 +865,8 @@
</xs:schema>]]></code></p>
</section1>
<h1>Acknowledgements</h1>
<p>The author would like to thank Real-Time Text Taskforce (R3TF) at <link class="western" href="http://www.realtimetext.org/">www.realtimetext.org</link> for their contribution to the technology documented in this specification. Members of R3TF who have contributed to this document, including corrections and edits, include Gunnar Helstrom, Barry Dingle, Paul E. Jones, Anoud van Wijk, and Gregg Vanderheiden.</p>
<h1>Acknowledgments</h1>
<p>The author would like to thank Real-Time Text Taskforce (R3TF) at <link class="western" href="http://www.realtimetext.org/">www.realtimetext.org</link> for their contribution to the technology documented in this specification. Members of R3TF who have contributed to this document, including corrections and edits, include Gunnar Helstrom, Barry Dingle, Paul E. Jones, Arnoud van Wijk, and Gregg Vanderheiden.</p>
<p>“Natural Typing”, the technique of preserving key press intervals, is acknowledged as an invention by Mark Rejhon, who is deaf. This technology is provided to XMPP.org as part of this specification in compliance of the XSF's Intellectual Property Rights Policy at <link class="western" href="http://xmpp.org/extensions/ipr-policy.shtml">http://xmpp.org/extensions/ipr-policy.shtml</link>. For more information, see Appendix C: Legal Notices.</p>