<?xml version='1.0' encoding='UTF-8'?> <!DOCTYPE xep SYSTEM 'xep.dtd' [ <!ENTITY % ents SYSTEM 'xep.ent'> %ents; ]> <?xml-stylesheet type='text/xsl' href='xep.xsl'?> <xep> <header> <title>Language Translation</title> <abstract>This document defines an XMPP protocol extension for providing language translation facilities over XMPP. It supports human, machine, client-based, and server-based translations.</abstract> &LEGALNOTICE; <number>0171</number> <status>Deferred</status> <type>Standards Track</type> <sig>Standards</sig> <approver>Council</approver> <dependencies> <spec>XMPP Core</spec> <spec>XMPP IM</spec> <spec>XEP-0030</spec> </dependencies> <supersedes/> <supersededby/> <shortname>langtrans</shortname> <author> <firstname>Boyd</firstname> <surname>Fletcher</surname> <email>boyd@spawar.navy.mil</email> </author> <author> <firstname>Daniel</firstname> <surname>LaPrade</surname> <email>daniel.laprade@je.jfcom.mil</email> </author> <author> <firstname>Keith</firstname> <surname>Lirette</surname> <email>keith.lirette@je.jfcom.mil</email> </author> <author> <firstname>Brian</firstname> <surname>Raymond</surname> <email>brian.raymond@je.jfcom.mil</email> </author> <revision> <version>0.2</version> <date>2006-04-24</date> <initials>psa</initials> <remark>Added text about use of Thread IDs.</remark> </revision> <revision> <version>0.1</version> <date>2006-01-24</date> <initials>psa</initials> <remark>Initial version.</remark> </revision> <revision> <version>0.0.4</version> <date>2006-01-17</date> <initials>psa</initials> <remark>Converted to XML format, cleaned up text, modified examples, changed pivotable and reviewed attributes to xs:boolean, corrected schema.</remark> </revision> <revision> <version>0.0.3</version> <date>2006-01-16</date> <initials>bf</initials> <remark>Changed xml:lang to destination, derived to derived_from; added service discovery identity.</remark> </revision> <revision> <version>0.0.2</version> <date>2005-12-28</date> <initials>bf</initials> <remark>Miscellaneous edits.</remark> </revision> <revision> <version>0.0.1</version> <date>2005-12-21</date> <initials>bf</initials> <remark>First draft.</remark> </revision> </header> <section1 topic='Introduction' anchor='intro'> <p>There currently exists no standard for describing language translations over a text chat protocol. While numerous products and services exist to provide translation of text, there exists no standardized protocol extension for requesting a translation and expressing the details of the translation XMPP (see &rfc3920;). This document describes how to express a translation and its components in an XMPP message as well as a method to request translation.</p> <p>Direct translation can be realized by either client-side translation before sending or transparent components translating messages on the fly. Discovering XMPP entities capable of translation allows for clients to request translation from them based on their capabilities. The remote XMPP entity could be either an automated translation service or a human providing translation.</p> </section1> <section1 topic='Glossary' anchor='glossary'> <table caption='Glossary'> <tr><th>Term</th><th>Definition</th></tr> <tr><td>Original Text</td><td>This is the message text that was originally created by the sender. This is the text that is translated.</td></tr> <tr><td>Translated Text</td><td>This is the message text that has been translated by the language translation engines. This also called the destination text. For any given message there can be multiple destination text message bodies.</td></tr> <tr><td>Pivot Language</td><td>Pivoting is the process of using one or more intermediate languages to translate from a given source language to a specific destination language. For example, if you needed to translate from English to Russian but only had translators that went from English to French and French to Russian then you could use French as a pivot language.</td></tr> <tr><td>Pivot Text</td><td>This is the translated text of the original message in a pivot language. For any given destination language, there can be zero or more pivot text bodies. The ordering of pivoting is required to be specified for the destination language.</td></tr> <tr><td>Language Translation Engine</td><td>Since not all language translation engines are the same quality it is important to some classes of users that they know what translation engine was used. It is equally important to also be able to select a specific translation engine for a given language pairing if more than one engine is available.</td></tr> <tr><td>Language Translation Character Set</td><td>Some language translation engines can only translate text between languages if certain character sets (or code pages) are used.</td></tr> <tr><td>Language Translation Dictionary</td><td> In order to enhance the accuracy of translation engines most support the concept of mission specific dictionaries.</td></tr> </table> </section1> <section1 topic='Requirements' anchor='reqs'> <p>The protocol defined herein addresses the following requirements:</p> <ol> <li><p>Enable an XMPP entity to request a translation from a remote XMPP entity.</p></li> <li> <p>Enable an XMPP entity to express the following mandatory elements of a translation for any receiving entities.</p> <ol style='list-style-type: lower-alpha'> <li>Identification of Original Text</li> <li>Identification of Translated Text</li> <li>Identification of any Pivot Language(s) and Text</li> <li>Identification of the method and order of translation</li> </ol> </li> <li> <p>Enable an XMPP entity to express the following optional elements of a translation for any receiving entities.</p> <ol style='list-style-type: lower-alpha'> <li>Identification of Language Translation Engines used.</li> <li>Identification of Location of the translation.</li> <li>Identification of Sender and Destination language of choice.</li> <li>Discovery of Server support for translation including which language pair, dictionaries, and engines are available.</li> <li>Identification of desire to text to be translated or not to be translated.</li> </ol> </li> </ol> <p>The following methods of translation are supported:</p> <ol> <li>Manual or Human</li> <li>Machine or Automated</li> <li>Machine with Human Review</li> </ol> </section1> <section1 topic='Use Cases' anchor='usecases'> <p>The following use cases simple scenarios for expression translation as well as requesting them from remote entities.</p> <section2 topic='Message Delivery' anchor='delivery'> <p>A message directly translated by the originating XMPP entity or a transparent XMPP entity delivered to a remote entity with only the required elements of source and destination language; this is the simplest case for a translation from one language to another. The source language is known because there is no <translation/> tag describing it. Three translation methods are supported by doing the following:</p> <ol> <li>If no 'engine' attribute is present, then manual (or human) translation was performed.</li> <li>If an 'engine' attribute is present then machine (or automated) translation was performed, where the translation engine is identified by the value of the 'engine' attribute. If the 'engine' attribute is present its value is an empty string, then the name of the translation engine was not available.</li> <li>If the 'engine' attribute and the 'reviewed' attribute are present, then machine translation was performed but the message text was reviewed and possibly modified by a human.</li> </ol> <section3 topic='Direct Translation' anchor='message-direct'> <example caption='Entity sends a message translated from English to French'><![CDATA[ <message from='bard@shakespeare.lit/globe' to='playwright@marlowe.lit/theatre'> <subject xml:lang='en'>Hello</subject> <subject xml:lang='fr'>Bonjour</subject> <body xml:lang='en'>How are you?</body> <body xml:lang='fr'>comment allez-vous?</body> <x xmlns='http://jabber.org/protocol/langtrans'> <translation destination='fr' derived_from='en'/> </x> </message> ]]></example> </section3> <section3 topic='Translation With Pivot' anchor='message-pivot'> <p>A message translated by the originating XMPP entity or a transparent XMPP entity delivered to a remote entity with the pivot languages used to accomplish the translation. The source language is known because there is no <x/> translation tag describing it. When a translation is done via a pivot language, the pivot languages and their order of use MUST be specified.</p> <example caption='Entity sends a message translated from French to Russian via English using human translators'><![CDATA[ <message from='bard@shakespeare.lit/globe' to='playwright@marlowe.lit/theatre'> <subject xml:lang='fr'>Bonjour</subject> <subject xml:lang='en'>Hello</subject> <subject xml:lang='ru'>x443;лте</subject> <body xml:lang='fr'>comment allez-vous?</body> <body xml:lang='en'>How are you?</body> <body xml:lang='ru'>Как вы?</body> <x xmlns='http://jabber.org/protocol/langtrans'> <translation destination='en' derived_from='fr'/> <translation destination='ru' derived_from='en'/> </x> </message> ]]></example> </section3> <section3 topic='Translation With Pivot Specifying Details' anchor='message-pivot-details'> <p>A message translated by the originating XMPP entity or a transparent XMPP entity delivered to a remote entity using pivot languages and machine translation. The source language is known because there is no &X; translation tag describing it.</p> <example caption='Entity sends a message translated from French to Russian via English using a machine translation engine.'><![CDATA[ <message from='bard@shakespeare.lit/globe' to='playwright@marlowe.lit/theatre'> <subject xml:lang='fr'>Bonjour</subject> <subject xml:lang='en'>Hello</subject> <subject xml:lang='ru'>x443;лте</subject> <body xml:lang='fr'>comment allez-vous?</body> <body xml:lang='en'>How are you?</body> <body xml:lang='ru'>Как вы?</body> <x xmlns='http://jabber.org/protocol/langtrans'> <translation destination='en' derived_from='fr' engine='SYSTRANS'/> <translation destination='ru' derived_from='en' engine='SYSTRANS'/> </x> </message> ]]></example> </section3> </section2> <section2 topic='Discovering Translation Providers' anchor='disco'> <section3 topic='Discovering Translation Providers On a Server' anchor='disco-items'> <p>When connected to a server, a XMPP entity can locate translation providers by asking a server which translation providers are attached to the server; this MUST be done using &xep0030;. The server SHOULD return the availability of of translation providers and language pairings for which the user has rights to use.</p> <example caption='Entity sends discovery request to server'><![CDATA[ <iq type='get' id='disco1' to='shakespeare.lit'> <query xmlns='http://jabber.org/protocol/disco#items'/> </iq> ]]></example> <example caption='Server returns items, including translation providers'><![CDATA[ <iq type='result' id='disco1' from='shakespeare.lit' to='bard@shakespeare.lit/globe'> <query xmlns='http://jabber.org/protocol/disco#items'> ... <item jid='towerofbabel@shakespeare.lit' name='Tower of Babel Translation Bot'/> <item jid='translation.shakespeare.lit' name='Translation Provider Service'/> ... </query> </iq> ]]></example> </section3> <section3 topic='Discovering Identity of Providers' anchor='disco-identity'> <p>Service Discovery is used to determine if a JID provides translation services. The JID can also be a bot (e.g., <towerofbabel@shakespeare.lit>) or a server component (e.g., <translation.shakespeare.lit>).</p> <example caption='Entity queries service regarding identity'><![CDATA[ <iq type='get' to='translation.shakespeare.lit' from='bard@shakespeare.lit/globe'> <query xmlns='http://jabber.org/protocol/disco#info'/> </iq> ]]></example> <example caption='Service reports identity'><![CDATA[ <iq type='result' to='bard@shakespeare.lit/globe' from='translation.shakespeare.lit'> <query xmlns='http://jabber.org/protocol/disco#info'> ... <identity category='automation' type='translation'/> <feature var='http://jabber.org/protocol/langtrans'/> ... </query> </iq> ]]></example> </section3> <section3 topic='Discovering Language Support' anchor='disco-lang'> <p>The supported languages and other details for the service must be known to use it. It is permissible for a translation service to provide multiple translation engines for the same language pairing -- if this is done, then a separate <item/> tag MUST be used for each pairing. A 'dictionary' attribute MAY be used to specify the dictionary for a specific <item/>. In order to specify more than one dictionary for a given language pairing then a separate <item/> tag MUST be used for each dictionary specification for that language pairing.</p> <example caption='Entity queries service for further information'><![CDATA[ <iq type='get' to='translation.shakespeare.lit' from='bard@shakespeare.lit/globe'> <query xmlns='http://jabber.org/protocol/langtrans#items'/> </iq> ]]></example> <example caption='Service replies with language details'><![CDATA[ <iq type='result' to='bard@shakespeare.lit/globe' from=' translation.shakespeare.lit'> <query xmlns=' http://jabber.org/protocol/langtrans#items'> <item src_lang='en' jid='translation.shakespeare.lit' dst_lang='fr' engine='SYSTRANS 2005 Release 2' pivotable='true'/> <item src_lang='en' jid='translation.shakespeare.lit' dst_lang='ko' engine='SYSTRANS 2005 Release 2' pivotable='true'/> <item src_lang='en' jid='translation.shakespeare.lit' dst_lang='ru' engine='SYSTRANS 2005 Release 2' pivotable='true'/> <item src_lang='en' jid='translation.shakespeare.lit' dst_lang='ru' engine='SYSTRANS 2005 Release 2' pivotable='true' dictionary='medical'/> <item src_lang='fr' jid='translation.shakespeare.lit' dst_lang='en' engine='SYSTRANS 2005 Release 2' pivotable='true' dictionary='standard'/> <item src_lang='ru' jid='translation.shakespeare.lit' dst_lang='en' engine='SYSTRANS 2005 Release 2' pivotable='true' dictionary='Medical 1.0'/> <item src_lang='ko' jid='translation.shakespeare.lit' dst_lang='en' engine='SYSTRANS 2005 Release 2' pivotable='true'/> </query> </iq> ]]></example> </section3> </section2> <section2 topic='Requesting a Translation from a Service' anchor='request'> <section3 topic='Requesting a Basic Translation' anchor='request-basic'> <p>To request service from a translation provider you can send a message to a provider requesting translations. The lack of a 'derived_from' attribute in the <translation/> element indicates a request for a translation. The request SHOULD include an XMPP &THREAD; element for tracking purposes.</p> <example caption='Entity requests a translation from English to French'><![CDATA[ <message from='bard@shakespeare.lit/globe' to='translation.shakespeare.lit'> <thread>5f3ea6f710337db2388e965e837fcc96334361e4</thread> <subject xml:lang='en'>Hello</subject> <body xml:lang='en'>How are you?</body> <x xmlns='http://jabber.org/protocol/langtrans'> <translation destination='fr'/> </x> </message> ]]></example> <p>If the translation request included a Thread ID, the translation provider MUST echo that thread in its response so that the requesting entity can keep associate the response with the request.</p> <example caption='Translation is returned from translation provider'><![CDATA[ <message from='translation.shakespeare.lit' to='bard@shakespeare.lit/globe'> <thread>5f3ea6f710337db2388e965e837fcc96334361e4</thread> <subject xml:lang='fr'>Bonjour</subject> <subject xml:lang='en'>Hello</subject> <body xml:lang='fr'>comment allez-vous?</body> <body xml:lang='en'>How are you?</body> <x xmlns='http://jabber.org/protocol/langtrans'> <translation destination='fr' derived_from='en'/> </x> </message> ]]></example> </section3> <section3 topic='Requesting a Translation With Multiple Destination Languages' anchor='request-multiple'> <example caption='bard requests a translation from English to French and Russian'><![CDATA[ <message from='bard@shakespeare.lit/globe' to='translation.shakespeare.lit'> <thread>5f3ea6f710337db2388e965e837fcc96334361e4</thread> <subject xml:lang='en'>Hello</subject> <body xml:lang='en'>How are you?</body> <x xmlns='http://jabber.org/protocol/langtrans'> <translation destination='fr'/> <translation destination='ru'/> </x> </message> ]]></example> <example caption='Translation is returned from translation provider'><![CDATA[ <message from='translation.shakespeare.lit' to='bard@shakespeare.lit/globe'> <thread>5f3ea6f710337db2388e965e837fcc96334361e4</thread> <subject xml:lang='en'>Hello</subject> <subject xml:lang='fr'>Bonjour</subject> <subject xml:lang='ru'>x443;лте</subject> <body xml:lang='fr'>comment allez-vous?</body> <body xml:lang='en'>How are you?</body> <body xml:lang='ru'>Как вы?</body> <x xmlns='http://jabber.org/protocol/langtrans'> <translation destination='fr' derived_from='en'/> <translation destination='ru' derived_from='en'/> </x> </message> ]]></example> </section3> <section3 topic='Requesting a Translation With a Specific Dictionary' anchor='request-dictionary'> <p>If a specific dictionary is required you MAY request a dictionary. This SHOULD have been returned when discoing the server although a dictionary MAY be requested which was not. The dictionaries are translation engine specific and are free form text.</p> <example caption='Requests a translation from English to French using the 'medical' dictionary'><![CDATA[ <message from='bard@shakespeare.lit/globe' to='translation.shakespeare.lit'> <thread>5f3ea6f710337db2388e965e837fcc96334361e4</thread> <subject xml:lang='en'>Hello</subject> <body xml:lang='en'>How are you?</body> <x xmlns='http://jabber.org/protocol/langtrans'> <translation destination='fr' dictionary='medical'/> </x> </message> ]]></example> <example caption='Translation provider returns translation with dictionary details'><![CDATA[ <message from='translation.shakespeare.lit' to='bard@shakespeare.lit/globe'> <thread>5f3ea6f710337db2388e965e837fcc96334361e4</thread> <subject xml:lang='fr'>Bonjour</subject> <subject xml:lang='en'>Hello</subject> <body xml:lang='fr'>comment allez-vous?</body> <body xml:lang='en'>How are you?</body> <x xmlns='http://jabber.org/protocol/langtrans'> <translation destination='fr' derived_from='en' dictionary='medical'/> </x> </message> ]]></example> <p>If the translation service cannot complete the translation it SHOULD return a <item-not-found/> error indicating some part of the translation request was problematic, unless doing so would violate the privacy and security considerations in XMPP Core and XMPP IM, or local security and privacy policies.</p> <example caption='Translation could not be completed'><![CDATA[ <message from='translation.shakespeare.lit' to='bard@shakespeare.lit/globe' type='error'> <error code='404' type='cancel'> <item-not-found xmlns='urn:ietf:xml:params:ns:xmpp-stanzas'/> </error> </message> ]]></example> <p>If privacy or security considerations make returning an <item-not-found/> error not feasible it SHOULD return a <service-unavailable/> error.</p> <example caption='Service unavailable'><![CDATA[ <message from='translation.shakespeare.lit' to='bard@shakespeare.lit/globe' type='error'> <error code='503' type='cancel'> <service-unavailable xmlns='urn:ietf:xml:params:ns:xmpp-stanzas'/> </error> </message> ]]></example> </section3> </section2> </section1> <section1 topic='Implementation Notes' anchor='impl'> <p>In order to reduce user confusion and misunderstanding of a translated message body, it is RECOMMENDED that implementations of langtran implement the following user interface features.</p> <ol> <li>Translated messages should be clearly identified as being a translation.</li> <li>The display of translated message should clearly show how (automated, manual, automated with human review) a messaged was translated.</li> <li>The display of a message should clearly show if the translation is the destination, original or pivot language.</li> <li>If pivoting is used, the destination message text should be marked in such a way as to indicate that it was translated on one or more pivot languages, what those language are, in what order they were used, and the actual pivot language text should be accessible to the user.</li> <li>It is recommended that only one level of pivoting be used as quality of the destination translation degrades significantly after each pivot.</li> </ol> <p>Note: The 'reviewed' and 'pivotable' attributes are of type "boolean" and MUST be handled accordingly. &BOOLEANNOTE;</p> </section1> <section1 topic='Internationalization Considerations' anchor='i18n'> <p>In order to properly process multi-language messages, clients MUST implement support for multiple message bodies differentiated by the 'xml:lang' attribute as described in <cite>RFC 3920</cite>.</p> </section1> <section1 topic='Security Considerations' anchor='security'> <p>Potential attacks may be easier against services that implement translation because of the potential disclosure of information regarding language pairings, engines, and dictionaries used however no specific vulnerabilities are introduced.</p> <p>This possible weakness can be mitigated by not returning specifics to requesting entities and the responding entity MAY perform authorization checks in order to determine how to respond.</p> </section1> <section1 topic='IANA Considerations' anchor='iana'> <p>This document requires no interaction with &IANA;.</p> </section1> <section1 topic='XMPP Registrar Considerations' anchor='registrar'> <section2 topic='Protocol Namespaces' anchor='registrar-ns'> <p>The ®ISTRAR; shall include 'http://jabber.org/protocol/langtrans' and 'http://jabber.org/protocol/langtrans#items' in its registry of protocol namespaces.</p> </section2> <section2 topic='Service Discovery Identities' anchor='registrar-identity'> <p>The XMPP Registrar shall add a type of "translation" to the "automation" category in its registry of service discovery identities.</p> </section2> </section1> <section1 topic='XML Schema' anchor='schema'> <section2 topic='langtrans' anchor='schema-langtrans'> <code><![CDATA[ <?xml version='1.0' encoding='UTF-8'?> <xs:schema xmlns='http://jabber.org/protocol/langtrans' xmlns:xs='http://www.w3.org/2001/XMLSchema' targetNamespace='http://jabber.org/protocol/langtrans' elementFormDefault='qualified'> <xs:element name='x'> <xs:complexType> <xs:sequence> <xs:element ref='translation'/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name='translation'> <xs:complexType> <xs:simpleContent> <xs:extension base='empty'> <xs:attribute name='charset' type='xs:string' use='optional'/> <xs:attribute name='derived_from' type='xs:language' use='optional' /> <xs:attribute name='destination' type='xs:string' use='required'/> <xs:attribute name='dictionary' type='xs:string' use='required'/> <xs:attribute name='engine' type='xs:string' use='optional' /> <xs:attribute name='reviewed' type='xs:boolean' use='optional' default='false'/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:simpleType name='empty'> <xs:restriction base='xs:string'> <xs:enumeration value=''/> </xs:restriction> </xs:simpleType> </xs:schema> ]]></code> </section2> <section2 topic='langtrans#items' anchor='schema-langtrans-items'> <code><![CDATA[ <?xml version='1.0' encoding='UTF-8'?> <xs:schema xmlns='http://jabber.org/protocol/langtrans#items' xmlns:xs='http://www.w3.org/2001/XMLSchema' targetNamespace='http://jabber.org/protocol/langtrans#items' elementFormDefault='qualified'> <xs:element name='query'> <xs:complexType> <xs:sequence> <xs:element ref='item' minOccurs='0' maxOccurs='unbounded'/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name='item'> <xs:complexType> <xs:simpleContent> <xs:extension base='empty'> <xs:attribute name='dictionary' type='xs:string'/> <xs:attribute name='dst_lang' type='xs:language'/> <xs:attribute name='engine' type='xs:string' use='optional'/> <xs:attribute name='jid' type='xs:string' use='required'/> <xs:attribute name='name' type='xs:string' use='optional'/> <xs:attribute name='pivotable' type='xs:boolean' use='optional' default='false'/> <xs:attribute name='src_lang' type='xs:language' use='required'/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:simpleType name='empty'> <xs:restriction base='xs:string'> <xs:enumeration value=''/> </xs:restriction> </xs:simpleType> </xs:schema> ]]></code> </section2> </section1> </xep>