1
0
mirror of https://github.com/moparisthebest/xeps synced 2025-01-04 10:28:00 -05:00
xeps/inbox/markup.xml
2017-11-07 19:32:19 +01:00

214 lines
10 KiB
XML

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE xep SYSTEM 'xep.dtd' [
<!ENTITY % ents SYSTEM 'xep.ent'>
%ents;
]>
<?xml-stylesheet type='text/xsl' href='xep.xsl'?>
<xep>
<header>
<title>Message Markup</title>
<abstract>This specification provides an alternative to XHTML-IM with rigid separation of content and markup information, improving the resilience against spoofing and injection attacks.</abstract>
&LEGALNOTICE;
<number>xxxx</number>
<status>ProtoXEP</status>
<type>Standards Track</type>
<sig>Standards</sig>
<approver>Council</approver>
<dependencies/>
<supersedes/>
<supersededby/>
<shortname>NOT_YET_ASSIGNED</shortname>
<author>
<firstname>Jonas</firstname>
<surname>Wielicki</surname>
<email>jonas@wielicki.name</email>
<jid>jonas@wielicki.name</jid>
</author>
<revision>
<version>0.0.1</version>
<date>2017-11-07</date>
<initials>jwi</initials>
<remark><p>First draft.</p></remark>
</revision>
</header>
<section1 topic='Introduction' anchor='intro'>
<p>Currently, &xep0071; or ad-hoc text-based formats are used to provide styling and semantic information in messages sent over the XMPP network.</p>
<p>These approaches have several drawbacks, including but not limited to:</p>
<ul>
<li>Lack of standardisation (ad-hoc text-based formats), and thus interoperability.</li>
<li>Lack of extensibility (ad-hoc text-based formats).</li>
<li>Pollution of &lt;body/&gt; with markup information (ad-hoc text-based formats), possibly reducing accessiblity.</li>
<li>Possibility of sending different textual content in the marked-up version vs. the plain-text version (XHTML-IM), allowing for spoofing attacks.</li>
<li>Difficult to sanitize potentially malicious input (XHTML-IM mostly) (see e.g. <span class='ref'><link url='https://mail.jabber.org/pipermail/standards/2017-October/033546.html'>Security Issues with XHTML-IM (again)</link></span> <note>Security Issues with XHTML-IM (again) &lt;<link url='https://mail.jabber.org/pipermail/standards/2017-October/033546.html'>https://mail.jabber.org/pipermail/standards/2017-October/033546.html</link>&gt;.</note>), leading to injection attacks.</li>
</ul>
</section1>
<section1 topic='Requirements' anchor='reqs'>
<ul>
<li>Textual data and markup metadata MUST be separated strictly.</li>
<li>There MUST only be a single source of truth for the text associated with each content language in a message.</li>
<li>The markup specification MUST be extensible in order to support more complex use-cases in the futurue.</li>
<li>The markup SHOULD convey semantic information, if possible, as opposed to stylistic information.</li>
<li>Entities SHOULD be able to cherry-pick a subset of the markup which is suitable for their presentation (for example, a terminal-based client may support inline emphasis and strike through, but no block-level markup).</li>
<li>The specification MUST NOT require server support.</li>
<li>Messages using this markup MUST NOT reduce readability for text-to-speech engines and other accessibility technologies.</li>
<li>Messages using this markup MUST NOT reduce readability for people with color vision deficiencies.</li>
<li>Requirements on the contents of the &lt;body/&gt; MUST NOT be imposed.</li>
</ul>
</section1>
<section1 topic='Glossary' anchor='glossary'>
<p>OPTIONAL.</p>
</section1>
<section1 topic='Use Cases' anchor='usecases'>
<section2 topic='Transport inline markup' anchor='usecases-inline'>
<p>Inline markup is declared with the &lt;span/&gt; element.</p>
<example><![CDATA[
<message>
<body>There is really no reason to worry.</body>
<markup xmlns="urn:xmpp:markup:0">
<span start="9" end="15">
<emphasis/>
</span>
</markup>
</message>
]]></example>
<p>The following child elements are defined for &lt;span/&gt;:</p>
<ul>
<li>&lt;emphasis/&gt;: The spanned range is emphasized. Suggested rendering: italics or bold.</li>
<li>&lt;code/&gt;: The spanned range is some kind of machine code. Suggested rendering: monospaced.</li>
<li>&lt;deleted/&gt;: The spanned range has been deleted. Suggested rendering: striked through.</li>
</ul>
<p>The start and end attributes define the range at which the span is applied. They are in units of unicode code points in the character data if the body element. The first affected codepoint is the one at start (where the first codepoint of a message has index 0) and the last affected codepoint is the one just before end. The above example could render in HTML as:</p>
<div class="example">
<p>There is <em>really</em> no reason to worry.</p>
</div>
</section2>
<section2 topic='Transport code blocks' anchor='usecases-code'>
<p>Code blocks are declared with the &lt;bcode/&gt; element:</p>
<example><![CDATA[
<message>
<body>Just run this command:
$ cowsay XMPP is awesome.</body>
<markup xmlns="urn:xmpp:markup:0">
<bcode start="23" end="48"/>
</markup>
</message>
]]></example>
<p>The start and end attributes work just like for &lt;span/&gt;.</p>
<p>The suggested rendering of code blocks is as block-level elements with monospaced font. The above example could render in HTML as:</p>
<div class="example">
<p>Just run this command:</p>
<p style="font-family: monospace;">$ cowsay XMPP is awesome.</p>
</div>
</section2>
<section2 topic='Transport itemized lists' anchor='usecases-itemized'>
<p>Itemized lists are declared with the &lt;list/&gt; and &lt;li/&gt; elements:</p>
<example><![CDATA[
<message>
<body>This XEP supports many things:
* inline markup
* code blocks
* lists
* and possibly more!</body>
<markup xmlns="urn:xmpp:markup:0">
<list start="31" end="89">
<li start="31"/>
<li start="47"/>
<li start="61"/>
<li start="69"/>
</list>
</markup>
</message>
]]></example>
<p>The start and end attributes of &lt;list/&gt; define the scope of the list. The start of the &lt;li/&gt; elements denote the start of a new list item. A list item continues until the end of the list or the start of the next list item. The first &lt;li/&gt; in a &lt;list/&gt; MUST have a start value equal to the start value of the &lt;list/&gt;.</p>
<p>The above example could render in HTML as:</p>
<div class="example">
<p>This XEP supports many things:</p>
<ul>
<li>* inline markup</li>
<li>* code blocks</li>
<li>* lists</li>
<li>* and possibly more!</li>
</ul>
</div>
</section2>
<section2 topic='Transport blockquotes' anchor='usecases-blockquote'>
<p>A block quote is declared with a &lt;bquote/&gt; element:</p>
<example><![CDATA[
<message>
<body>He said:
&gt; Thou shalt not pass!
and raised his hand.</body>
<markup xmlns="urn:xmpp:markup:0">
<bquote start="9" end="32"/>
</markup>
</message>
]]></example>
<p>In addition, &xep0372; or a similar mechanism MAY be used to attribute the origin of the quote. The above example could render in HTML as:</p>
<div class="example">
<p>He said:</p>
<blockquote>&gt; Thou shalt not pass!</blockquote>
<p>and raised his hand.</p>
</div>
<p>A nested quotation can be created by adding two &lt;bquote/&gt; elements where the start/end range is nested. If plain text quotation markers are used, the start of the blockquote MUST be placed at the first quotation marker of the <em>outer</em> quote.</p>
<example><![CDATA[
<message>
<body>&gt; He said:
&gt;&gt; Thou shalt not pass!
&gt; and raised his hand.
Isn't this from some famous movie?</body>
<markup xmlns="urn:xmpp:markup:0">
<bquote start="0" end="57"/>
<bquote start="11" end="34"/>
</markup>
</message>
]]></example>
<p>The above example could render in HTML as:</p>
<div class="example">
<blockquote>&gt; He said:
<blockquote>&gt;&gt; Thou shalt not pass!</blockquote>
&gt; and raised his hand.
</blockquote>
<p>Isn't this from some famous movie?</p>
</div>
</section2>
</section1>
<section1 topic='Business Rules' anchor='rules'>
<ul>
<li>Spans MUST NOT overlap with each other.</li>
<li>Spans MUST NOT overlap with the boundaries of a block-level markup element, but MAY be fully contained within a block-level markup element.</li>
<li>Block level markup elements MUST NOT overlap with each others boundaries.</li>
<li>There MUST NOT be a &lt;markup/&gt; element in a &lt;message/&gt; without corresponding &lt;body/&gt;. Note that there may be one &lt;markup/&gt; elements with appropriate xml:lang attribute value for each &lt;body/&gt;, if the message contains multiple &lt;body/&gt; elements.</li>
<li>The start and end attributes operate on unicode code points in the XML character data of the corresponding &lt;body/&gt; element.</li>
<li>Entities MUST silently ignore elements and attributes (arbitrarly deep) in &lt;markup/&gt; which they do not understand; this allows for future extensions of the markup without breaking existing implementations.</li>
</ul>
</section1>
<section1 topic='Implementation Notes' anchor='impl'>
<p>OPTIONAL.</p>
</section1>
<section1 topic='Accessibility Considerations' anchor='access'>
<p>Entities are encouraged use the semantic information to make the presentation of the textual content more precise, for example by applying spoken emphasis to passages marked with an &lt;emphasis/&gt; &lt;span/&gt;.</p>
</section1>
<section1 topic='Internationalization Considerations' anchor='i18n'>
<p>Since a message may have multiple &lt;body/&gt; elements in different languages, there MAY be multiple &lt;markup/&gt; elements, one for each of the &lt;body/&gt; elements. There is no requirement to include a &lt;markup/&gt; element for each language.</p>
</section1>
<section1 topic='Security Considerations' anchor='security'>
<p>REQUIRED.</p>
</section1>
<section1 topic='IANA Considerations' anchor='iana'>
<p>This document requires no interaction with &IANA;. </p>
</section1>
<section1 topic='XMPP Registrar Considerations' anchor='registrar'>
<p>This specification defines the following XML namespaces:</p>
<ul>
<li>urn:xmpp:markup:0</li>
</ul>
</section1>
<section1 topic='XML Schema' anchor='schema'>
<p>REQUIRED for protocol specifications.</p>
</section1>
<section1 topic='Acknowledgements' anchor='acknowledgements'>
<p>Thanks to Georg Lukas and Emmanuel Gil Peyrot for feedback on the initial idea of this XEP.</p>
</section1>
</xep>