<abstract>This specification provides an alternative to XHTML-IM with rigid separation of content and markup information, improving the resilience against spoofing and injection attacks.</abstract>
<p>Currently, &xep0071; or ad-hoc text-based formats are used to provide styling and semantic information in messages sent over the XMPP network.</p>
<p>These approaches have several drawbacks, including but not limited to:</p>
<ul>
<li>Lack of standardisation (ad-hoc text-based formats), and thus interoperability.</li>
<li>Lack of extensibility (ad-hoc text-based formats).</li>
<li>Pollution of <body/> with markup information (ad-hoc text-based formats), possibly reducing accessiblity.</li>
<li>Possibility of sending different textual content in the marked-up version vs. the plain-text version (XHTML-IM), allowing for spoofing attacks.</li>
<li>Difficult to sanitize potentially malicious input (XHTML-IM mostly) (see e.g. <spanclass='ref'><linkurl='https://mail.jabber.org/pipermail/standards/2017-October/033546.html'>Security Issues with XHTML-IM (again)</link></span><note>Security Issues with XHTML-IM (again) <<linkurl='https://mail.jabber.org/pipermail/standards/2017-October/033546.html'>https://mail.jabber.org/pipermail/standards/2017-October/033546.html</link>>.</note>), leading to injection attacks.</li>
</ul>
</section1>
<section1topic='Requirements'anchor='reqs'>
<ul>
<li>Textual data and markup metadata MUST be separated strictly.</li>
<li>There MUST only be a single source of truth for the text associated with each content language in a message.</li>
<li>The markup specification MUST be extensible in order to support more complex use-cases in the futurue.</li>
<li>The markup SHOULD convey semantic information, if possible, as opposed to stylistic information.</li>
<li>Entities SHOULD be able to cherry-pick a subset of the markup which is suitable for their presentation (for example, a terminal-based client may support inline emphasis and strike through, but no block-level markup).</li>
<li>The specification MUST NOT require server support.</li>
<li>Messages using this markup MUST NOT reduce readability for text-to-speech engines and other accessibility technologies.</li>
<li>Messages using this markup MUST NOT reduce readability for people with color vision deficiencies.</li>
<li>Requirements on the contents of the <body/> MUST NOT be imposed.</li>
<p>Inline markup is declared with the <span/> element.</p>
<example><![CDATA[
<message>
<body>There is really no reason to worry.</body>
<markupxmlns="urn:xmpp:markup:0">
<spanstart="9"end="15">
<emphasis/>
</span>
</markup>
</message>
]]></example>
<p>The following child elements are defined for <span/>:</p>
<ul>
<li><emphasis/>: The spanned range is emphasized. Suggested rendering: italics or bold.</li>
<li><code/>: The spanned range is some kind of machine code. Suggested rendering: monospaced.</li>
<li><deleted/>: The spanned range has been deleted. Suggested rendering: striked through.</li>
</ul>
<p>The start and end attributes define the range at which the span is applied. They are in units of unicode code points in the character data if the body element. The first affected codepoint is the one at start (where the first codepoint of a message has index 0) and the last affected codepoint is the one just before end. The above example could render in HTML as:</p>
<divclass="example">
<p>There is <em>really</em> no reason to worry.</p>
<p>Itemized lists are declared with the <list/> and <li/> elements:</p>
<example><![CDATA[
<message>
<body>This XEP supports many things:
* inline markup
* code blocks
* lists
* and possibly more!</body>
<markupxmlns="urn:xmpp:markup:0">
<liststart="31"end="89">
<listart="31"/>
<listart="47"/>
<listart="61"/>
<listart="69"/>
</list>
</markup>
</message>
]]></example>
<p>The start and end attributes of <list/> define the scope of the list. The start of the <li/> elements denote the start of a new list item. A list item continues until the end of the list or the start of the next list item. The first <li/> in a <list/> MUST have a start value equal to the start value of the <list/>.</p>
<p>A block quote is declared with a <bquote/> element:</p>
<example><![CDATA[
<message>
<body>He said:
> Thou shalt not pass!
and raised his hand.</body>
<markupxmlns="urn:xmpp:markup:0">
<bquotestart="9"end="32"/>
</markup>
</message>
]]></example>
<p>In addition, &xep0372; or a similar mechanism MAY be used to attribute the origin of the quote. The above example could render in HTML as:</p>
<divclass="example">
<p>He said:</p>
<blockquote>> Thou shalt not pass!</blockquote>
<p>and raised his hand.</p>
</div>
<p>A nested quotation can be created by adding two <bquote/> elements where the start/end range is nested. If plain text quotation markers are used, the start of the blockquote MUST be placed at the first quotation marker of the <em>outer</em> quote.</p>
<example><![CDATA[
<message>
<body>> He said:
>> Thou shalt not pass!
> and raised his hand.
Isn't this from some famous movie?</body>
<markupxmlns="urn:xmpp:markup:0">
<bquotestart="0"end="57"/>
<bquotestart="11"end="34"/>
</markup>
</message>
]]></example>
<p>The above example could render in HTML as:</p>
<divclass="example">
<blockquote>> He said:
<blockquote>>> Thou shalt not pass!</blockquote>
> and raised his hand.
</blockquote>
<p>Isn't this from some famous movie?</p>
</div>
</section2>
</section1>
<section1topic='Business Rules'anchor='rules'>
<ul>
<li>Spans MUST NOT overlap with each other.</li>
<li>Spans MUST NOT overlap with the boundaries of a block-level markup element, but MAY be fully contained within a block-level markup element.</li>
<li>Block level markup elements MUST NOT overlap with each others boundaries.</li>
<li>There MUST NOT be a <markup/> element in a <message/> without corresponding <body/>. Note that there may be one <markup/> elements with appropriate xml:lang attribute value for each <body/>, if the message contains multiple <body/> elements.</li>
<li>The start and end attributes operate on unicode code points in the XML character data of the corresponding <body/> element.</li>
<li>Entities MUST silently ignore elements and attributes (arbitrarly deep) in <markup/> which they do not understand; this allows for future extensions of the markup without breaking existing implementations.</li>
<p>Entities are encouraged use the semantic information to make the presentation of the textual content more precise, for example by applying spoken emphasis to passages marked with an <emphasis/><span/>.</p>
<p>Since a message may have multiple <body/> elements in different languages, there MAY be multiple <markup/> elements, one for each of the <body/> elements. There is no requirement to include a <markup/> element for each language.</p>