%ents; RFC 7764 RFC 7764: Guidance on Markdown: Design Philosophies, Stability Strategies, and Select Registrations <http://tools.ietf.org/html/rfc7764>." > Unicode Standard Annex #9 Unicode Standard Annex #9, "Unicode Bidirectional Algorithm", edited by Mark Davis, Aharon Lanin, and Andrew Glass. An integral part of The Unicode Standard, <http://unicode.org/reports/tr9/>." > ]>
Message Styling This specification defines a formatted text syntax for use in instant messages with simple text styling. &LEGALNOTICE; xxxx ProtoXEP Standards Track Standards Council XMPP Core XEP-0001 styling &sam; 0.0.1 2017-10-28 ssw

First draft.

Historically, XMPP has had no system for simple text styling. Instead, specifications like &xep0071; that require full layout engines have been used, leading to numerous security issues with implementations. Some entities have also performed their own styling based on identifiers in the body. While this has worked well in the past, it is not interoperable and leads to entities each supporting their own informal styling languages.

This specification aims to provide a single, interoperable formatted text syntax that can be used by entities that do not require full layout engines.

Many important terms used in this document are defined in &unicode;. The terms "left-to-right" (LTR) and "right-to-left" (RTL) are defined in &uax9;. The term "formatted text" is defined in &rfc7764;.

Formal markup language
A structured markup language such as LaTeX, SGML, HTML, or XML that is formally defined and may include metadata unrelated to formatting or text style.
Plain text
Text that does not convey any particular formatting or interpretation of the text by computer programs.
Whitespace character
Any Unicode scalar value which has the property "White_Space" or is in category Z in the Unicode Character Database.

A block is any chunk of text that can be parsed unambiguously in one pass.

  • A single line of text containing only inline spans
  • A block quotation comprising one or more lines
  • A preformatted code block

A span are groups of text that do not result in a line break when rendered (they are rendered inline) and where the entire group is rendered in the same manner and in the same block. Spans may be either plain text with no formatting applied, or may be formatted text that is enclosed by two styling directives. The following are all single spans:

  • plain span
  • *emphasized span*

Matches of spans between two styling directives MUST contain some text between the two styling directives and the opening styling directive MUST be located at the beginning of the line, or after a whitespace character. The opening styling directive MUST also not be followed by a whitespace character. The closing styling directive MUST NOT be preceeded by a whitespace character. Spans are always parsed from the beginning of the byte stream to the end and are lazily matched. Characters that would be styling directives but do not follow these rules are not considered when matching and thus may be present between two other styling directives.

For example, each of the following would be emphasized as indicated:

  • *emphasized*
  • foo *emphasized* bar
  • *emphasized* foo *emphasized*
  • *emphasized*foo*
  • * foo *emphasized*
  • *emphasized *foo*

Nothing would be styled in the following messages (where \n represents a new line):

  • not emphasized*
  • *not emphasized
  • *not \n emphasized*
  • *foo *bar
  • **
  • ****

Text enclosed by '*' (U+002A ASTERISK) SHOULD be displayed with a greater weight than the surrounding text (bold face).

The full title is *Twelfth Night, or What You Will* but *most* people shorten it. ]]>

Text enclosed by '_' (U+005F LOW LINE) SHOULD be displayed in italics.

The full title is _Twelfth Night, or What You Will_ but _most_ people shorten it. ]]>

Text enclosed by '~' (U+007E TILDE) SHOULD be displayed with a horizontal line through the middle (strike through).

Everyone ~dis~likes cake. ]]>

Text enclosed by a '`' (U+0060 GRAVE ACCENT) SHOULD be displayed inline in a monospace font. Inline formatting directives inside the inline preformatted text are not rendered. For example, in the following the word "monospace" is valid pre-formatted inline text:

  • This is `monospace`
  • This is `*monospace*`
  • This is *`monospace and bold`*
Wow, I can write in `monospace`! ]]>

A block of text surrounded by lines consisting of a sequence of three backticks, "```" (U+0060 GRAVE ACCENT), is preformatted text and should be displayed exactly as it was entered including whitespace. If no closing "```" sequence exists, the preformatted block extends to the end of the input stream or the end of the parent block (whichever comes first). No other formatting described in this document should be rendered inside a preformatted text block.

``` (println "Hello, world!") ``` This should show up as monospace, preformatted text ⤴ ]]> > ``` > (println "Hello, world!") The entire blockquote is a preformatted text block, but this line is plaintext! ]]>

A quotation is indicated by one or more lines with a byte stream beginning with a '>' (U+003E GREATER-THAN SIGN). Block quotes may contain any child block, including other quotations. Lines inside the block quote MUST have leading spaces trimmed before parsing the child block.

> That that is, is. Said the old hermit of Prague. ]]> >> That that is, is. > Said the old hermit of Prague. Who? ]]>

This document does not define a regular grammar and thus styling cannot be matched by a regular expression. Instead, a predictive recursive descent or LALR parser may be constructed. For instance, a simple parser can be constructed by first parsing all text into blocks and then recursively parsing the child-blocks inside block quotations, the spans inside plain lines, and by returning the text inside preformatted blocks without modification.

It is RECOMMENDED that formatting characters be displayed and formatted in the same manner as the text they apply to. For example, the string "*emphasis*" would be rendered as "*emphasis*".

When displaying text with formatting, developers should take care to ensure sufficient contrast exists between styled and unstyled text so that users with vision deficiencies are able to distinguish between the two.

Formatted text may also be rendered poorly by screen readers. When applying formatting it may be desirable to include directives to exclude formatting characters from being read.

OPTIONAL.

REQUIRED.

This document requires no interaction with &IANA;.

This specification requires no interaction with the ®ISTRAR;

This document does not define any new XML structure requiring a schema.