First draft.
Historically, XMPP has had no system for simple text styling. Instead, specifications like &xep0071; that require full layout engines have been used, leading to numerous security issues with implementations. Some entities have also performed their own styling based on identifiers in the body. While this has worked well in the past, it is not interoperable and leads to entities each supporting their own informal styling languages.
This specification aims to provide a single, interoperable formatted text syntax that can be used by entities that do not require full layout engines.
Many important terms used in this document are defined in &unicode;. The terms "left-to-right" (LTR) and "right-to-left" (RTL) are defined in &uax9;. The term "formatted text" is defined in &rfc7764;.
A block is any chunk of text that can be parsed unambiguously in one pass.
A span is a group of text that is rendered inline and where the entire group is rendered in the same manner. Spans may be either plain text with no formatting applied, or may be formatted text that is enclosed by two styling directives. Spans may not escape from their containing block. The following all contain spans marked by parenthesis:
Matches of spans between two styling directives MUST contain some text between the two styling directives and the opening styling directive MUST be located at the beginning of the line, or after a whitespace character. The opening styling directive MUST also not be followed by a whitespace character. The closing styling directive MUST NOT be preceeded by a whitespace character. Spans are always parsed from the beginning of the byte stream to the end and are lazily matched. Characters that would be styling directives but do not follow these rules are not considered when matching and thus may be present between two other styling directives.
For example, each of the following would be styled as indicated:
Nothing would be styled in the following messages (where "\n" represents a new line):
Text enclosed by '*' (U+002A ASTERISK) is strong SHOULD be displayed as bold.