1
0
mirror of https://github.com/moparisthebest/xeps synced 2024-11-24 18:22:24 -05:00

XEP-0372: Determine substring indexing convention

The `begin` index is inclusive and the `end` index is exclusive.

Standards list discussion: https://mail.jabber.org/pipermail/standards/2020-December/thread.html#37954
This commit is contained in:
JC Brand 2020-12-07 10:20:09 +01:00
parent 07b3993048
commit 7034575225

View File

@ -92,7 +92,11 @@
<section1 topic='Use Cases' anchor='usecases'>
<section2 topic='Generics' anchor='usecase_generics'>
<p>References are provided in a 'reference' element of a message, with a namespace of 'urn:xmpp:reference:0'. The element MUST contain a type attribute denoting the type of the reference and a uri attribute of the thing that is referenced. It MAY contain begin, end and anchor elements. A begin attribute is used to mark the index in the body of the referring message of the first character (TODO: define character appropriately) in the reference, with 0 being the index of the first character. An end attribute is similarly used for the index of the last character of the reference. Where the reference is not a substring of the message body in the referring stanza, begin and end are not used. An anchor attribute is used when the referring message is not the one containing the reference element, and points to the previous message containing the reference (the referring message).</p>
<p>References are provided in a 'reference' element of a message, with a namespace of 'urn:xmpp:reference:0'. The element MUST contain a 'type' attribute denoting the type of the reference and a 'uri' attribute of the thing that is referenced. It MAY contain 'begin', 'end' and 'anchor' elements.</p>
<p>The 'begin' and 'end' attributes are indexes denoting the beginning and end of the referenced substring in the message body. The Dijkstra convention of ranges<note>Dijkstra convention of ranges &lt;<link url='https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html'>https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html</link>&gt;</note> is used, which means that 'begin' is inclusive and 'end' is exclusive. In other words, the 'begin' attribute is the index of the first character (TODO: define character appropriately) in the referenced substring, with 0 being the index of the first character in the body, and the 'end' attribute is one higher than the index of the last character in the substring.
This convention has three main advantages. It matches subsequence indexing in various programming languages, 'end' minus 'begin' equals the length of the substring, and when two substrings are adjacent, the 'end' attribute of the first one matches the 'begin' attribute of the second one.
Where the reference is not a substring of the message body in the referring stanza, 'begin' and 'end' are not used.</p>
<p>An 'anchor' attribute is used when the referring message is not the one containing the reference element, and points to the previous message containing the reference (the referring message).</p>
<p>Note that the URIs of the reference and anchor do not need to refer to the same mechanism as that in which the reference was received. E.g., a service could listen for mentions in a MIX channels of users outside that channel, and send them messages containing a reference to let them know that they've been mentioned.</p>
</section2>
<section2 topic='Mentions' anchor='usecase_mention'>