Use codepoints for references

This commit is contained in:
Kevin Smith 2020-12-09 09:23:01 +00:00
parent 2efb37a0ee
commit 1eb63f596d
1 changed files with 8 additions and 2 deletions

View File

@ -25,11 +25,17 @@
<supersededby/>
<shortname>Refs</shortname>
&ksmithisode;
<revision>
<version>0.5.0</version>
<date>2020-12-09</date>
<initials>kis</initials>
<remark>Specify counting should be of code points.</remark>
</revision>
<revision>
<version>0.4.0</version>
<date>2020-12-08</date>
<initials>gh/jcbrand</initials>
<remark>Specify that begin is inclusive, starts counting at zero, and that end is exclusive (Dijkstra-based convention)</remark>
<remark>Specify that begin is inclusive, starts counting at zero, and that end is exclusive (Dijkstra-based convention).</remark>
</revision>
<revision>
<version>0.3</version>
@ -99,7 +105,7 @@
<section1 topic='Use Cases' anchor='usecases'>
<section2 topic='Generics' anchor='usecase_generics'>
<p>References are provided in a 'reference' element of a message, with a namespace of 'urn:xmpp:reference:0'. The element MUST contain a 'type' attribute denoting the type of the reference and a 'uri' attribute of the thing that is referenced. It MAY contain 'begin', 'end' and 'anchor' elements.</p>
<p>The 'begin' and 'end' attributes are indexes denoting the beginning and end of the referenced substring in the message body. The Dijkstra convention of ranges<note>Dijkstra convention of ranges &lt;<link url='https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html'>https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html</link>&gt;</note> is used, which means that 'begin' is inclusive and 'end' is exclusive. In other words, the 'begin' attribute is the index of the first character (TODO: define character appropriately) in the referenced substring, with 0 being the index of the first character in the body, and the 'end' attribute is one higher than the index of the last character in the substring.
<p>The 'begin' and 'end' attributes are indexes denoting the beginning and end of the referenced substring in the message body. The Dijkstra convention of ranges<note>Dijkstra convention of ranges &lt;<link url='https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html'>https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html</link>&gt;</note> is used, which means that 'begin' is inclusive and 'end' is exclusive. In other words, the 'begin' attribute is the index of the first unicode code point in the referenced substring, with 0 being the index of the first code point in the body, and the 'end' attribute is one higher than the index of the last code point in the substring.
This convention has three main advantages. It matches subsequence indexing in various programming languages, 'end' minus 'begin' equals the length of the substring, and when two substrings are adjacent, the 'end' attribute of the first one matches the 'begin' attribute of the second one.
Where the reference is not a substring of the message body in the referring stanza, 'begin' and 'end' are not used.</p>
<p>An 'anchor' attribute is used when the referring message is not the one containing the reference element, and points to the previous message containing the reference (the referring message).</p>