Browse Source

Use codepoints for references

master
Kevin Smith 2 years ago
parent
commit
1eb63f596d
  1. 10
      xep-0372.xml

10
xep-0372.xml

@ -25,11 +25,17 @@ @@ -25,11 +25,17 @@
<supersededby/>
<shortname>Refs</shortname>
&ksmithisode;
<revision>
<version>0.5.0</version>
<date>2020-12-09</date>
<initials>kis</initials>
<remark>Specify counting should be of code points.</remark>
</revision>
<revision>
<version>0.4.0</version>
<date>2020-12-08</date>
<initials>gh/jcbrand</initials>
<remark>Specify that begin is inclusive, starts counting at zero, and that end is exclusive (Dijkstra-based convention)</remark>
<remark>Specify that begin is inclusive, starts counting at zero, and that end is exclusive (Dijkstra-based convention).</remark>
</revision>
<revision>
<version>0.3</version>
@ -99,7 +105,7 @@ @@ -99,7 +105,7 @@
<section1 topic='Use Cases' anchor='usecases'>
<section2 topic='Generics' anchor='usecase_generics'>
<p>References are provided in a 'reference' element of a message, with a namespace of 'urn:xmpp:reference:0'. The element MUST contain a 'type' attribute denoting the type of the reference and a 'uri' attribute of the thing that is referenced. It MAY contain 'begin', 'end' and 'anchor' elements.</p>
<p>The 'begin' and 'end' attributes are indexes denoting the beginning and end of the referenced substring in the message body. The Dijkstra convention of ranges<note>Dijkstra convention of ranges &lt;<link url='https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html'>https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html</link>&gt;</note> is used, which means that 'begin' is inclusive and 'end' is exclusive. In other words, the 'begin' attribute is the index of the first character (TODO: define character appropriately) in the referenced substring, with 0 being the index of the first character in the body, and the 'end' attribute is one higher than the index of the last character in the substring.
<p>The 'begin' and 'end' attributes are indexes denoting the beginning and end of the referenced substring in the message body. The Dijkstra convention of ranges<note>Dijkstra convention of ranges &lt;<link url='https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html'>https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html</link>&gt;</note> is used, which means that 'begin' is inclusive and 'end' is exclusive. In other words, the 'begin' attribute is the index of the first unicode code point in the referenced substring, with 0 being the index of the first code point in the body, and the 'end' attribute is one higher than the index of the last code point in the substring.
This convention has three main advantages. It matches subsequence indexing in various programming languages, 'end' minus 'begin' equals the length of the substring, and when two substrings are adjacent, the 'end' attribute of the first one matches the 'begin' attribute of the second one.
Where the reference is not a substring of the message body in the referring stanza, 'begin' and 'end' are not used.</p>
<p>An 'anchor' attribute is used when the referring message is not the one containing the reference element, and points to the previous message containing the reference (the referring message).</p>

Loading…
Cancel
Save