From 1eb63f596d5b109558d9c5562a18a8cdb2885ed8 Mon Sep 17 00:00:00 2001 From: Kevin Smith Date: Wed, 9 Dec 2020 09:23:01 +0000 Subject: [PATCH] Use codepoints for references --- xep-0372.xml | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/xep-0372.xml b/xep-0372.xml index 9192d98a..5e632eae 100644 --- a/xep-0372.xml +++ b/xep-0372.xml @@ -25,11 +25,17 @@ Refs &ksmithisode; + + 0.5.0 + 2020-12-09 + kis + Specify counting should be of code points. + 0.4.0 2020-12-08 gh/jcbrand - Specify that begin is inclusive, starts counting at zero, and that end is exclusive (Dijkstra-based convention) + Specify that begin is inclusive, starts counting at zero, and that end is exclusive (Dijkstra-based convention). 0.3 @@ -99,7 +105,7 @@

References are provided in a 'reference' element of a message, with a namespace of 'urn:xmpp:reference:0'. The element MUST contain a 'type' attribute denoting the type of the reference and a 'uri' attribute of the thing that is referenced. It MAY contain 'begin', 'end' and 'anchor' elements.

-

The 'begin' and 'end' attributes are indexes denoting the beginning and end of the referenced substring in the message body. The Dijkstra convention of rangesDijkstra convention of ranges <https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html> is used, which means that 'begin' is inclusive and 'end' is exclusive. In other words, the 'begin' attribute is the index of the first character (TODO: define character appropriately) in the referenced substring, with 0 being the index of the first character in the body, and the 'end' attribute is one higher than the index of the last character in the substring. +

The 'begin' and 'end' attributes are indexes denoting the beginning and end of the referenced substring in the message body. The Dijkstra convention of rangesDijkstra convention of ranges <https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html> is used, which means that 'begin' is inclusive and 'end' is exclusive. In other words, the 'begin' attribute is the index of the first unicode code point in the referenced substring, with 0 being the index of the first code point in the body, and the 'end' attribute is one higher than the index of the last code point in the substring. This convention has three main advantages. It matches subsequence indexing in various programming languages, 'end' minus 'begin' equals the length of the substring, and when two substrings are adjacent, the 'end' attribute of the first one matches the 'begin' attribute of the second one. Where the reference is not a substring of the message body in the referring stanza, 'begin' and 'end' are not used.

An 'anchor' attribute is used when the referring message is not the one containing the reference element, and points to the previous message containing the reference (the referring message).