From 60581844da3f5a488ff74cff244f5c120c3766c6 Mon Sep 17 00:00:00 2001 From: Marvin W Date: Thu, 2 Jan 2020 14:05:07 +0100 Subject: [PATCH] charcount 0.2: Include feedback/clarifications from list --- xep-0426.xml | 44 ++++++++++++++++++++++++++++++++++++++------ 1 file changed, 38 insertions(+), 6 deletions(-) diff --git a/xep-0426.xml b/xep-0426.xml index 886490b9..90ab2b38 100644 --- a/xep-0426.xml +++ b/xep-0426.xml @@ -26,6 +26,12 @@ xsf@larma.de jabber@larma.de + + 0.2.0 + 2020-01-02 + mw + Include feedback/clarifications from list. + 0.1.0 2019-12-26 @@ -43,12 +49,13 @@

Various use-cases require the possibility to reference a part of the message - body. This was realized by providing the offset of the beginning and end of - the referenced region as offset from the beginning of the message. XEPs - doing so include &xep0372; (and thereof derived &xep0385;) and &xep0394;. + body or a specific position in it. This was realized by providing offsets + from the beginning of the message (when referencing a region, those offsets + would define begin and end of a region). XEPs doing so include &xep0301;, + &xep0372; (and thereof derived &xep0385;) and &xep0394;.

- For this method, it is highly relevant to decide how to count "characters" + For these use-cases, it is highly relevant to decide how to count "characters" in a message body. While it at first sounds trivial, there are various ways of doing so in modern font systems. The purpose of this XEP is to define how characters shall be counted for the purpose of the aforementioned XEPs and @@ -59,8 +66,17 @@

When counting characters in a body, they shall be counted by their - number of Unicode code points. Message bodies must not be - normalized when counting code points. + number of Unicode code points. Message bodies must be used + as strings of the XML characters (as defined in §2.2 of &w3xml;). This means + that, i.e. no Unicode normalization may be performed before determining + offsets when receiving or after determining offsets when sending. + Any kind of further body processing shall be performed after counting (e.g. + /me·The middle dot is used to represent a space character + and is not meant to be taken verbatim. as described in &xep0245; is + always counted as 4 characters without considering the sending user's name). + All references (as defined in §4.1 of &w3xml;) must be counted by their + referenced character(s) and not the reference characters (e.g. the encoded + & is counted as one decoded character &).

@@ -77,6 +93,13 @@ + + + + + + + @@ -180,6 +203,8 @@ any usable output. Counting code points is widely supported in programming languages and can easily be implemented for encoded strings when not. + The &w3xml; standard also defines a character as a unicode code point, thus + counting code points is equivalent to counting XML characters.

@@ -198,4 +223,11 @@

This document requires no interaction with ®ISTRAR;.

+ +

+ The author would like to thank Guus der Kinderen, Ralph Meijer, Jonas + Schäfer, Lance Stout and others that provided feedback. +

+
+
13 13
You & Me8888
こんにちは世界 7