mirror of
https://github.com/moparisthebest/xeps
synced 2024-11-26 19:22:15 -05:00
charcount 0.2: Include feedback/clarifications from list
This commit is contained in:
parent
f97bc934d2
commit
60581844da
44
xep-0426.xml
44
xep-0426.xml
@ -26,6 +26,12 @@
|
||||
<email>xsf@larma.de</email>
|
||||
<jid>jabber@larma.de</jid>
|
||||
</author>
|
||||
<revision>
|
||||
<version>0.2.0</version>
|
||||
<date>2020-01-02</date>
|
||||
<initials>mw</initials>
|
||||
<remark>Include feedback/clarifications from list.</remark>
|
||||
</revision>
|
||||
<revision>
|
||||
<version>0.1.0</version>
|
||||
<date>2019-12-26</date>
|
||||
@ -43,12 +49,13 @@
|
||||
<section1 topic='Introduction' anchor='intro'>
|
||||
<p>
|
||||
Various use-cases require the possibility to reference a part of the message
|
||||
body. This was realized by providing the offset of the beginning and end of
|
||||
the referenced region as offset from the beginning of the message. XEPs
|
||||
doing so include &xep0372; (and thereof derived &xep0385;) and &xep0394;.
|
||||
body or a specific position in it. This was realized by providing offsets
|
||||
from the beginning of the message (when referencing a region, those offsets
|
||||
would define begin and end of a region). XEPs doing so include &xep0301;,
|
||||
&xep0372; (and thereof derived &xep0385;) and &xep0394;.
|
||||
</p>
|
||||
<p>
|
||||
For this method, it is highly relevant to decide how to count "characters"
|
||||
For these use-cases, it is highly relevant to decide how to count "characters"
|
||||
in a message body. While it at first sounds trivial, there are various ways
|
||||
of doing so in modern font systems. The purpose of this XEP is to define how
|
||||
characters shall be counted for the purpose of the aforementioned XEPs and
|
||||
@ -59,8 +66,17 @@
|
||||
<section1 topic='Character counting' anchor='counting'>
|
||||
<p>
|
||||
When counting characters in a body, they shall be counted by their
|
||||
<strong>number of Unicode code points</strong>. Message bodies must not be
|
||||
normalized when counting code points.
|
||||
<strong>number of Unicode code points</strong>. Message bodies must be used
|
||||
as strings of the XML characters (as defined in §2.2 of &w3xml;). This means
|
||||
that, i.e. no Unicode normalization may be performed before determining
|
||||
offsets when receiving or after determining offsets when sending.
|
||||
Any kind of further body processing shall be performed after counting (e.g.
|
||||
<tt>/me·</tt><note>The middle dot is used to represent a space character
|
||||
and is not meant to be taken verbatim.</note> as described in &xep0245; is
|
||||
always counted as 4 characters without considering the sending user's name).
|
||||
All references (as defined in §4.1 of &w3xml;) must be counted by their
|
||||
referenced character(s) and not the reference characters (e.g. the encoded
|
||||
<tt>&amp;</tt> is counted as one decoded character <tt>&</tt>).
|
||||
</p>
|
||||
<table caption='Example strings and their counted length'>
|
||||
<tr>
|
||||
@ -77,6 +93,13 @@
|
||||
<td>13</td>
|
||||
<td>13</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>You & Me</td>
|
||||
<td>8</td>
|
||||
<td>8</td>
|
||||
<td>8</td>
|
||||
<td>8</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>こんにちは世界</td>
|
||||
<td>7</td>
|
||||
@ -180,6 +203,8 @@
|
||||
any usable output.
|
||||
Counting code points is widely supported in programming languages and can
|
||||
easily be implemented for encoded strings when not.
|
||||
The &w3xml; standard also defines a character as a unicode code point, thus
|
||||
counting code points is equivalent to counting XML characters.
|
||||
</p>
|
||||
</section1>
|
||||
|
||||
@ -198,4 +223,11 @@
|
||||
<p>This document requires no interaction with ®ISTRAR;.</p>
|
||||
</section1>
|
||||
|
||||
<section1 topic='Acknowledgements' anchor='acknowledgements'>
|
||||
<p>
|
||||
The author would like to thank Guus der Kinderen, Ralph Meijer, Jonas
|
||||
Schäfer, Lance Stout and others that provided feedback.
|
||||
</p>
|
||||
</section1>
|
||||
|
||||
</xep>
|
||||
|
Loading…
Reference in New Issue
Block a user