mirror of
https://github.com/moparisthebest/xeps
synced 2024-11-23 01:32:22 -05:00
charcount 0.2: Include feedback/clarifications from list
This commit is contained in:
parent
f97bc934d2
commit
60581844da
44
xep-0426.xml
44
xep-0426.xml
@ -26,6 +26,12 @@
|
|||||||
<email>xsf@larma.de</email>
|
<email>xsf@larma.de</email>
|
||||||
<jid>jabber@larma.de</jid>
|
<jid>jabber@larma.de</jid>
|
||||||
</author>
|
</author>
|
||||||
|
<revision>
|
||||||
|
<version>0.2.0</version>
|
||||||
|
<date>2020-01-02</date>
|
||||||
|
<initials>mw</initials>
|
||||||
|
<remark>Include feedback/clarifications from list.</remark>
|
||||||
|
</revision>
|
||||||
<revision>
|
<revision>
|
||||||
<version>0.1.0</version>
|
<version>0.1.0</version>
|
||||||
<date>2019-12-26</date>
|
<date>2019-12-26</date>
|
||||||
@ -43,12 +49,13 @@
|
|||||||
<section1 topic='Introduction' anchor='intro'>
|
<section1 topic='Introduction' anchor='intro'>
|
||||||
<p>
|
<p>
|
||||||
Various use-cases require the possibility to reference a part of the message
|
Various use-cases require the possibility to reference a part of the message
|
||||||
body. This was realized by providing the offset of the beginning and end of
|
body or a specific position in it. This was realized by providing offsets
|
||||||
the referenced region as offset from the beginning of the message. XEPs
|
from the beginning of the message (when referencing a region, those offsets
|
||||||
doing so include &xep0372; (and thereof derived &xep0385;) and &xep0394;.
|
would define begin and end of a region). XEPs doing so include &xep0301;,
|
||||||
|
&xep0372; (and thereof derived &xep0385;) and &xep0394;.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
For this method, it is highly relevant to decide how to count "characters"
|
For these use-cases, it is highly relevant to decide how to count "characters"
|
||||||
in a message body. While it at first sounds trivial, there are various ways
|
in a message body. While it at first sounds trivial, there are various ways
|
||||||
of doing so in modern font systems. The purpose of this XEP is to define how
|
of doing so in modern font systems. The purpose of this XEP is to define how
|
||||||
characters shall be counted for the purpose of the aforementioned XEPs and
|
characters shall be counted for the purpose of the aforementioned XEPs and
|
||||||
@ -59,8 +66,17 @@
|
|||||||
<section1 topic='Character counting' anchor='counting'>
|
<section1 topic='Character counting' anchor='counting'>
|
||||||
<p>
|
<p>
|
||||||
When counting characters in a body, they shall be counted by their
|
When counting characters in a body, they shall be counted by their
|
||||||
<strong>number of Unicode code points</strong>. Message bodies must not be
|
<strong>number of Unicode code points</strong>. Message bodies must be used
|
||||||
normalized when counting code points.
|
as strings of the XML characters (as defined in §2.2 of &w3xml;). This means
|
||||||
|
that, i.e. no Unicode normalization may be performed before determining
|
||||||
|
offsets when receiving or after determining offsets when sending.
|
||||||
|
Any kind of further body processing shall be performed after counting (e.g.
|
||||||
|
<tt>/me·</tt><note>The middle dot is used to represent a space character
|
||||||
|
and is not meant to be taken verbatim.</note> as described in &xep0245; is
|
||||||
|
always counted as 4 characters without considering the sending user's name).
|
||||||
|
All references (as defined in §4.1 of &w3xml;) must be counted by their
|
||||||
|
referenced character(s) and not the reference characters (e.g. the encoded
|
||||||
|
<tt>&amp;</tt> is counted as one decoded character <tt>&</tt>).
|
||||||
</p>
|
</p>
|
||||||
<table caption='Example strings and their counted length'>
|
<table caption='Example strings and their counted length'>
|
||||||
<tr>
|
<tr>
|
||||||
@ -77,6 +93,13 @@
|
|||||||
<td>13</td>
|
<td>13</td>
|
||||||
<td>13</td>
|
<td>13</td>
|
||||||
</tr>
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>You & Me</td>
|
||||||
|
<td>8</td>
|
||||||
|
<td>8</td>
|
||||||
|
<td>8</td>
|
||||||
|
<td>8</td>
|
||||||
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>こんにちは世界</td>
|
<td>こんにちは世界</td>
|
||||||
<td>7</td>
|
<td>7</td>
|
||||||
@ -180,6 +203,8 @@
|
|||||||
any usable output.
|
any usable output.
|
||||||
Counting code points is widely supported in programming languages and can
|
Counting code points is widely supported in programming languages and can
|
||||||
easily be implemented for encoded strings when not.
|
easily be implemented for encoded strings when not.
|
||||||
|
The &w3xml; standard also defines a character as a unicode code point, thus
|
||||||
|
counting code points is equivalent to counting XML characters.
|
||||||
</p>
|
</p>
|
||||||
</section1>
|
</section1>
|
||||||
|
|
||||||
@ -198,4 +223,11 @@
|
|||||||
<p>This document requires no interaction with ®ISTRAR;.</p>
|
<p>This document requires no interaction with ®ISTRAR;.</p>
|
||||||
</section1>
|
</section1>
|
||||||
|
|
||||||
|
<section1 topic='Acknowledgements' anchor='acknowledgements'>
|
||||||
|
<p>
|
||||||
|
The author would like to thank Guus der Kinderen, Ralph Meijer, Jonas
|
||||||
|
Schäfer, Lance Stout and others that provided feedback.
|
||||||
|
</p>
|
||||||
|
</section1>
|
||||||
|
|
||||||
</xep>
|
</xep>
|
||||||
|
Loading…
Reference in New Issue
Block a user