mirror of
https://github.com/moparisthebest/xeps
synced 2024-11-21 08:45:04 -05:00
XEP-0426: Character Counting 0.3.0
Added section about subsequences.
This commit is contained in:
parent
d79c8fafb6
commit
7a54054335
55
xep-0426.xml
55
xep-0426.xml
@ -13,19 +13,21 @@
|
||||
</abstract>
|
||||
&LEGALNOTICE;
|
||||
<number>0426</number>
|
||||
<status>Deferred</status>
|
||||
<status>Experimental</status>
|
||||
<type>Informational</type>
|
||||
<sig>Standards</sig>
|
||||
<approver>Council</approver>
|
||||
<dependencies/>
|
||||
<supersedes/>
|
||||
<supersededby/>
|
||||
<shortname>charcount</shortname>
|
||||
<author>
|
||||
<firstname>Marvin</firstname>
|
||||
<surname>Wissfeld</surname>
|
||||
<email>xsf@larma.de</email>
|
||||
<jid>jabber@larma.de</jid>
|
||||
</author>
|
||||
&larma;
|
||||
<revision>
|
||||
<version>0.3.0</version>
|
||||
<date>2022-12-27</date>
|
||||
<initials>lmw</initials>
|
||||
<remark>Added section about subsequences.</remark>
|
||||
</revision>
|
||||
<revision>
|
||||
<version>0.2.0</version>
|
||||
<date>2020-01-02</date>
|
||||
@ -165,9 +167,7 @@
|
||||
across platforms and as such should be used with care.
|
||||
</p>
|
||||
</section2>
|
||||
</section1>
|
||||
|
||||
<section1 topic='Rationale' anchor='rationale'>
|
||||
<section2 topic='Rationale' anchor='rationale'>
|
||||
<p>
|
||||
The most obvious way of counting characters is to count them how humans
|
||||
would. This sounds easy when only having western scripts in mind but becomes
|
||||
@ -206,6 +206,41 @@
|
||||
The &w3xml; standard also defines a character as a unicode code point, thus
|
||||
counting code points is equivalent to counting XML characters.
|
||||
</p>
|
||||
</section2>
|
||||
</section1>
|
||||
|
||||
<section1 topic='Subsequences' anchor='subsequence'>
|
||||
<p>
|
||||
When referencing a subsequence of the characters of a message body, the
|
||||
begin and end of the subsequence should be provided by two numbers, denoting
|
||||
the number of characters (counted as described above) before the begin of the
|
||||
subsequence or before the end of the subsequence, respectively. In other
|
||||
words, the begin is the index of the first character in the subsequence and
|
||||
the end is the index following the last character in the subsequence. That
|
||||
means, if a subsequence covers the full body, its begin should be given as
|
||||
0 and its end should be given as the number of characters in the body.
|
||||
</p>
|
||||
|
||||
<section2 topic='Developer notes' anchor='subsequence-developer-notes'>
|
||||
<p>
|
||||
Subsequence indexing in various programming languages match the convention
|
||||
described here. When using Python, the subsequence created by
|
||||
<tt>body[begin:end]</tt> matches all requirements of this document.
|
||||
</p>
|
||||
<p>
|
||||
Some programming languages define subsequences by offset and length. In
|
||||
this case, begin matchs the offset while end-begin matches the length.
|
||||
</p>
|
||||
</section2>
|
||||
<section2 topic='Rationale' anchor='subsequence-rationale'>
|
||||
<p>
|
||||
The convention for subsequences was choosen because it has three main
|
||||
advantages: It matches subsequence indexing in various programming
|
||||
languages, end minus begin of a subsequence equal the length of the
|
||||
subsequence and the end of the first of two adjacent subsequence matches the
|
||||
begin of the second one.
|
||||
</p>
|
||||
</section2>
|
||||
</section1>
|
||||
|
||||
<section1 topic='Glossary' anchor='glossary'>
|
||||
|
Loading…
Reference in New Issue
Block a user