mirror of
https://github.com/moparisthebest/xeps
synced 2024-11-24 10:12:19 -05:00
XEP-0426: Character Counting 0.3.0
Added section about subsequences.
This commit is contained in:
parent
d79c8fafb6
commit
7a54054335
55
xep-0426.xml
55
xep-0426.xml
@ -13,19 +13,21 @@
|
|||||||
</abstract>
|
</abstract>
|
||||||
&LEGALNOTICE;
|
&LEGALNOTICE;
|
||||||
<number>0426</number>
|
<number>0426</number>
|
||||||
<status>Deferred</status>
|
<status>Experimental</status>
|
||||||
<type>Informational</type>
|
<type>Informational</type>
|
||||||
<sig>Standards</sig>
|
<sig>Standards</sig>
|
||||||
|
<approver>Council</approver>
|
||||||
<dependencies/>
|
<dependencies/>
|
||||||
<supersedes/>
|
<supersedes/>
|
||||||
<supersededby/>
|
<supersededby/>
|
||||||
<shortname>charcount</shortname>
|
<shortname>charcount</shortname>
|
||||||
<author>
|
&larma;
|
||||||
<firstname>Marvin</firstname>
|
<revision>
|
||||||
<surname>Wissfeld</surname>
|
<version>0.3.0</version>
|
||||||
<email>xsf@larma.de</email>
|
<date>2022-12-27</date>
|
||||||
<jid>jabber@larma.de</jid>
|
<initials>lmw</initials>
|
||||||
</author>
|
<remark>Added section about subsequences.</remark>
|
||||||
|
</revision>
|
||||||
<revision>
|
<revision>
|
||||||
<version>0.2.0</version>
|
<version>0.2.0</version>
|
||||||
<date>2020-01-02</date>
|
<date>2020-01-02</date>
|
||||||
@ -165,9 +167,7 @@
|
|||||||
across platforms and as such should be used with care.
|
across platforms and as such should be used with care.
|
||||||
</p>
|
</p>
|
||||||
</section2>
|
</section2>
|
||||||
</section1>
|
<section2 topic='Rationale' anchor='rationale'>
|
||||||
|
|
||||||
<section1 topic='Rationale' anchor='rationale'>
|
|
||||||
<p>
|
<p>
|
||||||
The most obvious way of counting characters is to count them how humans
|
The most obvious way of counting characters is to count them how humans
|
||||||
would. This sounds easy when only having western scripts in mind but becomes
|
would. This sounds easy when only having western scripts in mind but becomes
|
||||||
@ -206,6 +206,41 @@
|
|||||||
The &w3xml; standard also defines a character as a unicode code point, thus
|
The &w3xml; standard also defines a character as a unicode code point, thus
|
||||||
counting code points is equivalent to counting XML characters.
|
counting code points is equivalent to counting XML characters.
|
||||||
</p>
|
</p>
|
||||||
|
</section2>
|
||||||
|
</section1>
|
||||||
|
|
||||||
|
<section1 topic='Subsequences' anchor='subsequence'>
|
||||||
|
<p>
|
||||||
|
When referencing a subsequence of the characters of a message body, the
|
||||||
|
begin and end of the subsequence should be provided by two numbers, denoting
|
||||||
|
the number of characters (counted as described above) before the begin of the
|
||||||
|
subsequence or before the end of the subsequence, respectively. In other
|
||||||
|
words, the begin is the index of the first character in the subsequence and
|
||||||
|
the end is the index following the last character in the subsequence. That
|
||||||
|
means, if a subsequence covers the full body, its begin should be given as
|
||||||
|
0 and its end should be given as the number of characters in the body.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<section2 topic='Developer notes' anchor='subsequence-developer-notes'>
|
||||||
|
<p>
|
||||||
|
Subsequence indexing in various programming languages match the convention
|
||||||
|
described here. When using Python, the subsequence created by
|
||||||
|
<tt>body[begin:end]</tt> matches all requirements of this document.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
Some programming languages define subsequences by offset and length. In
|
||||||
|
this case, begin matchs the offset while end-begin matches the length.
|
||||||
|
</p>
|
||||||
|
</section2>
|
||||||
|
<section2 topic='Rationale' anchor='subsequence-rationale'>
|
||||||
|
<p>
|
||||||
|
The convention for subsequences was choosen because it has three main
|
||||||
|
advantages: It matches subsequence indexing in various programming
|
||||||
|
languages, end minus begin of a subsequence equal the length of the
|
||||||
|
subsequence and the end of the first of two adjacent subsequence matches the
|
||||||
|
begin of the second one.
|
||||||
|
</p>
|
||||||
|
</section2>
|
||||||
</section1>
|
</section1>
|
||||||
|
|
||||||
<section1 topic='Glossary' anchor='glossary'>
|
<section1 topic='Glossary' anchor='glossary'>
|
||||||
|
Loading…
Reference in New Issue
Block a user