1
0
mirror of https://github.com/moparisthebest/xeps synced 2024-11-24 10:12:19 -05:00

XEP-0426: Character Counting 0.3.0

Added section about subsequences.
This commit is contained in:
Marvin W 2022-12-27 22:12:17 +01:00
parent d79c8fafb6
commit 7a54054335
No known key found for this signature in database
GPG Key ID: 072E9235DB996F2A

View File

@ -13,19 +13,21 @@
</abstract> </abstract>
&LEGALNOTICE; &LEGALNOTICE;
<number>0426</number> <number>0426</number>
<status>Deferred</status> <status>Experimental</status>
<type>Informational</type> <type>Informational</type>
<sig>Standards</sig> <sig>Standards</sig>
<approver>Council</approver>
<dependencies/> <dependencies/>
<supersedes/> <supersedes/>
<supersededby/> <supersededby/>
<shortname>charcount</shortname> <shortname>charcount</shortname>
<author> &larma;
<firstname>Marvin</firstname> <revision>
<surname>Wissfeld</surname> <version>0.3.0</version>
<email>xsf@larma.de</email> <date>2022-12-27</date>
<jid>jabber@larma.de</jid> <initials>lmw</initials>
</author> <remark>Added section about subsequences.</remark>
</revision>
<revision> <revision>
<version>0.2.0</version> <version>0.2.0</version>
<date>2020-01-02</date> <date>2020-01-02</date>
@ -165,9 +167,7 @@
across platforms and as such should be used with care. across platforms and as such should be used with care.
</p> </p>
</section2> </section2>
</section1> <section2 topic='Rationale' anchor='rationale'>
<section1 topic='Rationale' anchor='rationale'>
<p> <p>
The most obvious way of counting characters is to count them how humans The most obvious way of counting characters is to count them how humans
would. This sounds easy when only having western scripts in mind but becomes would. This sounds easy when only having western scripts in mind but becomes
@ -206,6 +206,41 @@
The &w3xml; standard also defines a character as a unicode code point, thus The &w3xml; standard also defines a character as a unicode code point, thus
counting code points is equivalent to counting XML characters. counting code points is equivalent to counting XML characters.
</p> </p>
</section2>
</section1>
<section1 topic='Subsequences' anchor='subsequence'>
<p>
When referencing a subsequence of the characters of a message body, the
begin and end of the subsequence should be provided by two numbers, denoting
the number of characters (counted as described above) before the begin of the
subsequence or before the end of the subsequence, respectively. In other
words, the begin is the index of the first character in the subsequence and
the end is the index following the last character in the subsequence. That
means, if a subsequence covers the full body, its begin should be given as
0 and its end should be given as the number of characters in the body.
</p>
<section2 topic='Developer notes' anchor='subsequence-developer-notes'>
<p>
Subsequence indexing in various programming languages match the convention
described here. When using Python, the subsequence created by
<tt>body[begin:end]</tt> matches all requirements of this document.
</p>
<p>
Some programming languages define subsequences by offset and length. In
this case, begin matchs the offset while end-begin matches the length.
</p>
</section2>
<section2 topic='Rationale' anchor='subsequence-rationale'>
<p>
The convention for subsequences was choosen because it has three main
advantages: It matches subsequence indexing in various programming
languages, end minus begin of a subsequence equal the length of the
subsequence and the end of the first of two adjacent subsequence matches the
begin of the second one.
</p>
</section2>
</section1> </section1>
<section1 topic='Glossary' anchor='glossary'> <section1 topic='Glossary' anchor='glossary'>