The input to this algorithm is a &xep0030; disco#info <query/> response. The output is an octet string which can be used as input to a hash function or an error.
-General remarks: The algorithm strongly distinguishes between character data (sequences of Unicode code points) and octet strings (sequences of 8-bit bytes). Whenever character data is encoded to octet strings in the following algorithm, the UTF-8 encoding as specified in &rfc3629; is used. Whenever octet strings are sorted in the following algorithm, the i;octet collation as specified in &rfc4790; is used.
+General remarks:
+The algorithm strongly distinguishes between character data (sequences of Unicode code points) and octet strings (sequences of 8-bit bytes). Whenever character data is encoded to octet strings in the following algorithm, the UTF-8 as specified in &rfc3629; encoding is used. Whenever octet strings are sorted in the following algorithm, the i;octet collation as specified in &rfc4790; is used.
The algorithm uses the xml:lang attribute. Implementations must take implicit values for the xml:lang attribute into account, for example those inherited from the disco#info or the IQ element.
Instead of issuing a &xep0030; disco#info <query/> with absent 'node' attribute to a target entity, an entity MAY use a &hashcache; to obtain the response. To look up the disco#info response in the &hashcache;, an entity MUST use a hash from the &hashset; which was most recently received from the entity to which the <query/> would have been sent otherwise. If none of the most recently received &hashes; are found in the &hashcache;, the entity MUST fall back to sending the request.
An entity MUST NOT use &hashes; which were not included in the most recent &hashset; received from the target entity.
An entity MAY use external data sources to fill the &hashcache;.
+An entity MUST ensure that implicit values for xml:lang attributes is preserved when disco#info data is cached. This can for example happen by making the implicit values explicit in the storage.
This was an issue with &xep0115; and has been addressed with a new algorithm for generating the hash function input which keeps the structural information of the disco#info input.
An entity MUST NOT ever use disco#info which has not been verified to belong to a &hash; obtained from a cache using that &hash;. Using cache contents from a trusted source (at the discretion of the entity) counts as verifying.
A malicious entity could send a large amount of &hashsets; in short intervals, while making sure that it provides matching disco#info responses. If a &procent; uses caching, this can overflow or thrash the caches. &procents; should be aware of this risk and apply proper rate-limiting for processing &hashsets;. To reduce the attack surface, an entity MAY choose to not cache &hashes; obtained from entities not in its roster.
+As mentioned earlier, when storing disco#info data in a cache for later retrieval, implementations MUST ensure that implicit values for xml:lang attributes are reconstructed correctly when the disco#info is restored.