From b82a5030da588251f9237d9aac155ccbe096437d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Sch=C3=A4fer?= Date: Fri, 21 Feb 2020 16:11:25 +0100 Subject: [PATCH] ProtoXEP: Extended Channel Search --- inbox/extended-channel-search.xml | 535 ++++++++++++++++++++++++++++++ 1 file changed, 535 insertions(+) create mode 100644 inbox/extended-channel-search.xml diff --git a/inbox/extended-channel-search.xml b/inbox/extended-channel-search.xml new file mode 100644 index 00000000..0d2ea18f --- /dev/null +++ b/inbox/extended-channel-search.xml @@ -0,0 +1,535 @@ + + +%ents; + + + + + + + + + + + + + + +]> + + + + + +
+ Extended Channel Search + This specification provides a standardised protocol to search for public group chats. In contrast to XEP-0030 (Service Discovery), it works across multiple domains and in contrast to XEP-0055 (Jabber Search) it more clearly handles extensibility. + &LEGALNOTICE; + xxxx + ProtoXEP + Standards Track + Standards + Council + + XMPP Core + XEP-0004 + XEP-0030 + XEP-0059 + XEP-0068 + + + + ECS + &jonaswielicki; + + 0.0.1 + 2020-02-19 + jsc +

First draft.

+
+
+ + + +

The XMPP instant messaging ecosystem is a federated one. This leads to many different group chat service providers existing and interesting public group chats being spread out across them. In order to provide users with a way to find public group chats (henceforth called channels) of interest to them, there needs to be a way to execute a cross-domain search based on keywords.

+

The protocol in this document provides a general and extensible search for channels across different domains and service types (e.g. MUC vs. MIX). It provides meta-information right in the result set, which allows searching entities to skip additional &xep0030; queries against the channels themselves.

+

The protocol is not only useful for cross-domain search, but also as an alternative to using a &xep0030; disco#items request followed by many disco#info requests on a group chat service.

+
+ + + +

The protocol:

+
    +
  • must work without state on the server side. This is to allow stateless proxies to be used for pseudonymisation or anonymisation.
  • +
  • must allow searching the list using a free-text keyword-based search.
  • +
  • must allow future extensions to the search query and the result.
  • +
  • must allow retrieving the entire data set (although, for clarification, an operator may choose to turn this off).
  • +
  • must use completely machine-readable and machine-writable data.
  • +
+
+ + + +
+ +
Channel
+
A public group chat hosted on a &gcs;. This can either be a &xep0045; room, a &xep0369; channel or something else entirely.
+
+ +
&gcs;
+
An entity or deployment which offers multi-user chat relay, such as by &xep0045; or &xep0369;.
+
+ +
&searchservice;
+
An entity which offers the service described in this specification.
+
+ +
&searcher;
+
An entity which requests information from the &searchservice;.
+
+
+
+ + + + +

An entity annouces that it supports serving search queries by publishing + the &searchns; feature via &xep0030;:

+ + + + + + + +]]> +
+ + +

To execute a keyword search, the &searcher; MAY first request the search form from the &searchservice;. Alternatively, the &searcher; MAY use the form specified in this document with only the fields which must be implemented by the &searchservice;.

+

After obtaining the search form, the &searcher; completes the form and sends it back to the &searchservice;. The &searchservice; replies with a &xep0059; paginated list of results.

+

The search form is a form conforming to &xep0068;.

+ +

To request the search form, an entity sends an empty search element qualified by the &searchns; namespace:

+ + + +]]> +

The &searchservice; replies with the form as in the following example:

+ + + + + ]]>¶msns; + + + + + true + + + true + + + true + + + 1 + + + xep-0045 + + + + + ]]>&ordernusers; + + + + + + +]]> +

Note: Not all of the fields shown above are mandatory to implement. See Search Form Fields for a list of fields and their implementation status.

+
+ + +

To request the result list for a given search query, a &searcher; submits a form with the ¶msns; FORM_TYPE. The &searcher; MAY include a &xep0059; <set/> element inside the <search/> element. In either case, the &searchservice; may reply with a RSM-paginated result and the &searcher; MUST be able to process that.

+

If a &searcher; composes a search request using a search form template obtained by the &searchservice;, it MAY omit all fields it does not know or where it does not change the value already supplied by the &searchservice;.

+ + + + 5 + + + + ]]>¶msns; + + + xmpp.org + + + {]]>&orderns; + + + + + + +]]> + +

The &searchservice; calculates the result, paginates it according to its own policy (possibly taking into account the pagination request from the client) and returns a single result page in the response IQ.

+ + + + commteam + 10 + + + + + XMPP Service Operators + Discussion venue for operators of federated XMPP services + 43 + + + + opaque-string-1 + opaque-string-2 + 5 + + + +]]> +

The result items are &itemel; elements wrapped in a &resultel; element qualified by the &searchns; namespace. The schema, along with extension rules, is described in Result Item Format.

+

To obtain further results, the &searcher; re-submits the identical form with an appropriate &xep0059; pagination request, using the information provided by the &searchservice; in the result <set/> element.

+
+ +

If the sort key requested by the &searcher; is not supported by the &searchservice;, the &searchservice; MUST reply with <feature-not-implemented/> and the <invalid-sort-key> application defined condition and a modify type:

+ + + + + + +]]> +
+ +

If the q field was supplied by the &searcher; and the contents of the q field did not yield any term suitable for search, the &searchservice; MUST reply with an <bad-request/> error and the <invalid-search-terms/> application defined condition. The error type MUST be modify.

+

The server SHOULD include a human-readable description of the constraints for search terms which were not met in the <text/> element of the error.

+ + + + + Search terms must have at least three characters. + + + + +]]> +
+ +

If the &searchservice; can not or does (by policy) not want to process the request due to excessive amounts of requests (either by the requesting entity, their domain or any other criteria), it MUST reply with an <resource-constraint/> error with type wait.

+

The application defined error condition <rate-limit/> MUST be included. This error condition has a RECOMMENDED attribute, retry-after, which provides the amount of seconds after which the &searcher; MAY retry the request.

+

The &searchservice; MAY include a human-readable description of the rate limit and when to retry in the <text/> element.

+ + + + + + +]]> +

Note: See also the rate-limiting related business rules for &searcher; entities.

+
+ +

If the &searchservice; can not or does (by policy) not want to allow a &searcher; to retrieve the entire database of channels, it MUST reject queries which set the all field to true with an error as follows:

+
    +
  • If the feature is generally disabled: <not-allowed/> with type cancel
  • +
  • If the feature is not offered to the &searcher; based on its identity: <forbidden/> with type auth
  • +
+

In all cases, the application defined condition <full-set-retrieval-rejected/> MUST be included.

+

The &searchservice; MAY include a human-readable description of the restrictions around full-list retrieval.

+

For example, if the full set retrieval had been disabled service-wide by configuration, the &searchservice; would reply with the following error:

+ + + + + Retrieval of the full database is not allowed. + + + + +]]> +
+ +

If the &searcher; provides form fields which are conflicting, the &searchservice; MUST reply with a <bad-request/> error of type modify. In addition, the <conflicting-fields/> application specific condition MUST be included.

+

Conflicting field values are those which fundamentally cannot be used in the same query in such a way that the definition of their function is still adhered to. For example, q restricts the results by keywords, but all specifies that all entries are returned.

+

The &searchservice; SHOULD include a human-readable description of the conflicting fields, referencing to the label values of the involved fields.

+

The <conflicting-fields/> element MAY have one or more <var/> child elements which refer to var values of the submitted fields. At least one of the referenced fields must be changed in order for a follow-up query to succeed.`

+

For example, if the &searcher; has set all to true and provided a query in q, the &searchservice; would reply with an error similar to the following:

+ + + + + Cannot both return all results and search by keywords. + + + all + q + + + +]]> +
+ +

If no field which would define a result set and which is understood by the &searchservice; is present, it MUST reply with a <bad-request/> error of type cancel.

+

In addition, the <no-search-conditions/> application defined condition MUST be included.

+ + + + + + +]]> +

An example of this situation would be a form where neither q nor all are given.

+
+
+
+
+ + + +
    +
  • When sending the form template, the &searchservice; MUST include all fields it supports with their respective default values.
  • +
  • When submitting a form to the &searchservice;, a &searcher; MAY omit all fields it either does not understand or it has left unchanged.
  • +
  • When submitting a form to the &searchservice;, a &searcher; MAY omit the <option/> elements.
  • +
  • When receiving a search form, the &searchservice; MUST ignore fields with a var value it does not understand.
  • +
  • When executing a keyword search, the service may process the keyword string in implementation-defined ways. This may include interpreting quotes and other "special" characters, removing keywords which do not fit internal criteria for suitability and others.
  • +
  • If the &searcher; receives a <rate-limit/> error, the behaviour of the &searcher; depends on the retry-after attribute: +
      +
    • If the retry-after attribute is present, the &searcher; MUST NOT send another search request before the amount of seconds indicated in the retry-after attribute have elapsed. There is no guarantee that the request will be accepted at that time.
    • +
    • If the retry-after attribute is not present, the &searcher; should wait for an implementation-defined amount of time and SHOULD back off exponentially on each subsequent <rate-limit/> error.
    • +
    +
  • +
  • If a search request does not yield any results, the &searchservice; MUST reply with a &resultel; without any &itemel; children in a type='result' IQ. Specifically, it MUST NOT reply with an <item-not-found/> error.
  • +
  • If the all field is set to true and the &searchservice; allows this operation, all results MUST be included in the result set (and then paginated using &xep0059;).
  • +
+
+ + + + +

The search form is extensible as per &xep0068;. Implementations are free to add fields on both sides of the exchange, as long as they are properly namespaced using Clark Notation.

+

The following fields are specified by this document:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
vartypeSupport levelDescription
qtext-singleRECOMMENDEDInput for the keyword-based search. Conflicts with all.
allbooleanOPTIONALReturn all results, ignoring text search terms. This does not influence the restrictions imposed by the types field. Conflicts with q.
sinaddressbooleanRECOMMENDED if q is supportedControl whether the keyword search searches in the address of the channel.
sinnamebooleanREQUIRED if q is supportedControl whether the keyword search searches in the name of the channel.
sindescriptionbooleanREQUIRED if q is supportedControl whether the keyword search searches in the textual description of the channel.
typeslist-multiRECOMMENDEDConstrain the service types of channels to return. If not supported, the search MUST only cover &xep0045; group chats.
keylist-singleREQUIREDSelect how the results are ordered.
+

The sort keys specified by this document are the following:

+ + + + + + + + + + + + + +
ValueDescription
&orderaddress;Order the results by the address of the channel. This ordering mode guarantees that the &searcher; gets a duplicate-free view without omissions when paginating.
&ordernusers;Order the results descendingly by the number of users. This mode does not guarantee that all channels in the database are returned, nor does it guarantee that no duplicates occur across multiple pages.
+ +

&searchservice; implementations may offer custom values for the key field, provided Clark Notation is used to namespace the values.

+
+
+ +

The result items are &itemel; elements qualified by the &searchns; namespace.

+

Each &itemel; element MUST have an address attribute whose value is a proper JID (as per either &rfc6122; or &rfc7622;). It identifies the channel uniquely.

+

The following child elements of &itemel; are defined by this specification. They are all qualified by the same namespace as &itemel; itself.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Element nameContent modelOccurencesDescription
nametext character data1The human-readable name of the channel.
descriptiontext character data1The human-readable description of the channel.
languagetext character data1A valid xml:lang code which indicates the primary language of the channel.
nusersnon-negative integer character data1Number of occupants
service-typeenumeration character data1The type of the service which hosts the channel. See below for values and semantics.
is-openboolean character data1If set to true, it indicates that the channel can be joined without extra credentials.
anonymity-modeenumeration character data1Anonymity level of participation. See below for values and semantics.
+

Notes:

+
    +
  1. Any child element may be omitted by a &searchservice; if the data is not available for any or all rooms.
  2. +
  3. The number of occupants may be stale by an undefined amount of time.
  4. +
  5. A service MAY return future versions of those elements alongside with past versions. Entities need to treat elements with the same name, but different namespace, as entirely different elements.
  6. +
+ + + + + + + + + + + + + +
ValueDescription
{&anonns;}noneThe bare JID of the account or the full JID of one or more devices of each occupant is visible to every other occupant.
muc_semianonymousAs specified in &xep0045;
+ + + + + + + + + + + + + +
ValueDescription
xep-0045A &xep0045; service.
xep-0369A &xep0369; service.
+

If a &searchservice; would return entries with the same address with different service types, it SHOULD prefer &xep0369; over &xep0045;. Note that a &searchservice; MUST NOT return service types the client has not asked for.

+ +

&searchservice; implementations are free to add custom child elements to &itemel; elements. &searcher; implementations MUST be prepared to handle any unknown elements in &itemel;, for example by ignoring them.

+

Additional values for the <anonymity-mode/> element may be specified by future extensions. If an implementation encounters an unknown value on this field, it is RECOMMENDED to either treat it as synonymous to {&anonns;}none or request the anonymity mode from the address using a protocol appropriate for the channel's service.

+
+
+
+ + + +

When sending a search form with a q field, the &searcher; transmits potentially sensitive information to a third party.

+
+ + + +

This specification does not require any interaction with the IANA.

+
+ + + +

This specification should probably create registries for the various fields it defines, as well as register a form type.

+
+ + + +

To be done.

+
+ + + +

Instead of rolling a custom protocol for the result items, &xep0055; could have been used.

+

While the result format of &xep0055; allows for some generality, it does so in a rather restricted way. It is limited by the data formats and types expressable in &xep0004;. Sturctured data, beyond lists of text and JIDs, can not be represented with &xep0004; at all. Machine-readable data would also have to be human-readable at the same time to provide a fallback view for human users. Interationalization of such human-readable data in field values is not possible with &xep0004;.

+

The advantage of entities being able to process unknown fields in a degraded manner is, principally, still present in the current proposal (although with a different kind of degration).

+

Given the complexity of fully and correctly processing &xep0004; report data, the slim benefits did, in the eyes of the authors, not outweigh the costs.

+
+ + + +

The basis for this protocol was developed for the search.jabber.network public group chat search service. It has been cleaned up for publication as a Standards Track XEP by the author and modified to support more use-cases.

+
+ + +