1
0
mirror of https://github.com/moparisthebest/xeps synced 2024-11-28 12:12:22 -05:00

ProtoXEP raft v0.0.2: Updates based on list and Council feedback

This commit is contained in:
Matthew A. Miller 2015-07-23 16:17:35 +02:00
commit 87e19dd441

View File

@ -28,6 +28,12 @@
<email>peter@membrey.hk</email> <email>peter@membrey.hk</email>
<jid>peter@membrey.hk</jid> <jid>peter@membrey.hk</jid>
</author> </author>
<revision>
<version>0.0.2</version>
<date>2015-07-22</date>
<initials>pm</initials>
<remark><p>Updates based on list and council feedback.</p></remark>
</revision>
<revision> <revision>
<version>0.0.1</version> <version>0.0.1</version>
<date>2015-07-20</date> <date>2015-07-20</date>
@ -52,6 +58,15 @@
<p>These are all things that would traditionally have to be re-engineered each time somebody wanted to use Raft across the public Internet. By supporting Raft in XMPP, developers looking to use Raft would have a transport layer that's as easy to use and understand as the Raft protocol itself. As Raft does not offer its own transport protocol and has deliberately left that to the developer, there is no conflict in standardizing an XMPP based transport layer.</p> <p>These are all things that would traditionally have to be re-engineered each time somebody wanted to use Raft across the public Internet. By supporting Raft in XMPP, developers looking to use Raft would have a transport layer that's as easy to use and understand as the Raft protocol itself. As Raft does not offer its own transport protocol and has deliberately left that to the developer, there is no conflict in standardizing an XMPP based transport layer.</p>
<section2 topic='Why &MESSAGE; and not &IQ;'>
<p>The Raft algorithm can be categorized as a request-response protocol. Normally this would make it a prime candidate for using &IQ; stanzas to handle the communication. However because Raft is designed to cope with message loss, it intrinsically supports automatic recovery. There is no need for the transport layer to report errors as even if the transport layer provided them (such as an &IQ; 'error' response), the Raft implementation cannot use it.</p>
<p>This has a number of benefits. First, it makes Raft adaptable to lossy transport layers where packets can (and do) get lost. Raft is able to automatically recover in this scenario because the next message the Leader sends will allow a Follower to detect that it has missed a message and ask for it to be sent again. The Leader has no way to deal with an error condition caused by sending a message to a Follower.</p>
<p>Second, when it comes to implementing Raft over XMPP, using &MESSAGE; instead of &IQ; greatly simplifies the implementation. As &IQ; stanzas require a reply, the implementation would need to handle detecting and reporting errors conditions back to the sender. This could mean adding arbitrary timers to try to determine if a Follower has 'timed out'. This adds complexity and uncertainty to the system, and given that Raft itself cannot make use of this information, using &IQ; does not add any value to the Raft over XMPP protocol.</p>
</section2>
<section2 topic='Example Usecase'> <section2 topic='Example Usecase'>
<p>Making databases or datastores available over the Internet has been the norm for many years. Databases containing PGP keys, certificates or other information can be found hosted by many different organizations. The problem with these systems is that as they become more critical to users, the impact of a server failing increases dramatically. For example, a server that provides a spam database that clients can verify email against, must be operational for those clients to be able to filter spam. Having a single server in this scenario is not acceptable; to provide redundancy there must be multiple servers.</p> <p>Making databases or datastores available over the Internet has been the norm for many years. Databases containing PGP keys, certificates or other information can be found hosted by many different organizations. The problem with these systems is that as they become more critical to users, the impact of a server failing increases dramatically. For example, a server that provides a spam database that clients can verify email against, must be operational for those clients to be able to filter spam. Having a single server in this scenario is not acceptable; to provide redundancy there must be multiple servers.</p>
@ -96,59 +111,56 @@
</section1> </section1>
<section1 topic='Protocol' anchor='protocol'> <section1 topic='Protocol' anchor='protocol'>
<p>This XEP defines a transport layer for Raft and not an actual implementation. That is, it does not seek to implement the Raft consensus algorithm within XMPP, but instead to simply define the means for Raft messages to be transported over XMPP.</p> <p>This XEP defines a transport layer for Raft and not an actual implementation. That is, it does not seek to implement the Raft consensus algorithm within XMPP, but instead to simply define the means for Raft messages to be transported over XMPP. To facilitate this, both the message name used in the Raft spec (shown in camel case) and the corresponding element name are mentioned together where appropriate.</p>
<p>Node to Node communication is the back-bone of a Raft cluster. In operation, only the Leader or a Candidate will send messages. In all other cases, nodes will only reply to messages received. The two messages are AppendEntries and RequestVote.</p> <p>Node to Node communication is the back-bone of a Raft cluster. In operation, only the Leader or a Candidate will send messages. In all other cases, nodes will only reply to messages received. The two messages are AppendEntries and RequestVote.</p>
<p>Note: These messages are Request/Response in nature but use the &MESSAGE; stanza rather than &IQ; because the handling of missing, delayed or repeated messages is handled by the Raft implementation and not by the transport layer.</p>
<section2 topic="RequestVote"> <section2 topic="RequestVote">
<p>When a Follower has not received a heartbeat from the Leader for a given period of time, it will determine that the Leader has failed and will seek to replace it. To do this it needs the support of the majority of nodes in the cluster. It can solicit support from other nodes by declaring itself a Candidate and sending a RequestVote message to all nodes in the cluster:</p> <p>When a Follower has not received a heartbeat from the Leader for a given period of time, it will determine that the Leader has failed and will seek to replace it. To do this it needs the support of the majority of nodes in the cluster. It can solicit support from other nodes by declaring itself a Candidate and sending a 'request-vote' (RequestVote) message to all nodes in the cluster:</p>
<example caption="Duncan is soliciting votes to become leader of the cluster"><![CDATA[ <example caption="Duncan is soliciting votes to become leader of the cluster"><![CDATA[
<message from="duncan@inverness.lit/castle" to="macbeth@cawdor.lit/castle"> <message from="duncan@inverness.lit/castle" to="macbeth@cawdor.lit/castle">
<RequestVote xmlns="urn:xmpp:raft" term="1" lastLogTerm="1" lastLogIndex="1" cluster="scotland"/> <request-vote xmlns="urn:xmpp:raft" term="1" last-log-term="1" last-log-index="1" cluster="scotland"/>
</message> </message>
]]></example> ]]></example>
<p>A node will respond with a RequestVoteResponse:</p> <p>A node will respond with a 'vote' (RequestVoteResponse) message:</p>
<example caption="Macbeth votes for Duncan to become the next leader"><![CDATA[ <example caption="Macbeth votes for Duncan to become the next leader"><![CDATA[
<message from="macbeth@cawdor.lit/castle" to="duncan@inverness.lit/castle"> <message from="macbeth@cawdor.lit/castle" to="duncan@inverness.lit/castle">
<RequestVoteResponse xmlns="urn:xmpp:raft" term="1" voteGranted="true" cluster="scotland"/> <vote xmlns="urn:xmpp:raft" term="1" vote-granted="true" cluster="scotland"/>
</message> </message>
]]></example> ]]></example>
<p> A node can either vote for a given Candidate (voteGranted="true") or against a Candidate voteGranted="false".</p> <p> A node can either vote for a given Candidate (vote-granted="true") or against a Candidate (vote-granted="false").</p>
<p>If a node does not receive a reply, no special handling is required.</p> <p>If a node does not receive a reply, no special handling is required.</p>
</section2> </section2>
<section2 topic="AppendEntries"> <section2 topic="AppendEntries">
<p>The AppendEntries message is used by the Leader to tell Followers that they should append a new entry (or entries) to their logs. It contains additional information to allow a Follower to determine which log entries have been executed and committed on the Leader and also if it has dropped any messages. These features are implemented in Raft directly.</p> <p>The 'append' (AppendEntries) message is used by the Leader to tell Followers that they should append a new entry (or entries) to their logs. It contains additional information to allow a Follower to determine which log entries have been executed and committed on the Leader and also if it has dropped any messages. These features are implemented in Raft directly.</p>
<example caption="Duncan sends an AppendEntries message to his followers"><![CDATA[ <example caption="Duncan sends an append message to his followers"><![CDATA[
<message from="duncan@inverness.lit/castle" to="macbeth@cawdor.lit/castle"> <message from="duncan@inverness.lit/castle" to="macbeth@cawdor.lit/castle">
<AppendEntries xmlns="urn:xmpp:raft" term="1" prevLogIndex="1" leaderCommit="1" cluster="scotland"> <append xmlns="urn:xmpp:raft" term="1" prev-log-index="1" leader-commit="1" cluster="scotland">
<Entry xmlns="urn:xmpp:raft" encoded="false"> <entry xmlns="urn:xmpp:raft" encoded="false">
SET X = 1 SET X = 1
</Entry> </entry>
<Entry xmlns="urn:xmpp:raft" encoded="true"> <entry xmlns="urn:xmpp:raft" encoded="true">
U0VUIFggPSAx U0VUIFggPSAx
</Entry> </entry>
</AppendEntries> </append>
</message> </message>
]]></example> ]]></example>
<p>The AppendEntries message is described as a simple array in the <link url="https://ramcloud.stanford.edu/raft.pdf">Raft paper</link> and this has been expanded on in XMPP to take advantage of structured XML. In addition, Raft is designed to be able to replicate any form of command and this could be binary data rather than textual data. To accommodate this, an attribute has been added to the AppendEntries element to allow a sender to flag when the receiver needs to decode the Entry before passing it to the Raft implementation. The data is encoded using base64.</p> <p>The AppendEntries message is described as a simple array in the <link url="https://ramcloud.stanford.edu/raft.pdf">Raft paper</link> and this has been expanded on in XMPP to take advantage of structured XML. In addition, Raft is designed to be able to replicate any form of command and this could be binary data rather than textual data. To accommodate this, an attribute has been added to the 'append-entries' element to allow a sender to flag when the receiver needs to decode the Entry before passing it to the Raft implementation. The data is encoded using base64.</p>
<p>When followers receive this message, they send a single AppendEntriesResponse in reply as follows:</p> <p>When followers receive this message, they send a single 'append-response' (AppendEntriesResponse) in reply as follows:</p>
<example caption="Macbeth sends an AppendEntriesResponse message to Duncan"><![CDATA[ <example caption="Macbeth sends an append-response message to Duncan"><![CDATA[
<message from="macbeth@cawdor.lit/castle" to="duncan@inverness.lit/castle"> <message from="macbeth@cawdor.lit/castle" to="duncan@inverness.lit/castle">
<AppendEntriesResponse xmlns="urn:xmpp:raft" term="1" success="true" cluster="scotland"/> <append-response xmlns="urn:xmpp:raft" term="1" success="true" cluster="scotland"/>
</message> </message>
]]></example> ]]></example>