admin管理员组

文章数量:1534200

基础概念

业界一般将 Lamport 论文里最初提出的分布式算法称之为 Basic Paxos,这是 Paxos 最基础的算法思想。Basic Paxos 算法的最终目标是通过严谨和可靠的流程来使得集群基于某个提案(Proposal)达到最终的共识。以下是该论文中涉及的一些概念:

  • value:提案值,是一个抽象的概念,这里不能把它简单的理解为数值。而应该理解为对某一数据或数据库某一行的某一列的一系列操作。
  • number:提案编号,全局唯一,单调递增。
  • proposal:集群需要达成共识的提案,拥有 number 和 value。

proposal 中的 value 就是在 Paxos 算法完成之后需要达成共识的值。

Paxos 算法中有三个核心角色:

  • Proposer:生成提案编号 n 和 value v,然后向 Acceptors 广播该提案,接收 Acceptors 的回复,如果有超过半数的 Acceptors 同意该提案,则选定该提案,否则放弃此次提案并生成更新的提案重新发起流程,提案被选定之后则通知所有 Learners 获取该最终选定的提案值(也可以由 Acceptor 来通知,看具体实现)。Basic Paxos 中允许有多个 Proposers。
  • Acceptor:接收 Proposer 的提案并参与提案的表决过程,把各自的决定回复给 Proposer 进行统计。Acceptor 可以接受来自多个 Proposers 的多个提案。
  • Learner:不参与决策过程,只获取最终选定的提案 value。

Paxos Made Simple

Leslie Lamport
01 Nov 2001

Abstract
The Paxos algorithm, when presented in plain English, is very simple.

1 Introduction

The Paxos algorithm for implementing a fault-tolerant distributed system has been regarded as difficult to understand, perhaps because the original presentation was Greek to many readers [5]. In fact, it is among the simplest and most obvious of distributed algorithms. At its heart is a consensus algorithm—the “synod” algorithm of [5]. The next section shows that this consensus algorithm follows almost unavoidably from the properties we want it to satisfy. The last section explains the complete Paxos algorithm, which is obtained by the straightforward application of consensus to the state machine approach for building a distributed system an approach that should be well-known, since it is the subject of what is probably the most often-cited article on the theory of distributed systems.

实现容错分布式系统的Paxos算法一直被认为难以理解,这可能是因为最初的表述对许多读者来说是希腊故事。事实上,它是最简单和最明显的分布式算法之一。其核心是一个共识算法——“synod”算法。下一节将展示这个共识算法几乎不可避免地遵循我们希望它满足的特性。最后一节解释了完整的Paxos算法,由简单的应用程序获得共识,状态机的方法构建分布式系统的方法,应该是众所周知的,因为它的主题可能是最常被提到的文章理论的分布式系统。

2 The Consensus Algorithm(共识算法)

2.1 The Problem(问题描述)

Assume a collection of processes that can propose values. A consensus algorithm ensures that a single one among the proposed values is chosen. If no value is proposed, then no value should be chosen. If a value has been chosen, then processes should be able to learn the chosen value. The safety requirements for consensus are:
假设一组进程可以提出值。共识算法确保在提议的值中选择一个。如果没有提出任何值,则不应该选择任何值。一旦一个值被选定,那么所有进程应该能够学习所选择的值。共识算法的要求是:

  • Only a value that has been proposed may be chosen,
  • Only a single value is chosen, and
  • A process never learns that a value has been chosen unless it actually has been.
  1. 只有被提议的 value 才能被选定,
  2. 只能选择一个 value,
  3. 只有一个 value 真的被确定选定,进程才能获取这个 value。

We won’t try to specify precise liveness requirements. However, the goal is to ensure that some proposed value is eventually chosen and, if a value has been chosen, then a process can eventually learn the value.
我们不会尝试去明确精准的活性要求。但是我们的目标是要确保总有一些被提出的值会被选中,如果一个值最终被选中了,那么其他进程最终要能够获取该值。

We let the three roles in the consensus algorithm be performed by three classes of agents: proposers, acceptors, and learners. In an implementation, a single process may act as more than one agent, but the mapping from agents to processes does not concern us here.
我们让三类代理(agent)来执行共识性算法中的三个角色:提议者(proposers)、接受者(acceptors)以及学习者(learners)。在实际实现中,一个独立的进程可以充当不止一个代理,但是从代理到进程之间的映射我们在这里并不关心。

Assume that agents can communicate with one another by sending messages. We use the customary asynchronous, non-Byzantine model, in which:
设想代理之间可以通过发送消息的方式相互通信。我们使用传统的异步(模型),而不是拜占庭问题模型,也就是说:

  • Agents operate at arbitrary speed, may fail by stopping, and may restart. Since all agents may fail after a value is chosen and then restart, a solution is impossible unless some information can be remembered by an agent that has failed and restarted.
  • Messages can take arbitrarily long to be delivered, can be duplicated, and can be lost, but they are not corrupted.
  1. 代理以任意速度运行,可能因停止而失效(指不能正常工作),也可能重启。由于所有代理都有可能在一个值被选定之后失效再接着重启,除非失效或者重启的代理能够记住一些关键信息,否则没有任何解决方案。
  2. 消息传递的时间可以任意长,消息可以重复或者丢失,但消息不会被篡改。
2.2 Choosing a Value(值的选定)

The easiest way to choose a value is to have a single acceptor agent. A proposer sends a proposal to the acceptor, who chooses the first proposed value that it receives. Although simple, this solution is unsatisfactory because the failure of the acceptor makes any further progress impossible.
选定一个值的最简单的方式就是只有一个接受者的代理。提议者向接受者发送一个提议,接受者选择它接收到的第一个提议值。虽然简单,但这个解决方案并不令人满意,因为这个(唯一的)接受者一旦失效,将导致后续的操作无法继续。

So, let’s try another way of choosing a value. Instead of a single acceptor, let’s use multiple acceptor agents. A proposer sends a proposed value to a set of acceptors. An acceptor may accept the proposed value. The value is chosen when a large enough set of acceptors have accepted it. How large is large enough? To ensure that only a single value is chosen, we can let a large enough set consist of any majority of the agents. Because any two majorities have at least one acceptor in common, this works if an acceptor can accept at most one value. (There is an obvious generalization of a majority that has been observed in numerous papers, apparently starting with.)
所以,让我们来尝试选定值的另一种方法吧。不再是单一的接受者,我们现在尝试使用多个接受者代理的方式。一个提议者将一个提议的值发送给一群接受者。一个接受者可能接受(accept)这个被提议的值。一旦一个足够大数量的接受者的集合都接受了一个值,那么这个值就可以算是被选定了。多大的数量才算足够大?为了确保有且只有一个值被选定,我们可以让一个所谓足够大的集合等同于这些代理中的“大多数”组成的集合。因为任意两个“大多数”的集合必然拥有至少一个共同的接受者,并且假如一个接受者最多只能接受一个值,这个方法就是行得通的。(在很多的论文中都有对于“大多数”的浅显的概括)

In the absence of failure or message loss, we want a value to be chosen even if only one value is proposed by a single proposer. This suggests the requirement:
在没有失败或消息丢失的情况下,即使单个提议者只提出一个值,我们也希望选择这个值。这就要求:

P1. An acceptor must accept the first proposal that it receives.

P1. 接受者(acceptor)必须接受它收到的第一个提案。

But this requirement raises a problem. Several values could be proposed by different proposers at about the same time, leading to a situation in which every acceptor has accepted a value, but no single value is accepted by a majority of them. Even with just two proposed values, if each is accepted by about half the acceptors, failure of a single acceptor could make it impossible to learn which of the values was chosen.
但这一要求引发了一个问题。不同的提议者可能同时提出多个值,导致每个接受者都接受一个值,但没有一个值被大多数接受。即使只有两个建议值,如果每一个都被大约一半的接受者接受,单个接受者的失败可能会使我们不可能知道选择了哪个提案。

P1 and the requirement that a value is chosen only when it is accepted by a majority of acceptors imply that an acceptor must be allowed to accept more than one proposal. We keep track of the different proposals that an acceptor may accept by assigning a (natural) number to each proposal, so a proposal consists of a proposal number and a value. To prevent confusion, we require that different proposals have different numbers. How this is achieved depends on the implementation, so for now we just assume it. A value is chosen when a single proposal with that value has been accepted by a majority of the acceptors. In that case, we say that the proposal (as well as its value) has been chosen.
P1 和 value 只有被大多数的 Acceptor(接受者) 接受才算被选中的要求,意味着必须允许 Acceptor 接受一个以上的 proposal(提案)。我们通过为每个 proposal

本文标签: 共识算法论文SimplePaxos