The Lamport, Pease, and Shostak Algorithm
In 1982, Lamport, Pease, and Shostak published a straightforward solution to this problem (research.microsoft.com/users/ lamport/pubs/byz.pdf). The algorithm assumes that there are n processes, with m faulty processes, where n>3m. Thus, for a scenario such as that in Figures 1 and 2 with one faulty process, there would have to be a minimum of four processes in the system to come to agreement. (For purposes here, n refers to the count of processes, and m to the number of faulty processes.)
The definition of the algorithm in the original paper is short and succinct, but can be confusing for programmers who don't have experience with distributed algorithms.
Lamport's algorithm is a recursive definition, with a base case for m=0, and a recursive step for m>0:
- r Algorithm OM(0)
- The general sends his value to every lieutenant.
- Each lieutenant uses the value he receives from the general.
- r Algorithm OM(m), m>0
- The general sends his value to each lieutenant.
- For each i, let vi be the value lieutenant i receives from the general. Lieutenant i acts as the general in Algorithm OM(m-1) to send the value vi to each of the n-2 other lieutenants.
- For each i, and each j i, let vi be the value lieutenant i received from lieutenant j in step 2 (using Algorithm (m-1)). Lieutenant i uses the value majority(v1, v2,...vn).
Lamport's Algorithm Definition
To most programmers, this is going to look like a conventional recursive function definition. However, it doesn't quite fit into the conventional recursive function mold you learned when studying the example of factorial(n).
Lamport's algorithm actually works in two stages. In the first step, the processes iterate through m+1 rounds of messages. In the second stage of the algorithm, each process takes all the information it has been given and uses it to come up with its decision.