Leader election
In distributed computing, leader election is the process of designating a single process as the organizer of some task distributed among several computers. Before the task is begun, all network nodes are either unaware which node will serve as the "leader" of the task, or unable to communicate with the current coordinator. After a leader election algorithm has been run, however, each node throughout the network recognizes a particular, unique node as the task leader.
The network nodes communicate among themselves in order to decide which of them will get into the "leader" state. For that, they need some method in order to break the symmetry among them. For example, if each node has unique and comparable identities, then the nodes can compare their identities, and decide that the node with the highest identity is the leader.
The definition of this problem is often attributed to LeLann, who formalized it as a method to create a new token in a token ring network in which the token has been lost.
Leader election algorithms are designed to be economical in terms of total bytes transmitted, and time. The algorithm suggested by Gallager, Humblet, and Spira for general undirected graphs has had a strong impact on the design of distributed algorithms in general, and won the Dijkstra Prize for an influential paper in distributed computing.
Many other algorithms have been suggested for different kinds of network graphs, such as undirected rings, unidirectional rings, complete graphs, grids, directed Euler graphs, and others. A general method that decouples the issue of the graph family from the design of the leader election algorithm was suggested by Korach, Kutten, and Moran.
Definition
The problem of leader election is for each processor eventually to decide whether it is a leader or not, subject to the constraint that exactly one processor decides that it is the leader. An algorithm solves the leader election problem if:- States of processors are divided into elected and not-elected states. Once elected, it remains as elected.
- In every execution, exactly one processor becomes elected and the rest determine that they are not elected.
- Termination: the algorithm should finish within a finite time once the leader is selected. In randomized approaches this condition is sometimes weakened.
- Uniqueness: there is exactly one processor that considers itself as leader.
- Agreement: all other processors know who the leader is.
- Communication mechanism: the processors are either synchronous in which processes are synchronized by a clock signal or asynchronous where processes run at arbitrary speeds.
- Process names: whether processes have a unique identity or are indistinguishable.
- Network topology: for instance, ring, acyclic graph or complete graph.
- Size of the network: the algorithm may or may not use knowledge of the number of processes in the system.
Algorithms
Leader election in rings
A ring network is a connected-graph topology in which each node is exactly connected to two other nodes, i.e., for a graph with n nodes, there are exactly n edges connecting the nodes. A ring can be unidirectional, which means processors only communicate in one direction, or bidirectional, meaning processors may transmit and receive messages in both directions.Anonymous rings
A ring is said to be anonymous if every processor is identical. More formally, the system has the same state machine for every processor. There is no deterministic algorithm to elect a leader in anonymous rings, even when the size of the network is known to the processes. This is due to the fact that there is no possibility of breaking symmetry in an anonymous ring if all processes run at the same speed. The state of processors after some steps only depends on the initial state of neighbouring nodes. So, because their states are identical and execute the same procedures, in every round the same messages are sent by each processor. Therefore, each processor state also changes identically and as a result if one processor is elected as a leader, so are all the others.For simplicity, prove it in anonymous synchronous rings. Prove by contradiction. Consider an anonymous ring R with size n>1. Assume there exists an algorithm "A" to solve leader election in this anonymous ring R.
Proof. prove by induction on.
Base case: : all the processes are in the initial state, so all the processes are identical.
Induction hypothesis: assume the lemma is true for rounds.
Inductive step: in round, every process send the same message to the right and send the same message to the left. Since all the processes are in the same state after round, in round k, every process will receive the message from the left edge, and will receive the message from the right edge. Since all processes are receiving the same messages in round, they are in the same state after round.
The above lemma contradicts the fact that after some finite number of rounds in an execution of A, one process entered the elected state and other processes entered the non-elected state.
Randomized (probabilistic) leader election
A common approach to solve the problem of leader election in anonymous rings is the use of probabilistic algorithms. In such approaches, generally processors assume some identities based on a probabilistic function and communicate it to the rest of the network. At the end, through the application of an algorithm, a leader is selected.Asynchronous ring
Since there is no algorithm for anonymous rings, the asynchronous rings would be considered as asynchronous non-anonymous rings. In non-anonymous rings, each process has a unique, and they don't know the size of the ring. Leader election in asynchronous rings can be solved by some algorithm with using messages or messages.In the algorithm, every process sends a message with its to the left edge. Then waits until a message from the right edge. If the in the message is greater than its own, then forwards the message to the left edge; else ignore the message, and does nothing. If the in the message is equal to its own, then sends a message to the left announcing myself is elected. Other processes forward the announcement to the left and turn themselves to non-elected. It is clear that the upper bound is for this algorithm.
In the algorithm, it is running in phases. On the th phase, a process will determine whether it is the winner among the left side and right side neighbors. If it is a winner, then the process can go to next phase. In phase, each process needs to determine itself is a winner or not by sending a message with its to the left and right neighbors. The neighbor replies an only if the in the message is larger than the neighbor's, else replies an. If receives two s, one from the left, one from the right, then is the winner in phase. In phase, the winners in phase need to send a message with its to the left and right neighbors. If the neighbors in the path receive the in the message larger than their, then forward the message to the next neighbor, otherwise reply an. If the th neighbor receives the larger than its, then sends back an, otherwise replies an. If the process receives two s, then it is the winner in phase. In the last phase, the final winner will receive its own in the message, then terminates and send termination message to the other processes. In the worst case, each phase there are at most winners, where is the phase number. There are phases in total. Each winner sends in the order of messages in each phase. So, the messages complexity is.
Synchronous ring
In Attiya and Welch's Distributed Computing book, they described a non-uniform algorithm using messages in synchronous ring with known ring size. The algorithm is operating in phases, each phase has rounds, each round is one time unit. In phase, if there is a process with, then process sends termination message to the other processes. Else, go to the next phase. The algorithm will check if there is a phase number equals to a process, then does the same steps as phase. At the end of the execution, the minimal will be elected as the leader. It used exactly messages and rounds.Itai and Rodeh introduced an algorithm for a unidirectional ring with synchronized processes. They assume the size of the ring is known to the processes. For a ring of size n, a≤n processors are active. Each processor decides with probability of a^ whether to become a candidate. At the end of each phase, each processor calculates the number of candidates c and if it is equal to 1, it becomes the leader.
To determine the value of c, each candidate sends a token at the start of the phase which is passed around the ring, returning after exactly n time units to its sender. Every processor determines c by counting the number of pebbles which passed through. This algorithm achieves leader election with expected message complexity of O. A similar approach is also used in which a time-out mechanism is employed to detect deadlocks in the system. There are also algorithms for rings of special sizes such as prime size and odd size.
Uniform algorithm
In typical approaches to leader election, the size of the ring is assumed to be known to the processes. In the case of anonymous rings, without using an external entity, it is not possible to elect a leader. Even assuming an algorithm exists, the leader could not estimate the size of the ring. i.e. in any anonymous ring, there is a positive probability that an algorithm computes a wrong ring size. To overcome this problem, Fisher and Jiang used a so-called leader oracle Ω? that each processor can ask whether there is a unique leader. They show that from some point upward, it is guaranteed to return the same answer to all processes.Rings with unique IDs
In one of the early works, Chang and Roberts proposed a uniform algorithm in which a processor with the highest ID is selected as the leader. Each processor sends its ID in a clockwise direction. A process receiving a message and compares it with its own. If it is bigger, it passes it through, otherwise it will discard the message. They show that this algorithm uses at most messages and in the average case.Hirschberg and Sinclair improved this algorithm with message complexity by introducing a 2 directional message passing scheme allowing the processors to send messages in both directions.
Leader election in a mesh
The mesh is another popular form of network topology, especially in parallel systems, redundant memory systems and interconnection networks.In a mesh structure, nodes are either corner, border or interior. The number of edges in a mesh of size a x b is m=2ab-a-b.
Unoriented mesh
A typical algorithm to solve the leader election in an unoriented mesh is to only elect one of the four corner nodes as the leader. Since the corner nodes might not be aware of the state of other processes, the algorithm should first wake up the corner nodes. A leader can be elected as follows.- Wake-up process: in which k nodes initiate the election process. Each initiator sends a wake-up message to all its neighbouring nodes. If a node is not initiator, it simply forwards the messages to the other nodes. In this stage at most 3n+k messages are sent.
- Election process: the election in outer ring takes two stages at most with 6-16 messages.
- Termination: leader sends a terminating message to all nodes. This requires at most 2n messages.
Oriented mesh
An oriented mesh is a special case where port numbers are compass labels, i.e. north, south, east and west. Leader election in an oriented mesh is trivial. We only need to nominate a corner, e.g. “north” and “east” and make sure that node knows it is a leader.Torus
A special case of mesh architecture is a torus which is a mesh with “wrap-around”. In this structure, every node has exactly 4 connecting edges.One approach to elect a leader in such a structure is known as electoral stages. Similar to procedures in ring structures, this method in each stage eliminates potential candidates until eventually one candidate node is left. This node becomes the leader and then notifies all other processes of termination. This approach can be used to achieve a complexity of O. There also more practical approaches introduced for dealing with presence of faulty links in the network.
Election in hypercubes
A Hypercube is a network consisting of nodes, each with degree of and edges.A similar electoral stages as before can be used to solve the problem of leader election. In each stage two nodes compete and the winner is promoted to the next stage. This means in each stage only half of the duelists enter the next stage. This procedure continues until only one duelist is left, and it becomes the leader. Once selected, it notifies all other processes. This algorithm requires messages. In the case of unoriented hypercubes, a similar approach can be used but with a higher message complexity of.
Election in complete networks
are structures in which all processes are connected to one another, i.e., the degree of each node is n-1, n being the size of the network. An optimal solution with O message and space complexity is known. In this algorithm, processes have the following states:- Dummy: nodes that do not participate in the leader election algorithm.
- Passive: the initial state of processes before start.
- Candidate: the status of nodes after waking up. The candidate nodes will be considered to become the leader.
Universal leader election techniques
As the name implies, these algorithms are designed to be used in every form of process networks without any prior knowledge of the topology of a network or its properties, such as its size.Shout
builds a spanning tree on a generic graph and elects its root as leader. The algorithm has a total cost linear in the edges cardinality.[Mega-Merger]
This technique in essence is similar to finding a Minimum Spanning Tree in which the root of the tree becomes the leader. The basic idea in this method is individual nodes merge with each other to form bigger structures. The result of this algorithm is a tree whose root is the leader of entire system. The cost of mega-merger method is where m is the number of edges and n is the number of nodes.Yo-yo
is a minimum finding algorithm consisting of two parts: a preprocessing phase and a series of iterations. In the first phase or setup, each node exchanges its id with all its neighbours and based on the value it orients its incident edges. For instance, if node x has a smaller id than y, x orients towards y. If a node has a smaller id than all its neighbours it becomes a source. In contrast, a node with all inward edges is a sink. All other nodes are internal nodes.Once all the edges are oriented, the iteration phase starts. Each iteration is an electoral stage in which some candidates will be removed. Each iteration has two phases: YO- and –YO. In this phase sources start the process to propagate to each sink the smallest values of the sources connected to that sink.
Yo-
- A source transmits its value to all its out-neighbours
- An internal node waits to receive a value from all its in-neighbours. It calculates the minimum and sends it to out-neighbour.
- A sink receives all the values and compute their minimum.
- A sink sends YES to neighbours from which saw the smallest value and NO to others
- An internal node sends YES to all in-neighbours from which it received the smallest value and NO to others. If it receives only one NO, it sends NO to all.
- A source waits until it receives all votes. If all YES, it survives and if not, it is no longer a candidate.
- When a node x sends NO to an in-neighbour y, the logical direction of that edge is reversed.
- When a node y receives NO from an out-neighbour, it flips the direction of that link.
An additional stage, pruning, also is introduced to remove the nodes that are useless, i.e. their existence has no impact on the next iterations.
- If a sink is leaf, then it is useless and therefore is removed.
- If, in the YO- phase the same value is received by a node from more than one in-neighbour, it will ask all but one to remove the link connecting them.
Applications
Radio networks
In radio network protocols, leader election is often used as a first step to approach more advanced communication primitives, such as message gathering or broadcasts. The very nature of wireless networks induces collisions when adjacent nodes transmit at the same time; electing a leader allows to better coordinate this process. While the diameter D of a network is a natural lower bound for the time needed to elect a leader, upper and lower bounds for the leader election problem depend on the specific radio model studied.Models and runtime
In radio networks, the n nodes may in every round choose to either transmit or receive a message. If no collision detection is available, then a node cannot distinguish between silence or receiving more than one message at a time. Should collision detection be available, then a node may detect more than one incoming message at the same time, even though the messages itself cannot be decoded in that case. In the beeping model, nodes can only distinguish between silence or at least one message via carrier sensing.Known runtimes for single-hop networks range from a constant to O rounds. In multi-hop networks, known runtimes differ from roughly O) rounds, O, O to O rounds.