Marzullo's algorithm


Marzullo's algorithm, invented by Keith Marzullo for his Ph.D. dissertation in 1984, is an agreement algorithm used to select sources for estimating accurate time from a number of noisy time sources. A refined version of it, renamed the "intersection algorithm", forms part of the modern Network Time Protocol.
Marzullo's algorithm is also used to compute the relaxed intersection of n boxes, as required by several robust set estimation methods.

Purpose

Marzullo's algorithm is efficient in terms of time for producing an optimal value from a set of estimates with confidence intervals where the actual value may be outside the confidence interval for some sources. In this case the best estimate is taken to be the smallest interval consistent with the largest number of sources.
If we have the estimates 10 ± 2, 12 ± 1 and 11 ± 1 then these intervals are , and which intersect to form or 11.5 ± 0.5 as consistent with all three values.



If instead the ranges are , and then there is no interval consistent with all these values but is consistent with the largest number of sources — namely, two of them.



Finally, if the ranges are , and then both the intervals and are consistent with the largest number of sources.



This procedure determines an interval. If the desired result is a best value from that interval then a naive approach would be to take the center of the interval as the value, which is what was specified in the original Marzullo algorithm. A more sophisticated approach would recognize that this could be throwing away useful information from the confidence intervals of the sources and that a probabilistic model of the sources could return a value other than the center.
Note that the computed value is probably better described as "optimistic" rather than "optimal". For example, consider three intervals , and . The algorithm described below computes or 11.995 ± 0.005 which is a very precise value. If we suspect that one of the estimates might be incorrect, then at least two of the estimates must be correct. Under this condition, the best estimate is since this is the largest interval that always intersects at least two estimates. The algorithm described below is easily parameterized with the maximum number of incorrect estimates.

Method

Marzullo's algorithm begins by preparing a table of the sources, sorting it and then searching for the intersections of intervals. For each source there is a range defined by c ± r. For each range the table will have two tuples of the form . One tuple will represent the beginning of the range, marked with type −1 as and the other will represent the end with type +1 as .
The description of the algorithm uses the following variables: best, cnt, beststart and bestend, i, and the table of tuples.
  1. Build the table of tuples.
  2. Sort the table by the offset.
  3. best=0 cnt=0
  4. go through each tuple in the table in ascending order
  5. return as optimal interval. The number of false sources is the number of sources minus the value of best.
  6. Efficiency

Marzullo's algorithm is efficient in both space and time. The asymptotic space usage is O, where n is the number of sources. In considering the asymptotic time requirement the algorithm can be considered to consist of building the table, sorting it and searching it. Sorting can be done in O time, and this dominates the building and searching phases which can be performed in linear time. Therefore, the time efficiency of Marzullo's algorithm is O.
Once the table has been built and sorted it is possible to update the interval for one source in linear time. Therefore, updating data for one source and finding the best interval can be done in O time.