Property testing


In computer science, a property testing algorithm for a decision problem is an algorithm whose query complexity to its input is much smaller than the instance size of the problem. Typically property testing algorithms are used to decide if some mathematical object has a "global" property, or is "far" from having this property, using only a small number of "local" queries to the object.
For example, the following promise problem admits an algorithm whose query complexity is independent of the instance size :
Property testing algorithms are central to the definition of probabilistically checkable proofs, as a probabilistically checkable proof is essentially a proof that can be verified by a property testing algorithm.

Definition and variants

Formally, a property testing algorithm with query complexity q and proximity parameter ε for a decision problem L is a randomized algorithm that, on input x makes at most q queries to x and behaves as follows:
Here, "x is ε-far from L" means that the Hamming distance between x and any string in L is at least ε|x|.
A property testing algorithm is said to have one-sided error if it satisfies the stronger condition that the accepting probability for instances x ∈ L is 1 instead of ⅔.
A property testing algorithm is said be non-adaptive if it performs all its queries before it "observes" any answers to previous queries. Such an algorithm can be viewed as operating in the following manner. First the algorithm receives its input. Before looking at the input, using its internal randomness, the algorithm decides which symbols of the input are to be queried. Next, the algorithm observes these symbols. Finally, without making any additional queries, the algorithm decides whether to accept or reject the input.

Features and limitations

The main efficiency parameter of a property testing algorithm is its query complexity, which is the maximum number of input symbols inspected over all inputs of a given length. One is interested in designing algorithms whose query complexity is as small as possible. In many cases the running time of property testing algorithms is sublinear in the instance length. Typically, the goal is first to make the query complexity as small as possible as a function of the instance size n, and then study the dependency on the proximity parameter ε.
Unlike other complexity-theoretic settings, the asymptotic query complexity of property testing algorithms is affected dramatically by the representation of instances. For example, when ε = 0.01, the problem of testing bipartiteness of dense graphs admits an algorithm of constant query complexity. In contrast, sparse graphs on n vertices require property testing algorithms of query complexity.
The query complexity of property testing algorithms grows as the proximity parameter ε becomes smaller for all non-trivial properties. This dependence on ε is necessary as a change of fewer than ε symbols in the input cannot be detected with constant probability using fewer than O queries. Many interesting properties of dense graphs can be tested using query complexity that depends only on ε and not on the graph size n. However, the query complexity can grow enormously fast as a function of ε. For example, for a long time the best known algorithm for testing if a graph does not contain any triangle had a query complexity which is a tower function of poly, and only in 2010 this has been improved to a tower function of log. One of the reasons for this enormous growth in bounds is that many of the positive results for property testing of graphs are established using the Szemerédi regularity lemma, which also has tower-type bounds in its conclusions. The connection of property testing to the Szemerédi regularity lemma and related graph removal lemmas is elaborated on below.

Property testing of graphs

For graphs with n vertices, the notion of distance we will use is the edit distance. That is, we say that the distance between two graphs is the smallest ε such that one can add and/or delete edges and get from the first graph to the second. Under a reasonable representation of graphs, this is equivalent to the earlier Hamming distance definition. For graphs, there is a characterization of which properties are easy to test. Namely, the properties that are easy to test are precisely those properties that are hereditary. These statements will be made clearer below. First of all, by a property being easy to test, we mean that it has an oblivious tester.

Oblivious testers

Informally, an oblivious tester for a graph property P is an algorithm that takes as input a parameter ε and graph G, and then runs as a property testing algorithm on G for the property P with proximity parameter ε that makes exactly q queries to G. Crucially, the number of queries an oblivious tester makes is a constant only dependent on ε. The formal definition is that an oblivious tester is an algorithm that takes as input a parameter ε. It computes an integer q and then asks an oracle for an induced subgraph H on exactly q vertices from G chosen uniformly at random. It then accepts or rejects according to ε and H. As before, we say it tests for the property P if it accepts with probability at least ⅔ for G that has property P, and rejects with probability at least ⅔ or G that is ε-far from having property P. In complete analogy with property testing algorithms, we can talk about oblivious testers with one-sided error.

Testing hereditary properties

A hereditary property is a property that is preserved under deletion of vertices. A few important hereditary properties are H-freeness, k-colorability, and planarity. All hereditary properties are testable, and there is a proof of this fact using a version of the graph removal lemma for infinite families of induced subgraphs. In fact, a rough converse of this is also true -- the properties that have oblivious testers with one-sided error are almost hereditary, in a sense which will not be made precise here.

Example: testing triangle-freeness

In this section, we will sketch a proof of the existence of an oblivious tester for triangle-freeness; this proof is an application of the triangle removal lemma.
Sketch of proof: If a graph G is ε-far from being triangle-free, then by the triangle removal lemma, there is a constant so that G has at least triangles. The oblivious tester samples triples of vertices independently at random from the graph. It accepts if no triple of vertices induces a triangle, and rejects otherwise.
Therefore, the algorithm above is an oblivious tester with one-sided error for triangle-freeness.