Test generation

Test generation is the process of creating a set of test data or test cases for testing the adequacy of new or revised software applications. Test Generation is seen to be a complex problem and though a lot of solutions have come forth most of them are limited to toy programs. Test Generation is one aspect of software testing.
Since testing is labor-intensive, accounting for nearly one third of the cost of the system development, the problem of generating quality test data quickly, efficiently and accurately is seen to be important.

Basic concepts

Mathematical modelling

A program could be considered as a function, P:S → R, where is the set of all possible inputs and the set of all possible outputs.
An input variable of function P is mapped to an input parameter of P. P denotes execution of program for certain input x.

Control flow graph

A Control Flow Graph of a program is a directed graph G = consisting of a set of nodes N and a set of edges E = connecting the nodes.
Each node denotes a basic block which itself is a sequence of instructions. It is important to note that in every basic block the control enters through the entry node and leaves at the end without stopping or branching except at the end. Basically, a block is always executed as a whole. The entry and exit nodes are two special nodes denoted by s and e respectively.
An edge in a control flow graph represents possible transfer of control. All edges have associated with them a condition or a branch predicate. The branch predicate might be the empty predicate which is always true. In order to traverse the edge the condition of the edge must hold. If a node has more than one outgoing edge the node is a condition and the edges are called branches.

A model

A Test Data Generator follows the following steps

Program Control Flow Graph Construction
Path Selection
Generating Test Data

The basis of the Generator is simple. The path selector identifies the paths. Once a set of test paths is determined the test generator derives input data for every path that results in the execution of the selected path. Essentially, our aim is to find an input data set that will traverse the path chosen by the path selector. This is done in two steps:

Find the path predicate for the path
Solve the path predicate

The solution will ideally be a system of equations which will describe the nature of input data so as to traverse the path. In some cases the generator provides the selector with feedback concerning paths which are infeasible etc.

Test data generators

Based on the Mathematical Modelling above we can simply state the Test Data Generator Problem as:
Given a program and a path, generate input x ∈ S, so that traverses path.

Random test data generators

data generation is probably the simplest method for generation of test data. The advantage of this is that it can be used to generate input for any type of program. Thus to generate test data we can randomly generate a bit stream and let it represent the data type needed. However, random test data generation does not generate quality test data as it does not perform well in terms of coverage. Since the data generated is based solely on probability it cannot accomplish high coverage as the chances of it finding semantically small faults is quite low.
If a fault is only revealed by a small percentage of the program input it is said to be a semantically small fault. For an example of a semantically small fault consider the following code:

void test

It is easy to see that the probability of execution of the first statement is significantly lesser than that of the second statement. As the structures in it grow complex so does the probability of its execution. Thus, such semantically small faults are hard to find using random test data generation.
However, Random Test Data Generation is usually used as a benchmark as it has the lowest acceptable rate of generating test data.

Goal-oriented test data generators

The Goal-Oriented approach provides a guidance towards a certain set of paths. The Test Data Generators in this approach generate an input for any path u instead of the usual approach of generating input from the entry to the exit of a block of code. Thus, the generator can find any input for any path which is a subset of the path. This drastically reduces the risk of generating relatively infeasible paths and provides a way to direct the search.
Two methods follow this technique:

The Chaining approach
Assertion-oriented approach.
Chaining approach

The chaining approach is an extension of the goal-oriented approach. It is seen that the main limitation of the test data generation methods is that only the control flow graph is used to generate the test data. This limited knowledge may make our selection harder. Thus, it is seen that the path-oriented approach usually has to generate a large number of paths before it finds the "right" path. This is because the path selection is blind.
The chaining approach tries to identify a chain of nodes that are vital to the execution of the goal node. The chaining approach starts by executing for any arbitrary input x. The search program, during the execution of each branch decides whether to continue execution through this branch or if an alternative branch be taken because the current branch does not lead to the goal node. If it is observed that execution flow is undesirable then search algorithms are used to automatically find new input to change the flow execution. However, if for this point also the search process cannot find input X to change the flow of execution then the chaining approach attempts to alter the flow at node p due to which an alternative branch at p can be executed.

Assertion-oriented approach

The assertion-oriented approach is an extension of the chaining approach. In this approach assertions - that is constraint conditions are inserted. This can be done either manually or automatically. If the program doesn't hold on execution there is an error in the program or the assertion.
When an assertion is executed it must hold, otherwise there is an error either in the
program or in the assertion. Suppose we have a code as follows:

void test

In the above code, the program should hold at the assertion statement. If the assertion does not hold it means that the path followed leads to an error. Thus, the goal of this approach is to find any path to an assertion that does not hold. The other major advantage of this approach is that all the other methods expects the value of an execution of the generated test data to be calculated from some other source than the code. However, in this approach it is not necessary since expected value is provided with the assertion.

Pathwise test data generators

Pathwise Test Data Generation is considered to be one of the best approaches to Test Data Generation. This approach does not give the generator the choice of selecting between multiple paths but just gives it one specific path for it to work on. Hence, the name Pathwise Test Data Generator. Thus, except for the fact that this method uses specific paths it is quite similar to Goal-Oriented test data generation. The use of specific paths leads to a better knowledge and prediction of coverage. However, this also makes it harder to generate the needed test data.
Pathwise test data generators require two inputs from the user:

The program to be tested
Testing criterion

If systems are solely based on the control flow graph to select specific paths it more often than not leads to the selection of
infeasible paths. In view of this mechanisms have been proposed for a constraint based test data generation. These mechanisms focuses on fault-based testing introducing deliberate changes in the code. These deliberate changes are called as "mutants" and this type of testing called as Mutation Testing.

Intelligent test data generators

Intelligent Test Data Generators depend on sophisticated analysis of the code to guide the search of the test data. Intelligent Test Data Generators are essentially utilize one of the test data generation method coupled with the detailed analysis of the code. This approach may generate test data quicker than the other approaches but the analysis required for the utilization of this approach over a wide variety of programs is quite complex and requires a great deal of insight to anticipate the different situations that may arise. There are open-source packages for this such as DataGenerator.

Chaotic data generators

Chaotic Data Generators generate data from a chaotic attractor. The chaotic attractor generates non-repeating data, and a small change in initial conditions in the attractor may cause a large change later in data generated.

Hypermedia generators

Hypermedia Generators generate hypertext, hypervolumes, and hypermovies.

Quantum data generators

A Quantum Data Generator generates qubits according to some other data generator. In this case, qubits are the data.

Test assertion generators

Once we have some test data, one needs a test oracle to assess the behavior of the program under test for the given test input. There are different kinds of oracles that can be used.
One can also improve the assertions for existing test data. For instance, one can generate and add new assertions in existing test cases. This is what the DSpot system does in the context of the Java programming language: it does dynamic analysis of JUnit test cases and generate missing assertions.

Test case generators

While test data generators generate only inputs, test case generators synthesize full test cases. For example, in an object-oriented language, a test case contains object creations and method calls. When one maximizes coverage, the generated test cases have no assertion.

// Example of generated test case that covers the constructor and two methods.
@Test
void generatedJunitTest

To add assertions in the generated test cases, one must have a test oracle. For instance, one can use a reference implementation has oracle. Then, the expected value of the generated assertions is the actual value returned by the reference implementation.

@Test
void generatedJunitTest

EvoSuite is an example of such a test case generator for Java.

Problems of test generation

Test generation is highly complex.
The use of dynamic memory allocation in most of the code written in industry is the most severe problem that the Test Data Generators face as the usage of the software then becomes highly unpredictable, due to this it becomes harder to anticipate the paths that the program could take making it nearly impossible for the Test Data Generators to generate exhaustive Test Data. However, in the past decade significant progress has been made in tackling this problem better by the use of genetic algorithms and other analysis algorithms. The following are problem areas that are encountered while implementing the test data generation techniques for actual industry used code.

Arrays and pointers

and Pointers can be considered to have similar constructs and also suffer from the same kind of problems. Arrays and pointers create problems during symbolic execution as it complicates the substitution since their values are not known. Also, in order to generate input for arrays and pointers there are multiple problems like the index of the array, or the structure of the input that needs to be given to the pointer. This is further compounded by the possibility of dynamic allocation of arrays and pointers.

Objects

due to its dynamic nature pose a problem for generation. This is further compounded by the use of other object oriented features. All of this makes it hard to determine which code will be called at runtime. An attempt has been made to address the problem of Object Oriented Code by use of mutation.

Loops

Loops that vary their behaviour depending on the input variables are potentially problematic as it is difficult to anticipate the path that could be taken. However, if the given path is specific, that is it doesn't change behaviour the loops cause no problem. There are a few techniques that have been suggested to solve this potential problem with loops.

Modules

A program usually consists of modules which then itself consists of functions. Two solutions have been proposed for generating test data for such functions:

Brute Force Solution This is done by inlining the called functions into the target
Analyzing the Called Functions Analyze the called functions first and generate path predicates for those functions.

However, often source code of the modules is not accessible and hence a complete static analysis is not always possible.

Infeasible paths

To generate test data so as to traverse a path involves solving a system of equations. If there are no solutions then the path given is infeasible. However, in this we are limited by the problem of undecidable nature of the system of equations. The most common method adopted is to set a highest number of iterations to be done before declaring the path as in-feasible.

Constraint satisfaction

as the name suggests is the process of finding a solution that conforms to a set of constraints that the variables must satisfy. A solution is therefore a vector of variables that satisfies all constraints. Constraint satisfaction is a difficult problem to solve and hence is not usually properly implemented. All the programs need to satisfy some constraint in some way or the other. There have been many methods like iterative relaxation, genetic algorithms etc. which allow to solve for constraints.

Readability of generated tests

One of challenges of test generation is readability: when the generated tests are meant to be committed to the test suite, developers have been able to easily understand the generated test cases. However, generated test cases often contain obscure generated variable names. One way to overcome this problem is that, instead of generating new tests, to improve existing tests already written by humans, this is known as test amplification.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...