Surrogate model


A surrogate model is an engineering method used when an outcome of interest cannot be easily directly measured, so a model of the outcome is used instead. Most engineering design problems require experiments and/or simulations to evaluate design objective and constraint functions as a function of design variables. For example, in order to find the optimal airfoil shape for an aircraft wing, an engineer simulates the airflow around the wing for different shape variables. For many real-world problems, however, a single simulation can take many minutes, hours, or even days to complete. As a result, routine tasks such as design optimization, design space exploration, sensitivity analysis and what-if analysis become impossible since they require thousands or even millions of simulation evaluations.
One way of alleviating this burden is by constructing approximation models, known as surrogate models, response surface models, metamodels or emulators, that mimic the behavior of the simulation model as closely as possible while being computationally cheap to evaluate. Surrogate models are constructed using a data-driven, bottom-up approach. The exact, inner working of the simulation code is not assumed to be known, solely the input-output behavior is important. A model is constructed based on modeling the response of the simulator to a limited number of intelligently chosen data points. This approach is also known as behavioral modeling or black-box modeling, though the terminology is not always consistent. When only a single design variable is involved, the process is known as curve fitting.
Though using surrogate models in lieu of experiments and simulations in engineering design is more common, surrogate modeling may be used in many other areas of science where there are expensive experiments and/or function evaluations.

Goals

The scientific challenge of surrogate modeling is the generation of a surrogate that is as accurate as possible, using as few simulation evaluations as possible. The process comprises three major steps which may be interleaved iteratively:
The accuracy of the surrogate depends on the number and location of samples in the design space. Various design of experiments techniques cater to different sources of errors, in particular, errors due to noise in the data or errors due to an improper surrogate model.

Types of surrogate models

Popular surrogate modeling approaches are: polynomial response surfaces; kriging; gradient-enhanced kriging ; radial basis function; support vector machines; space mapping ; artificial neural networks and Bayesian networks.
Further methods recently explored are Fourier surrogate modeling and random forests
For some problems, the nature of true function is not known a priori so it is not clear which surrogate model will be most accurate. In addition, there is no consensus on how to obtain the most reliable estimates of the accuracy of a given surrogate.
Many other problems have known physics properties. In these cases, physics-based surrogates such as space-mapping based models are the most efficient.
A recent survey of surrogate-assisted evolutionary optimization techniques can be found in.
Spanning two decades of development and engineering applications, Rayas-Sanchez reviews aggressive space mapping exploiting surrogate models. Recently, Razavi et al. have published a state-of-the-art review of surrogate models used in water resources management field.

Invariance properties

Recently proposed comparison-based surrogate models for evolutionary algorithms, such as CMA-ES, allow to preserve some invariance properties of surrogate-assisted optimizers:
An important distinction can be made between two different applications of surrogate models: design optimization and design space approximation.
In surrogate model based optimization, an initial surrogate is constructed using some of the available budgets of expensive experiments and/or simulations. The remaining experiments/simulations are run for designs which the surrogate model predicts may have promising performance. The process usually takes the form of the following search/update procedure.
Depending on the type of surrogate used and the complexity of the problem, the process may converge on a local or global optimum, or perhaps none at all.
In design space approximation, one is not interested in finding the optimal parameter vector but rather in the global behavior of the system. Here the surrogate is tuned to mimic the underlying model as closely as needed over the complete design space. Such surrogates are a useful, cheap way to gain insight into the global behavior of the system. Optimization can still occur as a post-processing step, although with no update procedure the optimum found cannot be validated.

Surrogate modeling software