Find Better Fits with Gaussian Process Modeling

Martin Bezener on Feb. 9, 2024

Once the data in a DOE is collected, it is analyzed, and a statistical model is constructed. This model gives us information about how the factors affect the responses and allows for predictions at factor combinations that were not run in the DOE (interpolation). In most cases, linear regression and ANOVA is used to build the statistical model. This method has many advantages: it’s widely available, easy to understand, and generally works well. Stat-Ease has a huge library of cases studies, tutorials, and videos that illustrate this technique.

However, there will be cases where linear regression doesn’t work well. One classic example is a computer, or simulation, experiment. Physical experiments have a noise component to them, meaning that if a combination of factors is run repeatedly, the response won’t be exactly the same each time – there’s measurement error, differences in lots of raw material, operator differences, and so on. However, in a computer experiment, software is used to generate the responses, and repeating the simulation with the same factor settings will result in identical output responses. In such a case, methods which assume noisy data, such as linear regression, are inappropriate.

Gaussian Process Models (GPMs) are an appropriate alternative in this case. A GPM, loosely speaking, interpolates between design points based on user-defined settings, and will pass through all the data points. It may look like a GPM is overfitting in this case but remember that there is no noise in the data – we know the responses are exact, and so the only uncertainty is in-between the runs. In a perfect world, we would simulate the response in the entire design space, but simulations can be time-consuming, often taking days or even weeks to obtain a single run, so a DOE and statistical model is used.

GPM_figure_1
A quadratic model used for simulation data will show uncertainty where there isn’t any (at the red design points) and severely overestimate the uncertainty between design points.

GPMs, however, can be extended beyond the zero-noise situation. This is especially useful in situations where the response may be non-linear, having intermittent spikes and valleys, and simply may not be modelled adequately by linear regression. Often, a high-order polynomial would be necessary (higher than quartic) which is generally not recommended.

Let’s look at the same data. Clearly, a quadratic model doesn’t do a good job describing the data. A high-order polynomial does a better job at capturing the trends in the data, but a the polynomial like this one will have huge error bars and will be very sensitive to outliers and minor perturbations in the data.

Stat-Ease 360 now can fit generalized Gaussian Process Models to noisy data – this extends the use case beyond computer experiments. Here’s what the model would look like when fit to the above data:

GPM_figure_2

Notice that this model captures the peaks and valleys of the data (unlike the quadratic model) and doesn’t go through all the points (unlike the zero-error GPM). This model is incredibly flexible and can be adjusted using a smoothing parameter. SE360 offers two ways for automatically fitting these models – maximum likelihood and cross validation. These models can be used as they normally would in the optimization and other features of the program.

Try out GPM today with a trial of Stat-Ease 360 software.

Return to blog