Note
Screenshots may differ slightly depending on software version.
Gaussian process models are used in computer experiments, where instead of a physical or chemical process, the runs are evaluated using a simulation that may take a great deal of time. A Gaussian process model can be used as a surrogate for the simulation to make predictions or find optimal operating parameters. For a more in-depth study of computer experiments, see Santner et al. (2003) [SWN03].
Note
Stat-Ease software does not offer Gaussian process models for blocked data or split plots – these designs not being useful for experiments on deterministic computer simulations.
A convenience store chain wants to open a new gas station, and is trying to determine the number of pumps and the capacity of the underground storage tank required for the new store. Using a computer simulation, they are able to calculate the average wait time of vehicles throughout a typical day. However, the simulation is very computationally expensive to run, and so cannot be used with an optimizer.
The outputs of the simulator are the average wait time over the course of a day and the number of vehicles that abandon the queue due to an overly long wait.
Fewer pumps and a smaller tank are desirable to reduce the initial cost, as well as ongoing maintenance. The number of pumps directly impacts the number of vehicles that can be serviced at any given time. If the underground tank is too small it may need to be refilled during peak demand. This causes all pumping to pause while it is filled.
Open Stat-Ease, and click on the Help, Tutorial Data menu and select Gas Station.
The vehicle simulator is deterministic. It will always result in the exact same average wait time and queue abandon rate given the same inputs. As such, a traditional designed experiment with replicates is inappropriate. Instead we will use a Latin Hypercube Design (LHD). This design has several properties that are desirable for a computer experiment:
It is a space-filling design.
It has no replicates.
The design resulting from removing a factor will also have no replicates (projection).
You’ll notice there is an additional response column for the cost of the station, which is an Equation Only response that we’ll use in optimization.
Click on the Graph Columns node, then graph Pumps vs Tank Size. This plot demonstrates the space-filling properties of the LHD.
You may notice that there are replicates along the X axis, for example there are two tank sizes at 4 pumps. This is because a fraction of a fuel pump is nonsensical, so the Pumps factor was rounded and converted to a Discrete Numeric factor type. We’ve lost the projection property of the LHD, but this will allow us to restrict the predictions and optimization algorithm to integers.
Click on the avg wait time response to begin the analysis. Switch to Gaussian Process in the Special Models dropdown and click Start Analysis.
The Factors tab allows you to remove a factor column from the analysis. In this case we’ll want to use both factors, so leave them alone. The Gaussian process kernel has a smoothing parameter that can be adjusted on this tab. For now we’ll use the one suggested by a maximum likelihood estimate, by clicking Calculate.
Continue to the Model Graphs tab to see how the model predicts. One thing that you may notice is that at the design points the Gaussian process model prediction will always match the observed value.
Repeat the process for the fraction abandoned response.
Now we are ready to find the optimal setup for the gas station. Go to the Numerical Optimization node. We’ll want to minimize the avg wait time and fraction abandoned responses, as well as the cost (which has the effect of minimizing both factors).
The optimizer finds a solution that is midway between the least and most expensive options. The average wait time of 17s is acceptable, and relatively few vehicles abandon the queue.
It would be reasonable to consider retaining as many vehicles as possible to be a greater priority than the wait time or cost. Stat-Ease has two ways to tweak the optimizer to better handle this important response. Go back to the Criteria tab, and you will see an Importance dropdown. Switch to the fraction abandoned criteria and see what setting this to +++++ does to the results.
Even with the higher importance we still get >5% abandoned.
Go back to the fraction abandoned criteria again and this time, grab the box in the middle of the criteria diagram and drag it to the bottom. This will modify the criteria so the desirability increases exponentially as you approach 0%. The same result can be achieved by increasing the Weights fields.
By increasing the number of pumps we’ve dropped the predicted abandon rate to 0.2%. This corresponds to roughly 4 cars per day as opposed to 100, which will probably pay for the increased cost of the station over a period of time.
You’ll notice that there is a column for the number of abandoned cars predicted by the simulator. This count response can be analyzed using Poisson regression as an alternate model to the fraction response. Try it out and see how the predictions differ. The simulation uses 2030 vehicles per day, so multiply the fraction response predictions by that to compare with the count.
References
Thomas Santner, Brian Williams, and William Notz. The Design and Analysis Computer Experiments. Springer, 01 2003. ISBN 978-1-4419-2992-1. doi:10.1007/978-1-4757-3799-8.