The final stage of analyzing designed experiments data is determining the optimal set of process conditions that works for all responses. Stat-Ease software does this via a numerical optimization algorithm. This routine simultaneously optimizes all responses at once, based on goals set by the experimenter. This is achieved by deploying the Derringer-Suich(1) desirability criteria in conjunction with the Nelder-Mead(2) variable-sized simplex search algorithm. This optimization function balances competing response goals to find the “sweet spot” that produces the best of all worlds. Without getting deep into the mathematical weeds of these tools, I would like to provide some basic concepts and discuss how to use this method to optimize DOE results.

**Starting point: minimum model requirements**

Numerical optimization uses prediction models created by the analysis of each measured response. The stronger the prediction models, the more accurate the optimization results. If the analysis does not show a strong relationship between the factors and the response, then optimization will not work well. At a minimum, the model p-value should be less than 0.05, and the model should only include terms that are statistically significant plus those needed to maintain model hierarchy. If the DOE data included replicates, then there should be an insignificant lack of fit test (p-value >0.10). Key summary statistics for modeling include adjusted R-squared and predicted R-squared. Higher is better for each of these, meaning that more variation in the data and in the predictions is explained by the model. There is not a particular “cut-off” for these values but models that explain more than 50% of the variation are going to perform better than those that do not. In summary, start optimization with response models that explain the data and produce reliable predictions.

**Desirability at a specific point**

Numerical optimization is driven by a mathematical calculation called desirability. Points within the design space are evaluated via the desirability function that is defined by the user-specified goals for each response. The overall (multi-response) desirability (D) is the geometric mean of the individual desirability (di) for each response.

*Figure 1: Desirability function*

An individual desirability “little d” (range of 0 to 1) is defined by how closely the evaluated point meets the response goal. Typical response goals are maximize, minimize or target a specific value. In addition to the goal, upper and lower “acceptable” limits on the response values must be set.

Illustration: The experimenters study a process that has 3 input factors and 2 output responses. In this example, the first response (% Conversion) measurements has an observed range of 51-97 percent. The goal for conversion is maximize. Considering business expectations, the minimum acceptable conversion is determined to be 80%, so that is defined as the lower limit. The upper limit is set to the theoretical maximum of 100%. These limits, along with the goal, define the desirability function for the conversion response. When evaluating a particular point in the design space, if the measured conversion is less than 80% (defined lower limit), desirability = 0. If conversion is 80-100%, desirability equals the proportion of the way towards the upper limit (100). Therefore, a conversion of 90 gives d=.5 and a conversion of 95 gives d=0.75. Any point that gives % conversion at 100% or higher will result in d=1.

*Figure 2: Response 1 goal: Maximize with an acceptable range of 80-100%.*

Response 2 is Activity and the goal is a Target of 63 and a range of 60-66 (Figure 3). Desirability will be 1 only at the exact value of 63. Evaluated points that result in activity levels between 60-63 and 63-66 are rated with desirability values that are proportional to the distance from the target. Activity levels that are either below 60 or above 66 are assigned a desirability of 0.

*Figure 3: Response 2 goal: Target 63, with acceptable range 60-66.*

**The optimization algorithm at work**

Once the goals and limits for each response are defined, the search algorithm can start. Stat-Ease software begins with a set of starting points (locations in the design space). For a single starting point, overall desirability (D) is calculated. Then the simplex search starts evaluating desirability (D) in the nearby area and takes “steps” that increase desirability. Steps are taken across the design space until desirability is maximized. All the starting points follow this process, resulting in a set of final “solutions” which are process conditions that at least minimally meet the requirements for all responses (individual desirability is greater than 0).

**Optimization solutions**

If the process is easy to optimize (the responses don’t compete with each other too much), there may be a large robust space that meets the response goals. In this case a very large number of solutions (process conditions) may be found. These solutions are sorted by the desirability value. Common practice is to focus on the top solution(s). Remember however, all the solutions meet the goals set by the experimenter. Optimization does not mean there is a single set of conditions that is best. If the area is very large (many solutions found) then tightening up the upper or lower limits may be merited. There may also be other external criteria to consider such as cost of the solution, manufacturability, ease of implementation, etc. The experimenter should review all the solutions presented and consider which ones make sense from a business perspective.

Figure 4 shows the optimal conditions for the illustration. The red dots show the location of the optimal settings for the factors, within their range. In this case time is set mid-way in the range (47 min), while temperature is maximized at 90 degrees and catalyst is approximately 2.7%. These process conditions are predicted to result in a conversion of 91% and activity level of 63. Confirmation runs should be completed to verify these results.

*Figure 4: Numerical solution “ramps view” for illustration*

*A side note:* Desirability is only a mathematical evaluation tool to compare solutions. Although it ranges from 0 to 1, it is a relative measure within a set of solutions, and not a statistic that needs to be as high as possible. Within a specific DOE, higher desirability means that the solution (set of conditions) met the stated goals more closely than a solution with lower desirability.

**Summary**

The success of numerical optimization starts with strong prediction models from the DOE analysis. Once models are established, the experimenter specifies each response goal, as well as upper and lower limits around that goal. The numerical search algorithm evaluates areas within the design space, searching for areas that simultaneously meet the goals for all the responses. This optimization function balances competing response goals to find the “sweet spot” that produces the best of all worlds.

References:

- G.C. Derringer and R. Suich in “Simultaneous Optimization of Several Response Variables,”
*Journal of Quality Technology*, October 1980, pp. 214-219 *Numerical Recipes in Pascal*by William H. Press et. al., p.326

A central composite design (CCD) is a type of response surface design that will give you very good predictions in the middle of the design space. Many people ask how many center points (CPs) they need to put into a CCD. The number of CPs chosen (typically 5 or 6) influences how the design functions.

Two things need to be considered when choosing the number of CPs in a central composite design:

**1)** Replicated center points are used to estimate pure error for the lack of fit test. Lack of fit indicates how well the model you have chosen fits the data. With fewer than five or six replicates, the lack of fit test has very low power. You can compare the critical F-values (with a 5% risk level) for a three-factor CCD with 6 center points, versus a design with 3 center points. The 6 center point design will require a critical F-value for lack of fit of 5.05, while the 3 center point design uses a critical F-value of 19.30. This means that the design with only 3 center points is less likely to show a significant lack of fit, even if it is there, making the test almost meaningless.

TIP: True “replicates” are runs that are performed at random intervals during the experiment. It is very important that they capture the true normal process variation! Do not run all the center points grouped together as then most likely their variation will underestimate the real process variation.

**2)** The default number of center points provides near uniform precision designs. This means that the prediction error inside a sphere that has a radius equal to the ±1 levels is nearly uniform. **Thus, your predictions in this region (±1) are equally good.** Too few center points inflate the error in the region you are most interested in. This effect (a “bump” in the middle of the graph) can be seen by viewing the standard error plot, as shown in Figures 1 & 2 below. (To see this graph, click on Design Evaluation, Graph and then View, 3D Surface after setting up a design.)

Figure 1 (left): CCD with the 6 center points (5-6 recommended). Figure 2 (right): CCD with only 3 center points. Notice the jump in standard error at the center of figure 2.

Ask yourself this—where do you want the best predictions? Most likely at the middle of the design space. Reducing the number of center points away from the default will substantially damage the prediction capability here! Although it can seem tedious to run all of these replicates, the number of center points does ensure that the analysis of the design can be done well, and that the design is statistically sound.

There are a couple features in the latest release of Design-Expert and Stat-Ease 360 software programs (version 22.0) that I really love, and wanted to draw your attention to. These features are accessible to everyone, no matter if you are a novice or an expert in design of experiments.

First, the **Analysis Summary** in the Post Analysis section: This provides a quick view of all response analyses in a set of tables, making it easy to compare model terms, statistics such as R-squared values, equations and more. We are pleased to now have this feature that has been requested many times! When you have a large number of responses, understanding the similarities and differences between the model may lead to additional insights to your product or process.

Second, the **Custom Graphs** (previously Graph Columns): Functionality and flexibility have been greatly expanded so that you can now plot analysis or diagnostic values, as well as design column information. Customize the colors, shapes and sizes of the points to tell your story in the way that makes sense to your audience.

Figure 1 (left) shows the layout of points in a central composite design, where the points are colored by the their space point type (factorial, axial or center points) and then sized by the response value. We can visualize where in the design space the responses are smaller versus larger.

In Figure 2 (right), I had a set of existing runs that I wanted to visualize in the design space. Then I augmented the design with new runs. I set the Color By option to Block to clearly see the new (green) runs that were added to the design space.

These new features offer many new ways to visualize your design, response data, and other pieces of the analysis. What stories will you tell?

I am often asked if the results from one-factor-at-a-time (OFAT) studies can be used as a basis for a designed experiment. They can! This augmentation starts by picturing how the current data is laid out, and then adding runs to fill out either a factorial or response surface design space.

One way of testing multiple factors is to choose a starting point and then change the factor level in the direction of interest (Figure 1 – green dots). This is often done one variable at a time “to keep things simple”. This data can confirm an improvement in the response when any of the factors are changed individually. However, it does not tell you if making changes to multiple factors at the same time will improve the response due to synergistic interactions. With today’s complex processes, the one-factor-at-a-time experiment is likely to provide insufficient information.

The experimenter can augment the existing data by extending a factorial box/cube from the OFAT runs and completing the design by running the corner combinations of the factor levels (Figure 2 – blue dots). When analyzing this data together, the interactions become clear, and the design space is more fully explored.

In other cases, OFAT studies may be done by taking a standard process condition as a starting point and then testing factors at new levels both lower and higher than the standard condition (see Figure 3). This data can estimate linear and nonlinear effects of changing each factor individually. Again, it cannot estimate any interactions between the factors. This means that if the process optimum is anywhere other than exactly on the lines, it cannot be predicted. Data that more fully covers the design space is required.

A face-centered central composite design (CCD)—a response surface method (RSM)—has factorial (corner) points that define the region of interest (see Figure 4 – added blue dots). These points are used to estimate the linear and the interaction effects for the factors. The center point and mid points of the edges are used to estimate nonlinear (squared) terms.

If an experimenter has completed the OFAT portion of the design, they can augment the existing data by adding the corner points and then analyzing as a full response surface design. This set of data can now estimate up to the full quadratic polynomial. There will likely be extra points from the original OFAT runs, which although not needed for model estimation, do help reduce the standard error of the predictions.

Running a statistically designed experiment from the start will reduce the overall experimental resources. But it is good to recognize that existing data can be augmented to gain valuable insights!

Learn more about design augmentation at the January webinar: The Art of Augmentation – Adding Runs to Existing Designs.

At the outset of my chemical engineering career, I spent 2 years working with various R&D groups for a petroleum company in Southern California. One of my rotations brought me to their tertiary oil-recovery lab, which featured a wall of shelves filled to the brim with hundreds of surfactants. It amazed me how the chemist would seemingly know just the right combination of anionic, nonionic, cationic and amphoteric varieties to blend for the desired performance. I often wondered, though, whether empirical screening might have paid off by revealing a few surprisingly better ingredients. Then after settling in on the vital few components doing an in-depth experiment may very well have led to discovery of previously unknown synergisms. However, this was before the advent of personal computers and software for mixture design of experiments (DOE), and, thus, extremely daunting for non-statisticians.

Nowadays I help many formulators make the most from mixture DOE via Stat-Ease softwares’ easy-to-use statistical tools. I was very encouraged to see this 2021 meta-analysis that found 200 or so recent publications (2016-2020) demonstrating the successful application of mixture DOE for food, beverage and pharmaceutical formulation development. I believe that this number can be multiplied many-fold to extrapolate these findings to other process industries—chemicals, coatings, cosmetics, plastics, and so forth. Also, keep in mind that most successes never get published—kept confidential until patented.

However, though I am very heartened by the widespread adoption of mixture DOE, screening remains underutilized based on my experience and a very meager yield of publications from 2016 to present from a Google-Scholar search. I believe the main reasons to be:

- Formulators prefer to rely on their profound knowledge of the chemistry for selection of ingredients (refer to my story about surfactants for tertiary oil recovery)
- The number of possibilities get overwhelming; for example, this 2016
*Nature*publication reports that experimenters on a pear cell suspension culture got thrown off by the 65 blends they believed were required for simplex screening of 20 components (too bad, as shown in the Stat-Ease software screenshot below, by cutting out the optional check blends and constraint-plane-centroids, this could be cut back to substantially.)

- Misapplying factorial screening to mixtures, which, unfortunately happens a lot due to these process-focused experiments being simpler and more commonly used. This is really a shame as pointed out in this Stat-Ease blog post

I feel sure that it pays to screen down many components to a vital few before doing an in-depth optimization study. Stat-Ease software provides some great options for doing so. Give screening a try!!

For more details on mixture screening designs and a solid strategy of experiments for optimizing formulations, see my webinar on Strategy of Experiments for Optimal Formulation. If you would like to speak with our team about putting mixture DOE to good use for your R&D, please contact us.