# Stat-Ease Blog

## Adding Intervals to Optimization Graphs

posted by Heidi on Oct. 18, 2017

Design-Expert® software provides powerful features to add confidence, prediction, or tolerance intervals to its graphical optimization plots. All users can benefit by seeing how this provides a more conservative ‘sweet spot’. However, this innovative enhancement is of particular value for those in the pharmaceutical industry who hope to satisfy the US FDA’s QbD (quality by design) requirements.

Here are the definitions:

Confidence Interval (CI): an interval that covers a population parameter (like a mean) with a pre-determined confidence level (such as 95%.)

Prediction Interval (PI): an interval that covers a future outcome from the same population with a pre-determined confidence level.

Tolerance Interval (TI): an interval that covers a fixed proportion of outcomes from the population with a pre-determined confidence level for estimating the population mean and standard deviation. (For example, 99% of the product will be in spec with 95% confidence.)

Note that a confidence interval contains a parameter (σ, μ, ρ, etc.) with “1-alpha” confidence, while a tolerance interval contains a fixed proportion of a population with “1-alpha” confidence.

These intervals are displayed numerically under Point Prediction as shown in Figure 1. They can be added as interval bands in graphical optimization, as shown in Figure 2. (Data is taken from our microwave popcorn DOE case, available upon request.) This pictorial representation is great for QbD purposes because it helps focus the experimenter on the region where they are most likely to get consistent production results. The confidence levels (alpha value) and population proportion can be changed under the Edit Preferences option.

## Choosing the Best Design for Process Optimization

posted by Shari on Aug. 29, 2017

Ever wonder what the difference is between the various response surface method (RSM) optimization design options? To help you choose the best design for your experiment, I’ve put together a list of things you should know about each of the three primary response surface designs—Central Composite, Box-Behnken, and Optimal.

Central Composite Design (CCD)

• Developed for estimating a quadratic model
• Created from a two-level factorial design, and augmented with center points and axial points
• Relatively insensitive to missing data
• Features five levels for each factor (Note: The number of levels can be reduced by choosing alpha=1.0, a face-centered CCD which has only three levels for each factor.)
• Provides excellent prediction capability near the center (bullseye) of the design space

Box-Behnken Design (BBD)

• Created for estimating a quadratic model
• Requires only three levels for each factor
• Requires specific positioning of design points
• Provides strong coefficient estimates near the center of the design space, but falls short at the corners of the cube (no design points there)
• BBD vs CCD: If you end up missing any runs, the accuracy of the remaining runs in the BBD becomes critical to the dependability of the model, so go with the more robust CCD if you often lose runs or mismeasure responses.

Optimal Design

• Customize for fitting a linear, quadratic or cubic model (Note: In Design-Expert® software you can change the user preferences to get up to a 6th order model.)
• Produce many levels when augmented as suggested by Design-Expert, but these can be limited by choosing the discrete factor option
• Design points are positioned mathematically according to the number of factors and the desired model, therefore the points are not at any specific positions—they are simply spread out in the design space to meet the optimality criteria, particularly when using the coordinate exchange algorithm
• The default optimality for “I” chooses points to minimize the integral of the prediction variance across the design space, thus providing good response estimation throughout the experimental region.
• Other comments: If you have knowledge of the subject matter, you can edit the desired model by removing the terms that you know are not significant or can't exist. This will decrease the required number of runs. Also, you can also add constraints to your design space, for instance to exclude particular factor combinations that must be avoided, e.g., high-temperature and high time for cooking.

For an in-depth exploration of both factorial and response surface methods, attend Stat-Ease’s Modern DOE for Process Optimization workshop.

Shari Kraber
[email protected]

## Tips and Tricks for Navigating Design-Expert® Software

posted by Shari on June 15, 2017

We’ve designed Design-Expert® software to be flexible and user-friendly. For those of you who haven’t had a chance to fully explore its capabilities, here are some tips to help you navigate the software and find options that are useful for you:

• The all-new Design Wizard asks you a series of questions, and then directs you to a starting design! You may want to modify things from here, but it’s a great starting point.
• To access guidance specific to the screen that you are on (Screen Tips), click on the light bulb on the tool bar. The question mark brings up more general help.
• Use the Tab key when entering factor information. The tab will flow left to right across the factor information for the first factor, and then move to the second factor. No mouse is needed!
• You can change factor names or levels by right-clicking on either a factor or response column header and choosing Edit Info. This is a convenient way of editing design information rather than rebuilding the design.
• Insert or Delete a response (or factor) by right-clicking on a response or factor column header.
• On the Design Layout view (spreadsheet) right-click on the gray square to the left of any row to access features like Inserting/Deleting/Duplicating rows, or changing the Row Status to Verification/Highlight/Ignore.
• On a graph, you can change axis settings, number formats, graph features, colors, and more, by right-clicking on the graph and choosing Graph Preferences. On a contour plot, you can set the increments of the contour lines to specific values.

## Don’t Let R² Fool You

posted by Pat on Dec. 13, 2016

Has a low R² ever disappointed you during the analysis of your experimental results? Is this really the kiss of death? Is all lost? Let’s examine R² as it relates to factorial design of experiments (DOE) and find out.

R² measures are calculated on the basis of the change in the response (Δy) relative to the total variation of the response (Δy + σ)over the range of the independent factor:

Let’s look at an example. Response y is dependent on factor x in a linear fashion:

We run a DOE using levels x1 and x2 in Figure 1 (below) to estimate beta1 (β1). Having the independent factor levels far apart generates a large signal-to-noise ratio (Δ12) and it is relatively easy to estimate β1. Because the signal (Δy) is large relative to the noise (σ), R² approaches one.

What if we had run a DOE using levels x3 and x4 in figure 1 to estimate β1? Having the independent factor levels closer together generates a smaller signal-to-noise ratio (Δ34) and it is more difficult to estimate β1. We can overcome this difficulty by running more replicates of the experiments. If enough replicates are run, β1 can be estimated with the same precision as in the first DOE using levels x1 and x2. But, because the signal (Δy) is smaller relative to the noise (σ), R² will be smaller, no matter how many replicates are run!

In factorial design of experiments our goal is to identify the active factors and measure their effects. Experiments can be designed with replication so active factors can be found even in the absence of a huge signal-to-noise ratio. Power allows us to determine how many replicates are needed. The delta (Δ) and sigma (Σ) used in the power calculation also give us an estimate of the expected R² (see the formula above). In many real DOEs we intentionally limit a factor’s range to avoid problems. Success is measured with the ANOVA (analysis of variance) and the t-tests on the model coefficients. A significant p-value indicates an active factor and a reasonable estimate of its effects. A significant p-value, along with a low R2, may mean a proper job of designing the experiments, rather than a problem!

R² is an interesting statistic, but not of primary importance in factorial DOE. Don’t be fooled by R²!

## DOE Simplified, 3rd Edition Now Available

posted by Heidi on July 2, 2015

The third edition of DOE Simplified: Practical Tools for Effective Experimentation is now available. This comprehensive introductory text is geared towards readers with a minimal statistical background. In it, the authors take a fresh and lively approach to learning the fundamentals of experiment design and analysis. This edition includes a major revision of the software that accompanies the book (via trial download) and sets the stage for introducing experiment designs where the randomization of one or more hard-to-change factors can be restricted. It also includes a new chapter on split plots and adds coverage of a number of recent developments in the design and analysis of experiments.

P.S. There are still some copies of DOE Simplified, 2nd Edition available on clearance if you would like to learn the fundamentals of DOE while saving money.