Interval Estimates

Interval Estimates are used to provide more information about the precision capability of the analysis. The lower and upper bounds indicate how well the true trends in the response surface have been modeled. Intervals can be shown on model graphs and are part of the post-analysis reports.

The default value for all the intervals have a confidence level (1-α) of 95%. The default proportion of individuals for tolerance intervals (P) is is 99%. These values can be changed under the Edit menu, Preferences, Math, Math Analysis section.

Confidence Interval

The “confidence” in “confidence level” is the percentage of intervals generated from samples (or designs) drawn from the same population that contain a true population parameter such as mean or individual outcomes.

The confidence interval (CI) bounds the true mean (average) of the population. It is part of the Point Prediction output.

Prediction Interval

The prediction interval (PI) bounds the average of a future sample of a given size. If the sample size is one, then sample can be thought of as the next observation. It is part of the Confirmation output.

The prediction interval will be wider than the confidence interval because it must include the unknown variation of a sample. The prediction interval is for the average of a small sample whereas the confidence interval is for the average of the population (an infinite sample).

Tolerance Interval

The tolerance interval (TI) contains at least a given proportion (P) of all individual outcomes.

The tolerance interval is the widest because it attempts to account for the variability observed in the experiment and the unobserved variation from all individual outcomes drawn from the same population.


Math Details

(1-α)*100% Confidence Interval

\(\hat{y}\pm t_{(1-\frac{\alpha}{2}, n-p)}\cdot SE\)

where the standard error of the design at \(x_0\) is,

\(SE = s \cdot \sqrt{x_0 (X^T X)^{-1}x_0^T}\)

(1-α)*100% Prediction Interval

\(\hat{y}\pm t_{(1-\frac{\alpha}{2},n-p)}\cdot SE_{pred}\)

where the prediction error for one future response measurement at point \(x_0\) is,

\(SE_{pred} = s \cdot \sqrt{1 + x_0(X^T X)^{-1}x_0^T}\)

These are the prediction intervals available on the graphs. See the Confirmation node entry for an example of how this formula changes for multiple confirmation observations at \(x_0\).

(1-α)*100% Tolerance Interval for P * 100% of the population

\(\hat{y}\pm s \cdot (\mathrm{TI~Multiplier})\)

where the multiplier is,

\(\mathrm{TI~Multiplier} = t_{(1-\alpha, n-p)} \cdot \sqrt{x_0 (X^TX)^{-1}x_0^T} + \Phi^{-1}\left(0.5 + P/2\right) \cdot\sqrt{\frac{n-p}{\chi^2_{(\alpha, n-p)}}}\)

The TI uses only \(\scriptstyle \alpha\) rather than \(\scriptstyle \alpha/2\) to compute the two-tailed interval.

Symbols:

\(\hat{y}\) predicted value at \(x_0\)
\(s\) estimated standard deviation
\(t\) student’s t critical value
\(\alpha\) acceptable type I error rate (1 - confidence level)
\(n\) number of runs in the design
\(p\) The number of terms in the model including the intercept
\(P\) proportion of the population contained in the tolerance interval
\(X\) expanded model matrix *
\(x_0\) expanded point vector
\(\Phi\) inverse normal function to convert the proportion to a normal score
\(\chi^2\) Chi-Square critical value
n-p is also the residual degrees of freedom (df) from the ANOVA.
The superscript T indicates the previous matrix is transposed.
The superscript -1 indicates the previous matrix is inverted.
*

The expanded model matrix (X) has one row for each run in the design and one column for each term in the model. The values in the X matrix are assumed to be coded values. The first column is typically all 1’s to represent the intercept term of the model. For mixture designs there is no intercept term in the Scheffé polynomial thus this column is not present.

The expanded point vector is a way to represent the settings of the factors for a particular location for purposes of prediction. Think of it as a matrix with one row. It has a similar structure to a row of the expanded model matrix; one element for each term in the model. The order of the terms represented by the model matrix’s columns and the point vector’s elements must match.

The residual (error) degrees of freedom for a split-plot design are estimated using the procedure outlined by Kenward and Roger. This value is shown on the point prediction output.

References

  • DeGryze, Langhans, and Vandebroek. Using the correct intervals for prediction: a tutorial on tolerance intervals of ordinary least-squares regression. Chemometrics and Intelligent Laboratory Systems, 87(2):147–154, 2007.

  • Hahn and Meeker. Statistical Intervals, A Guide for Practitioners. 1991.

  • Kenward and Roger. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53(3):983–997, 1997.