The DOE FAQ Alert Vol. 9 No. 2

Issue: Volume 9, Number 2
Date: February 2009
From: Mark J. Anderson, Stat-Ease, Inc., Statistics Made Easy® Blog

Dear Experimenter,

Here's another set of frequently asked questions (FAQs) about doing design of experiments (DOE), plus alerts to timely information and free software updates. If you missed the previous DOE FAQ Alert, see below.

==> Tip: Get immediate answers to questions about DOE via the Search feature on the main menu of the Stat-Ease® web site. This not only pores over previous alerts, but also the wealth of technical publications posted throughout the site.

Feel free to forward this newsletter to your colleagues. They can subscribe by going to http://www.statease.com/doealertreg.html. If this newsletter prompts you to ask your own questions about DOE, please address them via mail to:[email protected].

For an assortment of appetizers to get this Alert off to a good start, see these new blogs at http://statsmadeeasy.net* (beginning with the most recent one):

— Decimal place makes all the difference
— Paperwork reduction?
— Which of these is the winter weather outlier?**
— The MAD statistics for overkill
— Number smiths gain top three spots for having the best occupations

* Need a feed from StatsMadeEasy to Microsoft's Outlook? See http://office.microsoft.com/en-us/outlook/HA101595391033.aspx.
** (See the comment to this blog and follow its link to an amazing collection of stats and graphs on record temps across the USA)

Also, Stat-Ease offers an interactive web site — its Support Forum for Experiment Design at http://forum.statease.com. Whereas this monthly e-zine shares one-on-one communications with Stat-Ease StatHelp, anyone can post questions and answers to the Forum, which is open for everyone to see (with moderation). Check it out and weigh in!

Topics in the body text of this DOE FAQ Alert are headlined below (the "Expert" ones, if any, delve into statistical details).

1. FAQ: Why is it a bad idea to simply re-measure a response, but then treat it as a replicate run?
2. Expert-FAQ: How do you compute the externally Studentized residual?
3. Book Giveaway: A range of DOE texts from simple to sublime
4. Info Alert: DOE leads to 65% increase in product yield
5. Reader Response: Continued debate on reducing mixture models
6. Events Alert: 2nd notice for talk on — "Friend or Foe? How to Use Graphical Diagnostics for Scoping Out Discrepant Data"
7. Workshop Alert: Learn response surface methods (RSM) for process optimization to gain a vital edge in performance

P.S. Quote for the month: A political science professor on the statistical significance of the recount for Minnesota's Senate race between Coleman and Franken.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1. FAQ: Why is it a bad idea to simply re-measure a response, but then treat it as a replicate run?

-----Original Question-----
From: A process industry researcher well versed on DOE
"Have you covered in a back issue of this publication the problems with running one experiment, analyzing it three times for a response, and then entering these three values back into a DOE program as if one experiment had been run three independent times and the three measurements are the responses? If so, please let me know the reference; I'm having trouble convincing a colleague that this method will understate the variance & artificially inflate effects.

Naturally, when my colleague duplicates a run line and adds the additional responses (really just measuring technique precision), what was previously an insignificant effect now becomes significant, making his case. Thus, he cannot be convinced of the problems with this. You must have seen this type of behavior before — proof is the result he wants so it must be right."

Answer (from Consultant Wayne Adams):
"The best proof material we have can be found in the postings of our free webinars, in particular one that I presented last May: http://www.statease.com/webinars/08-May_Replicates_vs_Repeats.pdf.

Proper replication requires that the same factor combination be run at different times using the same set-up procedure. From what you describe that did not happen. Instead, your colleague simply took repeat measurements on a given run. This only takes the measurement variation into account — missing variations due to the set-up and process itself. In other words, the two runs can be significantly different because of set-up and process variation — not a true factor effect."

PS. I agree wholeheartedly that only re-measuring is a bad idea that creates false positive effects due to its underestimation of overall error from one experimental setup to another. For example, I once did a fun experiment on paper flyers in which the design called for replicates at random intervals. It was tempting just to re-fly the same aircraft. However, I am not very good at cutting and folding (flunked those lessons in kindergarten!) so I knew I'd better re-cut and re-fold an 'identical' flyer so my ineptitude in manufacturing would be accounted for.
— Mark

A further suggestion from Consultant Pat Whitcomb:

"Re-measuring and putting in the replicate measures as replicate runs can bias the regression coefficients. Each run varies about its 'true' value: sometimes high and sometimes low, more often near the 'true' value and less often farther away. Replicating the run gives a distribution of observations that are unbiased and
are centered on the 'true' value. Replicating only the measurement means that any offset due to process setup, charging, operation, etc. is fixed high or low and not averaged out; therefore this can introduce bias in the estimated coefficients."

(Learn more about replication by attending the three-day computer-intensive workshop "Experiment Design Made Easy." See http://www.statease.com/clas_edme.html for a description of this class and link from this page to the course outline and schedule. Then, if you like, enroll online.)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2. Expert FAQ: How do you compute the externally Studentized residual?

-----Original Question-----
From: French semiconductor engineer
"I do not really understand how the externally Studentized residual is computed. In the Design-Expert® software Help, I read:
>Externally Studentized Residual (Outlier-t value, R-Student):
Calculated by leaving the run in question out of the analysis and estimating the response from the remaining runs...The t value is the number of standard deviations difference between this predicted value and the actual response. This tests whether the run in question follows the model with coefficients estimated from the rest of the runs, that is, whether this run is consistent with the rest of the data for this model. Runs with large t values should be investigated.<

The formula provided along with this text includes a term "s-i": How is this computed and what does it mean. How is it possible that the externally Studentized residual is often larger (in absolute values) than the one that is 'internally' Studentized?"

Answer (from Consultant Wayne Adams):
"The s-i is the square root of the MSE for the model fit without the deleted run. Externally Studentized residuals (ESR) can be larger or smaller than internally Studentized residuals (ISR). ESR can be larger if the deleted point is on a slightly different curve than the rest of the observations. ESR can become smaller if the removal of the term causes the overall noise to increase, in other words if a point was well represented by the complete data set, removing it and fitting a new model can actually improve that statistic."

PS. It helps me to read the "-i" subscript as "minus one individual point." I like this 'deletion diagnostic' very much because it makes so much sense that if you think a given experimental run went awry, for example — due to a mechanical breakdown, then take it out before assessing it as a possible outlier.
— Mark

PPS. Under the topic "Externally Studentized Residuals (Outlier t)", I see this very practical Help for users of Design-Expert or Design-Ease® software:
>This plot helps the experimenter detect outliers in the data. Look for points that are outside the red lines. These are data points that are not fit well by the current model. Either the value is wrong or the model is wrong. If there is such a point, click on it to determine which point it is. Check the value that was typed into the Design Layout screen - typos are the most common cause of outliers! If the point was typed in correctly, investigate whether a special cause can be assigned to that point. Did something unusual happen during that run? Also, check your model - could a transformation help the analysis by fitting all the data points better? Finally, does the presence of the outlier affect the conclusions? If not, then it doesn't make sense to spend time on the issue.

TIP: Either draw a window around the point or right-click and choose Highlight Point. This will highlight the run in the Design Layout screen.<
I've found this last feature very useful for pin-pointing a suspicious run.
— Mark

A further suggestion from Consultant Pat Whitcomb:

"Use the DFFITS (the difference in fitted values with and without the run in question) to judge the effect of the suspected outlier."

(Learn more about diagnostics by attending the three-day computer-intensive workshop "Response Surface Methods for Process Optimization." For a complete description of this class, see http://www.statease.com/clas_rsm.html. Link from this page to the course outline and schedule. Then, if you like, enroll online.)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

3. Book Giveaway: A range of DOE texts from simple to sublime

(Sorry, due to the high cost of shipping, this offer applies only to residents of the United States and Canada.) Simply reply to this e-mail by February 13 if you'd like one of these blemished, used or surplus books on design of experiments (free!):

— Second edition of "DOE Simplified" by Anderson & Whitcomb. (This is a return due to getting bent out of shape when shipped. The paperback book is not as pretty as original but OK for reading. However, the less flexible CD-ROM did not survive the experience!)

— Second edition of "Statistics for Experimenters" by Box, Hunter & Hunter. (This is surplus — never used and in mint condition.)

— Second edition of "Response Surface Methodology" by Myers & Montgomery. (I replaced this book, still in good condition, with the newer edition — just out this year.*)

— "RSM Simplified" by Anderson & Whitcomb. (This is a returndue to getting wrinkled a bit when shipped. The CD-ROM ofDesign-Expert V6 software — a 180 day limited, but fully-functional version — remains intact.)

I will forward your e-mail entries to my assistant Karen. Do not expect to hear from either of us unless your name is drawn as a winner. However, we do appreciate your participation in these giveaways. Watch for more of these in future DOE FAQ Alerts. Your odds of winning a free book increase by entering each time around!

Reminder: If you reside outside the US or Canada, you are NOT eligible for the drawing because it costs too much to ship the books.

PS. The two-part series of "Simplified" books on DOE and RSM can be ordered online from http://www.statease.com/prodbook.html. There you will also find available the new edition (3rd) of "Response Surface Methodology" by Myers, Montgomery and Anderson-Cook.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

4. Info Alert: DOE leads to 65% increase in product yield

A "Quality" magazine web exclusive, posted December 30, details how a Williamette Valley Company (WVCO) chemist designed a two-level factorial experiment that revealed substantial interactions in their polyurethane process. Knowing this, WVCO implemented changes that increased first-pass yields 65 percent and overall plant yields by 20 percent. For details and the inspirational story, see http://preview.tinyurl.com/7achjf.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

5. Reader Response: Continued debate on reducing mixture models

-----Original Submission-----
From: Greg Piepel, MIXSOFT, Richland, WA
"After focusing heavily on mixture experiment methods for over 30 years, I thought I'd contribute to the discussion on using variable selection methods with mixture experiment models. This topic was addressed in the December 2008 FAQ #2, and Norman Draper provided his viewpoint in the January 2009 FAQ #4. Prof. Draper said he thought "applying a selection procedure to a mixture model was a doubtful technique, because of the dependency between variables." He concluded by saying "in general it is better to retain the model that has all the mixture terms of a specific order, e.g. first or second." I think this last sentence goes too far. My viewpoint is more along the lines of Pat Whitcomb's reply in the December issue. He began "As the models get larger there is more of a chance that there are significant terms hidden in the next higher order model." I concur. Over the years I have addressed mixture experiments having up to 22 components. Often there are some significant quadratic terms that correspond to subject-area knowledge. However, as Pat noted, the majority of quadratic terms do not correspond to significant effects.

Also, as the number of components increases beyond a certain point, it becomes infeasible to design an experiment large enough to fit a full quadratic mixture model (QMM). Even if there were enough data to fit the full QMM when there is a larger number of components, the colinearity could be so large as to cause problems. Hence, there is no question in my mind that it can be very useful to apply variable selection methods to identify statistically significant quadratic terms (especially as the number of mixture components increases).

My recommendation is to use the partial quadratic mixture (PQM) modeling approach discussed in the article "Augmenting Scheffe Linear Mixture Models with Squared and/or Cross product Terms", which appeared in the July 2002 issue of the "Journal of Quality Technology." Wendell Smith's book "Experimental Design for Formulation" also discusses and illustrates the PQM modeling approach for several examples. This approach has the advantage of considering model forms equivalent to the one obtained by variable selection, which addresses one of the concerns Prof. Draper had."

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

6. Events Alert: 2nd notice for talk on "Friend or Foe? How to Use Graphical Diagnostics for Scoping Out Discrepant Data"

I will be exhibiting for Stat-Ease at the 2009 ASQ Lean Six Sigma Conference in Phoenix on March 2-3 and giving a talk titled "Friend or Foe? How to Use Graphical Diagnostics for Scoping Out Discrepant Data." See my abstract and learning outcomes at http://www.asq.org/conferences/six-sigma/program/session-g3.html. To sign up for this conference by the American Society for Quality, go to http://www.asq.org/conferences/six-sigma/.

Click http://www.statease.com/events.html for a list of upcoming appearances by Stat-Ease professionals. We hope to see you sometime in the near future!

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

7. Learn response surface methods (RSM) for process optimization to gain a vital edge in performance

Seats are filling fast for the following DOE classes. If possible, enroll at least 4 weeks prior to the date so your place can be assured. However, do not hesitate to ask whether seats remain on classes that are fast approaching!

—> Experiment Design Made Easy (EDME)
(Detailed at http://www.statease.com/clas_edme.html)
> February 24-26 (Minneapolis, MN)

—> Mixture Design for Optimal Formulations (MIX)
(http://www.statease.com/clas_mix.html)
> April 28-30 (Minneapolis)
> June 23-25 (Edison, NJ)

—> Response Surface Methods for Process Optimization (RSM)
(http://www.statease.com/clas_rsm.html)
> March 10-12 (Minneapolis)

—> DOE for DFSS: Variation by Design (DDFSS)
(http://www.statease.com/clas_ddfss.html)
> May 5-6 (Minneapolis)

—> Designed Experiments for Life Sciences (DELS)
(http://www.statease.com/clas_dels.html)
> July 28-29 (Minneapolis)

See http://www.statease.com/clas_pub.html for complete schedule and site information on all Stat-Ease workshops open to the public. To enroll, click the "register online" link on our web site or call Elicia at 612.746.2038. If spots remain available, bring along several colleagues and take advantage of quantity discounts in tuition. Or, consider bringing in an expert from Stat-Ease to teach a private class at your site.*

*Once you achieve a critical mass of about 6 students, it becomes very economical to sponsor a private workshop, which is most convenient and effective for your staff. For a quote, e-mail [email protected].

I hope you learned something from this issue. Address your general questions and comments to me at: [email protected].

PLEASE DO NOT SEND ME REQUESTS TO SUBSCRIBE OR UNSUBSCRIBE — FOLLOW THE INSTRUCTIONS AT THE END OF THIS MESSAGE.

Sincerely,

Mark

Mark J. Anderson, PE, CQE
Principal, Stat-Ease, Inc. (http://www.statease.com)
2021 East Hennepin Avenue, Suite 480
Minneapolis, Minnesota 55413 USA

PS. Quote for the month — A political science professor on the statistical significance of the recount for Minnesota's Senate race between Coleman and Franken:

"The margin of error that Minnesota's election system provides is simply larger than that margin of victory in the 2008 Minnesota Senate race. That means the winner of the race is certainly the product of chance error"

—Steven E. Schier, Professor of Political Science, Carleton College. (Note that Senator Coleman won the seat of Paul Wellstone, a former professor of political science at Carleton College who tragically died in a plane crash just prior to the last election. The current election now favors Franken by ~200 votes — less than 0.0075% of the over 2.8 million votes cast!).

Trademarks: Stat-Ease, Design-Ease, Design-Expert and Statistics Made Easy are registered trademarks of Stat-Ease, Inc.

Acknowledgements to contributors:
—Students of Stat-Ease training and users of Stat-Ease software
—Stat-Ease consultants Pat Whitcomb, Shari Kraber and Wayne Adams (see http://www.statease.com/consult.html for resumes)
—Statistical advisor to Stat-Ease: Dr. Gary Oehlert (http://www.statease.com/garyoehl.html)
—Stat-Ease programmers, especially Tryg Helseth and Neal Vaughn (http://www.statease.com/pgmstaff.html)
—Heidi Hansel Wolfe, Stat-Ease sales and marketing director, and all the remaining staff

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Interested in previous FAQ DOE Alert e-mail newsletters?
To view a past issue, choose it below.

#1 Mar 01, #2 Apr 01, #3 May 01, #4 Jun 01, #5 Jul 01 , #6 Aug 01, #7 Sep 01, #8 Oct 01, #9 Nov 01, #10 Dec 01, #2-1 Jan 02, #2-2 Feb 02, #2-3 Mar 02, #2-4 Apr 02, #2-5 May 02, #2-6 Jun 02, #2-7 Jul 02, #2-8 Aug 02, #2-9 Sep 02, #2-10 Oct 02, #2-11 Nov 02, #2-12 Dec 02, #3-1 Jan 03, #3-2 Feb 03, #3-3 Mar 03, #3-4 Apr 03, #3-5 May 03, #3-6 Jun 03, #3-7 Jul 03, #3-8 Aug 03, #3-9 Sep 03 #3-10 Oct 03, #3-11 Nov 03, #3-12 Dec 03, #4-1 Jan 04, #4-2 Feb 04, #4-3 Mar 04, #4-4 Apr 04, #4-5 May 04, #4-6 Jun 04, #4-7 Jul 04, #4-8 Aug 04, #4-9 Sep 04, #4-10 Oct 04, #4-11 Nov 04, #4-12 Dec 04, #5-1 Jan 05, #5-2 Feb 05, #5-3 Mar 05, #5-4 Apr 05, #5-5 May 05, #5-6 Jun 05, #5-7 Jul 05, #5-8 Aug 05, #5-9 Sep 05, #5-10 Oct 05, #5-11 Nov 05, #5-12 Dec 05, #6-01 Jan 06, #6-02 Feb 06, #6-03 Mar 06, #6-4 Apr 06, #6-5 May 06, #6-6 Jun 06, #6-7 Jul 06, #6-8 Aug 06, #6-9 Sep 06, #6-10 Oct 06, #6-11 Nov 06, #6-12 Dec 06, #7-1 Jan 07, #7-2 Feb 07, #7-3 Mar 07, #7-4 Apr 07, #7-5 May 07, #7-6 Jun 07, #7-7 Jul 07, #7-8 Aug 07, #7-9 Sep 07, #7-10 Oct 07, #7-11 Nov 07, #7-12 Dec 07, #8-1 Jan 08, #8-2 Feb 08, #8-3 Mar 08, #8-4 Apr 08, #8-5 May 08, #8-6 June 08, #8-7 July 08, #8-8 Aug 08, #8-9 Sep 08, #8-10 Oct 08, #8-11 Nov 08, #8-12 Dec 08, #9-01 Jan 09, #9-02 Feb 09 (see above)

Software Training Consulting Publications Order Online Support Contact Us Search

Stat-Ease, Inc.
2021 E. Hennepin Avenue, Suite 480
Minneapolis, MN 55413-2726
e-mail: info@statease.com
p: 612.378.9449, f: 612.746.2069