Stat-Ease
If you are having trouble viewing this email view it online.
 
Vol: 12 | No: 5 | Sep/Oct'12
Stat-Ease
The DOE FAQ Alert
     
 

Heads-up (below!)
Everything you ever wanted to know about sums of squares but were afraid to ask.

Dear Experimenter,
Here’s another set of frequently asked questions (FAQs) about doing design of experiments (DOE), plus alerts to timely information and free software updates. If you missed previous DOE FAQ Alerts click here.

TIP: Get immediate answers to questions about DOE via the Search feature on the main menu of the Stat-Ease® web site. This pores not only over previous alerts, but also the wealth of technical publications posted throughout the site.

Feel free to forward this newsletter to your colleagues. They can subscribe by going to this registration page.

Also, Stat-Ease offers an interactive web site—The Support Forum for Experiment Design. Anyone (after gaining approval for registration) can post questions and answers to the Forum, which is open for all to see (with moderation). Furthermore the Forum provides program help for Design-Ease® and Design-Expert® software. Check it out and search for answers. If you come up empty, do not be shy: Ask your question! Also, this being a forum, we encourage you to weigh in with answers! The following Support Forum topic provides a sampling of threads that developed since my last Alert:

  • Area: Design Selection, Topic: “Repeats and degrees of freedom”, Question: “The help file in Design-Expert says that repeat measurements for a run should be processed outside of DX, and the average response used in the program. If I understand this correctly, I should be able to treat standard deviation this way as well, but how can I do this and avoid losing the degrees of freedom that are gained by running repeats?”

To open yet another avenue of communications with fellow DOE aficionados, sign up for The Stat-Ease Professional Network on Linked In and start or participate in discussions with other software users. A recent thread features “Have you ever used a split-plot design?” (a poll).

 
Stats Made Easy Blog

StatsMadeEasy offers wry comments weekly from an engineer with a bent for experimentation and statistics. Simply enter your e-mail in the forwarding field at www.StatsMadeEasy.net and get new StatsMadeEasy entries delivered directly to your inbox. Or, click this link to:

Subscribe with Feedburner

“Your StatsMadeEasy blogs brighten up a dreary work day...”
—Applied Statistician, Florida Smiley Face

Topics discussed since the last issue of the DOE FAQ Alert (latest one first):

Also see the new comments about my 7/29/12 blog on “Polysci prof asks ‘Is Algebra Necessary?” Please do not be shy about adding your take about any news or views you see in StatsMadeEasy.  Thanks for paying attention.

 

 
 


If this newsletter prompts you to ask your own questions about DOE, please address them via e-mail to: [email protected].


-

 
Topics in the body text of this DOE FAQ Alert are headlined below (the expert ones, if any, delve into statistical details):

1:  Newsletter alert: September issue of the Stat-Teaser reveals how to make tastier instant noodles
2:  FAQ*: Why don't the factor's sum of squares (S) always sum up exactly to the model SS in the analysis of variance (ANOVA) like they do in other software?
*(Answer appended with expert-level commentary.)
3:  Expert FAQ: Mixture design with component(s) going to zero for which one can choose differing categorical types
4:  Book giveaway: Winners announced!
5:  Webinar alert: Learn some tricks of the trade from Real-life DOE
6:  Events alert: Short course on “DOE Tools to Combine Mixture and Process Variables”
7:  Workshop alert: See when and where to learn about DOE
 
 


PS. Quote for the month: The value of jargon


- Back to top -


1: Newsletter alert: September issue of the Stat-Teaser reveals how to make tastier instant noodles

Many of you have received (or soon will) a printed copy of the latest Stat-Teaser, but others, by choice or because you reside outside of North America, will get your only view of the September issue at this link. It features a report by Stat-Ease Consultant Brooks Henderson on results from an experiment done by him and his colleagues from our programming team on instant noodles—a staple for lunch here at our office.  See the results of their matchup of brands and flavors made with varying amounts of water over a range of cooking times.  The results turned out to be a bit counter-intuitive.

This Stat-Teaser also provides an educational article by Consultant Shari Kraber on getting a good start on a designed experiment.

Thank you for reading our newsletter.  If you get the hard copy, but find it just as convenient to read what we post to the Internet, consider contacting us to be taken off our mailing list, thus conserving resources.  (Note: You will be notified via the DOE FAQ Alert on new newsletter posts.)  In any case, we appreciate you passing along hard copies and/or the link to the posting of the Stat-Teaser to your colleagues.


- Back to top -


2: FAQ: Why don't the factor's sum of squares (SS) always sum up exactly to the model SS in the analysis of variance (ANOVA) like they do in other software?

Original Question:

From a Biostatistician:
“I would like to report an issue with Design-Expert® software regarding the ANOVA table: From time to time the sum of the factors SS is not equal to the model SS.  I have compared specific ANOVA tables with ones obtained with the same dataset analyzed by another statistical program in which the model SS is perfectly equal to the sum of the factors SS.  Could you please explain this difference?”

Answer:

From Stat-Ease Consultant Shari Kraber:

The difference is in the type of sum of squares calculation that is used.  For numeric factors, Design-Expert defaults to partial SS (also known as Type III) whereas your other statistical program defaults to sequential SS (Type I).  This is all detailed in our program Help under Contents, Analysis of Variance Details, Sum of Squares.

Program Help on Sum of Squares
Screen Shot of Program Help on Sum of Squares (only the beginning part shown)

You may change the default in Design-Expert via Edit, Preferences, Math.  In the example you provided, if you change to sequential SS, the output from Design-Expert will then match the calculations done in your other program.”

Consultant Wayne Adams adds:
“When a design is balanced and orthogonal Type I = Type II* = Type III SS.   This is why sometimes the Type III sums add up to model.  But as soon as things become non-orthogonal, then the sums of squares won’t match.

It is preferred to use type III to test for whether adding an additional term provides significant improvement to the model fit when fitting a response surface model.  If any of the factors in the model are numeric this is the assumed goal of the analysis.

Follow this link to a more complete reference on SS Types.  Although it refers to another statistical package (SAS), we follow the same protocol.

*PS. When the factors in the model are all categoric, Type II SS is used to test whether the means of different treatment combinations are significantly different.  Main effects are tested assuming interactions will not be significant, then the interactions are tested anyway.  If all the factors in the model are two-level factors, then Type III is used; in this case a main effect is completely a linear effect.”


(Learn more about ANOVA by attending the two-day computer-intensive workshop Experiment Design Made Easy.  Click on the title for a description of this class and link from this page to the course outline and schedule.  Then, if you like, enroll online.)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here’s a detailed contribution on this issue from Professor Gary Oehlert.  I moved his conclusion to the front for those, like me, who may reach a point part way through where the details become convincing.  I recommend you read Chapter 10, Section 1 in Dr. Oehlert’s book A First Course in Design and Analysis of Experiments, which is now available online free of charge at this site.  Also, if you send me a request, I will email you a white paper by our statistical intern Martin Bezener on “Computing Sum of Squares in Unbalanced 22 Factorial Design.”  It provides a concise explanation of Type I, II and III approaches to the analysis of variance (ANOVA).

Summary
“By default Design-Expert is doing Type II for categoricals and Type III for continuous predictors.  These tests do not depend on the order in which the variables were entered or the (unknown to the software at this stage) weights that a user might want for the categoricals.  If we had a meaningful sequence for the continuous terms, then Type I for continuous would save us a little time, but the only place I can think of doing that automatically is for polynomials in a single variable.”

Discussion

“There have been arguments about this for decades, although I think that much of it has settled down, or maybe folks just got tired of it.

Type I is sequential sums of squares. Type I sums of squares are "improvement" sums of squares calculated by looking at the decrease in error SS when you add each additional term into the model sequentially.  The problem with Type I sums of squares is that they depend on the sequence of the terms in your model.  If you put in A then B, or you put in B then A, then you will get different sums of squares and thus different tests.  Order matters unless everything is completely orthogonal. In this sense you cannot talk about the Type I SS or test, you can only talk about the Type I SS or test for this particular sequence of model terms.  Change the order and you change the SS and the tests. So while the SS for the Type I decomposition add up to the model SS (a nice enough property), the allocation depends on the sequence (not so nice if you only have one sequence and it's not the right sequence).

The tests you can do with Type I are to test the last one in; if that is not needed, then you can test the next to last one in, and so on. Type I will not let you test anything earlier in the model than the last ("rightmost") needed term. In regression settings where the order of terms is well defined (for example, polynomial regression on a single predictor), Type I is very useful.  However, if there is no natural order (e.g., unrelated predictors or polynomials on multiple predictors), then Type III lets you test each variable as if it were the last one in (ie, the first one that Type I would test).  To get the same from Type I would require lots of refitting with different sequences.  To be fair, if you drop a term with Type III, you should really refit the model without it as well.

The sequential nature is also true for Type I with categorical factors.  However, the situation with Type III versus Type I versus Type II is even more complicated in this case. Type III seems like the obvious thing to do in all situations.  When you look at what is actually tested (is beta_j = 0 or are all of gamma_1, gamma_2, ... gamma_k), it seems to make sense. But it's not that easy.

Here is the issue for me: with categorical factors, the coefficients (the treatment effects) are not well defined. There are infinitely many equally valid sets of coefficients. By convention, we usually use the set that arises from equally weighted averages; for example, if I want a row effect, I take equally weighted averages across columns to get the row effects. However, there is nothing to say that the convention is correct for your problem. If your columns represent different kinds of machines, and in your factory you have twice as many of machine types 1 and 2 as you do of type 3 and 4, then you may well want an unequally weighted average across machines when you go to thinking about what the row effects are. Different weightings lead to different sets of coefficients; coefficients can be zero in one weighting and nonzero in another weighting, all for the same set of data.

Type III assumes that the conventional assumption of equal weights in all averaging situations is correct in every problem.  Using Type III means that you believe in equal weighting.  A Type III test is a comparison of the full model with a (possibly non-hierarchical) model that does not contain the term of interest.

If there is no interaction between A and B, then the weighting across the A levels doesn't matter when comparing levels of B, and the weighting across the B levels does not matter when comparing levels of factor A.  More generally, if we compare a small hierarchical model to a larger hierarchical model that includes it, then ANOVA is testing the correct hypotheses regardless of the weighting that is appropriate for the problem.  Type II is a set of tests that compares nested hierarchical models, so Type II does not depend on the weighting.  Another way of thinking about this is Type II is less dependent on the parameterization.  In addition, Type II does not depend on the order of the terms in the model, just what terms are in the model.

If you really, honest to goodness, believe that equal weighting is the right thing for your problem, then Type III is OK, even for categorical factors.  Otherwise, I go Type II for categorical factors.

Type II SS can be computed from Type I SS, but you need to use a bunch of different sequences of terms. For example, the Type I SS for C in the model A + B + AB + C + AC + BC + ABC is also the Type II SS for C. (Similarly for BC and ABC.)  To get the Type II SS for A, you could use Type I SS from the model B + C + BC + A (it only depends on what comes before, not on what may come after).  In the end, Type II does not depend on the order, because the software is going to do a bunch of different Type I sequences internally and the user never sees it.”


- Back to top -


3: Expert FAQ: Mixture design with component(s) going to zero for which one can choose differing categorical types

Original Question:

From a Senior Statistician:
“I have a question that I’ve had to deal with over the years and have never figured it out to my satisfaction.  I’ve always been able to practically interpret results and get direction, but this time it’s causing me angst.  Here is my dilemma:

As an example, assume a design with one numerical and one categorical factor as follows:

  1. Ingredient Level in formulation (representing a class of ingredients):  ranging from 0 – X%
  2. Ingredient Type: A vs B

At the 0% level of Type A and Type B, the composition is the same, however, entered as a response surface you have to “label it” as either having A or B even though it has neither.  When I run the experiment, I get a different response for the 0% level of A, and 0% level of B, but they really represent variability (replicates) of the composition with either A or B.  Usually, this works out where indeed 0% of A and 0% of B give the same general response, but in my latest experiment, this was not the case and the model predicts a big difference between 0% A and 0% B.

Is there a better way to analyze this, or some tricks to use?  I suppose this is best done as a mixture with a categorical variable for ingredient A type.”

Answer:

From Stat-Ease Consultant Pat Whitcomb:
“My presentation in September to the European Network for Business and Industrial Statistics (ENBIS) on “Categoric Mixture Components Proportion Going to Zero” deals with this question very precisely.  View the slides here.”

(Learn more about advanced mixture design and analysis by attending the two-day computer-intensive workshop Advanced Formulations: Combining Mixture & Process Variables.  Click on the title for a description of this class and link from this page to the course outline and schedule.  Then, if you like, enroll online.)


- Back to top -


4: Book giveaway: Winners announced!

Our latest book giveaway offered up:

  • Two worn, but serviceable, Design and Analysis of Experiments: 4th Edition texts by Doug Montgomery—these provide all the basics of DOE
  • A slightly used DOE Simplified book, signed by the authors (me and Pat), from exhibits at technical conferences
  • A new RSM Simplified book, signed by the two of us, from the latest printing by CRC Press (Taylor & Francis Group)

These lucky readers were selected at random from dozens of entrants*:

  • Katrina Labude, Director, Technology & Quality, ConocoPhillips, Ponca City, OK
  • Alex Nunez, Senior Process Improvement Engineer, Weber Metals, Los Angeles, CA
  • Dusty Vaughn, Project Engineer, Aerospace Testing Alliance, Arnold AFB, TN
  • Joe Mulligan, Senior Mechanical Engineer, G5 Engineering Solutions, Tallahassee, FL

Congratulations to these winners and condolences to the others who entered into this contest.  Keep watching for more great books to be given away in the future.

*(Due to the high cost of shipping, this drawing only offered to residents of the United States and Canada.)


- Back to top -


5: Webinar alert: Learn some tricks of the trade from Real-life DOE

On Wednesday, October 17 at 8 AM CDT* I will reveal some tricks of our trade for Real-Life DOE.  If you are just beginning with factorial screening and characterization experiments, this webinar is for you!  More advanced practitioners might glean an “aha!” as well, and/or follow up afterward with suggestions that I can share via the DOE FAQ Alert.  I will reprise this webinar on Wednesday, November 14 at 12 PM (noon).

Stat-Ease webinars vary somewhat in length depending on the presenter and the particular session—mainly due to breaks for questions: Plan for 45 minutes to 1.5 hours, with 1 hour being the target median.  When developing these one-hour educational sessions, our presenters often draw valuable material from Stat-Ease DOE workshops.

Attendance may be limited, so sign up soon by contacting our Communications Specialist, Karen Dulski, via [email protected].  If you can be accommodated, she will provide immediate confirmation and, in timely fashion, the link with instructions from our web-conferencing vendor GotoWebinar.

*(To determine the time in your zone of the world, try using this link.  We are based in Minneapolis, which appears on the city list that you must manipulate to calculate the time correctly.)


- Back to top -


6: Events alert: Short course on “DOE Tools to Combine Mixture and Process Variables”

(Last notice) Stat-Ease Consultant Pat Whitcomb will present a one-day short course on “DOE Tools to Combine Mixture and Process Variables” on Saturday, October 6, following the 2012 Fall Technical Conference (FTC) in Saint Louis, Missouri.  Follow this link to all the details, including the time when Pat will present an encore on “How to Design Experiments when Categoric Mixture Components Go to Zero” to this American audience.

Click here for a list of upcoming appearances by Stat-Ease professionals.  We hope to see you sometime in the near future!

PS.  Do you need a speaker on DOE for a learning session within your company or technical society at regional, national, or even international levels?  If so, contact me.  It may not cost you anything if Stat-Ease has a consultant close by, or if a web conference will be suitable.  However, for presentations involving travel, we appreciate reimbursement for travel expenses.  In any case, it never hurts to ask Stat-Ease for a speaker on this topic.


- Back to top -


7: Workshop Alert: See when and where to learn about DOE

Seats are filling fast for the following DOE classes.  If possible, enroll at least 4 weeks prior to the date so your place can be assured.  However, do not hesitate to ask whether seats remain on classes that are fast approaching!  Also, take advantage of a $395 discount when you take two complementary workshops that are offered on consecutive days.

All classes listed below will be held at the Stat-Ease training center in Minneapolis unless otherwise noted.

* Take both EDME/RSM or EDME/MIX workshops in the same week to earn $395 off the combined tuition!

** Take both MIX and MIX2 in the same week to earn $395 off the combined tuition!

See this web page for complete schedule and site information on all Stat-Ease workshops open to the public.  To enroll, click the "register online" link on our web site or call Elicia at 612-746-2038.  If spots remain available, bring along several colleagues and take advantage of quantity discounts in tuition.  Or, consider bringing in an expert from Stat-Ease to teach a private class at your site.***

***Once you achieve a critical mass of about 6 students, it becomes very economical to sponsor a private workshop, which is most convenient and effective for your staff.  For a quote, e-mail [email protected].


- Back to top -


Please do not send me requests to subscribe or unsubscribe—follow the instructions at the very end of this message.
I hope you learned something from this issue. Address your general questions and comments to me at: [email protected].

Sincerely,

Mark

Mark J. Anderson, PE, CQE
Principal, Stat-Ease, Inc.
2021 East Hennepin Avenue, Suite 480
Minneapolis, Minnesota 55413 USA


PS. Quote for the month—the value of jargon:


"
Like other occult techniques of divination, the statistical method has a private jargon deliberately contrived to obscure its methods from non-practitioners."

—G.
O. Ashley

Trademarks: Stat-Ease, Design-Ease, Design-Expert and Statistics Made Easy are registered trademarks of Stat-Ease, Inc.

Acknowledgements to contributors:
—Students of Stat-Ease training and users of Stat-Ease software
Stat-Ease consultants Pat Whitcomb, Shari Kraber, Wayne Adams and Brooks Henderson
—Statistical advisor to Stat-Ease: Dr. Gary Oehlert
Stat-Ease programmers led by Neal Vaughn
—Heidi Hansel Wolfe, Stat-Ease marketing director, Karen Dulski, and all the remaining staff that provide such supreme support!

Twitter-SmileyFor breaking news from Stat-Ease go to this Twitter site.

DOE FAQ Alert ©2012 Stat-Ease, Inc.
Circulation: Over 6100 worldwide
All rights reserved.

 
  Subscribe