DOE Wizard - Screening Designs

STATGRAPHICS – Rev.
7/16/2009
DOE Wizard – Screening Designs
Summary
The Experimental Design section of STATGRAPHICS contains a set of procedures that support
the design and analysis of many different types of experiments. These procedures enable the
analyst to construct a set of experimental runs that will yield the maximum amount of
information about a process in the smallest number of trials. In contrast to haphazard
experimentation, designed experiments are characterized by systematic manipulation of a
process in order to determine the effects attributable to different factors.
In the early stages of an investigation, the analyst is often faced with a long list of factors that
could affect the process. For example, in a typical chemical process, there could easily be
dozens of factors which have an impact on the yield of the process, such as the temperature at
which it is run, the amount of catalyst added, the mixing speed, and so on. Since it is difficult to
study many factors in detail simultaneously, screening designs have been developed to quickly
determine which factors have the greatest impact on a process.
This document describes the construction and analysis of designs that are intended to identify the
most important factors. After the critical factors are determined, a more complicated
experimental design involving a larger set of factor levels may be constructed to find the optimal
settings for those factors.
Example
As an example, a typical screening experiment will be constructed involving 5 factors and 1
response. The example, which involves a chemical reaction, is discussed in Chapter 12 of the
well-known book by Box, Hunter and Hunter (1976). The factors that will be varied are:
X1: feed rate

X2: amount of catalyst
X3: agitation rate
X4: temperature
X5: concentration
There is one response variable:
Y: percent reacted
Sample StatFolio: doewiz screening.sgp
 2009 by StatPoint Technologies, Inc. DOE Wizard – Screening Designs - 1

STATGRAPHICS – Rev. 7/16/2009
Design Creation
To begin the design creation process, start with an empty StatFolio. Select DOE – Experimental
Design Wizard to load the DOE Wizard’s main window. Then push each button in sequence to
create the design.
Step #1 – Define Responses
The first step of the design creation process displays a dialog box used to specify the response
variables. For the current example, there is a single response variable:
 Name: The name for the variable is reacted.
 Units: Reacted is measured as a percentage.
 Analyze: The parameter of interest is the mean percent reacted.
 Goal: The goal of the experiment is to maximize the reacted percentage.
 Impact: The relative importance of each response (not relevant if only one response).
 Sensitivity: The importance of being close to the best desired value (in this case, the
Maximum). Setting Sensitivity to Medium implies that the desirability attributed to the
response rises linearly between the Minimum and Maximum values indicated.
 Minimum and Maximum: Range of desirable values for the response.

Step #2 – Define Experimental Factors
The second step displays a dialog box on which to specify the factors that will be varied. In the
chemical reaction example, there are 5 factors:
 Name – Each factor must be assigned a unique name.
 Units – Units are optional.
 Type – Set the type of each component to Continuous, since they can be set at any value
within a continuous interval.
 Role – Set the role of each component to Controllable.
 Low - the lower level Lj for the factor.
 High - the upper level Uj for the factor.
Step #3 – Select Design

The third step begins by displaying the dialog box shown below:

Since all of the factors are controllable process factors, only one Options button is enabled.
Pressing that button displays a second dialog box:
Four general classes of designs are offered:
1. Screening - designs intended to select the most important factors affecting a response.
Most of the designs involve only 2 levels of each factor. The factors may be quantitative
or categorical.
2. Response Surface - designs intended to select the optimal settings of a set of experimental
factors. The designs involve at least 3 levels of the experimental factors, which must be
quantitative.

3. Multilevel Factorial - designs involving different numbers of levels for each
experimental factor. The factors must be quantitative.
4. Orthogonal Array – a general class of designs developed by Genichi Taguchi. The

factors may be quantitative or categorical.
If Screening Designs is selected, a third dialog box will be displayed listing all of the screening
designs available for 5 experimental factors:
 Name - the design name, including an abbreviation such as 2^5 if relevant. For screening
designs, the following types may appear in the list, depending upon the number of
experimental factors:
1. Factorial - includes runs at all combinations of the low and high levels of each factor,
for a total of 2k runs. Such designs are capable of estimating the main effects of all
factors and all interactions amongst the factors.
2. Factorial in m blocks - includes the same runs as a full factorial design. However, the
runs are divided into blocks, which are groups of runs to be done together (on the
same day, or by the same operator, or from the same batch of raw material) to
eliminate the effect of one or more nuisance factors. As the number of blocks
increases, the ability to estimate certain interactions is lost. The Alias Structure table
displayed after the design is initially created shows which interactions are confounded
with block effects.
3. Half fraction (or quarter, eighth, …) - a subset of the runs in a full factorial design,
either one-half of the full 2k runs, one-fourth of the runs, one-eighth of the runs, or
some other regular fraction. The number of runs in the design equals 2k-p, where p =
1 for a half-fraction, p = 2 for a quarter fraction, p = 3 for an eighth fraction, etc. For
such designs, the Resolution field indicates important information about what order of

interactions may be estimated by such a design, as described below. As with blocked
factorials, the Alias Structure table shows the confounding pattern of the design.
4. Irregular fraction - fractional factorial designs in which the number of runs is not a
power of 2. Certain irregular fractions, although not completely orthogonal, have
attractive confounding patterns. The designs included here are those described by
Haaland (1989).
5. Mixed level fraction - in contrast to all of the other screening designs, these designs
allow one factor (factor A) to be run at 3 levels rather than 2. This allows a quadratic
effect to be estimated for that factor, which must be quantitative. For the other
factors, the runs form a standard fractional factorial design. The designs included are
those described by Haaland (1989).
6. Plackett-Burman - two-level designs intended for screening a large number of factors

in a small number of runs, where the number of runs is not a power of 2. For
example, a design is available for studying 11 factors in 12 runs. Main effects are
confounded with 2 factor interactions, so the design should only be used when
interactions are either not present or known to be small.
7. Folded Plackett-Burman - similar to Plackett-Burman designs, except the two-factor

interactions are not confounded with main effects. However, two-factor interactions
are badly confounded amongst themselves and cannot be resolved.
8. Used-specified design - allows an empty experiment datasheet to be constructed so

that the analyst can enter his or her own runs. This allows the analysis procedures to
be executed using a design created elsewhere. The user should be careful, when
creating such a design, to enter the proper low and high values for each factor on the
earlier dialog boxes, since the manner is which the effects are defined during the
analysis depends on these low and high settings.
 Runs - the number of runs in the base design, before adding any additional replicates or
centerpoints.
 Resolution - an indication of the confounding pattern of the design. Designs are classified as
having one of the following resolutions:
Resolution III: designs which confound the estimates of the main effects with two-factor
interactions. Such designs can be safely interpreted only if all two-factor interactions are
small or non- existent.
Resolution IV: designs which are capable of obtaining clear estimates of all main effects.
However, some or all of the two-factor interactions are confounded with other two-factor
interactions or block effects. The Alias Structure table described below indicates where
the confounding occurs.
Resolution V: designs which are capable of obtaining clear estimates of all main effects
and all two-factor interactions. Higher order interactions, however, are confounded with
these effects. In most cases, this is not a problem since third-order and higher effects are

usually assumed to be small or non-existent. Resolution V designs are typically excellent
selections.
Resolution V+: the design has resolution greater than 5, allowing for the estimation of 3-
factor or higher order interactions if desired.
For blocked designs, an asterisk is shown next to the design resolution to indicate that the
stated resolution assumes that blocking factors do not interact with experimental factors, the
standard assumption when the analysis is performed.
 Error d.f. - the number of degrees of freedom from which the experimental error may be
estimated remaining after estimating all main effects, second-order interactions, and
quadratic effects (if relevant). This is prior to any replication or addition of centerpoints. In
general, at least 3 d.f. must be available if the statistical tests to be performed during the
analysis are to have reasonable statistical power.
 Block Size - for a design that is run in more than 1 block, the number of runs in the largest
block.
For the current example, a 16 run half-fraction will be selected. This design is resolution V,
which means it is capable of estimating all main effects and two-factor interactions. However,
there are 0 degrees of freedom remaining to estimate the experimental error. In order to do
formal statistical testing, additional runs will need to be added to the base design.
The final dialog box allows the analyst to add additional runs to the design and to specify the
order in which the runs will be performed:
 Centerpoints (number) - the number of centerpoints to be added to the base design, which
are additional experimental runs located at a point midway between the low and high level of
all the factors. Each additional centerpoint adds one degree of freedom from which to
estimate experimental error. If the design involves a single categorical factor, the
centerpoints will be placed at a middle level of the quantitative factors and divided equally
between the two levels of the categorical factor.
 Centerpoints (placement) - positioning of the centerpoints with respect to the runs in the
base design. They may be randomly scattered throughout the other experimental runs,
spaced evenly throughout the other runs, or placed at the beginning or end of the experiment.
The first two options are usually preferable.
 Replicate design - if a number other than 0 is entered, the entire design will be repeated the
indicated number of times.
 Randomize - check this box to randomly order the runs in the experiment. Randomization is
generally a good idea, since it can reduce the effect of lurking variables such as trends over
time. However, when replicating the examples in this documentation, do not randomize the
designs.
 Generate Button - this button displays a dialog box that allows experienced analysts to
change the design generators for fractional factorial designs:
In order to generate a 2k-p fractional factorial design, a full factorial design for k - p factors is
first generated. Columns for the additional p factors are then created by multiplying together
various combinations of columns of the initial factorial. In the current example, the column
for factor E is created by multiplying column A by column B by column C by column D.
This is abbreviated as
E = ABCD (1)
Alternatively, column E could have been created by multiplying together the same four
columns and then changing the sign, i.e.,
E = -ABCD (2)
which would result in a different set of 16 runs. For details of these procedures, see Box,
Hunter and Hunter (1976).
Note that 3 centerpoints have been added to the base design in the current example, resulting in a
total of 19 runs and providing 3 degrees of freedom from which to estimate the experimental
error. Since the Spaced option has been selected, the three centerpoints will be positioned at the
beginning, middle, and end of the experiment. The tentatively selected design is displayed in the
Select Design dialog box:
If the design is acceptable, press OK to save it to the STATGRAPHICS DataBook and return to
the DOE Wizard’s main window, which should now contain a summary of the design:

Step #4: Specify Model
Before evaluating the properties of the design, a tentative model must be specified. Pressing the
fourth button on the DOE Wizard’s toolbar displays a dialog box to make that choice:

The default model includes main effects for each of the 5 experimental factors, together with 10
two-factor interactions (shown as two-letter combinations). Selected terms could be excluded by
double-clicking on them with the left mouse button.
Step #5: Select Runs
Since we intend to run all of the runs in the base design, this step can be omitted.

Design Properties
Step #6: Evaluate Design
Several of the selections presented when pressing button #6 are helpful in evaluating the selected
design:
Design Worksheet
The design worksheet shows the 19 runs that have been created, in the order they are to be run:
Worksheet for <untitled> - Chemical reaction screening experiment

run feed rate catalyst agitation temperature concentration reacted
liters/min % rpm degrees % %
1 12.5 1.5 110.0 160.0 4.5
2 10.0 1.0 100.0 140.0 6.0
3 15.0 1.0 100.0 140.0 3.0
4 10.0 2.0 100.0 140.0 3.0
5 15.0 2.0 100.0 140.0 6.0
6 10.0 1.0 120.0 140.0 3.0
7 15.0 1.0 120.0 140.0 6.0
8 10.0 2.0 120.0 140.0 6.0
9 15.0 2.0 120.0 140.0 3.0
10 12.5 1.5 110.0 160.0 4.5
11 10.0 1.0 100.0 180.0 3.0
12 15.0 1.0 100.0 180.0 6.0
13 10.0 2.0 100.0 180.0 6.0
14 15.0 2.0 100.0 180.0 3.0
15 10.0 1.0 120.0 180.0 6.0
16 15.0 1.0 120.0 180.0 3.0
17 10.0 2.0 120.0 180.0 3.0
18 15.0 2.0 120.0 180.0 6.0
19 12.5 1.5 110.0 160.0 4.5
Note that 3 centerpoints have been added to the 16 runs in the base design, one at the beginning
of the experiment, one halfway through, and one at the end.
ANOVA Table
The ANOVA table shows the breakdown of the degrees of freedom in the design:
ANOVA Table
Source D.F.
Model 15
Total Error 3
Lack-of-fit 1
Pure error 2
Total (corr.) 18
15 of the 18 total degrees of freedom are used to estimate the main effects and two-factor
interactions.

Model Coefficients
The table of model coefficients is shown below:
Model Coefficients
Power at Power at Power at

Coefficient Standard Error VIF Ri-Squared SN = 0.5 SN = 1.0 SN = 2.0
A 0.25 1.0 0.0 11.13% 28.88% 75.50%
B 0.25 1.0 0.0 11.13% 28.88% 75.50%
C 0.25 1.0 0.0 11.13% 28.88% 75.50%
D 0.25 1.0 0.0 11.13% 28.88% 75.50%
E 0.25 1.0 0.0 11.13% 28.88% 75.50%
AB 0.25 1.0 0.0 11.13% 28.88% 75.50%
AC 0.25 1.0 0.0 11.13% 28.88% 75.50%
AD 0.25 1.0 0.0 11.13% 28.88% 75.50%
AE 0.25 1.0 0.0 11.13% 28.88% 75.50%
BC 0.25 1.0 0.0 11.13% 28.88% 75.50%
BD 0.25 1.0 0.0 11.13% 28.88% 75.50%
BE 0.25 1.0 0.0 11.13% 28.88% 75.50%
CD 0.25 1.0 0.0 11.13% 28.88% 75.50%
CE 0.25 1.0 0.0 11.13% 28.88% 75.50%
DE 0.25 1.0 0.0 11.13% 28.88% 75.50%
alpha = 5.0%, sigma estimated from total error with 3 d.f.
Since the design is perfectly orthogonal, all of the variance inflation factors (VIF) are equal to
their ideal value of 1.0. The rightmost column shows that there is a 75.5% chance of detecting
any effects with a magnitude equal to 2 times the standard deviation of the experimental error.
Correlation Matrix
The correlation matrix has 0’s in all the off-diagonal locations, showing that the estimates of the
main effects and two-factor interactions will all be uncorrelated.
Correlation Matrix
A B C D E AB AC AD AE BC BD BE CD CE DE
A 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
B 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
C 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
D 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
E 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
AB 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
AC 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
AD 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
AE 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000
BC 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000
BD 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000
BE 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000
CD 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000
CE 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000
DE 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000

Design Points
The graph of the design points shows that each pair of factors is run at all combinations of the
high and low levels:
15
feed rate
10
2
catalyst
1
120
agitation
100
180
temperature
140
6
concentration
3
Prediction Variance Plot
The prediction variance plot shows that the variance of the predicted response will be fairly
constant over most of the experimental region;
Prediction Variance Plot

agitation=110.0,temperature=160.0,concentration=4.5
1
0.8
Stnd. error
0.6
0.4
0.2
2
0 1.8
1.6
10 1.4
11 12 13 1.2
14 15 1 catalyst
feed rate
The only location where the variance is relatively high is close to the vertices.

Saving the Design File
Step #7: Save experiment
Once the experiment has been created and any additional runs entered, it must be saved on disk.
Press the button labeled Step 7 and select a name for the experiment file:
Design files are extended data files and have the extension .sgx. They include the data together
with other information that was entered on the input dialog boxes.
To reopen an experiment file, select Open Data File from the File menu. The data will be loaded
into the datasheet, and the Experimental Design Wizard window will be displayed.

Analyzing the Results

After the design file has been created and saved, the experiments would be performed. At a later
date, once the results have been collected, the experimenter would return to STATGRAPHICS
and reopen the saved design file using the Open Data Source selection on the main File menu.
The results can then be typed into the response columns. The results for the example are
displayed below:
run feed rate catalyst agitation temperature concentration reacted
(liters/min) (%) (rpm) (degrees) (%) (%)
1 12.5 1.5 110.0 160.0 4.5 65.0
2 10.0 1.0 100.0 140.0 6.0 56.0
3 15.0 1.0 100.0 140.0 3.0 53.0
4 10.0 2.0 100.0 140.0 3.0 63.0
5 15.0 2.0 100.0 140.0 6.0 65.0
6 10.0 1.0 120.0 140.0 3.0 53.0
7 15.0 1.0 120.0 140.0 6.0 55.0
8 10.0 2.0 120.0 140.0 6.0 67.0
9 15.0 2.0 120.0 140.0 3.0 61.0
10 12.5 1.5 110.0 160.0 4.5 67.0
11 10.0 1.0 100.0 180.0 3.0 69.0
12 15.0 1.0 100.0 180.0 6.0 45.0
13 10.0 2.0 100.0 180.0 6.0 78.0
14 15.0 2.0 100.0 180.0 3.0 93.0
15 10.0 1.0 120.0 180.0 6.0 49.0
16 15.0 1.0 120.0 180.0 3.0 60.0
17 10.0 2.0 120.0 180.0 3.0 95.0
18 15.0 2.0 120.0 180.0 6.0 82.0
19 12.5 1.5 110.0 160.0 4.5 63.0
Important Notes:
1. If more than one sample was taken at each set of experimental conditions, the data values
should be entered into data tables B through Z. The summary statistics in data table A
will then be automatically calculated from the other tables. Do not treat the samples as
replicates unless you actually reset the process between each sample.
2. If any experiments were not performed, leave the corresponding cell blank. The program
will recognize the imbalance in the design and handle it.
3. If any experimental runs were done at conditions different than originally planned,
change the entries in the experimental factor columns to correspond to the values that
were actually used.
4. If additional runs were performed, you may add them to the bottom of the datasheet.
They will be included in the fit.

Statistical Model
The statistical model upon which the analysis of screening designs is based expresses the
response variable Y as a linear function of the experimental factors, interactions between the
factors, and an error term. There are two types of models that are generally fit, illustrated below
for 5 experimental factors:
1. First-order model – contains terms representing main effects only.
Y = 0 + 1X1 + 2X2 + 3X3 + 4X4 +5X5 +  (3)
2. Second-order model – contains terms representing main effects and second-order

interactions.
Y = 0 + 1X1 + 2X2 + 3X3 + 4X4 +5X5
+ 12X1X2 + 13X1X3 + 14X1X4 +15X1X5 + 23X2X3
+ 24X2X4+ 25X2X5 + 34X3X4 + 35X3X5 + 45X4X5 +  (4)
The experimental error  is assumed to be randomly drawn from a normal distribution with a
mean of 0 and a standard deviation equal to . In rare cases, interactions amongst 3 or more
factors may be included by adding terms consisting of cross-products of more than 2 factors. For
the occasional screening design that has 3 levels of a factor, a term such as  11 X 12 would also be
included in the second-order model.
For quantitative variables, STATGRAPHICS represents the experimental factor Xj using its
original values as entered into the datasheet. For categorical factors, indicator variables are used
of the form
-1 at low level of factor j

Xj = (5)
+1 at high level of factor j
where the “low” and “high” levels are those defined when the design was constructed.
Effects
In order to simplify the interpretation of screening designs, it is common to reexpress the above
model in terms of “effects”. The main effect of factor j is defined as the change in the response
variable Y when Xj is changed from its low level to its high level, with all other factors being
held constant midway between their lows and their highs. In a balanced two-level factorial
design, the estimated effect of factor j equals the difference between the average response at the
high level of the factor and the average response at the low level of the factor:
 j  y   y  (6)
where
y  = average response at high level of factor j
y  = average response at low level of factor j
In an unbalanced design, the effect is a more complicated function of the coefficients, but the
basic interpretation remains the same.
Two-factor interactions may also be defined. In general, a two-factor interaction may be thought
of as the additional effect of one factor over and above the main effect when the second factor is
held at its high level. In a balanced two-level factorial design, this interaction effect equals

ˆ jk  y    y  ˆ j  ˆ k  (7)
where
y   = average response at the high levels of both factors j and k.
and y is the grand average:
y i
y i 1
(8)
n
An additional important characteristic of the effects is that they are all expressed in units of the
response variable, so that effects of factors can be compared directly, regardless of the units in
which the factors are expressed.
Step #8: Analyze data
Once the data have been entered, press the button labeled Step #8 on the Experiment Design
Wizard toolbar. This will display a dialog box listing each of the response variables:

 Response: column containing the response variable to be analyzed.
 Transformation: the desired transformation to be applied before the model is fit.
 Power and addend: the transformation parameters if a Power or Box-Cox transformation is

selected.
If more than one response has been measured, you should repeat this step once for each response.

Analysis Summary
The analysis of a screening design involves estimating the average or main effect of each
experimental factor and interactions between the factors. The Analysis Summary displays
information about the estimated effects:
Analyze Experiment - reacted

File name: chemical reaction.sfx
Comment: Chemical reactor example
Estimated effects for reacted (%)

Effect Estimate Stnd. Error V.I.F.
average 65.2105 0.378313
A:feed rate -2.0 0.824515 1.0
B:catalyst 20.5 0.824515 1.0
C:agitation 0.0 0.824515 1.0
D:temperature 12.25 0.824515 1.0
E:concentration -6.25 0.824515 1.0
AB 1.5 0.824515 1.0
AC 0.5 0.824515 1.0
AD -0.75 0.824515 1.0
AE 1.25 0.824515 1.0
BC 1.5 0.824515 1.0
BD 10.75 0.824515 1.0
BE 1.25 0.824515 1.0
CD 0.25 0.824515 1.0
CE 2.25 0.824515 1.0
DE -9.5 0.824515 1.0
Standard errors are based on total error with 3 d.f..
The table shows:
 Average - the estimated response at the center of the design region. For complete data from
most orthogonal designs, this equals the grand average of all the data values y .
 Estimated main effects - the difference between the response at the high level of a factor
and the response at the low level of a factor, when all other factors are held at their central
values.
 Estimated 2-factor interactions - the additional effect of one factor when a second is held at
its high level. Interactions occur when the effect of one factor is different at different levels
of another factor.
 Other effects - defined as twice the coefficient associated with the corresponding term in the
regression model when all variables are standardized according to:
X j  low j
X j  (9)
high j  low j
 Standard errors - Each effect is shown by default with its estimated standard error. The
standard errors are measures of the estimation error associated with each effect. The display
can be changed to show confidence intervals for each effect using the Analysis Options
dialog box.

 V.I.F. – variance inflation factors that measure the extent to which any imbalance in the
experiment has inflated the variance of the estimated effects. For a perfectly orthogonal
design, the factors will equal 1.0. Any values of 10 of greater are usually taken to be a sign
that serious correlation exists amongst the estimated effects, which causes the estimates to be
much more variable than they would be in a well-designed experiment.
 Standard errors are based on … - an indication of how the experimental error has been
estimated, as determined by the Analysis Options dialog box.
Pareto Chart
The Pareto Chart shows a graphical depiction of each of the effects in the above table. There are
two forms of the Pareto chart: a standardized form and an unstandardized form. The
unstandardized chart displays the absolute value of the effects in decreasing order:
Pareto Chart for reacted
B:catalyst
D:temperature +
BD -
DE
E:concentration
CE
A:feed rate
BC
AB
BE
AE
AD
AC
CD
C:agitation
0 4 8 12 16 20 24
Effect
The color of the bars shows whether an effect is positive or negative. From the above plot, it is
easy to see that the three most important factors in the example are catalyst, temperature, and
concentration.
To create a standardized Pareto chart, each effect is converted to a t-statistic by dividing it by its
standard error. These standardized effects are then plotted in decreasing order of absolute
magnitude:

Standardized Pareto Chart for reacted
B:catalyst
D:temperature +
BD -
DE
E:concentration
CE
A:feed rate
BC
AB
BE
AE
AD
AC
CD
C:agitation
0 5 10 15 20 25
Standardized effect
In addition, a line is drawn on the chart beyond which an effect is statistically significant at a
specified significance level, usually 5%. In the above chart, the main effects of factors B, D, and
E are significant, as are the BD and DE interactions. Noticeably absent from the list are any
effects involving factors A and C.
Pane Options
 Standardize: check to plot the standardized effects rather than the absolute effects.
 Alpha: the significance level  corresponding to the vertical line on the chart. Bars extending
beyond the line are statistically significant at the  significance level.

ANOVA Table
To determine the level of significance for each effect, the ANOVA Table may be used:
Analysis of Variance for reacted - Chemical reactor example

Source Sum of Squares Df Mean Square F-Ratio P-Value
A:feed rate 16.0 1 16.0 5.88 0.0937
B:catalyst 1681.0 1 1681.0 618.17 0.0001
C:agitation 0.0 1 0.0 0.00 1.0000
D:temperature 600.25 1 600.25 220.74 0.0007
E:concentration 156.25 1 156.25 57.46 0.0048
AB 9.0 1 9.0 3.31 0.1664
AC 1.0 1 1.0 0.37 0.5870
AD 2.25 1 2.25 0.83 0.4301
AE 6.25 1 6.25 2.30 0.2268
BC 9.0 1 9.0 3.31 0.1664
BD 462.25 1 462.25 169.99 0.0010
BE 6.25 1 6.25 2.30 0.2268
CD 0.25 1 0.25 0.09 0.7815
CE 20.25 1 20.25 7.45 0.0720
DE 361.0 1 361.0 132.75 0.0014
Total error 8.15789 3 2.7193
Total (corr.) 3339.16 18
R-squared = 99.7557 percent

R-squared (adjusted for d.f.) = 98.5341 percent
Standard Error of Est. = 1.64903
Mean absolute error = 0.254848
Durbin-Watson statistic = 1.37903 (P=0.4687)
Lag 1 residual autocorrelation = 0.00827674
The ANOVA partitions the variance of the response into several components: one for each main
effect, one for each interaction, and one for the experimental error. The ANOVA table shows:
 Sum of Squares - the Type III sums of squares attributable to each term in the model. This
measures the increase in the variance of the experimental error that would occur if each term
was separately removed from the model. The sum of squares for total error is also included,
where
n n
S error   ei2   ( yi  y i ) 2 (10)
i 1 i 1
ei is the i-th residual, measuring the difference between the observed response for run i and
the value predicted by the fitted model.
 Df - the degrees of freedom associated with each term.
 Mean Square - the mean square associated with each term, obtained by dividing the
associated sum of squares by its degrees of freedom. The mean squared error (MSE)
estimates the variance of the experimental error:
Serror
ˆ 2  MSE  (11)
df error
 F-Ratio - an F ratio which divides the mean square of an effect by the mean squared error:
MSeffect
F (12)
MSE
The F-ratios may be used to determine the statistical significance of each effect.
 P-Value - the P-Value associated with testing the null hypothesis that the coefficient for a
selected effect equals 0, implying that the effect is not present. P-Values below a critical
level (such as 0.05 if operating at the 5% significance level) indicate that the corresponding
effect is statistically significant at that significance level.
 R-squared - the percentage of the variability in the response variable that has been
accounted for by the fitted model, calculated from
 S error 
R 2  100 1  % (13)
 S total 
R-squared ranges from 0% to 100% and measures how well the model fits the observed
response data.
 R-squared (adjusted for d.f.) - the adjusted R-squared, which accounts for the number of
degrees of freedom in the fitted model. In situations such as the current one where the
number of coefficients in the fitted model is large relative to the total number of runs, the
ordinary R-squared statistic may overstate the ability of the fitted model to predict the
response. The adjusted R-squared compensates for this effect by
  n  1  S error 
2
Radj  100 1    % (14)
  n  p  S total 
where p is the number of estimated coefficients in the fitted model.
 Standard error of est. - the estimated standard deviation of the experimental error, given by
  MSE (15)
This value is used when constructing prediction intervals for the response.
 Mean absolute error - the average of the absolute values of the residuals, given by
n
 |e |
i 1
i
MAE  (16)
n
This value indicates the average error in predicting the observed response using the fitted
model.
 Durbin-Watson statistic - a statistic calculated from the residuals according to

n 1
 (e
i 1
i 1  ei ) 2
DW  n (17)
e
i 1
2
i
The Durbin-Watson statistic measures serial correlation in the residuals to determine whether
there is any dependence between successive observations. In this case, it could detect drifts
over the course of the experiment. A small P-value would indicate that the analyst should
take a close look at the residuals to look for any trends, which may be done using the
Diagnostic Plots graph option.
There are five effects in the ANOVA table with P-values below 0.05. These are the same five
effects identified as significant on the standardized Pareto chart (the two methods are
equivalent). As a whole, the model accounts for at least 98% of the observed variability in the
response. It is unnecessarily complicated, however, since many effects are not statistically
significant. A later section illustrates how to remove selected effects using Analysis Options.
Pane Options
 Include Lack-of-Fit Test: If checked, a line will be added to the ANOVA table to determine
whether the current model adequately represents the observed data. Note: this option has no
effect unless there are replicate experimental runs at identical settings of the experimental
factors. The resulting ANOVA table is shown below:
Analysis of Variance for reacted

Source Sum of Squares Df Mean Square F-Ratio P-Value
A:feed rate 16.0 1 16.0 4.00 0.1835
B:catalyst 1681.0 1 1681.0 420.25 0.0024
C:agitation 0.0 1 0.0 0.00 1.0000
D:temperature 600.25 1 600.25 150.06 0.0066
E:concentration 156.25 1 156.25 39.06 0.0247
AB 9.0 1 9.0 2.25 0.2724
AC 1.0 1 1.0 0.25 0.6667
AD 2.25 1 2.25 0.56 0.5315
AE 6.25 1 6.25 1.56 0.3377
BC 9.0 1 9.0 2.25 0.2724
BD 462.25 1 462.25 115.56 0.0085
BE 6.25 1 6.25 1.56 0.3377
CD 0.25 1 0.25 0.06 0.8259
CE 20.25 1 20.25 5.06 0.1534
DE 361.0 1 361.0 90.25 0.0109
Lack-of-fit 0.157895 1 0.157895 0.04 0.8609
Pure error 8.0 2 4.0
Total (corr.) 3339.16 18

Note the lines labeled Lack-of-fit and Pure error, which provide two separate estimates of the
experimental error sigma:
1. Pure error: an estimate calculated by pooling the variance within sets of observations at
identical levels of X. It is “pure” in the sense that it estimates the experimental error 
whether or not the proper model has been selected.
2. Lack-of-fit: an estimate calculated from the deviation between the average response for
each group of replicate values and the values predicted by the fitted model. If the model
is not correct, this estimates  plus a positive quantity that measures the lack-of-fit of the
selected model.
The P-Value in the lack-of-fit line may be used to test the hypothesis that the current model is
adequate. A small P-Value would indicate an inadequate model. In the current example, the P-
Value is well above 0.05, so the selected model appears to be adequate.
Normal Probability Plot of Effects

When the degrees of freedom available for estimating the experimental error is small, the formal
F tests conducted in the ANOVA table may not have much power, so that smaller effects will not
appear to be significant. On the other hand, testing a large number of effects, each at a 5%
significance level, may well generate more significant results than are actually present. A
somewhat less rigorous way of judging which effects are real and which are probably just
manifestations of noise is through the Normal Probability Plot of Effects:
Normal Probability Plot for reacted

99.9
99
95
percentage
80
50
20
5
1
0.1
-12 -2 8 18 28
Standardized effects
In this plot, the standardized effects are ordered from smallest to largest and plotted versus
quantiles of a normal distribution. Any estimates which are just noise will fall approximately
along a straight line. Any estimates which correspond to real signals will lie off the line to the
left or right.
Two types of normal probability plots are available, a full normal and a half-normal, which may
be chosen using Pane Options.

Pane Options
 Plot Type: Select Normal to plot each effect while retaining it positive or negative sign.
Select Half-Normal to plot the absolute values of the effects.
 Direction: Select Horizontal to plot percentages on the horizontal axis or Vertical to plot
them on the vertical axis.
 Fitted Line: If checked, a reference line is added to the plot by fitting a least squares
regression line to the smallest 50% of the effects.
 Label Effects: If checked, the names of the effects are added to the plot.
Example: Half-Normal Plot with Effect Labels
Half-Normal Plot for reacted
2.4
B:catalyst
2
Standard deviations
1.6 D:temperature
BD
1.2 DE
E:concentration
CE
0.8 A:feed rate
BC
AB
0.4 AE
BE
AD
AC
CD
0 C:agitation
0 5 10 15 20 25
Standardized effects
This plot has the advantage that all signals fall to the right of the noise line. The above plot
confirms the conclusion that 5 significant effects are present.

Analysis Options
The mathematical model currently being used to fit the data contains 5 main effects and 10 two-
factor interactions. As seen above, many of these terms are not statistically significant. When
building empirical models based solely on observed data, it is important to keep the models as
simple as possible, since simple models tend to be easier to interpret and have a better chance of
extrapolating to other combinations of the experimental factors.
In accordance with the principle of parsimony or K.I.S.S. (Keep It Simple Statistically),

insignificant effects should be removed from the model according to the following rules:
1. Remove any insignificant two-factor interactions (or other second-order effects).
2. Remove any insignificant main effects that are not involved in significant interactions.
Note that main effects corresponding to factors that are involved in significant interactions
should not normally be removed, since doing so would place artificial constraints on the
underlying polynomial models.
To eliminate effects from the model, select Analysis Options:
 Maximum Order Effect - the maximum order effect to be included in the model. Set to 2
by default to request fitting of both main effects and 2-factor interactions. If set to 1, only
main effects will be estimated.
 Ignore Block Numbers - for designs that contain more than 1 block, indicates whether block
effects should be estimated or ignored. Note that column 1 of the datasheet for any
experiment file contains block numbers corresponding to each row.
 Estimate Sigma From - whether the standard deviation of the experimental error is to be
estimated from the experimental error, or whether the analyst will provide a known value. If
a known value is provided, the statistical tests and confidence intervals will use that value.
 Display - affects the output displayed for each effect after the  on the Analysis Summary.

 Confounding Pattern - specifies how the procedure determines which effects to estimate
when fitting the model. The choices are:
1. From Original Design - examines the confounding pattern of the original design to
determine which effects can be estimated. For example, in a resolution IV design, the
program will estimate specific combinations of the two-factor interactions. This is the
default choice and is appropriate in all but very special circumstances.
2. From Data - examines the X matrix of all runs performed to determine which effects can
be estimated. This may be desirable when the analyst has added additional runs to the
base design to clear certain interactions that would otherwise be confounded. Since the
program attempts to estimate all effects when this option is chosen, the Exclude dialog
box may have to be used to tell the program exactly which effects are to be estimated.
 Exclude - when pressed, generates the dialog box shown below:
Effects can be excluded from the model by double-clicking on them one at a time. Double-
clicking on an effect in either of the two columns moves it to the other column.
In the current example, all effects other than the 5 that appeared to be statistically significant
were removed. The standardized Pareto chart for the new model shows the remaining effects:

Standardized Pareto Chart for reacted
B:catalyst +
-
D:temperature
BD
DE
E:concentration
0 4 8 12 16
Standardized effect
Main Effects Plot

Once a suitable model has been fit and checked, the results must be displayed in a manner that is
understandable to all involved. Since it is often difficult to gain insights by looking at a
mathematical equation, various plots are provided for displaying the fitted model. The Main
Effects Plot is almost always important:
Main Effects Plot for reacted
78
74
70
reacted
66
62
58
54
1.0 2.0 140.0 180.0 3.0 6.0
catalyst temperature concentration
It shows how the predicted response Y varies when each of the factors in the model is changed
from its low level to its high level, with all other factors held at the center of the experimental
region (halfway between the low level and the high level). When all of the factors are plotted
together as in the above plot, it is easy to judge which factors have the greatest impact. When
plotted individually, the predicted response at the extremes of a selected factor is shown:
Main Effects Plot for reacted
78
75.4605
74
70
reacted
66
62
58
54.9605
54
1.0 2.0
catalyst
Note: In some cases, the values shown at the endpoints of the line in the above plot will be equal
to the average response at the low and high level of the plotted factor. That is not the case in
general, however. It is important to note that STATGRAPHICS plots the predicted response
from the current model, not the observed data. This allows the plot to be used with many types
of designs other than a two-level factorial.
Pane Options
 Factors: factors to be included in the plot.

Interaction Plot
When significant interactions exist amongst the experimental factors, the main effects plots do
not tell the whole story about the factors that interact and can even be misleading. In such cases,
an Interaction Plot should be produced for each pair of factors. If more than one interaction is
plotted, the display will take the following form:
Interaction Plot for reacted

94
+
84
-
reacted
74
64 - +
+
-
+-
54
1 2 140 180
BD DE
A pair of lines will be plotted for each interaction, corresponding to the predicted response when
one factor is varied from its low value to its high value, at each level of the other factor. All
factors not involved in the interaction are held at their central value.
The plot is usually easier to understand if Pane Options if used to plot each interaction
separately:
Interaction Plot for reacted
94
temperature=180.0
84
reacted
74
64 temperature=140.0
temperature=180.0
temperature=140.0
54
1.0 2.0
catalyst
The predicted response for each combination of the low and high levels of two factors is
displayed at the end of each line segment. If two factors do not interact, the effect of one factor
will not depend upon the level of the other and the two lines in the interaction plot will be

approximately parallel. If the factors interact, as in the above figure, the lines will not be parallel
and may even cross. Interpretation of interaction plots is usually highly informative. For
example, the plot above shows that temperature has little effect on the response at a low level of
catalyst. However, it has a large effect at the high level of catalyst.
Pane Options
 Factors: two or more factors to include on the plot. All interactions for which both factors
have been checked will be included.
 Reverse Factors: If checked, the first factor rather than the second will be used to define the
lines on the plot.
Regression Coefficients
The underlying regression model may be displayed by selecting the Regression Coefficients
pane:
Regression coeffs. for reacted - Chemical reactor example

Coefficient Estimate
constant 9.83553
B:catalyst -65.5
D:temperature 0.2125
E:concentration 23.25
BD 0.5375
DE -0.158333
Interpretation
This pane displays the regression equation which has been fitted to the data. The equation of the fitted model is
reacted = 9.83553 - 65.5*catalyst + 0.2125*temperature + 23.25*concentration + 0.5375*catalyst*temperature -

0.158333*temperature*concentration
where the values of the variables are specified in their original units. To have STATGRAPHICS evaluate this function, select
Predictions from the list of Tabular Options. To plot the function, select Response Plots from the list of Graphical Options.

The StatAdvisor displays the equation, which corresponds to the regression model described
earlier. This is the equation that is used to predict the response at specified values of the
experimental factors.
Note: In the regression equation, all factors that were defined as continuous when the experiment
was initially created are expressed in their original units (e.g., temperature is expressed in
degrees C). Factors that were not defined as continuous use the coding of -1 for the low level
and +1 for the high level.
Correlation Matrix
The correlation matrix displays the estimated correlation between the coefficients in the fitted
regression model:
Correlation Matrix for Estimated Effects
(1) (2) (3) (4) (5) (6)

(1) average 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000
(2) B:catalyst 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000
(3) D:temperature 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000
(4) E:concentration 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000
(5) BD 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000
(6) DE 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000
The correlations are estimated from the variance-covariance matrix of the coefficients, given by
s 2 (b)  MSE  X X 
1
(18)
A diagonal matrix such as that shown above indicates that the estimates of each of the effects are
uncorrelated with the other estimates, which stems from the orthogonality of the original design.
If any data values were missing (indicated by leaving the corresponding cells of the response
column in the spreadsheet empty), or additional runs have been added by the user, there may
well be non-zero values for the off-diagonal terms. Large correlations are likely to lead to poorly
defined effects that are difficult to interpret.
Response Plots
The list of graphs available in the Analyze Data procedure contains two selections labeled
Response Plots that allow the predicted values of the response to be plotted in various ways. By
default, the first selection creates a response surface plot and the second a contour plot.
However, each is controlled by the same Pane Options dialog box, which allows the analyst to
display any of four types of plots:
1. a surface plot.
2. a contour plot.
3. a square plot.
4. a cube plot.
The surface plot displays a plot of the predicted response as a function of any two of the
experimental factors, with the other factors held at selected values. For example, the plot below
shows reacted as a function of catalyst and temperature:
Estimated Response Surface

feed rate=12.5,agitation=110.0,concentration=4.5
94
84
reacted
74
64
180
170
54 160
1 1.2 150
1.4 1.6 1.8 140 temperature
2
catalyst
The height of the surface represents the predicted value Y , which is plotted over the range of the
experimental factors.
Contour plots draw lines or colored regions based on values of the predicted response. For
example, the plot below displays the range of the predicted values for reacted using colors
extending from blue at 50 to red at 85:
Contours of Estimated Response Surface

180 reacted
50.0
55.0
170 60.0
temperature
65.0
160 70.0
75.0
80.0
150 85.0
140
1 1.2 1.4 1.6 1.8 2
catalyst
The color ramp used is controlled by the Palette tab on the Graphics Options dialog box.
The Square Plot shows the predicted response at combinations of the low and high levels for any
2 factors:

Square Plot for reacted

180.0
55.7105 86.9605
temperature
54.2105 63.9605
140.0
1.0 2.0
catalyst
The Cube Plot shows the predicted response at combinations of the low and high levels for any 3
factors:
Cube Plot for reacted

feed rate=12.5,agitation=110.0
47.8355
79.0855
6.0 55.8355
65.5855
concentration
63.5855
94.8355180.0
3.0 52.5855
1.0 62.3355
2.0140.0 temperature
catalyst
In the example, the highest predicted value for reacted is obtained at catalyst = 2, temperature =
180, and concentration = 3. The strong interactions amongst the factors result in a predicted
value of nearly 95 at that combination of the factors.

Pane Options
 Type: type of response plot to create.
 From: location at which the first contour line is drawn, or the start of the first region.
 To: location at which the last contour line is drawn, or the end of the last region.
 By: spacing between contour lines or regions.
 Lines: if selected, a sequence of contour lines is drawn at selected levels of the predicted
response, as on a topographical map.
 Painted Regions: if selected, a set of regions is drawn covering various ranges of the
predicted response.
 Resolution: defines the resolution m of an m-by-m grid of predicted values which is used to
draw the surface and contour lines. Increasing the resolution may improve the smoothness
and definition of the plots, at the expense of computer time and memory.
 Horizontal Divisions: the number of divisions along the first experimental axis. This
determines how many vertical lines will be drawn on the surface plot.
 Vertical Divisions: the number of divisions along the second experimental axis. This
determines how many horizontal lines will be drawn on the surface plot.

 Contours Below: requests that a contour plot, of type specified below, be drawn in the
bottom face of the 3-D plot.
 Show Points: requests that the observed data values Yi be added to the plot, with vertical
lines drawn from each point to the surface.
 Wire Frame: requests that the surface be drawn using cross-hatched lines as shown in the
figure above. This is the most effective choice for black-and-white presentation.
 Solid: requests that the surface be drawn using a solid color.
 Contoured: requests that the surface be drawn showing contour levels of the response.
 Continuous: draws contours using a continuous range of colors.
 Factors: specifies the factors to be plotted on each axis and the levels at which the other
factors will be held:
If creating a surface, contour, or square plot, two factors must be checked. If creating a cube plot,
three factors must be checked.
The current example plots predicted values versus catalyst and temperature, when feed rate =
12.5, agitation = 110, and concentration = 4.5.

Predictions
The Predictions pane may be used to generate predictions from the fitted model:
Estimation Results for reacted

Observed Fitted Studentized Lower 95.0% CL Upper 95.0% CL
Row Value Value Residual Residual for Mean for Mean
1 65.0 65.2105 -0.210526 -0.0846423 63.8485 66.5726
2 56.0 55.8355 0.164474 0.0807762 52.248 59.4231
3 53.0 52.5855 0.414474 0.203853 48.998 56.1731
4 63.0 62.3355 0.664474 0.327704 58.748 65.9231
5 65.0 65.5855 -0.585526 -0.28848 61.998 69.1731
6 53.0 52.5855 0.414474 0.203853 48.998 56.1731
7 55.0 55.8355 -0.835526 -0.413139 52.248 59.4231
8 67.0 65.5855 1.41447 0.708878 61.998 69.1731
9 61.0 62.3355 -1.33553 -0.667797 58.748 65.9231
10 67.0 65.2105 1.78947 0.735268 63.8485 66.5726
11 69.0 63.5855 5.41447 4.1464 59.998 67.1731
12 45.0 47.8355 -2.83553 -1.52039 44.248 51.4231
13 78.0 79.0855 -1.08553 -0.539401 75.498 82.6731
14 93.0 94.8355 -1.83553 -0.933357 91.248 98.4231
15 49.0 47.8355 1.16447 0.57969 44.248 51.4231
16 60.0 63.5855 -3.58553 -2.04408 59.998 67.1731
17 95.0 94.8355 0.164474 0.0807762 91.248 98.4231
18 82.0 79.0855 2.91447 1.57129 75.498 82.6731
19 63.0 65.2105 -2.21053 -0.919228 63.8485 66.5726
20 62.6605 60.6918 64.6293
Average of 3 centerpoints = 65.0

Average of model predictions at center = 65.2105
The table may include all rows in the datasheet, or only rows for which the value of the response
variable Y has not been entered. The latter feature allows the analyst to make predictions at
combinations of X that were not included in the experiment. For example, the above table shows
the result of adding a 20th row with feed rate = 12.5, catalyst = 1.2, agitation = 105, temperature
= 165, and concentration = 3.5. The predicted value of reacted is 62.66. The 95% confidence
interval for the mean value of reacted at that same combination of the factors ranges from 60.9 to
64.6.
The table also displays the average of the experimental runs performed at the center of the
experimental region, together with the predicted response. If the assumed model is correct, the
two values should be close. If not, there may be unmodeled curvature with respect to one or
more of the experimental factors. Determining the nature of that curvature would require
performing additional runs at different levels of the factors. To add additional runs to a screening
experiment, you can use the Augment Design selection on the DOE menu.
One other noticeable entry in the above table is the Studentized residual for row #11. The
Studentized residual measures the difference between the observed response and the predicted
response, in units of its standard error, when the observation in question is not used to fit the
model. The Studentized residual for observation #11 equals 4.1. Values in excess of 3.0 are
unusual and would typically require further scrutiny. If the point in question gave a desirable
result (which it does not), a rerun of that set of experimental conditions might be necessary.

Pane Options
 Include: items to include in the table:
1. Observed Y - the observed response values Yi .
2. Fitted Y - the predicted values Yi calculated from the fitted model.
3. Residuals - the residuals ei .
4. Studentized Residuals - a type of standardized residual, where each residual is divided by

an estimate of its standard error. STATGRAPHICS computes Studentized deleted
residuals, in which each observation is removed one at a time and the model refit without
that data value. The deleted residual then equals the observed response minus the value
predicted from a model fit without that observation, i.e.,
d i  Yi  Y( i ) (19)
The Studentized residual is calculated from
di
ei*  (20)
s( d i )
where

s 2 (d i )  MSE (i ) 1  X i ( X (i ) X ( i ) ) 1 X i  (21)

The deleted residuals should follow a t distribution with n - p - 1 degrees of freedom,
where p is the number of estimated coefficients in the fitted model.
5. Standard Errors for Forecasts - the standard error for new observations at a selected
combination of the experimental factors Xh, given by

MSE 1  X h ( X  X ) 1 X h  (22)
6. Confidence Limits for Individual Forecasts - confidence limits for new observations at a
selected combination of the experimental factors Xh, given by

Yh  t n  p MSE 1  X h ( X  X ) 1 X h  (23)
7. Confidence Limits for Forecast Means - confidence limits for the mean response at a
selected combination of the experimental factors Xh, given by

Yh  t n  p MSE X h ( X  X ) 1 X h  (24)
 Predict - whether forecasts are displayed for all of the runs in the experiment data file, or
only for runs that have a missing value in the response column.
 Confidence level - the confidence levels for the intervals.
Diagnostic Plots
Several plots are also provided under Diagnostic Plots to examine the residuals from the fitted
model. The Pane Options dialog box displays the various choices, which include the following:
Observed versus Predicted

This plot displays the observed response Yi versus the fitted values Yi , together with a diagonal
line:
Plot of reacted
95
85
observed
75
65
55
45
45 55 65 75 85 95
predicted

If the model fits well, the values should lie close to the line, as in the example above. Curvature
around the line may suggest the need to transform the values of Yi using a logarithm or similar
function.
Residual versus Predicted

This plot displays the residuals ei versus the fitted values Yi , with a horizontal line at zero:
Residual Plot for reacted

6
4
2
residual
0
-2
-4
-6
45 55 65 75 85 95
predicted
The residuals should vary randomly around the line. Changes in the magnitude of the residuals
from left to right may signal that the variance of the experimental error varies with the mean
level of the response. Such heteroscedasticity may frequently be eliminated by a variance-
stabilizing transformation such as a logarithm or a square root.
Residuals versus Run Order

This plot displays the residuals ei versus run number i, with a horizontal line at zero:

6
4
2
residual
0
-2
-4
-6
0 4 8 12 16 20
run number
Any non-random pattern may indicate a time trend or other effect. In such cases, addition of a
factor to account for the change may improve the fit of the model. The above plot does suggest
an increase in variability during the second half of the experiment, which would be worthy of
further investigation.

Residuals versus Factor
This plot displays the residuals ei versus the observed values of a selected experimental factor:

6
4
2
residual
0
-2
-4
-6
1 1.2 1.4 1.6 1.8 2
catalyst
Any curvature around the line may suggest the need for a model with quadratic effects. The
above plot suggests that the variability amongst the replicated values at the centerpoint may be
somewhat less than that of the residuals at the low and high levels of catalyst.
Normal Probability Plot of Residuals

This plot displays the residuals ei versus quantiles of a normal distribution, with an optional
fitted line as reference:
Normal Probability Plot for Residuals

99.9
99
95
percentage
80
50
20
5
1
0.1
-3.6 -1.6 0.4 2.4 4.4 6.4
residuals
If the experimental error follows a normal distribution, the points should lie along a straight line.
The above plot suggests that the largest residual (row #11) is somewhat higher than expected,
since it lies off the line defined by the others. This could indicate the presence of an outlier or
significant curvature.

Power Curve
The power curve shows the ability of the statistical tests to detect effects of a given magnitude:
Power Curve for B:catalyst

1
probability of detection
0.8
0.6
0.4
0.2
0
-8 -4 0 4 8
true effect
This plot is similar to the power curve explained in detail earlier when the design was being
constructed, except that the horizontal axis is displayed in units of the response instead of a
signal-to-noise ratio. The above plot shows that the current experiment has an excellent chance
of detecting any effects for which the change in reacted is 5 or more.
Pane Options
 Plot: the type of plot to be created.
 Plot versus: selects the experimental factor to be shown in the plot, for those plots where a
factor is needed.
 Direction: defines the orientation of the normal probability plot.
 Fitted Line: specifies whether a line should be fit to the data on the normal probability plot.
 Alpha: specifies the -risk associated with the Power Curve.

Optimization
Step #9: Optimize responses
Once a statistical model has been developed for each response, the analyst may now determine
what combination of factors will yield the best results. Pressing the button labeled Step #9 on the
Experimental Design Wizard toolbar first displays the dialog box shown below:
Since optimization requires searching for the best conditions throughout the experimental region,
it is a good idea to begin that search at many different points in order to avoid finding only a
local optimum.
When the optimization is complete, a message similar to that shown below will be displayed:
The dialog box indicates the “Desirability” of the final result, based on a metric designed to
balance competing requirements of multiple responses (see the document titled DOE Wizard for
full details). The value displayed in this case indicates that the predicted reactivity at the
optimum factor settings is 74.18% of the distance between 80 and 100, which was the desired
range specified when the design was created.
If you press OK, additional information will be added to the main DOE Wizard window:

Step 9: Optimize the responses

Response Values at Optimum
Response Prediction Lower 95.0% Limit Upper 95.0% Limit Desirability
reacted 94.8355 91.6295 98.0415 0.741776
Factor Settings at Optimum

Factor Setting
feed rate 10.0
catalyst 2.0
agitation 119.9
temperature 180.0
concentration 3.0
The table shows the estimated response at the optimal settings of the experimental factors. For
the chemical reaction data, it is estimated that the mean percent reacted will equal 94.84% when
the factors are set at feed rate = 10, catalyst = 2, agitation = 119.9, temperature = 180, and
concentration = 3. The 95% confidence interval for the mean ranges between 91.63% and
98.04%.
NOTE: Since feed rate and agitation have been completely eliminated from the statistical model,
the solution displayed is only one of many. Any setting of feed rate and agitation would give the
same predicted response.
If you push the Tables and Graphs button on the analysis toolbar, you can display the estimated
desirability throughout the experimental region. An interesting type of display is the 3-D Contour
plot shown below (use Pane Options and the Factors button to select the factors to plot on each
axis):
Desirability Plot
Desirability
0.0
0.1
6 0.2
5.5 0.3
concentration
5 0.4
0.5
4.5
0.6
4 0.7
3.5 0.8
180
3 170 0.9
160
1 1.2 1.4 150 1.0
1.6 1.8 2 140 temperature
catalyst
It is clear that the best place to operate is in the lower right back corner.

Step 10: Save results
The button labeled Step 10 allows you to save the results in a StatFolio:
Actually, the StatFolio can be saved at any point and reloaded at a later date.
IMPORTANT: When using the Experimental Design Wizard, two files are created:
1. An experiment file with the extension .sgd which stores information about the
experimental data.
2. A StatFolio with the extension .sgp that stores the results of the analysis.
If you move the experiment to another computer, be sure to transfer both files.
Step 11: Augment Design
Since the conclusions from the design are fairly clear, there is no need to augment the design.

Extrapolation
Step 12: Extrapolate
The maximum predicted reactivity within the design space is 94.84%. To use the statistical
model to predict settings of the factors outside the experimental region that might produce even
better results, press the button labeled Step 12. The following dialog box will be displayed:
 Start at: the position from which to start the search.
 Change: the factors you wish to consider changing. Since feed rate and agitation have been
completely eliminated from the model, they have been unchecked.
 Display steps of: The program will begin at the starting location and follow the path of
steepest ascent in an attempt to increase the desirability of the predicted response. Specify the
increment of increased desirability at which the results should be displayed.

 Low and high: The limits within which the factors will be changed.
In this case, we have asked to program to search from the derived optimal conditions and display
improvement of 1% in desirability.
The results of the search are shown in the following table, which will be added to the main DOE
Wizard window:

Step 12: Extrapolate model

Extrapolated Response Values
Step Desirability reacted
0 0.741776 94.8355
1 0.759359 95.1872
2 0.77701 95.5402
3 0.79473 95.8946
4 0.812519 96.2504
5 0.830377 96.6075
6 0.848303 96.9661
7 0.866299 97.326
8 0.884363 97.6873
9 0.902497 98.0499
10 0.920699 98.414
11 0.93897 98.7794
12 0.957311 99.1462
13 0.97572 99.5144
14 0.994198 99.884
15 1.0 100.284
Factor Settings for Extrapolation

Step feed rate catalyst agitation temperature concentration
0 10.0 2.0 119.9 180.0 3.0
1 10.0 2.00481 119.9 180.2 2.99273
2 10.0 2.00961 119.9 180.4 2.98545
3 10.0 2.01441 119.9 180.6 2.97816
4 10.0 2.0192 119.9 180.8 2.97086
5 10.0 2.02398 119.9 181.0 2.96355
6 10.0 2.02876 119.9 181.2 2.95623
7 10.0 2.03354 119.9 181.4 2.9489
8 10.0 2.03831 119.9 181.6 2.94156
9 10.0 2.04307 119.9 181.8 2.93421
10 10.0 2.04783 119.9 182.0 2.92685
11 10.0 2.05259 119.9 182.2 2.91948
12 10.0 2.05734 119.9 182.4 2.9121
13 10.0 2.06208 119.9 182.6 2.90471
14 10.0 2.06682 119.9 182.8 2.89732
15 10.0 2.07182 119.9 183.0 2.88628
The program suggests that the best course of action would be to increase catalyst and
temperature while decreasing concentration.

DOE Wizard - Screening Designs

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DOE Wizard - Screening Designs

Uploaded by

Copyright:

Available Formats

STATGRAPHICS – Rev.

DOE Wizard – Screening Designs

X1: feed rate

There is one response variable:

Sample StatFolio: doewiz screening.sgp

 2009 by StatPoint Technologies, Inc. DOE Wizard – Screening Designs - 1

Step #1 – Define Responses

 Name: The name for the variable is reacted.

 Units: Reacted is measured as a percentage.

 Analyze: The parameter of interest is the mean percent reacted.

 Goal: The goal of the experiment is to maximize the reacted percentage.

 Minimum and Maximum: Range of desirable values for the response.

 Name – Each factor must be assigned a unique name.

 Units – Units are optional.

 Role – Set the role of each component to Controllable.

 Low - the lower level Lj for the factor.

 High - the upper level Uj for the factor.

Step #3 – Select Design

 2009 by StatPoint Technologies, Inc. DOE Wizard – Screening Designs - 3

Four general classes of designs are offered:

 2009 by StatPoint Technologies, Inc. DOE Wizard – Screening Designs - 4

4. Orthogonal Array – a general class of designs developed by Genichi Taguchi. The

 2009 by StatPoint Technologies, Inc. DOE Wizard – Screening Designs - 5

6. Plackett-Burman - two-level designs intended for screening a large number of factors

7. Folded Plackett-Burman - similar to Plackett-Burman designs, except the two-factor

8. Used-specified design - allows an empty experiment datasheet to be constructed so

 2009 by StatPoint Technologies, Inc. DOE Wizard – Screening Designs - 6

 2009 by StatPoint Technologies, Inc. DOE Wizard – Screening Designs - 9

Step #4: Specify Model

 2009 by StatPoint Technologies, Inc. DOE Wizard – Screening Designs - 10

Step #5: Select Runs

 2009 by StatPoint Technologies, Inc. DOE Wizard – Screening Designs - 11

Step #6: Evaluate Design

Worksheet for <untitled> - Chemical reaction screening experiment

 2009 by StatPoint Technologies, Inc. DOE Wizard – Screening Designs - 12

The table of model coefficients is shown below:

Power at Power at Power at

 2009 by StatPoint Technologies, Inc. DOE Wizard – Screening Designs - 13

Prediction Variance Plot

Prediction Variance Plot

 2009 by StatPoint Technologies, Inc. DOE Wizard – Screening Designs - 14

Saving the Design File

Step #7: Save experiment

 2009 by StatPoint Technologies, Inc. DOE Wizard – Screening Designs - 15

Analyzing the Results

 2009 by StatPoint Technologies, Inc. DOE Wizard – Screening Designs - 16

1. First-order model – contains terms representing main effects only.

Y = 0 + 1X1 + 2X2 + 3X3 + 4X4 +5X5 +  (3)

2. Second-order model – contains terms representing main effects and second-order

Y = 0 + 1X1 + 2X2 + 3X3 + 4X4 +5X5

+ 12X1X2 + 13X1X3 + 14X1X4 +15X1X5 + 23X2X3

+ 24X2X4+ 25X2X5 + 34X3X4 + 35X3X5 + 45X4X5 +  (4)

-1 at low level of factor j

y  = average response at high level of factor j

y  = average response at low level of factor j

y   = average response at the high levels of both factors j and k.

and y is the grand average:

Step #8: Analyze data

 2009 by StatPoint Technologies, Inc. DOE Wizard – Screening Designs - 18

 Response: column containing the response variable to be analyzed.

 Transformation: the desired transformation to be applied before the model is fit.

 Power and addend: the transformation parameters if a Power or Box-Cox transformation is

 2009 by StatPoint Technologies, Inc. DOE Wizard – Screening Designs - 19

Analyze Experiment - reacted

reacted = 9.83553 - 65.5catalyst + 0.2125temperature + 23.25concentration + 0.5375catalyst*temperature -