You are on page 1of 400

DECISION

ANALYSIS
USING
MICROSOFT

EXCEL
SPRING 2006

Michael R. Middleton
School of Business and Management
University of San Francisco
This page is intentionally mostly blank.

Copyright © 2006 by Michael R. Middleton


Detailed Contents
1
PART 1 MODELS AND SENSITIVITY ANALYSIS ...................... 11
Chapter 1 Introduction to Decision Modeling ............................................................. 13
1.1 Models to Aid Decision Making ............................................................................ 13
Components of a Decision Model ............................................................................ 14
1.2 Basic What-If Model.............................................................................................. 16
Influence Diagram Representation ........................................................................... 16
Decision Tree Representation .................................................................................. 18
Consequence Table Representation.......................................................................... 18

Chapter 2 Sensitivity Analysis Using SensIt ................................................................ 19


2.1 How to Install SensIt .............................................................................................. 19
2.2 How to Uninstall or Delete SensIt.......................................................................... 20
2.3 SensIt Overview ..................................................................................................... 20
2.4 Example Problem ................................................................................................... 20
2.5 One Input, One Output ........................................................................................... 21
Cells for Input Variable............................................................................................ 22
Cells for Output Variable ......................................................................................... 22
Input Values ............................................................................................................. 22
2.6 Many Inputs, Many Outputs Tornado .................................................................... 23
Ranges for Input Variables....................................................................................... 24
Cells for Output Variable ......................................................................................... 25
Ranges for Input Values........................................................................................... 25
2.7 Tornado Sorted by Downside Risk ........................................................................ 26
2.8 Tornado Sorted by Upside Potential ...................................................................... 26
2.9 Tornado Showing Major Uncertainties .................................................................. 27
2.10 Spider ................................................................................................................... 28
2.11 Tips for Many Inputs, One Output ....................................................................... 29
2.12 Eagle Airlines Problem ........................................................................................ 31

Chapter 3 Multiattribute Utility ................................................................................... 33


3.1 Applications of Multi-Attribute Utility .................................................................. 33
4 Detailed Contents

3.2 MultiAttribute Utility Swing Weights.................................................................... 34


Attribute Scores........................................................................................................ 35
Swing Weights ......................................................................................................... 36
Overall Scores .......................................................................................................... 37
3.3 Sensitivity Analysis Methods................................................................................. 38
Dominance ............................................................................................................... 39
Monetary Equivalents Assessment........................................................................... 39
Additive Utility Function ......................................................................................... 40
Weight Ratio Assessment......................................................................................... 41
Weight Ratio Sensitivity Analysis ........................................................................... 43
Swing Weight Assessment ....................................................................................... 44
Swing Weight Sensitivity Analysis .......................................................................... 46
Direct Weight Assessment and Sensitivity Analysis................................................ 49
Summary .................................................................................................................. 51
Sensitivity Analysis Examples References .............................................................. 51
Screenshots from Excel to Word.............................................................................. 52

PART 2 MONTE CARLO SIMULATION....................................... 53


Chapter 4 Introduction to Monte Carlo Simulation ................................................... 55
4.1 Introduction ............................................................................................................ 55

Chapter 5 Uncertain Quantities.................................................................................... 57


5.1 Discrete Uncertain Quantities ................................................................................ 57
5.2 Continuous Uncertain Quantities ........................................................................... 57
Case A: Uniform Density ......................................................................................... 57
Case B: Ramp Density ............................................................................................. 60
Case C: Triangular Density ...................................................................................... 62

Chapter 6 Simulation Without Add-Ins....................................................................... 65


6.1 Simulation Using Excel Functions ......................................................................... 65

Chapter 7 Monte Carlo Simulation Using RiskSim .................................................... 67


7.1 Using RiskSim Functions....................................................................................... 67
7.2 Using RiskSim Functions....................................................................................... 68
7.3 Updating Links To RiskSim Functions .................................................................. 68
7.4 Monte Carlo Simulation ......................................................................................... 70
7.5 Random Number Seed ........................................................................................... 71
7.6 One-Output Example.............................................................................................. 72
7.7 RiskSim Output for One-Output Example ............................................................. 73
7.8 Customizing RiskSim Charts ................................................................................. 75
7.9 Random Number Generator Functions................................................................... 77
RandBinomial .......................................................................................................... 77
Detailed Contents 5

RandBiVarNormal ................................................................................................... 78
RandCumulative....................................................................................................... 79
RandDiscrete ............................................................................................................ 80
RandExponential ...................................................................................................... 82
RandInteger .............................................................................................................. 83
RandNormal ............................................................................................................. 84
RandSample ............................................................................................................. 85
RandPoisson............................................................................................................. 85
RandTriangular ........................................................................................................ 86
RandUniform............................................................................................................ 87
7.10 RiskSim Technical Details ................................................................................... 88
7.11 Modeling Uncertain Relationships ....................................................................... 90
Base Model, Four Inputs .......................................................................................... 90
Three Inputs ............................................................................................................. 91
Two Inputs ............................................................................................................... 92
Four Inputs with Three Uncertainties....................................................................... 93
Intermediate Details ................................................................................................. 95

Chapter 8 Multiperiod What-If Modeling ................................................................... 97


8.1 Apartment Building Purchase Problem .................................................................. 97
Apartment Building Analysis Notes....................................................................... 100
8.2 Product Launch Financial Model ......................................................................... 101
8.3 Machine Simulation Model .................................................................................. 105
AJS Process 1......................................................................................................... 105
AJS Process 2......................................................................................................... 106

Chapter 9 Modeling Inventory Decisions................................................................... 113


9.1 Newsvendor Problem ........................................................................................... 113
Stationery Wholesaler Example ............................................................................. 113

Chapter 10 Modeling Waiting Lines .......................................................................... 115


10.1 Queue Simulation............................................................................................... 115

PART 3 DECISION TREES ........................................................ 121


Chapter 11 Introduction to Decision Trees................................................................ 123
11.1 Decision Tree Structure...................................................................................... 123
DriveTek Problem, Part A...................................................................................... 123
Nodes and Branches ............................................................................................... 124
11.2 Decision Tree Terminal Values.......................................................................... 126
DriveTek Problem, Part B...................................................................................... 126
11.3 Decision Tree Probabilities ................................................................................ 128
DriveTek Problem, Part C...................................................................................... 128
6 Detailed Contents

Chapter 12 Decision Trees Using TreePlan ............................................................... 129


12.1 TreePlan Installation .......................................................................................... 129
Occasional Use....................................................................................................... 129
Selective Use.......................................................................................................... 129
Steady Use.............................................................................................................. 130
12.2 Building a Decision Tree in TreePlan ................................................................ 130
12.3 Anatomy of a TreePlan Decision Tree ............................................................... 132
12.4 Step-by-Step TreePlan Tutorial.......................................................................... 134
DriveTek Problem .................................................................................................. 134
Nodes and Branches ............................................................................................... 135
Terminal Values ..................................................................................................... 136
Building the Tree Diagram..................................................................................... 137
Interpreting the Results .......................................................................................... 145
Formatting the Tree Diagram ................................................................................. 146
Displaying Model Inputs........................................................................................ 148
Printing the Tree Diagram...................................................................................... 150
Alternative Model .................................................................................................. 151
12.5 Decision Tree Solution....................................................................................... 151
Strategy .................................................................................................................. 151
Payoff Distribution................................................................................................. 152
DriveTek Strategies................................................................................................ 152
Strategy Choice ...................................................................................................... 156
Certainty Equivalent............................................................................................... 157
Rollback Method.................................................................................................... 159
Optimal Strategy .................................................................................................... 160
12.6 Newox Decision Tree Problem .......................................................................... 162
12.7 Brandon Decision Tree Problem ........................................................................ 163
Decision Tree Strategies......................................................................................... 163

Chapter 13 Sensitivity Analysis for Decision Trees................................................... 171


13.1 One-Variable Sensitivity Analysis ..................................................................... 171
13.2 Two-Variable Sensitivity Analysis..................................................................... 173
Setup for Data Table .............................................................................................. 174
Obtaining Results Using Data Table Command..................................................... 174
Embellishments ...................................................................................................... 175
13.3 Multiple-Outcome Sensitivity Analysis ............................................................. 176
13.4 Robin Pinelli's Sensitivity Analysis ................................................................... 177

Chapter 14 Value of Information in Decision Trees ................................................. 181


14.1 Value of Information.......................................................................................... 181
14.2 Expected Value of Perfect Information.............................................................. 181
Expected Value of Perfect Information, Reordered Tree ....................................... 182
Expected Value of Perfect Information, Payoff Table ........................................... 185
Expected Value of Perfect Information, Expected Improvement........................... 186
Detailed Contents 7

Expected Value of Perfect Information, Single-Season Product............................ 187


14.3 DriveTek Post-Contract-Award Problem ........................................................... 190
14.4 Sensitivity Analysis vs EVPI ............................................................................. 194

Chapter 15 Value of Imperfect Information.............................................................. 195


15.1 Technometrics Problem...................................................................................... 195
Prior Problem ......................................................................................................... 195
Imperfect Information ............................................................................................ 196
Probabilities From Relative Frequencies................................................................ 196
Revision of Probability........................................................................................... 200

Chapter 16 Modeling Attitude Toward Risk ............................................................. 201


16.1 Risk Utility Function.......................................................................................... 201
16.2 Exponential Risk Utility..................................................................................... 204
16.3 Approximate Risk Tolerance.............................................................................. 207
16.4 Exact Risk Tolerance Using Excel..................................................................... 207
16.5 Exact Risk Tolerance Using RiskTol.xla ........................................................... 211
16.6 Exponential Utility and TreePlan ....................................................................... 212
16.7 Exponential Utility and RiskSim........................................................................ 212
16.8 Risk Sensitivity for Machine Problem ............................................................... 214
16.9 Risk Utility Summary......................................................................................... 215
Concepts................................................................................................................. 215
Fundamental Property of Utility Function ............................................................. 216
Using a Utility Function To Find the CE of a Lottery............................................ 216
Exponential Utility Function .................................................................................. 216
TreePlan's Simple Form of Exponential Utility ..................................................... 216
Approximate Assessment of RiskTolerance .......................................................... 216
Exact Assessment of RiskTolerance ...................................................................... 217
Using Exponential Utility for TreePlan Rollback Values ...................................... 217
Using Exponential Utility for a Payoff Distribution .............................................. 218

PART 4 DATA ANALYSIS ......................................................... 219


Chapter 17 Introduction to Data Analysis ................................................................. 221
17.1 Levels of Measurement ...................................................................................... 221
Categorical Measure............................................................................................... 221
Numerical Measure ................................................................................................ 221
17.2 Describing Categorical Data .............................................................................. 222
17.3 Describing Numerical Data ................................................................................ 222
Frequency Distribution and Histogram .................................................................. 222
Numerical Summary Measures .............................................................................. 222
Distribution Shapes ................................................................................................ 223
8 Detailed Contents

Chapter 18 Univariate Numerical Data ..................................................................... 225


18.1 Analysis Tool: Descriptive Statistics.................................................................. 225
Formatting the Output Table .................................................................................. 228
Interpreting Descriptive Statistics .......................................................................... 229
Another Measure of Skewness ............................................................................... 231
18.2 Analysis Tool: Histogram .................................................................................. 233
Histogram Embellishments .................................................................................... 235
18.3 Better Histograms Using Excel .......................................................................... 237
Exercises .................................................................................................................... 238

Chapter 19 Bivariate Numerical Data........................................................................ 239


19.1 XY (Scatter) Charts............................................................................................ 240
19.2 Analysis Tool: Correlation ................................................................................. 242
19.3 Analysis Tool: Covariance ................................................................................. 244
19.4 Correlations for Several Variables ..................................................................... 245
Exercises .................................................................................................................... 247

Chapter 20 One-Sample Inference for the Mean ...................................................... 249


20.1 Normal versus t Distribution .............................................................................. 249
20.2 Hypothesis Tests ................................................................................................ 249
Left-Tail, Right-Tail, or Two-Tail ......................................................................... 250
Decision Approach or Reporting Approach ........................................................... 250

Chapter 21 Simple Linear Regression........................................................................ 253


21.1 Inserting a Linear Trendline ............................................................................... 254
Trendline Interpretation.......................................................................................... 256
Trendline Embellishments...................................................................................... 257
21.2 Regression Analysis Tool................................................................................... 257
Regression Interpretation ....................................................................................... 261
Regression Charts................................................................................................... 262
21.3 Regression Functions ......................................................................................... 264
Exercises .................................................................................................................... 267

Chapter 22 Simple Nonlinear Regression .................................................................. 269


22.1 Polynomial ......................................................................................................... 271
22.2 Logarithmic ........................................................................................................ 273
22.3 Power ................................................................................................................. 275
22.4 Exponential ........................................................................................................ 277
Exercises .................................................................................................................... 282

Chapter 23 Multiple Regression ................................................................................. 283


23.1 Interpretation of Regression Output ................................................................... 285
Significance of Coefficients ................................................................................... 285
Interpretation of the Regression Statistics.............................................................. 286
Detailed Contents 9

Interpretation of the Analysis of Variance ............................................................. 286


23.2 Analysis of Residuals ......................................................................................... 286
23.3 Using TREND to Make Predictions ................................................................... 288
Interpretation of the Predictions ............................................................................. 289
Exercises .................................................................................................................... 290

Chapter 24 Regression Using Categorical Variables ................................................ 293


24.1 Categories as Explanatory Variables.................................................................. 293
24.2 Interpretation of Regression Using Indicators.................................................... 296
24.3 Interpretation of Multiple Regression ................................................................ 297
24.4 Categories as the Dependent Variable................................................................ 298
Interpretation of the Classifications ....................................................................... 301
Exercises .................................................................................................................... 302

Chapter 25 Regression Models for Cross-Sectional Data......................................... 305


25.1 Cross-Sectional Regression Checklist................................................................ 305
Plot Y versus each X .............................................................................................. 305
Examine the correlation matrix .............................................................................. 305
Calculate the regression model with diagnostics.................................................... 305
Use the model......................................................................................................... 306

Chapter 26 Time Series Data and Forecasts.............................................................. 307


26.1 Time Series Patterns........................................................................................... 307

Chapter 27 Autocorrelation and Autoregression ...................................................... 311


27.1 Linear Time Trend ............................................................................................. 312
27.2 Durbin-Watson Statistic ..................................................................................... 313
27.3 Autocorrelation .................................................................................................. 314
27.4 Autoregression ................................................................................................... 316
27.5 Autocorrelation Coefficients Function ............................................................... 320
27.6 AR(2) Model ...................................................................................................... 322
Exercises .................................................................................................................... 324

Chapter 28 Time Series Smoothing ............................................................................ 325


28.1 Moving Average Using Add Trendline.............................................................. 327
28.2 Moving Average Data Analysis Tool................................................................. 329
28.3 Exponential Smoothing Tool.............................................................................. 330
Exercises .................................................................................................................... 333

Chapter 29 Time Series Seasonality ........................................................................... 335


29.1 Regression Using Indicator Variables ................................................................ 336
29.2 AR(4) Model ...................................................................................................... 342
29.3 Classical Time Series Decomposition ................................................................ 347
Exercises .................................................................................................................... 354
10 Detailed Contents

Chapter 30 Regression Models for Time Series Data ............................................... 357


30.1 Time Series Regression Checklist...................................................................... 357
Plot Y versus time .................................................................................................. 357
Plot Y versus each X .............................................................................................. 357
Examine the correlation matrix .............................................................................. 357
Calculate the regression model with diagnostics.................................................... 358
Use the model......................................................................................................... 358
30.2 Autocorrelation of Residuals.............................................................................. 359

PART 5 CONSTRAINED OPTIMIZATION.................................. 361


Chapter 31 Product Mix Optimization ...................................................................... 363
31.1 Linear Programming Concepts........................................................................... 363
Formulation ............................................................................................................ 363
Graphical Solution.................................................................................................. 363
Sensitivity Analysis................................................................................................ 363
31.2 Basic Product Mix Problem ............................................................................... 365
31.3 Outdoors Problem .............................................................................................. 370
Spreadsheet Model ................................................................................................. 372
Solver Reports........................................................................................................ 373

Chapter 32 Modeling Marketing Decisions ............................................................... 375


32.1 Allocating Advertising Expenditures ................................................................. 375

Chapter 33 Nonlinear Product Mix Optimization .................................................... 381


33.1 Diminishing Profit Margin ................................................................................. 381

Chapter 34 Integer-Valued Optimization Models..................................................... 383


34.1 Transportation Problem...................................................................................... 383
34.2 Modified Transportation Problem ...................................................................... 384
34.3 Scheduling Problem ........................................................................................... 386

Chapter 35 Optimization Models for Finance Decisions .......................................... 389


35.1 Working Capital Management Problem............................................................. 389
35.2 Work Cap Alternate Formulations ..................................................................... 391
35.3 Stock Portfolio Problem ..................................................................................... 393
35.4 MoneyCo Problem ............................................................................................. 395

Appendix Excel for the Macintosh.............................................................................. 397


The Shortcut Menu................................................................................................. 397
Relative and Absolute References.......................................................................... 397

References ..................................................................................................................... 399


Part 1 Models and Sensitivity Analysis

Chapter 1 introduces the terminology for decision models that is used throughout the
book. Several ways to describe a decision problem are discussed, including spreadsheet
models, influence charts, decision trees, and consequence tables.
Chapter 2 contains the documentation and examples for the SensIt sensitivity analysis
add-in for Excel.
Chapter 3 discusses multi-attribute utility which is a useful model for decision problems
with conflicting objectives. The discussion includes extensive sensitivity analysis for
multi-attribute utility using standard Excel features.
12

This page is intentionally mostly blank.


Introduction to
Decision Modeling
1
1.1 MODELS TO AID DECISION MAKING
Decision: irrevocable allocation of resources
Model: abstract representation of reality
What makes decision difficult?
Complexity
many factors to consider; relationships among factors
Uncertainty
Conflicting Objectives
How does modeling help?
Complexity Model; consider each factor separately;
consider relationships explicitly;
avoid being overwhelmed
Uncertainty Sensitivity Analysis and Probability
Conflicting Objectives consider each objective;
consider tradeoffs explicitly
Goals of modeling: recommended solution, insight, clarity of action
14 Chapter 1 Introduction to Decision Modeling

Figure 1.1 Overall Model-Building Flowchart

Real World Model

Abstraction

Math Model

Difficult
Problem

Operations
on Model

Implementation
Model Results

Components of a Decision Model


Controllable input variables
"What you can do," decision variables, alternatives
Uncontrollable input variables
"What you know and don't know," uncertainties, constraints
Relationships
how inputs are related to output, usually with intermediate variables, structure
Intermediate variables
useful for linking inputs to output
Output variable
"What you want," performance measure, overall satisfaction
1.1 Models to Aid Decision Making 15

Influence chart
Rectangle for controllable inputs
Rounded rectangle or oval for other variables

Figure 1.2 Generic Influence Chart

Performance
Measure (Output)

Intermediate
... Variables ...

Controllable Uncontrollable
Factor (Input) ... Factor (Input) ...
16 Chapter 1 Introduction to Decision Modeling

1.2 BASIC WHAT-IF MODEL


Influence Diagram Representation

Figure 1.3 Typical Influence Diagram

Net Output
Cash Flow

Total
Costs
Intermediate
Variables
Sales Total
Revenue Variable Cost

Unit Units Unit Variable Fixed


Price Sold Cost Costs Inputs

Figure 1.4 Typical Spreadsheet Model


1.2 Basic What-If Model 17

Figure 1.5 Formulas for Typical Spreadsheet Model

Figure 1.6 Defined Names for Typical Spreadsheet Model


18 Chapter 1 Introduction to Decision Modeling

Decision Tree Representation

Figure 1.7 Decision Fan and Event Fan

... ...
= =
Decision Decision Event Event
with many Fan with many Fan
possible possible
alternatives outcomes

Figure 1.8 Conceptual Decision Tree

Net
$ Cash
Flow

Unit Fixed Units Unit


Price Costs Sold Variable
Cost

Consequence Table Representation

Figure 1.9 Professor's Summer Decision


Conflicting Objectives
Alternatives Cash Flow Hassle-Free Happy Deans Professional Fame
Develop Software $2700 Yes Maybe Maybe
Teach MBAs $4300 No Yes No
Vacation $0 Yes No No
Sensitivity Analysis
Using SensIt
2
SensIt is a sensitivity analysis add-in for Microsoft Excel (Excel 97 and later versions)
for Windows and Macintosh. The original version was written by Mike Middleton of the
University of San Francisco and Jim Smith of Duke University, and the current version
was rewritten in VBA by Mike Middleton.

2.1 HOW TO INSTALL SENSIT


There are several ways to install SensIt:
(1) Start Excel, and use Excel’s File | Open command to open the SensIt xla file from
floppy or hard drive.
(2) Copy the SensIt xla file to the Program Files | Microsoft Office | Office | Library
folder of your hard drive, in which case SensIt will automatically appear in Excel's Add-
In Manager. Start Excel, and use Excel’s Tools | Add-Ins command to load and unload
SensIt as needed by checking or unchecking the SensIt Sensitivity Analysis checkbox.
(3) Copy the SensIt xla file to your choice of a folder on the hard drive. Start Excel,
choose Tools | Add-Ins | Browse, navigate to the location of the SensIt xla file, select it,
and click OK. Subsequently, use Excel’s Tools | Add-Ins command to load and unload
SensIt as needed by checking or unchecking the SensIt Sensitivity Analysis checkbox.
(4) Copy the SensIt xla file to the Program Files | Microsoft Office | Office | XLStart
folder of your hard drive, in which case the file will be opened every time you start
Excel.
All of SensIt’s functionality, including its built-in help, is a part of the SensIt xla file.
There is no separate setup file or help file. When you use SensIt, it does not create any
Windows Registry entries (although Excel may use such entries to keep track of its add-
ins). SensIt does create a temporary worksheet for intermediate calculations, but after the
calculations are successfully completed, SensIt deletes the temporary worksheet.
20 Chapter 2 Sensitivity Analysis Using SensIt

2.2 HOW TO UNINSTALL OR DELETE SENSIT


(A) First, use your file manager to locate the SensIt xla file, and delete the file from your
hard drive.
(B1) If SensIt is listed under Excel's add-in manager and the box is checked, when you
start Excel you will see "Cannot find ..." Click OK. Choose Tools | Add-Ins, uncheck the
box for SensIt; you will see "Cannot find ... Delete from list?" Click Yes.
(B2) If SensIt is listed under Excel's add-in manager and the box is not checked, start
Excel and choose Tools | Add-Ins. Check the box for SensIt; you will see "Cannot find ...
Delete from list?" Click Yes.

2.3 SENSIT OVERVIEW


To run SensIt, start Excel and open the SensIt xla file. Alternatively, install SensIt using
one of the methods described above. SensIt adds a Sensitivity Analysis command to the
Tools menu. The Sensitivity Analysis command has three subcommands: One Input, One
Output; Many Inputs, One Output; and Help.
Before using the SensIt options, you must have a spreadsheet model with one or more
inputs and an output. SensIt's features make it easy for you to see how sensitive the
output is to changes in the inputs.
Use SensIt’s One Input, One Output option to see how your model’s output depends on
changes in a single input variable. This feature creates an XY (Scatter) chart type.
Use SensIt’s Many Inputs, One Output option to see how your model’s output depends
on ranges you specify for each of the model’s input variables. This feature creates a
tornado chart (a horizontal Bar chart type) and a spider chart (an XY (Scatter) chart type).

2.4 EXAMPLE PROBLEM


Eagle Airlines is deciding whether to purchase a five-seat aircraft where some proportion
of the hours flown would be charter flights and some hours would be regularly scheduled
ticketed flights with an uncertain number of seats sold (capacity). A spreadsheet model
that does not include financing costs is shown below.
2.5 One Input, One Output 21

Figure 2.1 Model Display


A B C
1 Spreadsheet Model For Eagle Airlines
2
3 Input Variables Input Cells
4 Charter Price/Hour $325
5 Ticket Price/Hour $100
6 Hours Flown 800
7 Capacity of Scheduled Flights 50%
8 Proportion of Chartered Flights 0.5
9 Operating Cost/Hour $245
10 Insurance $20,000
11
12 Intermediate Calculations
13 Total Revenue $230,000
14 Total Cost $216,000
15
16 Performance Measure
17 Annual Profit $14,000
18
19 Adapted from Bob Clemen's textbook,
20 Making Hard Decisions, 2nd ed., Duxbury (1996).

Figure 2.2 Model Formulas


A B
11
12 Intermediate Calculations
13 Total Revenue =(B8*B6*B4)+((1-B8)*B6*B5*B7*5)
14 Total Cost =(B6*B9)+B10
15
16 Performance Measure
17 Annual Profit =B13-B14
18

2.5 ONE INPUT, ONE OUTPUT


Use SensIt’s One Input, One Output option to see how your model’s output depends on
changes in a single input variable.
22 Chapter 2 Sensitivity Analysis Using SensIt

Figure 2.3 SensIt One Input, One Output Dialog Box

Cells for Input Variable


In the Label reference edit box, type a cell reference, or point to the cell containing a text
label and click. In the Value reference edit box, type a cell reference, or point to the cell
containing a numeric value that is an input cell of your model.

Cells for Output Variable


In the Label reference edit box, type a cell reference, or point to the cell containing a text
label and click. In the Value reference edit box, type a cell reference, or point to the cell
containing a formula that is the output of your model.

Input Values
Type numbers in the Start, Step, and Stop edit boxes to specify values to be used in the
input variable’s cell. Cell references are not allowed.
Click OK: SensIt uses the Start, Step, and Stop values to prepare a table of values. Each
value is copied to the input variable Value cell, the worksheet is recalculated, and the
value of the output variable Value cell is copied to the table. (You could do this manually
in Excl using the Edit | Fill | Series and Data | Table commands.) SensIt uses the paired
input and output values to prepare an XY (Scatter) chart. The text in the label cells you
identified are used as the chart’s axis labels. (You could do this manually using the
ChartWizard.)
2.6 Many Inputs, Many Outputs Tornado 23

Figure 2.4 SensIt Numerical and Chart Output


SensIt 1.20 Professional
One Input, One Output
SensIt 1.20 Professional
Date (current date)
Time (current time) $25,000
Workbook senssamp.xls
Input Cell Model!$B$6
Output Cell Model!$B$17 $20,000

Hours Flown Annual Profit


400 -$3,000 $15,000
450 -$875

Annual Profit
500 $1,250
$10,000
550 $3,375
600 $5,500
650 $7,625 $5,000
700 $9,750
750 $11,875
800 $14,000 $0
850 $16,125
900 $18,250
950 $20,375 -$5,000
400 500 600 700 800 900 1000
1000 $22,500
Hours Flow n

From the table and chart, we observe that Eagle must fly approximately 480 hours to
achieve a positive profit, assuming all other assumptions stay the same. The exact
threshold value for Hours Flown could be obtained using Excel's Goal Seek feature.

2.6 MANY INPUTS, MANY OUTPUTS TORNADO


Use SensIt’s Tornado option to see how your model’s output depends on ranges you
specify for each of the model’s input variables. Before using Tornado, arrange your
model input cells in adjacent cells in a single column, arrange corresponding labels in
adjacent cells in a single column, and arrange Low, Base, and High input values for each
input variable in three separate columns. Alternatively, the three columns containing
input values can be worst case, likely case, and best case. An appropriate arrangement is
shown below.
24 Chapter 2 Sensitivity Analysis Using SensIt

Figure 2.5 Model Display with Lower and Upper Bounds


A B C D E F
1 Spreadsheet Model For Eagle Airlines
2
3 Input Variables Input Cells Lower Bound Base Value Upper Bound
4 Charter Price/Hour $325 $300 $325 $350
5 Ticket Price/Hour $100 $95 $100 $108
6 Hours Flown 800 500 800 1000
7 Capacity of Scheduled Flights 50% 40% 50% 60%
8 Proportion of Chartered Flights 0.5 0.45 0.5 0.7
9 Operating Cost/Hour $245 $230 $245 $260
10 Insurance $20,000 $18,000 $20,000 $25,000
11
12 Intermediate Calculations
13 Total Revenue $230,000
14 Total Cost $216,000
15
16 Performance Measure
17 Annual Profit $14,000
18
19 Adapted from Bob Clemen's textbook,
20 Making Hard Decisions, 2nd ed., Duxbury (1996).

Figure 2.6 SensIt Many Inputs, One Output Dialog Box

Ranges for Input Variables


Type a range reference, or point to the range (click and drag) containing text labels and
the range containing numeric values that are inputs to your model. If the range is not
contiguous, select the first portion and then hold down the Control key while making the
remaining selections. Alternatively, type a comma between each portion.
2.6 Many Inputs, Many Outputs Tornado 25

Cells for Output Variable


Type a cell reference, or point to the cell containing a text label and the cell containing a
formula that’s the output of your model.

Ranges for Input Values


Type a range reference, or point to the range (click and drag) containing numeric values
for each of your model’s inputs. You can make non-contiguous selections similar to the
ranges for input variables. Be sure that all five range selections have the appropriate cells
in the same order.
After you click OK, for each input variable, SensIt sets all other input values at their Base
case values, copies the One Extreme input value to the input variable cell, recalculates the
worksheet, and copies the value of the output variable cell to the table; the steps are
repeated using each Other Extreme input value. For each input variable, SensIt computes
the range of the output variable values (the swing), sorts the table from largest swing
down to smallest smallest, and prepares a bar chart.

Figure 2.7 SensIt Tornado Numerical and Chart Output


A B C D E F G H I J
1 SensIt 1.20 Professional
2 Many Inputs, One Output
3 Single-Factor Sensitivity Analysis
4
5 Date (current date) Workbook senssamp.xls
6 Time (current time) Output Cell Cases!$B$17
7
8 Input Value Output Value (Annual Profit)
9 Input Variable Low Output Base Case High Output Low Base Case High Swing
10 Capacity of Scheduled Flights 40% 50% 60% -$6,000 $14,000 $34,000 $40,000
11 Operating Cost/Hour $260 $245 $230 $2,000 $14,000 $26,000 $24,000
12 Hours Flown 500 800 1000 $1,250 $14,000 $22,500 $21,250
13 Charter Price/Hour $300 $325 $350 $4,000 $14,000 $24,000 $20,000
14 Proportion of Chartered Flights 0.45 0.5 0.7 $11,000 $14,000 $26,000 $15,000
15 Ticket Price/Hour $95 $100 $108 $9,000 $14,000 $22,000 $13,000
16 Insurance $25,000 $20,000 $18,000 $9,000 $14,000 $16,000 $7,000
17
18
19 SensIt 1.20 Professional
20
21
Capacity of Scheduled Flights 40% 60%
22
23
Operating Cost/Hour $260 $230
24
25 Hours Flown 500 1000
26
27 Charter Price/Hour $300 $350
28
29 Proportion of Chartered Flights 0.45 0.7
30
31 Ticket Price/Hour $95 $108
32
Insurance $25,000 $18,000
33
34
-$15,000 -$10,000 -$5,000 $0 $5,000 $10,000 $15,000 $20,000 $25,000 $30,000 $35,000 $40,000
35
Annual Profit
36
37
26 Chapter 2 Sensitivity Analysis Using SensIt

The uncertainty about Capacity of Scheduled Flights is associated with the widest swing
in Annual Profit.

2.7 TORNADO SORTED BY DOWNSIDE RISK


The tornado chart is originally sorted by Swing. To sort by downside risk, i.e., by the low
output values, select the data in cells A10:J16, choose Data | Sort, check that "No header
row" is selected, select "Sort by" column F Ascending, and click OK. The results are
shown below.

Figure 2.8 SensIt Tornado Sorted by Downside Risk


A B C D E F G H I J
8 Input Value Output Value (Annual Profit)
9 Input Variable Low Output Base Case High Output Low Base Case High Swing
10 Capacity of Scheduled Flights 40% 50% 60% -$6,000 $14,000 $34,000 $40,000
11 Hours Flown 500 800 1000 $1,250 $14,000 $22,500 $21,250
12 Operating Cost/Hour $260 $245 $230 $2,000 $14,000 $26,000 $24,000
13 Charter Price/Hour $300 $325 $350 $4,000 $14,000 $24,000 $20,000
14 Ticket Price/Hour $95 $100 $108 $9,000 $14,000 $22,000 $13,000
15 Insurance $25,000 $20,000 $18,000 $9,000 $14,000 $16,000 $7,000
16 Proportion of Chartered Flights 0.45 0.5 0.7 $11,000 $14,000 $26,000 $15,000
17
18
19 SensIt 1.20 Professional
20
21
Capacity of Scheduled Flights 40% 60%
22
23
Hours Flown 500 1000
24
25 Operating Cost/Hour $260 $230
26
27 Charter Price/Hour $300 $350
28
29 Ticket Price/Hour $95 $108
30
31 Insurance $25,000 $18,000
32
Proportion of Chartered Flights 0.45 0.7
33
34
-$15,000 -$10,000 -$5,000 $0 $5,000 $10,000 $15,000 $20,000 $25,000 $30,000 $35,000 $40,000
35
Annual Profit
36
37

2.8 TORNADO SORTED BY UPSIDE POTENTIAL


To sort by upside potential, i.e., by the high output values, select the data in cells
A10:J16, choose Data | Sort, check that "No header row" is selected, select "Sort by"
column H Dscending, and click OK. The results are shown below.
2.9 Tornado Showing Major Uncertainties 27

Figure 2.9 SensIt Tornado Sorted by Upside Potential


A B C D E F G H I J
8 Input Value Output Value (Annual Profit)
9 Input Variable Low Output Base Case High Output Low Base Case High Swing
10 Capacity of Scheduled Flights 40% 50% 60% -$6,000 $14,000 $34,000 $40,000
11 Operating Cost/Hour $260 $245 $230 $2,000 $14,000 $26,000 $24,000
12 Proportion of Chartered Flights 0.45 0.5 0.7 $11,000 $14,000 $26,000 $15,000
13 Charter Price/Hour $300 $325 $350 $4,000 $14,000 $24,000 $20,000
14 Hours Flown 500 800 1000 $1,250 $14,000 $22,500 $21,250
15 Ticket Price/Hour $95 $100 $108 $9,000 $14,000 $22,000 $13,000
16 Insurance $25,000 $20,000 $18,000 $9,000 $14,000 $16,000 $7,000
17
18
19 SensIt 1.20 Professional
20
21
Capacity of Scheduled Flights 40% 60%
22
23
Operating Cost/Hour $260 $230
24
25 Proportion of Chartered Flights 0.45 0.7
26
27 Charter Price/Hour $300 $350
28
29 Hours Flown 500 1000
30
31 Ticket Price/Hour $95 $108
32
Insurance $25,000 $18,000
33
34
-$15,000 -$10,000 -$5,000 $0 $5,000 $10,000 $15,000 $20,000 $25,000 $30,000 $35,000 $40,000
35
Annual Profit
36
37

2.9 TORNADO SHOWING MAJOR UNCERTAINTIES


In some situations you may have twenty or more input variables and you wish to show
the variation of only the top five or ten. To illustrate this modification, consider showing
only the top four input variables in the example. Click one of the bars on the left side of
the vertical base case line to select Series 1 (shown at the right end of the formula bar),
and then click and drag the fill handle from A16 up to A13 and the fill handle from F16
up to F13. Click one of the bars on the right side of the vertical base case line to select
Series 2, and then click and drag the fill handle from H16 up to H13. To resize the chart,
click just inside its outer border and drag the bottom center fill handle upward. The
resulting chart is shown below.
28 Chapter 2 Sensitivity Analysis Using SensIt

Figure 2.10 SensIt Tornado Showing Only Major Uncertainties

SensIt 1.20 Professional

Capacity of Scheduled Flights 40% 60%

Hours Flow n 500 1000

Operating Cost/Hour $260 $230

Charter Price/Hour $300 $350

-$15,000 -$10,000 -$5,000 $0 $5,000 $10,000 $15,000 $20,000 $25,000 $30,000 $35,000 $40,000
Annual Profit

2.10 SPIDER
Use SensIt’s Spider option to see how your model’s output depends on the same
percentage changes for each of the model’s input variables.
Click OK: SensIt Spider uses the Start (%), Step (%), and Stop (%) values and the
original (base case) numeric value in each input variable cell to prepare a table of
percentage change input values. For each input variable, all other input values are set at
their base case values, each percentage change input value is copied to the input variable
cell, the worksheet is recalculated, and the value of the output variable cell is copied to
the table. SensIt prepares two XY (Scatter) charts; the horizontal axis is percentage
change of input variables; the vertical axis is model output value on one chart and
percentage change of model output value on the other; the input variables’ labels are used
for chart legends.
2.11 Tips for Many Inputs, One Output 29

Figure 2.11 SensIt Spider Numerical and Chart Output


A B C D E F G H I J K
1 SensIt 1.20 Professional
2 Many Inputs, One Output
3 Single-Factor Sensitivity Analysis
4
5 Date (current date) Workbook senssamp.xls
6 Time (current time) Output Cell Cases!$B$17
7
8 Input Value Input Value as % of Base Output Value (Annual Profit)
9 Input Variable Low Output Base Case High Output Low Output Base Case High Output Low Base Case High Swing
10 Capacity of Scheduled Flights 40% 50% 60% 80.0% 100.0% 120.0% -$6,000 $14,000 $34,000 $40,000
11 Operating Cost/Hour $260 $245 $230 106.1% 100.0% 93.9% $2,000 $14,000 $26,000 $24,000
12 Hours Flown 500 800 1000 62.5% 100.0% 125.0% $1,250 $14,000 $22,500 $21,250
13 Charter Price/Hour $300 $325 $350 92.3% 100.0% 107.7% $4,000 $14,000 $24,000 $20,000
14 Proportion of Chartered Flights 0.45 0.5 0.7 90.0% 100.0% 140.0% $11,000 $14,000 $26,000 $15,000
15 Ticket Price/Hour $95 $100 $108 95.0% 100.0% 108.0% $9,000 $14,000 $22,000 $13,000
16 Insurance $25,000 $20,000 $18,000 125.0% 100.0% 90.0% $9,000 $14,000 $16,000 $7,000
17
18
19 SensIt 1.20 Professional
20
21
22 $40,000
23
24 $35,000
25
26 $30,000
27
28 $25,000
29
30 $20,000 Capacity of Scheduled Flights
Annual Profit

31 Operating Cost/Hour
32 $15,000 Hours Flown
33 Charter Price/Hour
34 $10,000 Proportion of Chartered Flights
35 Ticket Price/Hour
36 $5,000 Insurance
37
38 $0
39
40 -$5,000
41
42 -$10,000
43
44 -$15,000
45 50.0% 60.0% 70.0% 80.0% 90.0% 100.0% 110.0% 120.0% 130.0% 140.0% 150.0%
46
Input Value as % of Base Case
47
48

2.11 TIPS FOR MANY INPUTS, ONE OUTPUT


When defining the high and low cases for each variable, it is important to be consistent so
that the "high" cases are all equally high and the "low" cases are equally low. This will
ensure that the output results can be meaningfully compared.
For example, if you are thinking about the uncertainty using probability and very extreme
values are possible but with low probability of occurrence, you might take all of the base
case values to be estimates of the mean of the input variable, take low cases to be values
such there is a 1-in-10 chance of the variable being below this amount, and take the high
cases to be values such that there is a 1-in-10 chance of the variable being above this
amount. Or, you might use the 5th and 95th percentiles for each of the input variables.
30 Chapter 2 Sensitivity Analysis Using SensIt

Alternatively, in some situations the values for each input variable may have lower and
upper bounds, so you may specify low and high values that are the absolute lowest and
highest possible values.
When you click OK, SensIt sets all of the input variables to their base-case values and
records the output value. Then SensIt goes through each of the input variables one at a
time, plugs the low-case value into the input cell, and records the value in the output cell.
It then repeats the process for the high case. For each substitution, all input values are
kept at their base-case values except for the single input value that is setn at it low or high
value. SensIt then produces a spreadsheet that lists the numerical results as shown in
columns F, G, and H of the worksheet with the tornado chart.
In the worksheet, the variables are sorted by their "swing" -- the absolute value of the
difference between the output values in the low and high cases. "Swing" serves as a
rough measure of the impact of each input variable. The rows of numerical output are
sorted from highest swing at the top down to lowest swing at the bottom. Then SensIt
creates a bar chart of the sorted data.
In general, you should focus your modeling efforts on those variables with the greatest
impact on the value measure.
If your model has input variables that are discrete or categorical, you should create
multiple tornado charts using different base case values of that input variable. For
example, if your model has an input variable "Government Regulation" that has possible
values 0 (zero) or 1, the low and high values will be 0 and 1, but you should run one
tornado chart with base case = 0 and another tornado chart with base case = 1.
2.12 Eagle Airlines Problem 31

2.12 EAGLE AIRLINES PROBLEM


Figure 2.12 Ten-Variable Eagle Model Display
A B C D E F
1 Spreadsheet Model For Eagle Airlines
2
3 Variable Input Cells Lower Bound Base Value Upper Bound
4 Hours Flown 800 500 800 1000
5 Charter Price/Hour $325 $300 $325 $350
6 Ticket Price/Hour $100 $95 $100 $108
7 Capacity of Scheduled Flights 50% 40% 50% 60%
8 Proportion Of Chartered Flights 0.5 0.45 0.5 0.7
9 Operating Cost/Hour $245 $230 $245 $260
10 Insurance $20,000 $18,000 $20,000 $25,000
11 Proportion Financed 0.4 0.3 0.4 0.5
12 Interest Rate 11.5% 10.5% 11.5% 13.0%
13 Purchase Price $87,500 $85,000 $87,500 $90,000
14
15 Total Revenue $230,000
16 Total Cost $220,025
17
18 Annual Profit $9,975
19
20 Adapted from Bob Clemen's textbook, Making Hard Decisions

Figure 2.13 Ten-Variable Eagle Model Formulas


A B C D E F
14
15 Total Revenue =(B8*B4*B5)+((1-B8)*B4*B6*B7*5)
16 Total Cost =(B4*B9)+B10+(B13*B11*B12)
17
18 Annual Profit =B15-B16
19
32 Chapter 2 Sensitivity Analysis Using SensIt

Figure 2.14 Ten-Variable Worst Case and Best Case Inputs Determined by Solver
Variable Worst Case Base Case Best Case
Hours Flown 1000 800 1000
Charter Price/Hour $300 $325 $350
Ticket Price/Hour $95 $100 $108
Capacity of Scheduled Flights 40% 50% 60%
Proportion Of Chartered Flights 0.45 0.5 0.7
Operating Cost/Hour $260 $245 $230
Insurance $25,000 $20,000 $18,000
Proportion Financed 0.5 0.4 0.3
Interest Rate 13.0% 11.5% 10.5%
Purchase Price $90,000 $87,500 $85,000

Total Revenue $239,500 $230,000 $342,200


Total Cost $290,850 $220,025 $250,678

Annual Profit -$51,350 $9,975 $91,523


Multiattribute Utility
3
3.1 APPLICATIONS OF MULTI-ATTRIBUTE UTILITY
Strategy for Dealing with Microcomputer Networking
Impact on microcomputer users
Productivity enhancement
User satisfaction
Impact on mainframe capacity
Costs
Upward compatibility of the network
Impacts on organizational structure
Risks
Purchase of manufacturing machinery
Price
Technical features
Service
Choosing a manager candidate
Education
Management skills
Technical skills
Personal skills
Choosing a beverage container (soft drink industry)
34 Chapter 3 Multiattribute Utility

Energy to produce
Cost
Environmental waste
Customer service
Selecting a best job
Monetary compensation
Geographical location
Travel requirements
Nature of work

3.2 MULTIATTRIBUTE UTILITY SWING WEIGHTS


Excel Workbook Clemen15.xls
Conflicting Objectives: Fundamental Objectives versus Means Objectives
Clemen, Making Hard Decisions, Ch. 15
Multiattribute Utility
Set of Objectives should be
1) complete
2) as small as possible
3) not redundant
4) decomposable ("independent" or unrelated)
Additive Utility Function
Overall Score of Alternative = Sum [ Weight times Attribute Score of Alternative ]

Figure 3.1 Data for Example


Attribute Red Portalo Blue Norushi Yellow Standard

Life span, in years 12 9 6

Price $17,000 $10,000 $8,000

Color Red Blue Yellow


3.2 MultiAttribute Utility Swing Weights 35

Attribute Scores

Figure 3.2 Individual Utility for Life Span


Life Span
Scores for Life Span
Years Score
6 0 1.0
9 0.5
12 1
0.8
Life Span Score

0.6

0.4

0.2

0.0
5 6 7 8 9 10 11 12 13
Life Span, in years

Figure 3.3 Individual Utility for Price


Price
Scores for Price
Price Score
$17,000 0 1.0
$10,000 0.78
$8,000 1 0.8
Price Score

0.6

0.4

0.2

0.0
$5,000 $10,000 $15,000 $20,000
Price
36 Chapter 3 Multiattribute Utility

Figure 3.4 Individual Utility for Color


Color
Scores for Color
Color Score
Red 0 1.0
Blue 0.667
Yellow 1
0.8
Color Score

0.6

0.4

0.2

0.0
Red Blue Yellow
Color

Swing Weights

Figure 3.5 Swing Weight Assessment Display

A B C D E F G
1 Swing Weights
2
3 Consequence to Compare
4 Attribute Swung from
5 Worst to Best Life span Price Color Rank Rate Weight
6 (Benchmark) 6 years $17,000 red 4 0 0.000
7 Life span 12 years $17,000 red 2 75 0.405
8 Price 6 years $8,000 red 1 100 0.541
9 Color 6 years $17,000 yellow 3 10 0.054
10 185

1) Hypothetical alternatives (number of attributes plus one)


Benchmark alternative is worst for all attributes
Each other hypothetical alternative has one attribute at best, all others at worst
2) Rank the hypothetical alternatives
3) Benchmark has rating zero, first ranked alternative has rating 100
3.2 MultiAttribute Utility Swing Weights 37

Assign level-of-satisfaction ratings to the intermediate alternatives


4) Weight equals rating divided by sum of ratings

Figure 3.6 Swing Weight Assessment Formulas


A B C D E F G
1 Swing Weights
2
3 Consequence to Compare
4 Attribute Swung from
5 Worst to Best Life span Price Color Rank Rate Weight
6 (Benchmark) 6 years $17,000 red 4 0 =F6/$F$10
7 Life span 12 years $17,000 red 2 75 =F7/$F$10
8 Price 6 years $8,000 red 1 100 =F8/$F$10
9 Color 6 years $17,000 yellow 3 10 =F9/$F$10
10 =SUM(F6:F9)

Overall Scores

Figure 3.7 Swing Weight Overall Scores Display


I J K L M N O P Q
1 Overall Scores
2
3 Red Portalo Blue Norushi Yellow Standard
4 Attribute Attribute Attribute Attribute Attribute Attribute
5 Attribute Value Score Value Score Value Score
6 Life span 12 1.000 9 0.500 6 0.000
7 Price $17,000 0.000 $10,000 0.780 $8,000 1.000
8 Color Red 0.000 Blue 0.667 Yellow 1.000
9
10 Overall Score 0.40541 0.66038 0.59459
11
12 Best Blue Norushi

Figure 3.8 Swing Weight Overall Scores Formulas


I J K L M N O P Q R
1 Overall Scores
2
3 Red Portalo Blue Norushi Yellow Standard
4 Attribute Attribute Attribute Attribute Attribute Attribute
5 Attribute Value Score Value Score Value Score
6 Life span 12 1.000 9 0.500 6 0.000
7 Price $17,000 0.000 $10,000 0.780 $8,000 1.000
8 Color Red 0.000 Blue 0.667 Yellow 1.000
9
10 Overall Score =SUMPRODUCT($G$7:$G$9,K6:K8) =SUMPRODUCT($G$7:$G$9,N6:N8) =SUMPRODUCT($G$7:$G$9,Q6:Q8)
11
12 Best =IF(K10=MAX(K10,N10,Q10),"Red Portalo",IF(N10=MAX(K10,N10,Q10),"Blue Norushi","Yellow Standard"))
38 Chapter 3 Multiattribute Utility

Figure 3.9 Sensitivity Analysis


U V W X Y Z AA
1 Sensitivity Analysis Data Tables
2
3 Life Span Rate (10 to 100) Color Rate (0 to 75)
4
5 W9 Output Formula: =J12 Z9 Output Formula: =J12
6 Column Input Cell: F7 Column Input Cell: F9
7
8 Life Span Rate Best Color Rate Best
9
10 10 Yellow Standard 0 Blue Norushi
11 15 Yellow Standard 5 Blue Norushi
12 20 Yellow Standard 10 Blue Norushi Base Case
13 25 Yellow Standard 15 Blue Norushi
14 30 Yellow Standard 20 Blue Norushi
15 35 Yellow Standard 25 Blue Norushi
16 40 Yellow Standard 30 Blue Norushi
17 45 Yellow Standard 35 Blue Norushi
18 50 Yellow Standard 40 Blue Norushi
19 55 Blue Norushi 45 Blue Norushi
20 60 Blue Norushi 50 Yellow Standard
21 65 Blue Norushi 55 Yellow Standard
22 70 Blue Norushi 60 Yellow Standard
23 Base Case 75 Blue Norushi 65 Yellow Standard
24 80 Blue Norushi 70 Yellow Standard
25 85 Blue Norushi 75 Yellow Standard
26 90 Blue Norushi
27 95 Blue Norushi
28 100 Blue Norushi

3.3 SENSITIVITY ANALYSIS METHODS


SENSITIVITY ANALYSIS FOR MULTI-ATTRIBUTE UTILITY USING EXCEL
This paper describes several standard methods for analyzing decisions where the
outcomes have multiple attributes. The example problem concerns a large company that
is planning to purchase several hundred cars for use by the sales force. The company
wants a car that is inexpensive, safe, and lasts a long time. Figure 1 shows data for seven
cars that are being considered.
3.3 Sensitivity Analysis Methods 39

Figure 1 Attribute Data for Seven Alternatives


A B C D E F G H
1 Alternatives
2 Attribute Alta Bulldog Cruiser Delta Egret Fleet Garnett
3 Cost $20 $18 $16 $14 $12 $10 $15
4 Lifetime 10 10 8 8 6 6 8
5 Safety High Medium High Medium Medium Low Low
6
7 Cost thousands of dollars
8 Lifetime expected years
9 Safety third-party rating

Other attributes might be important, e.g., comfort and prestige. The cost attribute should
include operating costs, insurance, and salvage value, in addition to purchase price. It
might be appropriate to combine the cost and lifetime attributes into a single attribute,
e.g., cost per year. Clemen [1] suggests that a set of attributes should be complete (so that
all important objectives are included), as small as possible (to facilitate analysis), not
redundant (to avoid double-counting a common underlying characteristic), and
decomposable (so that the decision maker can think about each attribute separately).

Dominance
An alternative can be eliminated if another alternative is better on some objectives and no
worse on the others. The Garnett is more expensive than the Delta, has the same lifetime,
and has a lower safety rating. So the Garnett can be eliminated from further
consideration.

Monetary Equivalents Assessment


One method for comparing multi-attribute alternatives is to subjectively assign monetary
values to the non-monetary attributes. For example, the decision maker may determine
that each additional year of expected lifetime is worth $500, medium safety is $4,000
better than low safety, and high safety is $6,000 better than low safety. Arbitrarily using
Fleet as the base case with total equivalent cost of $10,000, Figure 2 shows costs and
equivalent costs, in thousands of dollars, in rows 9:11. The negative entries for Lifetime
and Safety correspond to positive benefits relative to the Fleet car's base case values.
Based on this method, the Egret is chosen. Sensitivity analysis, not shown here, would
involve seeing how the choice depends on subjective equivalents different from the $500
per year lifetime and the $4,000 and $6,000 safety assessments.
Hammond et al. [3] describe another method involving even swaps that could be used to
select the best alternative.
40 Chapter 3 Multiattribute Utility

Figure 2 Monetary Equivalents for Non-Dominated Alternatives


A B C D E F G
1 Non-Dominated Alternatives
2 Attribute Alta Bulldog Cruiser Delta Egret Fleet
3 Cost $20 $18 $16 $14 $12 $10
4 Lifetime, years 10 10 8 8 6 6
5 Safety rating High Medium High Medium Medium Low
6
7 Non-Dominated Alternatives
8 Attribute Alta Bulldog Cruiser Delta Egret Fleet
9 Cost $20 $18 $16 $14 $12 $10
10 Lifetime, $ -$2 -$2 -$1 -$1 $0 $0
11 Safety, $ -$6 -$4 -$6 -$4 -$4 $0
12
13 Equiv. Cost $12 $12 $9 $9 $8 $10

Additive Utility Function


The additive multi-attribute utility function U includes individual utility functions Ui for
each attribute xi, usually scaled from 0 to 1, and weights wi that reflect the decision
maker's tradeoffs among the attributes.
U(x1,x2,x3) = w1.U1(x1) + w2.U2(x2) + w3.U3(x3), where w1 + w2 + w3 = 1 (1)
Weights may be specified directly, as ratios, or using a swing weight procedure.
Individual utility functions are assessed using the range of attribute values for the
alternatives being considered.
The individual utility values for Cost and Lifetime shown in Figure 3 are based on
proportional scores, corresponding to linear utility functions. For example, each thousand
dollar difference in cost is associated with a 0.1 difference in utility. The utility values for
Safety are subjective judgments. For example, the decision maker thinks that a change in
Safety from Low to Medium achieves only two-thirds of the satisfaction associated with
a change from Low to High.
3.3 Sensitivity Analysis Methods 41

Figure 3 Individual Utilities


A B C D E F G
1 Non-Dominated Alternatives
2 Attribute Alta Bulldog Cruiser Delta Egret Fleet
3 Cost $20 $18 $16 $14 $12 $10
4 Lifetime 10 10 8 8 6 6
5 Safety High Medium High Medium Medium Low
6
7 Assess individual utility for each attribute.
8 Cost U($20,000)=0, U($10,000)=1, linear
9 Lifetime U(6 years)=0, U(10 years)=1, linear
10 Safety U(Low)=0, U(Medium)=2/3, U(High)=1
11
12 Non-Dominated Alternatives
13 Attribute Alta Bulldog Cruiser Delta Egret Fleet
14 Cost 0.000 0.200 0.400 0.600 0.800 1.000
15 Lifetime 1.000 1.000 0.500 0.500 0.000 0.000
16 Safety 1.000 0.667 1.000 0.667 0.667 0.000

Compared to the assessments for individual utility, the assessments for tradeoffs are
usually much more difficult to make. The following sections focus on assessments of
tradeoff weights and sensitivity analysis.

Weight Ratio Assessment


One method for measuring trade-offs among the conflicting objectives is to assess weight
ratios. For example, the decision maker may judge that cost is five times as important as
lifetime, which may be interpreted to mean that the change in overall satisfaction
corresponding to a change in cost from $20,000 to $10,000 is five times the change in
overall satisfaction corresponding to a change in lifetime from 6 years to 10 years.
Similarly, the decision maker may judge that a $10,000 decrease in cost is one and a half
times as satisfying as a change from a low to a high safety rating. The assessments are
shown in cells J4:J5 in Figure 4.
42 Chapter 3 Multiattribute Utility

Figure 4 Weight Ratio Assessment and Choice


A B C D E F G H I J
1 Non-Dominated Alternatives Assess weight ratios.
2 Attribute Alta Bulldog Cruiser Delta Egret Fleet
3 Cost 0.000 0.200 0.400 0.600 0.800 1.000 Weight Ratio Input
4 Lifetime 1.000 1.000 0.500 0.500 0.000 0.000 Cost/Lifetime 5.0
5 Safety 1.000 0.667 1.000 0.667 0.667 0.000 Cost/Safety 1.5
6
7 Overall 0.464 0.452 0.625 0.613 0.667 0.536 Weights
8 Cost 0.536
9 Max Value 0.667 Lifetime 0.107
10 Location 5 Safety 0.357
11 Choice Egret
12
13 Choice Egret

With three attributes, the two assessed weight ratios determine two equations and the
requirement that the weights sum to one determines a third equation. Using algebra, a
solution for the three unknown weights is shown in cells J8:J10 in Figure 5.
The formula for overall utility in cell B7, with a relative reference to the attribute utilities
in B3:B5 and an absolute reference to the weights in J8:J10, is copied to cells C7:G7.
The MAX worksheet function determines the maximum overall utility in B7:G7, the
MATCH function determines the location of that maximum in B7:G7, and the INDEX
function returns the alternative name located in B2:G2. The zero argument in the
MATCH function is needed to specify that an exact match is required; the zero argument
in the INDEX function is used as a placeholder and could be omitted in this application
without affecting the results. Cell B13 combines these functions into a single formula.

Figure 5 Formulas for Weight Ratio Assessment and Choice


A B H I J
1 Non-Dominated Alternatives Assess weight ratios.
2 Attribute Alta
3 Cost 0 Weight Ratio Input
4 Lifetime 1 Cost/Lifetime 5
5 Safety 1 Cost/Safety 1.5
6
7 Overall =SUMPRODUCT(B3:B5,$J$8:$J$10) Weights
8 Cost =1/(1/J4+1/J5+1)
9 Max Value =MAX(B7:G7) Lifetime =J8/J4
10 Location =MATCH(B9,B7:G7,0) Safety =J8/J5
11 Choice =INDEX(B2:G2,0,B10)
12
13 Choice =INDEX(B2:G2,0,MATCH(MAX(B7:G7),B7:G7,0))

After deleting cells A9:B12, the single formula is in cell B9. The arrangement shown in
Figure 6 is used for the remaining analyses.
3.3 Sensitivity Analysis Methods 43

Figure 6 Weight Ratio Choice for Sensitivity Analysis


A B C D E F G
1 Non-Dominated Alternatives
2 Attribute Alta Bulldog Cruiser Delta Egret Fleet
3 Cost 0.000 0.200 0.400 0.600 0.800 1.000
4 Lifetime 1.000 1.000 0.500 0.500 0.000 0.000
5 Safety 1.000 0.667 1.000 0.667 0.667 0.000
6
7 Overall 0.464 0.452 0.625 0.613 0.667 0.536
8
9 Choice Egret

Weight Ratio Sensitivity Analysis


The decision maker specified tradeoffs using weight ratios, so it is appropriate to see
whether the choice is sensitive to changes in those assessed values. To construct a two-
way data table for sensitivity analysis of the weight ratios as shown in Figures 7 and 8,
enter a set of values in a row, N4:R4, and another set of values in a column, M5:M13. In
the top left cell of the data table, M4, enter a formula for determining the data table's
output values, =B9. (To improve the appearance of the table, cell M4 is formatted with a
custom three-semicolon format so that the formula result is not displayed.) Select
M4:R13. Choose Data | Table. In the Data Table dialog box, specify J4 as the Row Input
Cell and J5 as the Column Input Cell. Click OK.

Figure 7 Coarse Two-Factor Sensitivity Analysis of Weight Ratios


L M N O P Q R
1 Two-Factor Sensitivity Analysis
2
3 Cost/Lifetime Weight Ratio
4 3.0 4.0 5.0 6.0 7.0
5 Cost/Safety 1.00 Cruiser Cruiser Cruiser Cruiser Cruiser
6 Weight 1.25 Cruiser Egret Egret Egret Egret
7 Ratio 1.50 Egret Egret Egret Egret Egret
8 1.75 Egret Egret Egret Egret Egret
9 2.00 Egret Egret Egret Egret Egret
10 2.25 Egret Egret Egret Egret Egret
11 2.50 Egret Egret Egret Egret Egret
12 2.75 Egret Egret Egret Egret Egret
13 3.00 Egret Egret Egret Egret Egret

Cell P7, corresponding to the original assessments, has a border. The data table is
dynamic, so the macro view may be refined near the base-case assessments by specifying
different input values.
44 Chapter 3 Multiattribute Utility

Figure 8 Fine Two-Factor Sensitivity Analysis of Weight Ratios


L M N O P Q R
1 Two-Factor Sensitivity Analysis
2
3 Cost/Lifetime Weight Ratio
4 4.0 4.5 5.0 5.5 6.0
5 Cost/Safety 1.00 Cruiser Cruiser Cruiser Cruiser Cruiser
6 Weight 1.10 Cruiser Cruiser Cruiser Egret Egret
7 Ratio 1.20 Cruiser Egret Egret Egret Egret
8 1.30 Egret Egret Egret Egret Egret
9 1.40 Egret Egret Egret Egret Egret
10 1.50 Egret Egret Egret Egret Egret
11 1.60 Egret Egret Egret Egret Egret
12 1.70 Egret Egret Egret Egret Egret
13 1.80 Egret Egret Egret Egret Egret

Figure 8 shows that the Cost/Safety weight ratio must be less than 1.2 to affect the
choice. If the decision maker regards 1.2 as "far away" from 1.5, then the Egret choice is
appropriate. Otherwise, the decision maker should think more carefully about the original
assessments before making a choice based on this analysis. The assessment of the
Cost/Lifetime weight ratio is not as critical, because any value between 4 and 6 yields the
same choice.

Swing Weight Assessment


Compared to weight ratio assessment, the swing weight method requires assessments that
are similar to directly assigning an overall utility to an alternative. However, the
hypothetical alternatives requiring assessment in this method are constructed so that it
should be easier for the decision maker to assign overall utilities to them instead of to the
actual alternatives.
The swing weight method involves four steps as shown in Figure 9.
1) Develop the hypothetical alternatives. The number of hypothetical alternatives
equals the number of attributes plus one. The benchmark alternative in column J
is worst for all attributes. Each other hypothetical alternative, shown in columns
K, L, and M, has one attribute at best and all others at worst.
2) Rank the hypothetical alternatives, as shown in row 7. This is an intermediate
step that facilitates assigning overall utilities.
3) Assign overall utility scores reflecting overall satisfaction for the hypothetical
alternatives. The benchmark worst case has score zero, and the first-ranked
alternative has score 100. Then assign level-of-satisfaction scores to the
intermediate alternatives, as shown in cells L9 and M9.
3.3 Sensitivity Analysis Methods 45

4) Sum the scores, as shown in cell N9. In the additive utility function, the weight
for each attribute equals the score divided by sum of the scores. (The algebra
solution, not shown here, is based on the special zero and one individual utility
values of the hypothetical alternatives.) Formulas are shown in Figure 10.

Figure 9 Hypothetical Alternatives and Weights for Swing Weight Assessment


I J K L M N
1 Hypothetical Alternatives
2 Attribute Worst Best Cost Best Lifetime Best Safety
3 Cost $20 $10 $20 $20
4 Lifetime 6 6 10 6
5 Safety Low Low Low High
6
7 Rank 4 1 3 2
8 Total
9 Overall Score 0 100 20 70 190
10
11 Weight 0.000 0.526 0.105 0.368
12
13 Decision Maker's Inputs Underlined

Figure 10 Formulas for Swing Weight Assessment


I J K L M N
1 Hypothetical Alternatives
2 Attribute Worst Best Cost Best Lifetime Best Safety
3 Cost 20 10 20 20
4 Lifetime 6 6 10 6
5 Safety Low Low Low High
6
7 Rank 4 1 3 2
8 Total
9 Overall Score 0 100 20 70 =SUM(J9:M9)
10
11 Weight =J9/$N$9 =K9/$N$9 =L9/$N$9 =M9/$N$9
12
13 Decision Maker's Inputs Underlined

The individual utility values are in a column, and the weights are in a row. The
SUMPRODUCT function requires that the two arrays for its arguments have the same
orientation, so the TRANSPOSE function converts the weights into a column format, as
shown in Figure 11. The function in B7 must be array-entered; after typing the function,
hold down Control and Shift while you press Enter.
46 Chapter 3 Multiattribute Utility

Figure 11 Formulas for Swing Weight Choice


A B
1 Non-Dominated Alternatives
2 Attribute Alta
3 Cost 0
4 Lifetime 1
5 Safety 1
6
7 Overall =SUMPRODUCT(B3:B5,TRANSPOSE($K$11:$M$11))
8
9 Choice =INDEX(B2:G2,0,MATCH(MAX(B7:G7),B7:G7,0))

Figure 12 Swing Weight Choice


A B C D E F G
1 Non-Dominated Alternatives
2 Attribute Alta Bulldog Cruiser Delta Egret Fleet
3 Cost 0.000 0.200 0.400 0.600 0.800 1.000
4 Lifetime 1.000 1.000 0.500 0.500 0.000 0.000
5 Safety 1.000 0.667 1.000 0.667 0.667 0.000
6
7 Overall 0.474 0.456 0.632 0.614 0.667 0.526
8
9 Choice Egret

Swing Weight Sensitivity Analysis


The decision maker specified tradeoffs using overall scores for the hypothetical
alternatives, so it is appropriate to see whether the choice is sensitive to changes in those
assessed values. Figure 13 shows the sensitivity for the Best-Lifetime score that was
specified as 20 relative to the worst-case benchmark and the highest-ranked Best-Cost
hypothetical alternative. The Best-Lifetime alternative is still ranked 3 as long as its score
is between 0 and 70.
To improve the appearance of the sensitivity analysis tables in Figure 13, the output
formula cells, R13 and T13, have a three-semicolon custom format.
3.3 Sensitivity Analysis Methods 47

Figure 13 Sensitivity Analysis of Swing Weight Best-Lifetime Score


P Q R S T U
1 Single-Factor Sensitivity Analysis
2
3 Best Lifetime Overall Score
4 Base case Score is 20
5 Rank 3 as long as Score is between 0 and 70
6
7 Output Formula in cell R13: =B9
8 Data Table Column Input Cell: M9
9
10 Detail
11 Best Lifetime Best Lifetime
12 Overall Score Choice Overall Score Choice
13
14 0 Egret 30 Egret
15 5 Egret 31 Egret
16 10 Egret 32 Egret
17 15 Egret 33 Egret
18 Base Case 20 Egret 34 Cruiser
19 25 Egret 35 Cruiser
20 30 Egret
21 35 Cruiser
22 40 Cruiser
23 45 Cruiser
24 50 Cruiser
25 55 Cruiser
26 60 Cruiser
27 65 Cruiser
28 70 Cruiser

The results in the left table Figure 13, cells Q13:R28, indicate that the Best-Lifetime
score must be greater than 30 to affect the choice. A refined data table in cells T13:U19
shows that the score must be greater than 33 before the choice changes from Egret to
Cruiser. If the decision maker regards 33 as "far away" from 20, then the Egret choice is
appropriate.
Figure 14 shows a similar sensitivity analysis for the Best-Safety score. The assessed
score of 70 must be greater than 89 to affect the choice.
48 Chapter 3 Multiattribute Utility

Figure 14 Sensitivity Analysis of Swing Weight Best-Safety Score


W X Y Z AA AB
1 Single-Factor Sensitivity Analysis
2
3 Best Safety Overall Score
4 Base case Score is 70
5 Rank 2 as long as Score is between 20 and 100
6
7 Output Formula in cell Y13 and cell AB13: =B9
8 Data Table Column Input Cell: N9
9
10 Detail
11 Best Safety Best Safety
12 Overall Score Choice Overall Score Choice
13
14 20 Fleet 85 Egret
15 25 Fleet 86 Egret
16 30 Fleet 87 Egret
17 35 Egret 88 Egret
18 40 Egret 89 Egret
19 45 Egret 90 Cruiser
20 50 Egret
21 55 Egret
22 60 Egret
23 65 Egret
24 Base Case 70 Egret
25 75 Egret
26 80 Egret
27 85 Egret
28 90 Cruiser
29 95 Cruiser
30 100 Cruiser

To construct a two-way data table for sensitivity analysis of the swing weight
assessments as shown in Figure 15, enter a set of values in a row, R4:V4, and another set
of values in a column, Q5:Q13. In the top left cell of the data table, Q4, enter a formula
for determining the data table's output values, =B9. (To improve the appearance of the
table, cell Q4 is formatted with a custom three-semicolon format so that the formula
result is not displayed.) Select Q4:V13. Choose Data | Table. In the Data Table dialog
box, specify L9 as the Row Input Cell and M9 as the Column Input Cell. Click OK.
3.3 Sensitivity Analysis Methods 49

Figure 15 Sensitivity Analysis of Both Swing Weight Scores


P Q R S T U V
1 Two-Way Sensitivity Analysis
2
3 Best Lifetime Overall Score
4 10 15 20 25 30
5 Best 50 Egret Egret Egret Egret Egret
6 Safety 55 Egret Egret Egret Egret Egret
7 Overall 60 Egret Egret Egret Egret Egret
8 Score 65 Egret Egret Egret Egret Egret
9 70 Egret Egret Egret Egret Egret
10 75 Egret Egret Egret Egret Cruiser
11 80 Egret Egret Egret Egret Cruiser
12 85 Egret Egret Egret Cruiser Cruiser
13 90 Egret Egret Cruiser Cruiser Cruiser

The table shows that the choice changes from Egret to Cruiser if the combination of
assessments is changed from 20 & 70 to 30 & 75. This table could be refined to examine
the exact threshold values.

Direct Weight Assessment and Sensitivity Analysis


In some situations the decision maker may be able to assign tradeoff weights directly.
Figure 16 shows results using the formulas shown in Figure 17.

Figure 16 Direct Weight Assessment


A B C D E F G H I J
1 Non-Dominated Alternatives Weights
2 Attribute Alta Bulldog Cruiser Delta Egret Fleet Cost 0.500
3 Cost 0.000 0.200 0.400 0.600 0.800 1.000 Lifetime 0.100
4 Lifetime 1.000 1.000 0.500 0.500 0.000 0.000 Safety 0.400
5 Safety 1.000 0.667 1.000 0.667 0.667 0.000
6
7 Overall 0.500 0.467 0.650 0.617 0.667 0.500
8
9 Choice Egret

The formula in cell B9 includes an IF function to verify that each weight is between 0
and 1, inclusive, and that the sum of the weights equals one. If not, the formula returns
empty text. This formula must be array-entered; after typing the function, hold down
Control and Shift while you press Enter.
50 Chapter 3 Multiattribute Utility

Figure 17 Formulas for Direct Weight Assessment


A B H I J
1 Non-Dominated Alternatives Weights
2 Attribute Alta Cost 0.5
3 Cost 0 Lifetime 0.1
4 Lifetime 1 Safety =1-J3-J2
5 Safety 1
6
7 Overall =SUMPRODUCT(B3:B5,$J$2:$J$4)
8
9 Choice =IF(AND(SUM(J2:J4)<=1,J2:J4>=0),INDEX(B2:G2,0,MATCH(MAX(B7:G7),B7:G7,0)),"")

Figure 18 shows a two-way table for sensitivity analysis of the weights. Cell R5
corresponds to the approximate base case assessments in the weight ratio and swing
weight methods.

Figure 18 Sensitivity Analysis of Direct Weight Assessment


L M N O P Q R S T U V
1 Two-Factor Sensitivity Analysis
2
3 Cost Weight
4 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
5 Lifetime 0.1 Alta Cruiser Cruiser Cruiser Egret Egret Fleet Fleet Fleet
6 Weight 0.2 Alta Alta Cruiser Cruiser Cruiser Egret Fleet Fleet
7 0.3 Alta Alta Alta Cruiser Delta Fleet Fleet
8 0.4 Alta Alta Alta Bulldog Bulldog Fleet
9 0.5 Alta Alta Alta Bulldog Bulldog
10 0.6 Alta Alta Bulldog Bulldog
11 0.7 Alta Bulldog Bulldog
12 0.8 Alta Bulldog
13 0.9 Bulldog

Figure 19 is a more detailed view. The choice formula in cell B9 is modified by placing
the INDEX function inside the LEFT function so that only the first letter of the
alternative's name is returned.
3.3 Sensitivity Analysis Methods 51

Figure 19 Detailed Sensitivity Analysis of Direct Weight Assessment


L M N O P Q R S T U V W X Y Z AA AB AC AD AE AF AG AH
1 Two-Factor Sensitivity Analysis
2
3 Cost Weight
4 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00
5 Lifetime 0.00 A C C C C C C C C C E E E E E E F F F F F
6 Weight 0.05 A A C C C C C C C C E E E E E F F F F F
7 0.10 A A A C C C C C C C E E E E F F F F F
8 0.15 A A A A C C C C C C E E E E F F F F
9 0.20 A A A A A A C C C C C E E F F F F
10 0.25 A A A A A A A C C C D D F F F F
11 0.30 A A A A A A A A C D D D F F F
12 0.35 A A A A A A A A A D D D F F
13 0.40 A A A A A A A A B B B D F
14 0.45 A A A A A A A B B B B B
15 0.50 A A A A A A A B B B B
16 0.55 A A A A A A B B B B
17 0.60 A A A A A A B B B
18 0.65 A A A A A B B B
19 0.70 A A A A B B B
20 0.75 A A A A B B
21 0.80 A A A B B
22 0.85 A A B B
23 0.90 A A B
24 0.95 A B
25 1.00 A

The results in Figure 19 show that all alternatives in this data set are candidates
depending on the tradeoffs specified by the decision maker. In general, moving left to
right, if more weight is given to cost, a less expensive alternative is chosen.

Summary
This paper considered three methods for assessing tradeoffs in the additive utility
function. For each method sensitivity analysis is useful for gaining insight into which
tradeoff assumptions are critical. Kirkwood [2] includes Excel VBA methods for
sensitivity analysis of individual utility functions in addition to weights.

Sensitivity Analysis Examples References


[1] Clemen, R.T. Making Hard Decisions: An Introduction to Decision Analysis,
2nd Edition. Duxbury Press, 1996.
[2] Kirkwood, C.W. Strategic Decision Making: Multiobjective Decision Analysis
with Spreadsheets. Duxbury Press, 1997.
[3] Hammond, J.S., Keeney, R.L., and Raiffa, H. Smart Choices: A Practical Guide
to Making Better Decisions. Harvard Business School Press, 1999.
52 Chapter 3 Multiattribute Utility

Screenshots from Excel to Word


To copy Excel displays for the figures in this paper, choose File | Page Setup | Sheet |
Gridlines and File | Page Setup | Sheet | Row And Column Headings. Select the cell
range, hold down the Shift key, and in Excel's main menu choose Edit | Copy Picture | As
Shown When Printed. In Word, position the pointer in an empty paragraph and choose
Edit | Paste.
Part 2 Monte Carlo Simulation

Part 2 discusses Monte Carlo simulation which is useful for incorporating uncertainty
into spreadsheet what-if models.
Separate chapters describe simulation using standard Excel features and simulation using
the RiskSim simulation add-in for Excel.
Additional topics in this part include multi-period evaluation models, inventory decisions,
and queuing models.
54

This page is intentionally mostly blank.


Introduction to Monte
Carlo Simulation
4
4.1 INTRODUCTION

Figure 4.1 Conceptual Simulation as a Sample of Tree Endpoints

Net
$ Cash
Flow

Unit Fixed Units Unit


Price Costs Sold Variable
Cost
3 values ~400 values ~500 values ~600,000
values
56 Chapter 4 Introduction to Monte Carlo Simulation

Figure 4.2 Probability Distributions for Sampling Tree Endpoints

Net
$ Cash
Flow

Unit Fixed Units Unit


Price Costs Sold Variable
Cost

Discrete Normal Uniform

Figure 4.3 Conceptual Simulation as Influence Chart with Repeated What-Ifs

Net
Cash
Flow

Unit Fixed Units Unit


Price Costs Sold Variable
Cost

$29

Constant Discrete Normal Uniform


Uncertain Quantities
5
5.1 DISCRETE UNCERTAIN QUANTITIES
Discrete UQ: a few, distinct values
Assign probability mass to each value (probability mass function).
Contrast discrete UQs with continuous UQs. Continuous UQs have an infinite number of
values or so many distinct values that it is difficult to assign probability to each value.
Instead, for a continuous UQ we assign probability only to ranges of values.

5.2 CONTINUOUS UNCERTAIN QUANTITIES


Probability Density Functions and Cumulative Probability for Continuous Uncertain
Quantities
The total area under a probability density function equals one.
A portion of the area under a density function is a probability.
The height of a density function is not a probability.
The simplest probability density function is the uniform density function.

Case A: Uniform Density


The number of units of a new product that will be sold is an uncertain quantity.
What is the minimum quantity? “1000 units”
What is the maximum quantity? “5000 units”
Are any values in the range between 1000 and 5000 more likely than others?
“No”
Represent the uncertainty using a uniform density function.
58 Chapter 5 Uncertain Quantities

Technical point: For a continuous UQ, P(X=x) = 0.


For a continuous UQ, probability is non-zero only for a range of values.
For convenience in computation and assessment, we may use a continuous UQ to
approximate a discrete UQ, and vice versa.
In Figure 1, the range of values is 5000 – 1000 = 4000, which is the width of the total
area under the uniform (rectangular) density function. The area of a rectangle is Width *
Height = Area, and the area under the uniform density function in Figure 1 must equal 1.
So, Height = Area / Base. Here the Base is 5000 – 1000 = 4000 units. Therefore, Height
= 1/4000 = 0.00025.

Figure 5.1 Uniform Density Function

Uniform Density Function


Probability Density, f(x)

0.00025

0
0 1000 2000 3000 4000 5000 6000
Unit Sales, x
5.2 Continuous Uncertain Quantities 59

Figure 5.2 Figure 2

Cumulative Probability for Uniform Density

1.00
Cumulative Probability,

0.75
P(X<=x)

0.50

0.25

0.00
0 1000 2000 3000 4000 5000 6000
Unit Sales, x

Both probability mass functions (for discrete UQs) and probability density functions (for
continuous UQs) have corresponding cumulative probability functions.
It is important to understand the relationship between a density function and its
cumulative probability function.
Cumulative probability can be expressed in four ways:
P(X<=x) probability that UQ X is inclusive left -tail
less than or equal to x
P(X<x) probability that UQ X is exclusive left -tail
strictly less than x
P(X>=x) probability that UQ X is inclusive right -tail
greater than or equal to x
P(X>x) probability that UQ X is exclusive right -tail
strictly greater than x
For continuous UQs the cumulative probability is the same for inclusive and exclusive.
P(X<=x) is the most common type.
60 Chapter 5 Uncertain Quantities

Figure 2 is the cumulative probability function corresponding to the uniform density


function shown in Figure 1.
What is the probability that sales will be between 3,500 and 4,000 units?
P(3500<=X<=4000) = 0.125
P(3500<=X<=4000) = P(X<=4000) – P(X<=3500) = 0.750 – 0.625 = 0.125
Mathematical observation: The uniform density function is a constant; the corresponding
cumulative function (the integral of the constant function) is linear.

Case B: Ramp Density


The number of units of a new product that will be sold is an uncertain quantity.
What is the minimum quantity? “1000 units”
What is the maximum quantity? “5000 units”
Are any values in the range between 1000 and 5000 more likely than others?
“Yes, values close to 5000 are much more likely than values close to 1000.”
Represent the uncertainty using a ramp density function.
The area of a triangle is Base * Height / 2, and the area under the ramp density function
in Figure 3 must equal 1. So, Height = 2 / Base. Here, the Base is 5000 – 1000 = 4000
units. Therefore, Height = 2 / 4000 = 0.0005.
5.2 Continuous Uncertain Quantities 61

Figure 5.3 Figure 3

Ramp Density Function


Probability Density, f(x)

0.0005

0
0 1000 2000 3000 4000 5000 6000
Unit Sales, x

Figure 5.4 Figure 4

Cumulative Probability for Ramp Density


Cumulative Probability, P(X<=x)

1.00

0.75

0.50

0.25

0.00
0 1000 2000 3000 4000 5000 6000
Unit Sales, x
62 Chapter 5 Uncertain Quantities

An important observation is that flatter portions of a cumulative probability function


correspond to ranges with low probability. Steeper portions of a cumulative probability
function correspond to ranges with high probability.
What is the probability that sales will be between 3,500 and 4,000 units?
P(3500<=X<=4000) = 0.171875
P(3500<=X<=4000) = P(X<=4000) – P(X<=3500) = 0.562500 – 0.390625 = 0.171875
The ramp density may not be appropriate for describing uncertainty in many situations,
but it is an important building block for the extremely useful triangular density function.
Mathematical observation: The ramp density function is linear; the corresponding
cumulative function (the integral of the linear function) is quadratic.

Case C: Triangular Density


The number of units of a new product that will be sold is an uncertain quantity.
What is the minimum quantity? “1000 units”
What is the maximum quantity? “5000 units”
Are any values in the range between 1000 and 5000 more likely than others?
“Yes, values close to 4000 are more likely.”
Represent the uncertainty using a triangular density function.
The area of a triangle is Base * Height / 2, and the area under the triangular density
function in Figure 5 must equal 1. So, Height = 2 / Base. Here, the Base is 5000 – 1000
= 4000 units. Thus, Height = 2 / 4000 = 0.0005.
5.2 Continuous Uncertain Quantities 63

Figure 5.5 Figure 5

Triangular Density Function


Probability Density, f(x)

0.0005

0
0 1000 2000 3000 4000 5000 6000
Unit Sales, x

Figure 5.6 Figure 6

Cumulative Probability for Triangular Density


Cumulative Probability,

1.00

0.75
P(X<=x)

0.50

0.25

0.00
0 1000 2000 3000 4000 5000 6000
Unit Sales, x
64 Chapter 5 Uncertain Quantities

Again, an important observation is that flatter portions of a cumulative probability


function correspond to ranges with low probability (the range close to 1000 and the range
close to 5000 in Figure 6). Steeper portions of a cumulative probability function
correspond to ranges with high probability (the range close to 4000).
What is the probability that sales will be between 3,500 and 4,000 units?
P(3500<=X<=4000) = 0.229167
P(3500<=X<=4000) = P(X<=4000) – P(X<=3500) = 0.750000 – 0.520833 = 0.229167
The triangular density function is extremely useful for describing uncertainty in many
situations. It requires only three inputs: minimum, mode (most likely value), and
maximum.
Mathematical observation: The triangular density function has two linear segments, i.e.,
piecewise linear; the corresponding cumulative function (the integral of each linear
function) is two quadratic segments, i.e., piecewise quadratic.
Simulation Without
Add-Ins
6
6.1 SIMULATION USING EXCEL FUNCTIONS
Figure 6.1 Display
A B C D E F G
1 Software Decision Analysis
2 RAND()
3 Unit Price $29
4 Units Sold 661 0.3502 Normal Mean = 700, StDev = 100
5 Unit Variable Cost $10.92 0.9832 Uniform Min = $6, Max = $11
6 Fixed Costs $12,000 0.7364 Discrete Value Probability Cumulative
7 $10,000 0.25 0.25
8 Net Cash Flow -$47 $12,000 0.50 0.75
9 $15,000 0.25 1.00

Figure 6.2 Formulas


A B C D E F G
1 Software Decision Analysis
2 RAND()
3 Unit Price 29
4 Units Sold =INT(NORMINV(C4,700,100)) =RAND() Normal Mean = 700, StDev = 100
5 Unit Variable Cost =6+5*C5 =RAND() Uniform Min = $6, Max = $11
6 Fixed Costs =IF(C6<0.25,10000,IF(C6<0.75,12000,15000)) =RAND() Discrete Value Probability Cumulative
7 10000 0.25 0.25
8 Net Cash Flow =B4*(B3-B5)-B6 12000 0.5 0.75
9 15000 0.25 1
66

This page is intentionally mostly blank.


Monte Carlo Simulation
Using RiskSim
7
7.1 USING RISKSIM FUNCTIONS
RiskSim is a Monte Carlo Simulation add-in for Microsoft Excel (Excel 97 and later
versions) for Windows and Macintosh.
RiskSim provides random number generator functions as inputs for your model,
automates Monte Carlo simulation, and creates charts. Your spreadsheet model may
include various uncontrollable uncertainties as input assumptions (e.g., demand for a new
product, uncertain variable cost of production, competitor reaction), and you can use
simulation to determine the uncertainty associated with the model's output (e.g., annual
profit). RiskSim automates the simulation by trying hundreds of what-ifs consistent with
your assessment of the uncertainties.
To use RiskSim, you
(1) create a spreadsheet model
(2) optionally use SensIt to identify critical inputs
(3) enter one of RiskSim's eleven random number generator functions in each input cell of your model
(4) choose Tools | Risk Simulation from Excel's menu
(5) specify the model output cell and the number of what-if trials
(6) interpret RiskSim's histogram and cumulative distribution charts.
RiskSim facilitates Monte Carlo simulation by providing:
Eleven random number generator functions
Ability to set the seed for random number generation
Automatic repeated sampling for simulation
Frequency distribution of simulation results
Histogram and cumulative distribution charts
68 Chapter 7 Monte Carlo Simulation Using RiskSim

7.2 USING RISKSIM FUNCTIONS


RiskSim adds nine random number generator functions to Excel. You can use these
functions as inputs to your model by typing in a worksheet cell or by using the Function
Wizard. From the Insert menu choose Function, or click the Function Wizard button.
RiskSim's functions are listed in a User Defined category. The nine functions are:
RANDBINOMIAL(trials,probability_s)
RANDBIVARNORMAL(mean1,stdev1,mean2,stdev2,correl12)
RANDCUMULATIVE(value_cumulative_table)
RANDDISCRETE(value_discrete_table)
RANDEXPONENTIAL(lambda)
RANDINTEGER(bottom,top)
RANDNORMAL(mean,standard_dev)
RANDPOISSON(mean)
RANDSAMPLE(population)
RANDTRIANGULAR(minimum,most_likely,maximum)
RANDUNIFORM(minimum,maximum)
RiskSim's RAND... functions include extensive error checking of arguments. After
verifying that the functions are working properly, you may want to substitute RiskSim's
FAST... functions which have minimal error checking and therefore run faster. From the
Edit menu choose Replace; in the Replace dialog box, type =RAND in the "Find What"
edit box, type =FAST in the "Replace with" edit box, and click the Replace All button.

7.3 UPDATING LINKS TO RISKSIM FUNCTIONS


When you insert a RiskSim random number generator function in a worksheet cell, the
function is linked to the disk location of the RiskSim xla file you are currently using.
During the current Excel session, the formula bar shows only the name of the RiskSim
function. But when you save and close the workbook, Excel saves the complete path to
the disk location of RiskSim function. For example, after closing and reopening the
workbook, the formula bar might show C:\MyAddIns\risk231p.xla\RandNormal(100,
10). This is standard behavior for Excel user defined functions like the ones contained in
the RiskSim xla file.
When you open the workbook, Excel looks for the RiskSim xla file using the saved path.
If Excel cannot find the RiskSim xla file at the saved path location (e.g., if you deleted
7.3 Updating Links To RiskSim Functions 69

the RiskSim xla file from the C:\MyAddIns folder or if you opened the workbook on
another computer where the RiskSim xla file is not located at the same path), Excel
displays a dialog box like the one shown below.

Figure 7.3 Excel 2003 Warning To Update Links

If you see this dialog box or a similar warning when you open an Excel file, choose the
"Don't Update" option. The workbook will be opened, but any cell containing a reference
to a RiskSim function will display the #NAME? or similar error code.
To update the links after the workbook is open, be sure that a RiskSim xla file is open.
Then choose Edit | Links to see the dialog box shown below. (In this example the
workbook originally used functions from the RiskSim xla file located at
C:\middleton\risksim\risksim.xla.)

Figure 7.4 Excel 2003 Edit Links Dialog Box


70 Chapter 7 Monte Carlo Simulation Using RiskSim

To update the links, click the Change Source button. A file browser window will open,
where you can navigate to the RiskSim xla file that is open. After you select the file using
the file browser, click OK. Back in the Edit Links dialog box, click the Close button.
In Excel 2003 the Edit Links dialog box has a Startup Prompt button. To avoid possible
problems when Excel tries to automatically update links while a file is being opened, we
recommend the default "Let users choose to display the alert or not."

Figure 7.5 Excel 2003 Startup Prompt Dialog Box

7.4 MONTE CARLO SIMULATION


After specifying random number generator functions as inputs to your model, from the
Tools choose Risk Simulation | One Output.

Figure 7.6 RiskSim Dialog Box


7.5 Random Number Seed 71

Optionally, select the "Output Label Cell" edit box, and point or type a reference to a cell
containing the name of the model output (for example, a cell whose contents is the text
label "Net Profit").
Select the "Output Formula Cell" edit box, and point to a single cell on your worksheet or
type a cell reference. The output cell of your model must contain a formula that depends,
usually indirectly, on the model inputs determined by the random number generator
functions.
Select the "Random Number Seed" edit box, and type a number between zero and one. (If
you want to change the seed without performing a simulation, enter zero in the "Number
of iterations" edit box.)
Select the "Number Of Trials" edit box, and type an integer value (for example, 100 or
500). This value, sometimes called the sample size or number of iterations, specifies the
number of times the worksheet will be recalculated to determine output values of your
model.

7.5 RANDOM NUMBER SEED


The "Random Number Seed" edit box on the RiskSim dialog box allows you to set the
seed for RiskSim's random number generator functions. The seed must be an integer in
the range 1 through 2,147,483,647. RiskSim's random number generator functions
depend on RiskSim's own uniform random number function that is completely
independent of Excel's built-in RAND().
Random numbers generated by the computer are actually pseudo-random. The numbers
appear to be random, and they pass various statistical tests for randomness. But they are
actually calculated by an algorithm where each random number depends on the previous
random number. Such an algorithm generates a repeatable sequence. The seed specifies
where the algorithm starts in the sequence.
A Monte Carlo simulation model usually has uncontrollable inputs (uncertain quantities
using random number generator functions), controllable inputs (decision variables that
have fixed values for a particular set of simulation iterations), and an output variable (a
performance measure or operating characteristic of the system).
For example, a simple queuing system model may have an uncertain arrival pattern, a
controllable number of servers, and total cost (waiting time plus server cost) as output. To
evaluate a different number of servers, you would specify the same seed before
generating the uncertain arrivals. Then the variation in total cost should depend on the
different number of servers, not on the particular sequence of random numbers that
generates the arrivals.
72 Chapter 7 Monte Carlo Simulation Using RiskSim

7.6 ONE-OUTPUT EXAMPLE


In this example the decision maker has described his subjective uncertainty using normal,
triangular, and discrete probability distributions.

Figure 7.7 One-Output Example Model Display


A B C D E F G H
1 Software Decision Analysis
2
3 Unit Price $29 Price is controllable and constant.
4 Units Sold 739 Normal Mean = 700, StDev = 100
5 Unit Variable Cost $8.05 Triangular Min = $6, Mode = $8, Max = $11
6 Fixed Costs $12,000 Discrete Value Probability
7 $10,000 0.25
8 Net Cash Flow $3,485 $12,000 0.50
9 $15,000 0.25

Figure 7.8 One-Output Example Model Formulas


A B
1 Software Decision Analysis
2
3 Unit Price $29
4 Units Sold =INT(RANDNORMAL(700,100))
5 Unit Variable Cost =RANDTRIANGULAR(6,8,11)
6 Fixed Costs =RANDDISCRETE(E7:F9)
7
8 Net Cash Flow =B4*(B3-B5)-B6

Figure 7.9 RiskSim Dialog Box for One-Output Example


7.7 RiskSim Output for One-Output Example 73

7.7 RISKSIM OUTPUT FOR ONE-OUTPUT EXAMPLE


When you click the Simulate button, RiskSim creates a new worksheet in your Excel
workbook named "RiskSim Summary 1." A summary of your inputs and the output is
shown in cells L1:R9 with the accompanying histogram and cumulative distribution
charts.
74 Chapter 7 Monte Carlo Simulation Using RiskSim

Figure 7.10 RiskSim Summary Output for One-Output Example


L M N O P Q R
1 RiskSim 2.31 Pro Mean $2,335
2 Date (current date) St. Dev. $2,800
3 Time (current time) Mean St. Error $89
4 Workbook risksamp.xls Minimum -$6,288
5 Worksheet Simulation First Quartile $523
6 Output Cell $B$8 Median $2,470
7 Output Label Net Cash Flow Third Quartile $4,157
8 Seed 1 Maximum $12,838
9 Trials 1000 Skewness -0.1133
10
11
RiskSim 2.31 Pro - Histogram
12
13
14 180
15 160
16 140
17
120
18
Frequency

100
19
20 80
21 60
22 40
23
20
24
25 0
26 -$8,000 -$6,000 -$4,000 -$2,000 $0 $2,000 $4,000 $6,000 $8,000 $10,000 $12,000 $14,000
27 Net Cash Flow
28
29
30
RiskSim 2.31 Pro - Cumulative Chart
31
32
33 1.0
34 0.9
35 0.8
Cumulative Probability

36 0.7
37 0.6
38 0.5
39 0.4
40
0.3
41
0.2
42
0.1
43
44 0.0
45 -$8,000 -$6,000 -$4,000 -$2,000 $0 $2,000 $4,000 $6,000 $8,000 $10,000 $12,000 $14,000
46 Net Cash Flow
47
48

The histogram is based on the frequency distribution in columns I:J. The cumulative
distribution is based on the sorted output values in column C and the cumulative
probabilities in column D.
7.8 Customizing RiskSim Charts 75

Figure 7.11 RiskSim Numerical Output for One-Output Example


A B C D E F G H I J
1 Trial Net Cash Flow Sorted Cumulative Percent Percentile Upper Limit Frequency
2 1 $1,594 -$6,288 0.0005 0% -$6,288 -$8,000 0
3 2 $1,593 -$6,239 0.0015 5% -$2,324 -$7,000 0
4 3 $1,533 -$5,635 0.0025 10% -$1,465 -$6,000 2
5 4 $7,480 -$5,213 0.0035 15% -$699 -$5,000 2
6 5 $5,968 -$4,831 0.0045 20% $62 -$4,000 11
7 6 $1,862 -$4,601 0.0055 25% $523 -$3,000 18
8 7 -$1,677 -$4,588 0.0065 30% $1,009 -$2,000 34
9 8 $2,727 -$4,487 0.0075 35% $1,336 -$1,000 54
10 9 $6,167 -$4,420 0.0085 40% $1,625 $0 77
11 10 $4,740 -$4,336 0.0095 45% $2,035 $1,000 101
12 11 $1,783 -$4,298 0.0105 50% $2,470 $2,000 146
13 12 $904 -$4,285 0.0115 55% $2,897 $3,000 126
14 13 $1,518 -$4,243 0.0125 60% $3,216 $4,000 155
15 14 $1,596 -$4,116 0.0135 65% $3,544 $5,000 110
16 15 $1,536 -$4,113 0.0145 70% $3,805 $6,000 73
17 16 -$701 -$3,954 0.0155 75% $4,157 $7,000 52
18 17 -$414 -$3,951 0.0165 80% $4,615 $8,000 21
19 18 $783 -$3,906 0.0175 85% $5,168 $9,000 8
20 19 $5,087 -$3,849 0.0185 90% $5,777 $10,000 9
21 20 $2,804 -$3,793 0.0195 95% $6,680 $11,000 0
22 21 $1,869 -$3,757 0.0205 100% $12,838 $12,000 0
23 22 $1,402 -$3,719 0.0215 $13,000 1
24 23 $2,120 -$3,608 0.0225 $14,000 0
25 24 $7,783 -$3,591 0.0235 0
26 25 $704 -$3,548 0.0245
27 26 $5,471 -$3,485 0.0255
28 27 $4,743 -$3,403 0.0265

The cumulative probabilities start at 1/(2*N), where N is the number of trials, and
increase by 1/N. The rationale is that the lowest ranked output value of the sampled
values is an estimate of the population's values in the range from 0 to 1/N, and the lowest
ranked value is associated with the median of that range.
Column B contains the original sampled output values.
Columns F:G show percentiles based on Excel's PERCENTILE worksheet function.
Refer to Excel's online help for the interpolation method used by the PERCENTILE
function.
The summary measures in columns Q:R are also based on Excel worksheet functions:
AVERAGE, STDEV, QUARTILE, and SKEW.

7.8 CUSTOMIZING RISKSIM CHARTS


If the labels on the horizontal axis are numbers with many digits, some of the labels may
wrap around so that some of the digits display below the others. One way to remedy this
anomaly is to widen the chart (click just inside the outer border of the chart so that eight
chart handles are shown and then drag the middle chart handle on the left or right to
76 Chapter 7 Monte Carlo Simulation Using RiskSim

widen the chart). Another way is to select the horizontal axis (click between the labels on
the horizontal axis so that "Value (X) axis" appears in the name box in the upper left of
Excel) and change to a smaller font size using the Font Size drop-down edit box on the
the Formatting tool bar.
The histogram chart is a combination chart using a column chart type for the vertical bars
and an XY (Scatter) chart type for the horizontal axis. The two chart types align properly
as long as the horizontal axis retains the same minimum and maximum values.
For example, if you want more spacing between the dollar labels on the horizontal axis,
select the horizontal axis (so that "Value (X) axis" appears in the name box in the upper
left of Excel), choose Format | Selected Axis | Scale, and change the "Major unit" from
2000 to 4000. Do not change the Minimum = –8000 or the Maximum = 14000. The
histogram appears as shown below.

Figure 7.12 Original Histogram With Modified Horizontal Axis Major Unit

RiskSim 2.31 Pro - Histogram

160

140

120

100
Frequency

80

60

40

20

0
-$8,000 -$4,000 $0 $4,000 $8,000 $12,000
Net Cash Flow

The cumulative chart is a standard XY (Scatter) chart type, so you can change the major
unit as described above, but you can also change the minimum and maximum without
affecting the integrity of the chart.
Another way to obtain more spacing on the horizontal axis of the histogram or
cumulative chart is to use a custom format. For example, if you want to show values in
thousands instead of the original units, select the horizontal axis (click between the labels
on the horizontal axis so that "Value (X) axis" appears in the name box in the upper left
of Excel), choose Format | Selected Axis | Number | Custom, and enter a comma at the
end of the current format shown in the "Type:" edit box. After changing the original
7.9 Random Number Generator Functions 77

format "$#,##0" to "$#,##0," and modifying the horizontal axis title, the cumulative chart
appears as shown below.

Figure 7.13 Original Cumulative Chart With Horizontal Axis Custom Format

RiskSim 2.31 Pro - Cumulative Chart

1.0
0.9
0.8
Cumulative Probability

0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-$8 -$6 -$4 -$2 $0 $2 $4 $6 $8 $10 $12 $14
Net Cash Flow, in thousands of dollars

7.9 RANDOM NUMBER GENERATOR FUNCTIONS


RandBinomial
Returns a random value from a binomial distribution. The binomial distribution can
model a process with a fixed number of trials where the outcome of each trial is a success
or failure, the trials are independent, and the probability of success is constant.
RANDBINOMIAL counts the total number of successes for the specified number of
trials. If n is the number of trials, the possible values for RANDBINOMIAL are the non-
negative integers 0,1,...,n.
RANDBINOMIAL Syntax: RANDBINOMIAL(trials,probability_s)
Trials (often denoted n) is the number of independent trials.
Probability_s (often denoted p) is the probability of success on each trial.
RANDBINOMIAL Remarks
Returns #N/A if there are too few or too many arguments.
Returns #NAME! if an argument is text and the name is undefined.
78 Chapter 7 Monte Carlo Simulation Using RiskSim

Returns #NUM! if trials is non-integer or less than one, or probability_s is less than zero
or more than one.
Returns #VALUE! if an argument is a defined name of a cell and the cell is blank or
contains text.
RANDBINOMIAL Example
A salesperson makes ten unsolicited calls per day, where the probability of making a sale
on each call is 30 percent. The uncertain total number of sales in one day is
=RANDBINOMIAL(10,0.3)
RANDBINOMIAL Related Function
FASTBINOMIAL: Same as RANDBINOMIAL without any error checking of the
arguments.
CRITBINOM(trials,probability_s,RAND()): Excel's inverse of the cumulative binomial,
or CRITBINOM(trials,probability_s,RANDUNIFORM(0,1)) to use the RiskSim Seed
feature.

RandBiVarNormal
Returns two random values from a bivariate normal distribution with a specified
correlation.
To use this random number generator function, select two adjacent cells on the
worksheet. Type =RANDBIVARNORMAL followed by numerical values for the five
arguments or references to cells containing the values, separated by commas, enclosed in
starting and ending parentheses. After typing the ending parentheses, do not press Enter.
Instead, hold down the Control and Shift keys while you press Enter, thus "array
entering" the function.
Syntax:
RANDBIVARNORMAL(mean1,stdev1,mean2,stdev2,correl12)
Returns #REF! if the array function is not entered into two adjacent cells.
Returns #NUM! if a standard deviation is negative or the correlation is outside the range
between -1 and +1.
Returns #VALUE! if an argument is not numeric.
Example: Select two adjacent cells, type
=RANDBIVARNORMAL(100,10,50,5,0.5)
Hold down Control and Shift while you press Enter.
7.9 Random Number Generator Functions 79

RandCumulative
Returns a random value from a piecewise-linear cumulative distribution. This function
can model a continuous-valued uncertain quantity, X, by specifying points on its
cumulative distribution. Each point is specified by a possible value, x, and a
corresponding left-tail cumulative probability, P(X<=x). Random values are based on
linear interpolation between the specified points.
RANDCUMULATIVE Syntax: RANDCUMULATIVE(value_cumulative_table)
Value_cumulative_table must be a reference, or the defined name of a reference, for a
two-column range, with values in the left column and corresponding cumulative
probabilities in the right column.
RANDCUMULATIVE Remarks
Returns #N/A if there are too few or too many arguments.
Returns #NAME! if the argument is text and the name is undefined.
Returns #NUM! if the first (top) cumulative probability is not zero, if the last (bottom)
cumulative probability is not one, or if the values or cumulative probabilities are not in
ascending order.
Returns #REF! if the number of columns in the table reference is not two.
Returns #VALUE! if the argument is not a reference, if the argument is a defined name
but not for a reference, or if any cell of the table contains text or is blank.
RANDCUMULATIVE Example
A corporate planner thinks that minimum possible market demand is 1000 units, median
is 5000, and maximum possible is 9000. Also, there is a ten percent chance that demand
will be less than 4000 and a ten percent chance it will exceed 7000. The values, x, and
cumulative probabilities, P(X<=x), are entered into spreadsheet cells A1:B5.

Figure 7.14 RandDiscrete Example Spreadsheet Data

The function is entered into another cell: =RANDCUMULATIVE(A1:B5)


RANDCUMULATIVE Related Function
80 Chapter 7 Monte Carlo Simulation Using RiskSim

FASTCUMULATIVE: Same as RANDCUMULATIVE without any error checking of


the arguments.

Figure 7.15 RandCumulative Example Probability Density Function

0.0005

0.0004
Probability Density, f(x)

0.0003

0.0002

0.0001

0
0 2000 4000 6000 8000 10000
Market Demand, x, in units

Figure 7.16 RandCumulative Example Cumulative Probability Function

1
Cumulative Probability, P(X<=x

0.8

0.6

0.4

0.2

0
0 2000 4000 6000 8000 10000
Market Demand, x, in units

RandDiscrete
Returns a random value from a discrete probability distribution. This function can model
a discrete-valued uncertain quantity, X, by specifying its probability mass function. The
7.9 Random Number Generator Functions 81

function is specified by each possible discrete value, x, and its corresponding probability,
P(X=x).
RANDDISCRETE Syntax: RANDDISCRETE(value_discrete_table)
Value_discrete_table must be a reference, or the defined name of a reference, for a two-
column range, with values in the left column and corresponding probability mass in the
right column.
RANDDISCRETE Remarks
Returns #N/A if there are too few or too many arguments.
Returns #NAME! if the argument is text and the name is undefined.
Returns #NUM! if a probability is negative or if the probabilities do not sum to one.
Returns #REF! if the number of columns in the table reference is not two.
Returns #VALUE! if the argument is not a reference, if the argument is a defined name
but not for a reference, or if any cell of the table contains text or is blank.
RANDDISCRETE Example
A corporate planner thinks that uncertain market demand, X, can be approximated by
three possible values and their associated probabilities: P(X=3000) = 0.3, P(X=4000) =
0.6, and P(X=5000) = 0.1. The values and probabilities are entered into spreadsheet cells
A1:B3.

Figure 7.17 RandDiscrete Example Spreadsheet Data

The function is entered into another cell: =RANDDISCRETE(A1:B3)


RANDDISCRETE Related Function
FASTDISCRETE: Same as RANDDISCRETE without any error checking of the
arguments.
RandDiscrete Example Probability Mass Function
82 Chapter 7 Monte Carlo Simulation Using RiskSim

Figure 7.18 RandDiscrete Example Probability Mass Function

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 1000 2000 3000 4000 5000 6000 7000
Ma rke t D e ma nd, x, in units

Figure 7.19 RandDiscrete Example Cumulative Probability Function

1
Cumulative Probability, P(X<=x

0.8

0.6

0.4

0.2

0
0 1000 2000 3000 4000 5000 6000 7000
Market Demand, x, in units

RandExponential
Returns a random value from an exponential distribution. This function can model the
uncertain time interval between successive arrivals at a queuing system or the uncertain
time required to serve a customer.
RANDEXPONENTIAL Syntax: RANDEXPONENTIAL(lambda)
Lambda is the mean number of occurrences per unit of time.
7.9 Random Number Generator Functions 83

RANDEXPONENTIAL Remarks
Returns #N/A if there are too few or too many arguments.
Returns #NAME! if the argument is text and the name is undefined.
Returns #NUM! if lambda is negative or zero.
Returns #VALUE! if the argument is a defined name of a cell and the cell is blank or
contains text.
RANDEXPONENTIAL Examples
Cars arrive at a toll plaza with a mean rate of 3 cars per minute. The uncertain time
between successive arrivals, measured in minutes, is =RANDEXPONENTIAL(3). The
average value returned by repeated recalculation of RANDEXPONENTIAL(3) is 0.333.
A bank teller requires an average of two minutes to serve a customer. The uncertain
customer service time, measured in minutes, is =RANDEXPONENTIAL(0.5). The
average value returned by repeated recalculation of RANDEXPONENTIAL(0.5) is 2.
RANDEXPONENTIAL Related Functions
FASTEXPONENTIAL: Same as RANDEXPONENTIAL without any error checking of
the arguments.
−LN(RAND())/lambda: Excel's inverse of the exponential, or
−LN(RANDUNIFORM(0,1))/lambda to use the RiskSim Seed feature.
RANDPOISSON: Counts number of occurrences for a Poisson process.

RandInteger
Returns a uniformly distributed random integer between two integers you specify.
RANDINTEGER Syntax: RANDINTEGER(bottom,top)
Bottom is the smallest integer RANDINTEGER will return.
Top is the largest integer RANDINTEGER will return.
RANDINTEGER Remarks
Returns #N/A if there are too few or too many arguments.
Returns #NAME! if an argument is text and the name is undefined.
Returns #NUM! if top is less than or equal to bottom.
Returns #VALUE! if bottom or top is not an integer or if an argument is a defined name
of a cell and the cell is blank or contains text.
84 Chapter 7 Monte Carlo Simulation Using RiskSim

RANDINTEGER Example
The number of orders a particular customer will place next year is between 7 and 11, with
no number more likely than the others. The uncertain number of orders is
=RANDINTEGER(7,11).
RANDINTEGER Related Function
FASTINTEGER: Same as RANDINTEGER without any error checking of the
arguments.
RANDBETWEEN(bottom,top): Excel’s function for uniformly distributed integers,
without RiskSim’s capability of setting the seed.

RandNormal
Returns a random value from a normal distribution. This function can model a variety of
phenomena where the values follow the familiar bell-shaped curve, and it has wide
application in statistical quality control and statistical sampling.
RANDNORMAL Syntax: RANDNORMAL(mean,standard_dev)
Mean is the arithmetic mean of the normal distribution.
Standard_dev is the standard deviation of the normal distribution.
RANDNORMAL Remarks
Returns #N/A if there are too few or too many arguments.
Returns #NAME! if an argument is text and the name is undefined.
Returns #NUM! if standard_dev is negative.
Returns #VALUE! if an argument is a defined name of a cell and the cell is blank or
contains text.
RANDNORMAL Example
The total market for a product is approximately normally distributed with mean 60,000
units and standard deviation 5,000 units. The uncertain total market is
=RANDNORMAL(60000,5000).
RANDNORMAL Related Function
FASTNORMAL: Same as RANDNORMAL without any error checking of the
arguments.
NORMINV(RAND(),mean,standard_dev): Excel's inverse of the normal, or
NORMINV(RANDUNIFORM(0,1),mean,standard_dev) to use the RiskSim Seed
feature.
7.9 Random Number Generator Functions 85

RandSample
Returns a random sample without replacement from a population.
To use this random number generator function, select a number of cells equal to the
sample size, either in a single column or in a single row. Type =RANDSAMPLE
followed by a reference to the cells containing the population values, enclosed in
parentheses. After typing the ending parentheses, do not press Enter. Instead, hold down
the Control and Shift keys while you press Enter, thus "array entering" the function.
Syntax: RANDSAMPLE(population)
The population argument is a reference to a range of values in a single column.
Returns #N/A if the population range is not part of a single column.
Returns #REF! if the function is not entered into two adjacent cells.
Example: Type population values into cells A1:A5. For a sample of size 3, select cells
B1:B3, and type =RANDSAMPLE(A1:A5) but don't press Enter. Hold down Control and
Shift while you press Enter.

RandPoisson
Returns a random value from a Poisson distribution. This function can model the
uncertain number of occurrences during a specified time interval, for example, the
number of arrivals at a service facility during an hour. The possible values of
RANDPOISSON are the non-negative integers, 0, 1, 2, ... .
RANDPOISSON Syntax: RANDPOISSON(mean)
Mean is the mean number of occurrences per unit of time.
RANDPOISSON Remarks
Returns #N/A if there are too few or too many arguments.
Returns #NAME! if the argument is text and the name is undefined.
Returns #NUM! if mean is negative or zero.
Returns #VALUE! if mean is a defined name of a cell and the cell is blank or contains
text.
RANDPOISSON Examples
Cars arrive at a toll plaza with a mean rate of 3 cars per minute. The uncertain number of
arrivals in a minute is =RANDPOISSON(3). The average value returned by repeated
recalculation of RANDPOISSON(3) is 3.
86 Chapter 7 Monte Carlo Simulation Using RiskSim

A bank teller requires an average of two minutes to serve a customer. The uncertain
number of customers served in a minute is =RANDPOISSON(0.5). The average value
returned by repeated recalculation of RANDPOISSON(0.5) is 0.5.
RANDPOISSON Related Functions
FASTPOISSON: Same as RANDPOISSON without any error checking of the arguments.
RANDEXPONENTIAL: Describes time between occurrences for a Poisson process.

RandTriangular
Returns a random value from a triangular probability density function. This function can
model an uncertain quantity where the most likely value (mode) has the largest
probability of occurrence, the minimum and maximum possible values have essentially
zero probability of occurrence, and the probability density function is linear between the
minimum and the mode and between the mode and the maximum. This function can also
model a ramp density function where the minimum equals the mode or the mode equals
the maximum.
RANDTRIANGULAR Syntax:
RANDTRIANGULAR(minimum,most_likely,maximum)
Minimum is the smallest value RANDTRIANGULAR will return.
Most_likely is the most likely value RANDTRIANGULAR will return.
Maximum is the largest value RANDTRIANGULAR will return.
RANDTRIANGULAR Remarks
Returns #N/A if there are too few or too many arguments.
Returns #NAME! if an argument is text and the name is undefined.
Returns #NUM! if minimum is greater than or equal to maximum, if most_likely is less
than minimum, or if most_likely is greater than maximum.
Returns #VALUE! if an argument is a defined name of a cell and the cell is blank or
contains text.
RANDTRIANGULAR Example
The minimum time required to complete a particular task that is part of a large project is
4 hours, the most likely time required is 6 hours, and the maximum time required is 10
hours.
The function returning the uncertain time required for the task is entered into a cell:
=RANDTRIANGULAR(4,6,10).
7.9 Random Number Generator Functions 87

RANDTRIANGULAR Related Function


FASTTRIANGULAR: Same as RANDTRIANGULAR without any error checking of
arguments.

Figure 7.20 RandTriangular Example Probability Density Function

0.6

0.5
Probability Density, f(x)

0.4

0.3

0.2

0.1

0
0 2 4 6 8 10
Task Time, x, in hours

Figure 7.21 RandTriangular Example Cumulative Probability Function

1
Cumulative Probability, P(X<=x

0.8

0.6

0.4

0.2

0
0 2 4 6 8 10
Task Time, x, in hours

RandUniform
88 Chapter 7 Monte Carlo Simulation Using RiskSim

Returns a uniformly distributed random value between two values you specify. As a
special case, RANDUNIFORM(0,1) is the same as Excel's built-in RAND() function.
RANDUNIFORM Syntax: RANDUNIFORM(minimum,maximum)
Minimum is the smallest value RANDUNIFORM will return.
Maximum is the largest value RANDUNIFORM will return.
RANDUNIFORM Remarks
Returns #N/A if there are too few or too many arguments.
Returns #NAME! if an argument is text and the name is undefined.
Returns #NUM! if minimum is greater than or equal to maximum.
Returns #VALUE! if an argument is a defined name of a cell and the cell is blank or
contains text.
RANDUNIFORM Example
A corporate planner thinks that the company's product will garner between 10% and 15%
of the total market, with all possible percentages equally likely in the specified range. The
uncertain market proportion is =RANDUNIFORM(0.10,0.15).
RANDUNIFORM Related Function
FASTUNIFORM: Same as RANDUNIFORM without any error checking of the
arguments.

7.10 RISKSIM TECHNICAL DETAILS


RiskSim's random number generator functions are based on a uniformly distributed
random number function called RandSeed which is not directly accessible by the user.
RandSeed returns a random value x in the range 0<x<=1. Internally, decimal values for
RandSeed are calculated by dividing a uniformly distributed random integer by
2,147,483,647, which is RandSeed's period. Random integers in the range 1 through
2,147,483,647 are generated using the well-documented Park-Miller algorithm, where
each random integer depends on the previous random integer.
When RiskSim starts, the initial integer seed depends on the system clock. Unlike Excel's
RAND() function, you can use RiskSim at any time to specify an integer seed in the
range 1 through 2,147,483,647, which is used as the previous random integer for the
sequence of random numbers generated by the RiskSim functions.
7.10 RiskSim Technical Details 89

In the Risk Simulation dialog box, the "Random number seed" edit box changes the seed
only for the RiskSim functions; it does not have any effect on Excel's built-in RAND()
function.
Each of RiskSim's random number generator functions use RandSeed as a building block.
RANDBINOMIAL(trials,probability_s) uses RandSeed as the cumulative probability in
Excel's built-in CRITBINOM function.
RANDBIVARNORMAL(mean1,stdev1,mean2,stdev2,correl12) uses two values of
RandNormal to obtain correlated normal values.
RANDCUMULATIVE(value_cumulative_table) uses the value of RandSeed, R, searches
to find the adjacent cumulative probabilities that bracket R, and interpolates on the linear
segment of the cumulative distribution to find the corresponding value.
RANDDISCRETE(value_discrete_table) compares RandSeed with summed probabilities
of the input table until the sum exceeds the RandSeed value, and then returns the previous
value from the input table.
RANDEXPONENTIAL(lambda) uses the value of RandSeed, R, as follows. If the
exponential density function is f(t) = lambda*EXP(-lambda*t), the cumulative is P(T<=t)
= 1 - EXP(-lambda*t). Associating R with P(T<=t), the inverse cumulative is t = -LN(1-
R)/lambda. Since R and 1-R are both uniformly distributed between 0 and 1, RiskSim
uses -LN(R)/lambda for the returned value.
RANDINTEGER(bottom,top) returns bottom + INT(RandSeed*(top-bottom+1)).
RANDNORMAL(mean,standard_dev) uses two RandSeed values in the well-
documented Box-Muller method.
RANDPOISSON(mean) compares RandSeed with cumulative probabilities of Excel's
built-in POISSON function until the probability exceeds the RandSeed value, and then
returns the previous value.
RANDSAMPLE(population) uses RandSeed for each of the cells that were selected when
the function was array-entered, avoiding population values that have already been
selected, thus providing sampling without replacement.
RANDTRIANGULAR(minimum,most_likely,maximum) uses RandSeed once. The
triangular density function has two linear segments, so the cumulative distribution has
two quadratic segments. The returned value is determined by interpolation on the
appropriate quadratic segment.
RANDUNIFORM(minimum,maximum) returns minimum + RandSeed*(maximum-
minimum). RANDUNIFORM(0,1) is equivalent to Excel's built-in RAND() function.
90 Chapter 7 Monte Carlo Simulation Using RiskSim

RiskSim includes a FAST... version of each of the nine functions, e.g.,


FASTBINOMIAL, FASTCUMULATIVE, etc. The FAST... functions are identical to the
RAND... functions except there is no error checking of arguments.

7.11 MODELING UNCERTAIN RELATIONSHIPS


Base Model, Four Inputs
Price is fixed. The three uncontrollable inputs are independent.

Figure 7.22 Four Inputs Influence Chart

Net Cash
Flow

Unit
Price Fixed Costs Units Sold
Variable Cost

Figure 7.23 Four Inputs Display


A B
1 Controllable Input
2 Price $29
3 Uncontrollable Inputs
4 Fixed Costs $12,000
5 Units Sold 700
6 Unit Variable Cost $8
7 Output Variable
8 Net Cash Flow $2,700
7.11 Modeling Uncertain Relationships 91

Figure 7.24 Four Inputs Formulas


A B
1 Controllable Input
2 Price 29
3 Uncontrollable Inputs
4 Fixed Costs 12000
5 Units Sold 700
6 Unit Variable Cost 8
7 Output Variable
8 Net Cash Flow =(B2-B6)*B5-B4

Three Inputs
Price is variable. Units sold depends on price. The two cost inputs are independent.

Figure 7.25 Three Inputs Influence Chart

Net Cash
Flow

Units Sold

Unit
Price Fixed Costs Variable Cost

Figure 7.26 Three Inputs Display


A B C D E
1 Controllable Input Price Units Sold
2 Price $29 $29 700
3 Uncontrollable Inputs $39 550
4 Fixed Costs $12,000 $49 400
5 Unit Variable Cost $8 $59 250
6 Intermediate Variable
7 Units Sold 700 Slope -15
8 Output Variable Intercept 1135
9 Net Cash Flow $2,700
92 Chapter 7 Monte Carlo Simulation Using RiskSim

Figure 7.27 Three Inputs Formulas


A B C D E
1 Controllable Input Price Units Sold
2 Price 29 29 700
3 Uncontrollable Inputs 39 550
4 Fixed Costs 12000 49 400
5 Unit Variable Cost 8 59 250
6 Intermediate Variable
7 Units Sold =E8+E7*B2 Slope =SLOPE(E2:E5,D2:D5)
8 Output Variable Intercept =INTERCEPT(E2:E5,D2:D5)
9 Net Cash Flow =(B2-B5)*B7-B4

Two Inputs
Price is variable. Units sold depends on price. Unit variable cost depends on fixed costs.

Figure 7.28 Two Inputs Influence Chart

Net Cash
Flow

Unit
Units Sold
Variable Cost

Price Fixed Costs


7.11 Modeling Uncertain Relationships 93

Figure 7.29 Two Inputs Display


A B C D E
1 Controllable Input Price Units Sold
2 Price $29 $29 700
3 Uncontrollable Inputs $39 550
4 Fixed Costs $12,000 $49 400
5 Intermediate Variable $59 250
6 Unit Variable Cost $8.00
7 Units Sold 700 Slope -15
8 Output Variable Intercept 1135
9 Net Cash Flow $2,700
10
11 Fixed Costs Unit Variable Cost
12 $10,000 $11
13 $12,000 $8
14 $15,000 $6
15
16 a 0.000000166667
17 b -0.005166666667
18 c 46

Figure 7.30 Two Inputs Formulas


A B C D E
1 Controllable Input Price Units Sold
2 Price 29 29 700
3 Uncontrollable Inputs 39 550
4 Fixed Costs 12000 49 400
5 Intermediate Variable 59 250
6 Unit Variable Cost =E16*B4^2+E17*B4+E18
7 Units Sold =E8+E7*B2 Slope =SLOPE(E2:E5,D2:D5)
8 Output Variable Intercept =INTERCEPT(E2:E5,D2:D5)
9 Net Cash Flow =(B2-B6)*B7-B4
10
11 Fixed Costs Unit Variable Cost
12 10000 11
13 12000 8
14 15000 6
15
16 a =TRANSPOSE(LINEST(E12:E14,D12:D14^{1,2}))
17 b =TRANSPOSE(LINEST(E12:E14,D12:D14^{1,2}))
18 c =TRANSPOSE(LINEST(E12:E14,D12:D14^{1,2}))

Four Inputs with Three Uncertainties


Price is variable. Units sold depends on price. Unit variable cost depends on fixed costs.
Fixed costs, units sold, and unit variable cost are uncertain.
94 Chapter 7 Monte Carlo Simulation Using RiskSim

Figure 7.31 Three Uncertainties Influence Chart

Net Cash
Flow

Unit
Units Sold
Variable Cost

Units Sold Unit Variable


Median Cost Median

Units Sold Unit Variable


Price Fixed Costs
Uncertainty Cost Uncertainty

Figure 7.32 Three Uncertainties Display


A B C D E
1 Controllable Input Price Units Sold
2 Price $29 $29 700
3 Uncontrollable Inputs $39 550
4 Fixed Costs $12,000 $49 400
5 Units Sold Uncertainty 10 $59 250
6 Unit Variable Cost Uncertainty $0.10
7 Intermediate Variable Slope -15
8 Units Sold Median 700 Intercept 1135
9 Units Sold 710
10 Unit Variable Cost Median $8.00
11 Unit Variable Cost $8.10 Fixed Costs Unit Variable Cost
12 Output Variable $10,000 $11
13 Net Cash Flow $2,839 $12,000 $8
14 $15,000 $6
15
16 a 0.000000166667
17 b -0.005166666667
18 c 46
7.11 Modeling Uncertain Relationships 95

Figure 7.33 Three Uncertainties Formulas


A B C D E
1 Controllable Input Price Units Sold
2 Price 29 29 700
3 Uncontrollable Inputs 39 550
4 Fixed Costs 12000 49 400
5 Units Sold Uncertainty 10 59 250
6 Unit Variable Cost Uncertainty 0.1
7 Intermediate Variable Slope =SLOPE(E2:E5,D2:D5)
8 Units Sold Median =E8+E7*B2 Intercept =INTERCEPT(E2:E5,D2:D5)
9 Units Sold =B8+B5
10 Unit Variable Cost Median =E16*B4^2+E17*B4+E18
11 Unit Variable Cost =B10+B6 Fixed Costs Unit Variable Cost
12 Output Variable 10000 11
13 Net Cash Flow =(B2-B11)*B9-B4 12000 8
14 15000 6
15
16 a =TRANSPOSE(LINEST(E12:E14,D12:D14^{1,2}))
17 b =TRANSPOSE(LINEST(E12:E14,D12:D14^{1,2}))
18 c =TRANSPOSE(LINEST(E12:E14,D12:D14^{1,2}))

Intermediate Details
Price is variable. Units sold depends on price. Unit variable cost depends on fixed costs.
Fixed costs, units sold, and unit variable cost are uncertain.
Include revenue, total variable cost, and total costs as intermediate variables.

Figure 7.34 Intermediate Details Influence Chart

Net Cash
Flow

Revenue Total Costs

Total
Variable Cost

Unit
Units Sold
Variable Cost

Units Sold Unit Variable


Median Cost Median

Units Sold Unit Variable


Price Fixed Costs
Uncertainty Cost Uncertainty
96 Chapter 7 Monte Carlo Simulation Using RiskSim

Figure 7.35 Intermediate Details Display


A B C D E
1 Controllable Input Price Units Sold
2 Price $29 $29 700
3 Uncontrollable Inputs $39 550
4 Fixed Costs $12,000 $49 400
5 Units Sold Uncertainty 10 $59 250
6 Unit Variable Cost Uncertainty $0.10
7 Intermediate Variable Slope -15
8 Units Sold Median 700 Intercept 1135
9 Units Sold 710
10 Revenue $20,590
11 Unit Variable Cost Median $8.00 Fixed Costs Unit Variable Cost
12 Unit Variable Cost $8.10 $10,000 $11
13 Total Variable Cost $5,751 $12,000 $8
14 Total Costs $17,751 $15,000 $6
15 Output Variable
16 Net Cash Flow $2,839 a 0.000000166667
17 b -0.005166666667
18 c 46

Figure 7.36 Intermediate Details Formulas


A B C D E
1 Controllable Input Price Units Sold
2 Price 29 29 700
3 Uncontrollable Inputs 39 550
4 Fixed Costs 12000 49 400
5 Units Sold Uncertainty 10 59 250
6 Unit Variable Cost Uncertainty 0.1
7 Intermediate Variable Slope =SLOPE(E2:E5,D2:D5)
8 Units Sold Median =E8+E7*B2 Intercept =INTERCEPT(E2:E5,D2:D5)
9 Units Sold =B8+B5
10 Revenue =B9*B2
11 Unit Variable Cost Median =E16*B4^2+E17*B4+E18 Fixed Costs Unit Variable Cost
12 Unit Variable Cost =B11+B6 10000 11
13 Total Variable Cost =B12*B9 12000 8
14 Total Costs =B4+B13 15000 6
15 Output Variable
16 Net Cash Flow =B10-B14 a =TRANSPOSE(LINEST(E12:E14,D12:D14^{1,2}))
17 b =TRANSPOSE(LINEST(E12:E14,D12:D14^{1,2}))
18 c =TRANSPOSE(LINEST(E12:E14,D12:D14^{1,2}))
Multiperiod What-If
Modeling
8
8.1 APARTMENT BUILDING PURCHASE PROBLEM
You are considering the purchase of an apartment building in northern California. The
building contains 25 units and is listed for $2,000,000. You plan to keep the building for
three years and then sell it.
You know that the annual taxes on the property are currently $20,000 and will increase to
$25,000 after closing. You estimate that these taxes will grow at a rate of 2 percent per
year. You estimate that it will cost about $1,000 per unit per year to maintain the
apartments, and these maintenance costs are expected to grow at a 15 percent per year
rate.
You have not decided on the rent to charge. Currently, the rent is $875 per unit per
month, but there is substantial turnover, and the occupancy is only 75 percent. That is, on
average, 75 percent of the units are rented at any time. You estimate that if you lowered
the rent to $675 per unit per month, you would have 100 percent occupancy. You think
that intermediate rental charges would produce intermediate occupancy percentages; for
example, a $775 rental charge would have 87.5 percent occupancy.
You will decide on the monthly rental charge for the first year, and you think the rental
market is such that you will be able to increase it 7 percent per year for the second and
third years. Furthermore, whatever occupancy percentage occurs in the first year will
hold for the second and third years. For example, if you decide on the $675 monthly
rental charge for the first year, the occupancy will be 100 percent all three years.
At the end of three years, you will sell the apartment building. The realtors in your area
usually estimate the selling price of a rental property as a multiple of its annual rental
income (before expenses). You estimate that this multiple will be 9. That is, if the rental
income in the third year is $200,000, then the sale price will be $1,800,000.
Your objective is to achieve the highest total accumulated cash at the end of the three
year period. If rental income exceeds expenses in the first or second years, you will invest
the excess in one-year certificates of deposits (CDs) yielding 5 percent. Thus, total
98 Chapter 8 Multiperiod What-If Modeling

accumulated cash will include net cash flow (income minus expense) in each of the three
years, interest from CDs received at the end of the second and third years, and cash from
the sale of the property at the end of the third year.
In your initial analysis you have decided to ignore depreciation and other issues related to
income taxes.
Instead of purchasing the apartment building, you could invest the entire $2,000,000 in
certificates of deposits yielding 5 percent per year.
8.1 Apartment Building Purchase Problem 99

Figure 8.1 Base Case Model Display


A B C D E F
1 Apartment Building Purchase Monthly Rent Occupancy
2 $675 100
3 Controllable Factors $775 87.5
4 Unit monthly rent $775 $875 75
5 Uncertain Factors
6 Annual unit maintenance $1,000 slope -0.125
7 Annual maint. increase 15% intercept 184.375
8 Annual tax increase 2.0%
9 Gross rent multiplier 9.00
10 Other Assumptions
11 First year property taxes $25,000
12 Annual rent increase 7%
13 CD annual yield 5%
14 Intermediate variable
15 Occupancy percentage 87.50%
16 Performance measure
17 Final cash value $2,610,848
18
19 One Two Three
20 Unit monthly rent $775 $829 $887
21 Annual rental income $203,438 $217,678 $232,916
22
23 Annual maintenance cost $25,000 $28,750 $33,063
24 Annual property tax $25,000 $25,500 $26,010
25 Total annual expenses $50,000 $54,250 $59,073
26
27 Operating cash flow $153,438 $163,428 $173,843
28
29 CD investment $153,438 $324,538
30 Year-end CD interest $7,672 $16,227
31
32 Sale receipt $2,096,240
33
34 Final Cash Value $2,610,848
35
36 CD investment $2,000,000 $2,100,000 $2,205,000
37 Year-end CD interest $100,000 $105,000 $110,250
38 Final Cash Value $2,315,250
100 Chapter 8 Multiperiod What-If Modeling

Figure 8.2 Base Case Model Formulas


A B C D E F
1 Apartment Building Purchase Monthly Rent Occupancy
2 675 100
3 Controllable Factors 775 87.5
4 Unit monthly rent 775 875 75
5 Uncertain Factors
6 Annual unit maintenance 1000 slope =SLOPE(F2:F4,E2:E4)
7 Annual maint. increase 0.15 intercept =INTERCEPT(F2:F4,E2:E4)
8 Annual tax increase 0.02
9 Gross rent multiplier 9
10 Other Assumptions
11 First year property taxes 25000
12 Annual rent increase 0.07
13 CD annual yield 0.05
14 Intermediate variable
15 Occupancy percentage =(F7+F6*B4)/100
16 Performance measure
17 Final cash value =D34
18
19 One Two Three
20 Unit monthly rent =B4 =B20*(1+$B$12) =C20*(1+$B$12)
21 Annual rental income =B20*25*$B$15*12 =C20*25*$B$15*12 =D20*25*$B$15*12
22
23 Annual maintenance cost =B6*25 =(1+$B$7)*B23 =(1+$B$7)*C23
24 Annual property tax =B11 =(1+$B$8)*B24 =(1+$B$8)*C24
25 Total annual expenses =SUM(B23:B24) =SUM(C23:C24) =SUM(D23:D24)
26
27 Operating cash flow =B21-B25 =C21-C25 =D21-D25
28
29 CD investment =B27 =C27+C29+C30
30 Year-end CD interest =B13*C29 =B13*D29
31
32 Sale receipt =D21*B9
33
34 Final Cash Value =SUM(D27:D32)
35
36 CD investment 2000000 =B36+B37 =C36+C37
37 Year-end CD interest =B13*B36 =B13*C36 =B13*D36
38 Final Cash Value =D36+D37

Figure 8.3 Ranges based on decision maker’s or expert’s judgment


Uncertain Factors Low Base High
Annual unit maintenance $700 $1,000 $2,000
Annual maint. increase 10% 15% 30%
Annual tax increase 2.0% 2.0% 3.0%
Gross rent multiplier 7.00 9.00 10.00

Apartment Building Analysis Notes


Influence Diagram (for single period)
Modeling effect of rent on occupancy rate
Linear fit: algebra (slope and intercept)
XY Scatter chart; Insert Trendline
Quadratic fit: if $775 yields 82.5% occupancy instead of 87.5%
8.2 Product Launch Financial Model 101

Base Case model


Use Solver to find optimum rent to maximize final cash value
Use Sensit.xla Plot of final cash value depending on rent; relatively insensitive
Use Sensit.xla Spider
Sensitivity Cases
Ranges based on decision maker’s or expert’s judgment
Sensit.xla Tornado chart: identify critical variables
Monte Carlo simulation
RiskSim.xla
Triangular distributions for critical variables
What is probability that final cash will be less than $2,315,250?

8.2 PRODUCT LAUNCH FINANCIAL MODEL


Figure 8.4 Original Model Display
A B C D E F G H I J K L
1
2 FINANCE The @RISK Demonstration Model :
3 Product Launch Risk Analysis 2001-2010
4
5 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
6 ======== ======== ======== ======== ======== ======== ======== ======== ======== ========
7 Price No Entry $70.00 $88.20 $119.00 $112.70 $99.40 $94.50 $91.70 $90.30
8 Price With Entry $53.00 $67.31 $79.50 $63.60 $60.95 $55.65 $54.59 $51.94
9 Volume No Entry 3500 4340 6580 5565 5180 5180 4970 4935
10 Volume With Entry 3300 4158 3564 3399 3300 3300 3432 3696
11 Competitor Entry: 1
12
13 Design Costs $50,000.00
14 Capital Investment $100,000.00
15 Operating Expense Factor 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15
16
17 Sales Price $53.00 $67.31 $79.50 $63.60 $60.95 $55.65 $54.59 $51.94
18 Sales Volume 3300 4158 3564 3399 3300 3300 3432 3696
19 Sales Revenue $174,900 $279,875 $283,338 $216,176 $201,135 $183,645 $187,353 $191,970
20 Unit Production Cost $23.33 $24.26 $25.23 $26.24 $27.29 $28.38 $29.52 $30.70
21 Overhead $3,300 $6,944 $10,528 $8,904 $8,288 $8,288 $7,952 $7,896
22 Cost of Goods Sold $80,289 $107,830 $100,461 $98,104 $98,354 $101,957 $109,264 $121,366
23 Gross Margin $94,611 $172,045 $182,877 $118,072 $102,781 $81,688 $78,089 $70,604
24 Operating Expense $12,043 $16,175 $15,069 $14,716 $14,753 $15,294 $16,390 $18,205
25 Net Before Tax ($50,000) $0 $82,568 $155,870 $167,808 $103,357 $88,028 $66,395 $61,699 $52,400
26 Depreciation $20,000 $20,000 $20,000 $20,000 $20,000
27 Tax ($23,000) ($9,200) $28,781 $62,500 $67,992 $38,344 $40,493 $30,542 $28,382 $24,104
28 Taxes Owed $0 $0 $0 $59,081 $67,992 $38,344 $40,493 $30,542 $28,382 $24,104
29 Net After Tax ($50,000) $0 $82,568 $96,789 $99,816 $65,013 $47,535 $35,853 $33,317 $28,296
30 ======== ======== ======== ======== ======== ======== ======== ======== ======== ========
31 Net Cash Flow ($50,000) ($100,000) $82,568 $96,789 $99,816 $65,013 $47,535 $35,853 $33,317 $28,296
32 NPV 10% $164,877
33
102 Chapter 8 Multiperiod What-If Modeling

Figure 8.5 Input Assumptions


A B C D E F G H I J K L
1
2 FINANCE The @RISK Demonstration M odel :
3 Product Launch Risk Analysis 2001-2010
4
5 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
6 ======== ======== ======== ======== ======== ======== ======== ======== ======== ========
7 Price No Entry $70.00 $88.20 $119.00 $112.70 $99.40 $94.50 $91.70 $90.30
8 Price With Entry $53.00 $67.31 $79.50 $63.60 $60.95 $55.65 $54.59 $51.94
9 Volume No Entry 3500 4340 6580 5565 5180 5180 4970 4935
10 Volume With Entry 3300 4158 3564 3399 3300 3300 3432 3696
11 Competitor Entry: 1
12
13 Design Costs $50,000.00
14 Capital Investment $100,000.00
15 Operating Expense Factor 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15
16
17 Sales Price $53.00 $67.31 $79.50 $63.60 $60.95 $55.65 $54.59 $51.94
18 Sales Volume 3300 4158 3564 3399 3300 3300 3432 3696
19 Sales Revenue $174,900 $279,875 $283,338 $216,176 $201,135 $183,645 $187,353 $191,970
20 Unit Production Cost $23.33 $24.26 $25.23 $26.24 $27.29 $28.38 $29.52 $30.70
21 Overhead $3,300 $6,944 $10,528 $8,904 $8,288 $8,288 $7,952 $7,896
22 Cost of Goods Sold $80,289 $107,830 $100,461 $98,104 $98,354 $101,957 $109,264 $121,366
23 Gross Margin $94,611 $172,045 $182,877 $118,072 $102,781 $81,688 $78,089 $70,604
24 Operating Expense $12,043 $16,175 $15,069 $14,716 $14,753 $15,294 $16,390 $18,205
25 Net Before Tax ($50,000) $0 $82,568 $155,870 $167,808 $103,357 $88,028 $66,395 $61,699 $52,400
26 Depreciation $20,000 $20,000 $20,000 $20,000 $20,000
27 Tax ($23,000) ($9,200) $28,781 $62,500 $67,992 $38,344 $40,493 $30,542 $28,382 $24,104
28 Taxes Owed $0 $0 $0 $59,081 $67,992 $38,344 $40,493 $30,542 $28,382 $24,104
29 Net After Tax ($50,000) $0 $82,568 $96,789 $99,816 $65,013 $47,535 $35,853 $33,317 $28,296
30 ======== ======== ======== ======== ======== ======== ======== ======== ======== ========
31 Net Cash Flow ($50,000) ($100,000) $82,568 $96,789 $99,816 $65,013 $47,535 $35,853 $33,317 $28,296
32 NPV 10% $164,877
33

Figure 8.6 Modifications for SensIt Display


A B C D E F G H I J K L
1 Inputs
2 Price w/ o Entry $70.00
3 Price w/ Entry $53.00
4 Volum e No Entry 3,500
5 Volum e w/ Entry 3,300
6 Com petitor Entry 1
7 Design Costs $50,000
8 Capital Investm ent $100,000
9 Operating Expense Factor 15.0%
10 Unit Production Costs 23.33
11 Overhead $3,300
12
13
14
15 FINANCE The @RISK Demonstration M odel :
16 Product Launch Risk Analysis 2001-2010
17
18 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
19 ======== ======== ======== ======== ======== ======== ======== ======== ======== ========
20 Price No Entry $70.00 $88.20 $119.00 $112.70 $99.40 $94.50 $91.70 $90.30
21 Price With Entry $53.00 $67.31 $79.50 $63.60 $60.95 $55.65 $54.59 $51.94
22 Volume No Entry 3500 4340 6580 5565 5180 5180 4970 4935
23 Volume With Entry 3300 4158 3564 3399 3300 3300 3432 3696
24 Competitor Entry: 1
25
26 Design Costs $50,000.00
27 Capital Investment $100,000.00
28 Operating Expense Factor 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15
29
30 Sales Price $53.00 $67.31 $79.50 $63.60 $60.95 $55.65 $54.59 $51.94
31 Sales Volume 3300 4158 3564 3399 3300 3300 3432 3696
32 Sales Revenue $174,900 $279,875 $283,338 $216,176 $201,135 $183,645 $187,353 $191,970
33 Unit Production Cost $23.33 $24.26 $25.23 $26.24 $27.29 $28.38 $29.52 $30.70
34 Overhead $3,300 $6,944 $10,528 $8,904 $8,288 $8,288 $7,952 $7,896
35 Cost of Goods Sold $80,289 $107,830 $100,461 $98,104 $98,354 $101,957 $109,264 $121,366
36 Gross Margin $94,611 $172,045 $182,877 $118,072 $102,781 $81,688 $78,089 $70,604
37 Operating Expense $12,043 $16,175 $15,069 $14,716 $14,753 $15,294 $16,390 $18,205
38 Net Before Tax ($50,000) $0 $82,568 $155,870 $167,808 $103,357 $88,028 $66,395 $61,699 $52,400
39 Depreciation $20,000 $20,000 $20,000 $20,000 $20,000
40 Tax ($23,000) ($9,200) $28,781 $62,500 $67,992 $38,344 $40,493 $30,542 $28,382 $24,104
41 Taxes Owed $0 $0 $0 $59,081 $67,992 $38,344 $40,493 $30,542 $28,382 $24,104
42 Net After Tax ($50,000) $0 $82,568 $96,789 $99,816 $65,013 $47,535 $35,853 $33,317 $28,296
43 ======== ======== ======== ======== ======== ======== ======== ======== ======== ========
44 Net Cash Flow ($50,000) ($100,000) $82,568 $96,789 $99,816 $65,013 $47,535 $35,853 $33,317 $28,296
45 NPV 10% $164,877
46
8.2 Product Launch Financial Model 103

Figure 8.7 Modifications for SensIt Formulas


A B C D E F
1 Inputs
2 Price w/ o Entry 70
3 Price w/ Entry 53
4 Volum e No Entry 3500
5 Volum e w/ Entry 3300
6 Com petitor Entry 1
7 Design Costs 50000
8 Capital Investm ent 100000
9 Operating Expense Factor 0.15
10 Unit Production Costs 23.33
11 Overhead 3300
12
13
14
15 FINANCE The @RISK Demonstratio
16 Product Launch Risk Analysis 2001-20
17
18 2001 2002 2003 2004
19 ======== ======== ======== ========
20 Price No Entry =C2 =1.26*E20
21 Price With Entry =C3 =1.27*E21
22 Volume No Entry =C4 =1.24*E22
23 Volume With Entry =C5 =1.26*E23
24 Competitor Entry: =C6
25
26 Design Costs =C7
27 Capital Investment =C8
28 Operating Expense Factor =C9 =$E$28
29
30 Sales Price =IF($C$24=0,E20,E21) =IF($C$24=0,F20,F21)
31 Sales Volume =IF($C$24=0,E22,E23) =IF($C$24=0,F22,F23)
32 Sales Revenue =(E30*E31) =(F30*F31)
33 Unit Production Cost =C10 =1.04*E33
34 Overhead =C11 6944
35 Cost of Goods Sold =(E31*E33)+E34 =(F31*F33)+F34

Figure 8.8 Data for Competitor Entry as Base Case


A B C D E F G
1 Inputs Low Base High
2 Price w/ o Entry $70.00 $50.00 $70.00 $90.00
3 Price w/ Entry $53.00 $40.00 $53.00 $68.00
4 Volum e No Entry 3,500 3,100 3,500 3,900
5 Volum e w/ Entry 3,300 2,800 3,300 3,800
6 Com petitor Entry 0 0 1 1
7 Design Costs $50,000 $37,000 $50,000 $63,000
8 Capital Investm ent $100,000 $60,000 $100,000 $140,000
9 Operating Expense Factor 15.0% 6.5% 15.0% 23.0%
10 Unit Production Costs 23.33 15.50 23.33 32.00
11 Overhead $3,300 $2,800 $3,300 $4,000
104 Chapter 8 Multiperiod What-If Modeling

Figure 8.9 Tornado Chart for Competitor Entry as Base Case

Sensit - Sensitivity Analysis - Tornado

Competitor Entry 1 0

Price w/ Entry$40.00 $68.00

Unit Production Costs 32.00 15.50

Volume w/ Entry 2,800 3,800

Capital Investment $140,000 $60,000

Operating Expense Factor 23.0% 6.5%

Design Costs $63,000 $37,000

Overhead $4,000 $2,800

Price w/o Entry $90.00


$50.00

Volume No Entry 3,900


3,100

$0 $100,000 $200,000 $300,000 $400,000 $500,000 $600,000 $700,000


NPV 10%

Figure 8.10 Data for No Competitor Entry as Base Case


A B C D E F G
1 Inputs Low Base High
2 Price w/ o Entry $70.00 $50.00 $70.00 $90.00
3 Price w/ Entry $53.00 $40.00 $53.00 $68.00
4 Volum e No Entry 3,500 3,100 3,500 3,900
5 Volum e w/ Entry 3,300 2,800 3,300 3,800
6 Com petitor Entry 0 0 0 1
7 Design Costs $50,000 $37,000 $50,000 $63,000
8 Capital Investm ent $100,000 $60,000 $100,000 $140,000
9 Operating Expense Factor 15.0% 6.5% 15.0% 23.0%
10 Unit Production Costs 23.33 15.50 23.33 32.00
11 Overhead $3,300 $2,800 $3,300 $4,000
8.3 Machine Simulation Model 105

Figure 8.11 Tornado Chart for No Competitor Entry as Base Case

Sensit - Sensitivity Analysis - Tornado

Price w/o Entry $50.00 $90.00

Competitor Entry 1 0

Unit Production Costs 32.00 15.50

Volume No Entry 3,100 3,900

Operating Expense Factor 23.0% 6.5%

Capital Investment $140,000 $60,000

Design Costs $63,000 $37,000

Overhead $4,000 $2,800

Price w/ Entry $68.00


$40.00

Volume w/ Entry 3,800


2,800

$100,000 $200,000 $300,000 $400,000 $500,000 $600,000 $700,000 $800,000 $900,000 $1,000,00 $1,100,00
NPV 10% 0 0

8.3 MACHINE SIMULATION MODEL


Adapted from Clemen's Making Hard Decisions. AJS, Ltd., is a manufacturing company
that performs contract work for a wide variety of firms. It primarily manufactures and
assembles metal items, and so most of its equipment is designed for precision machining
tasks. The executive of AJS currently are trying to decide between two processes for
manufacturing a product. Their main criterion for measuring the value of a manufacturing
process is net present value (NPV). The contractor will pay AJS $8 per unit. AJS is using
a three-year horizon for its evaluation (the current year and the next two years).

AJS Process 1
Under the first process, AJS's current machinery is used to make the product. The
following inputs are used:
Demand Demand for each of the three years is unknown. The three annual demands are
modeled as discrete uncertain quantities with the probability distributions shown in the
spreadsheet display.
106 Chapter 8 Multiperiod What-If Modeling

Variable Cost Variable cost per unit changes each year, depending on the costs for
materials and labor. The uncertainty about each variable cost is represented by a
continuous normal distribution with mean $4.00 and standard deviation $0.40.
Machine Failure Each year, AJS's machines fail occasionally, but obviously it is
impossible to predict when or how many failures will occur during the year. Each time a
machine fails, it costs the firm $8000. The uncertainty about the number of machine
failures in each of the three years is represented by a Poisson random variable with
average 4 failures per year.
Fixed Cost Each year a fixed cost of $12,000 is incurred.

AJS Process 2
The second process involves scrapping the current equipment (it has no salvage value)
and purchasing new equipment to make the product at a cost of $60,000. Assume that the
firm pays cash for the new machine, and ignore tax effects.
Demand Because of the new machine, the final product is slightly altered and improved,
and consequently the demands are likely to be higher than before, although more
uncertain. The new demand distributions are shown in the spreadsheet display.
Variable Cost Variable cost per unit still changes each year. With the new machine it is
judged to be slightly lower but with more uncertainty, so the cost is described by a
normal distribution with mean $3.50 and standard deviation $1.00.
Machine Failure Equipment failures are less likely with the new equipment, with an
average of three per year. Such failures tend to be less serious with the new machine,
costing only $6000.
Fixed Cost The annual fixed cost of $12,000 is unchanged.
8.3 Machine Simulation Model 107

Figure 8.12 Process 1 Display and Formulas


A B C D E F G
1 Process 1
2 Zero One Two
3 Demand D P(D) D P(D) D P(D)
4 11,000 0.2 8,000 0.2 4,000 0.1
5 16,000 0.6 19,000 0.4 21,000 0.5
6 21,000 0.2 27,000 0.4 37,000 0.4
7
8 Var Cost Mean StDev
9 Normal $4.00 $0.40
10
11 Machine Mean
12 Failure 4
13 Poisson
14
15 Equipment $0
16 Unit Price $8
17 Failure Cost $8,000
18 Fixed Cost $12,000
19 Discount Rate 10%
20
21 Year Initial Zero One Two
22 Demand 16,000 19,000 21,000 Mode
23 Var Cost $4.00 $4.00 $4.00 Mean
24 Failures 4 4 4 Mean
25 Cash Flow $0 $20,000 $32,000 $40,000
26
27 NPV $74,681
28
29 Formula in B25: =-B15
30
31 Formula in C25: =C22*($B16-C23)-C24*$B17-$B18
32 Copy to D25:E25
33
34 Formula in B27: =B25+NPV(B19,C25:E25)
108 Chapter 8 Multiperiod What-If Modeling

Figure 8.13 Process 2 Display


A B C D E F G
1 Process 2
2 Zero One Two
3 Demand D P(D) D P(D) D P(D)
4 14,000 0.3 12,000 0.36 9,000 0.4
5 19,000 0.4 23,000 0.36 26,000 0.1
6 24,000 0.3 31,000 0.28 42,000 0.5
7
8 Var Cost Mean StDev
9 Normal $3.50 $1.00
10
11 Machine Mean
12 Failure 3
13 Poisson
14
15 Equipment $60,000
16 Unit Price $8
17 Failure Cost $6,000
18 Fixed Cost $12,000
19 Discount Rate 10%
20
21 Year Initial Zero One Two
22 Demand 19,000 23,000 26,000 Mode
23 Var Cost $3.50 $3.50 $3.50 Mean
24 Failures 3 3 3 Mean
25 Cash Flow -$60,000 $55,500 $73,500 $87,000
26
27 NPV $116,563

Figure 8.14 RiskSim Functions for Process 1 and Process 2


A B C D E
20
21 Year Initial Zero One Two
22 Demand =randdiscrete(B4:C6) =randdiscrete(D4:E6) =randdiscrete(F4:G6)
23 Var Cost =randnormal($B$9,$C$9) =randnormal($B$9,$C$9) =randnormal($B$9,$C$9)
24 Failures =randpoisson($B$12) =randpoisson($B$12) =randpoisson($B$12)
25 Cash Flow =-B15 =C22*($B16-C23)-C24*$B17-$B18 =D22*($B16-D23)-D24*$B17-$B18 =E22*($B16-E23)-E24*$B17-$B18
26
27 NPV =B25+NPV(B19,C25:E25)
8.3 Machine Simulation Model 109

Figure 8.15 RiskSim Output for Process 1


RiskSim - One Output - Summary Mean $90,526
Date 9-Apr-01 St. Dev. $47,290
Time 7:07 PM Mean St. Error $1,495
Workbook AJS_WhatIf.xls Minimum -$59,664
Worksheet Process 1 Probability First Quartile $58,050
Output Cell $B$27 Median $91,460
Output Label NPV Third Quartile $124,435
Seed 0.5 Maximum $234,703
Trials 1,000 Skewness -0.1034

RiskSim Histogram, 09-Apr-01, 07:07 PM

400

350

300

250
Frequency

200

150

100

50

0
-$100,000 $0 $100,000 $200,000
NPV, Upper Limit of Interval

RiskSim Cumulative Chart, 09-Apr-01, 07:07 PM

1.0
0.9
0.8
Cumulative Probability

0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-$100,000 -$50,000 $0 $50,000 $100,000 $150,000 $200,000 $250,000
NPV
110 Chapter 8 Multiperiod What-If Modeling

Figure 8.16 RiskSim Output for Process 2


RiskSim - One Output - Summary Mean $116,159
Date 9-Apr-01 St. Dev. $73,675
Time 7:08 PM Mean St. Error $2,330
Workbook AJS_WhatIf.xls Minimum -$70,685
Worksheet Process 2 Probability First Quartile $60,199
Output Cell $B$27 Median $114,335
Output Label NPV Third Quartile $168,191
Seed 0.5 Maximum $347,514
Trials 1,000 Skewness 0.1390

RiskSim Histogram, 09-Apr-01, 07:08 PM

300

250

200
Frequency

150

100

50

0
-$100,000 $0 $100,000 $200,000 $300,000
NPV, Upper Limit of Interval

RiskSim Cumulative Chart, 09-Apr-01, 07:08 PM

1.0
0.9
0.8
Cumulative Probability

0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-$100,00 -$50,000 $0 $50,000 $100,00 $150,00 $200,00 $250,00 $300,00 $350,00
0 0 0 0 0 0 0
NPV
8.3 Machine Simulation Model 111

Follow these instructions to show two or more risk profiles on the same chart.
Use RiskSim to obtain the sorted values, cumulative probabilities, and XY charts for
strategy A and strategy B.
To add the data for strategy B to the existing plot for strategy A, select the sorted values
and cumulative probabilities for strategy B (without including the text labels in row 1),
and choose Edit | Copy.
Click just inside the outer border of the strategy A chart to select it. From the main menu,
choose Edit | Paste Special. In the Paste Special dialog box, select "Add cells as New
series," select "Values (Y) in Columns," check the box for "Categories (X Values) in
First Column," and click OK.
Use the same method to add data for other strategies to the strategy A chart.
To change the lines and markers of a data series, click a data point on the chart to select
the data series, and choose Format | Selected Data Series | Patterns.
If the X values are quite different for the various strategies, it may be necessary to adjust
the minimum and maximum values on the Scale tab of the Format Axis dialog box.
112 Chapter 8 Multiperiod What-If Modeling

Figure 8.17 Comparison of Process1 and Process 2


Process 1 Process 2

Mean $90,526 Mean $116,159


St. Dev. $47,290 St. Dev. $73,675
Mean St. Error $1,495 Mean St. Error $2,330
Minimum -$59,664 Minimum -$70,685
First Quartile $58,050 First Quartile $60,199
Median $91,460 Median $114,335
Third Quartile $124,435 Third Quartile $168,191
Maximum $234,703 Maximum $347,514
Skewness -0.1034 Skewness 0.1390

RiskSim Cumulative Chart

1.0

0.9
Process 1 Process 2
0.8

0.7
Cumulative Probability

0.6

0.5

0.4

0.3

0.2

0.1

0.0
-$100,000 -$50,000 $0 $50,000 $100,000 $150,000 $200,000 $250,000 $300,000 $350,000
NPV
Modeling Inventory
Decisions
9
This chapter describes simulation and expected value methods for determining how much
of a product or service to have on hand for a single period when there is uncertain
demand and no possibility of reordering.

9.1 NEWSVENDOR PROBLEM


This approach is appropriate for decision situations with
highly seasonal or style goods,
perishable goods like flowers and foods,
goods that become obsolete, like newspapers and magazines, and
perishable services, like airline seats for a specific flight
and hotel rooms for a specific date.
This decision problem is sometimes called the newsvendor problem, and it is the basis for
more elaborate models called yield management or revenue management.

Stationery Wholesaler Example


A wholesaler of stationery is deciding how many desk calendars to stock for the coming
year. It is impossible to reorder, and leftover units are worthless. The wholesaler has
approximated the uncertain demand as shown in the following table.
Demand, in thousands Probability
100 0.10
200 0.15
300 0.50
400 0.25
The calendars sell for $100 per thousand, and the incremental cost of purchase is $70 per
thousand. The incremental cost of selling (sales commissions) is $5 per thousand.
114 Chapter 9 Modeling Inventory Decisions

This page is intentionally mostly blank.


Modeling Waiting Lines
10
10.1 QUEUE SIMULATION
A warehouse has one dock used to unload railroad freight cars. Incoming freight cars are
delivered to the warehouse during the night. It takes exactly half a day to unload a car. If
more than two cars are waiting to be unloaded on a given day, the unloading of some of
the cars is postponed until the following day. The cost is $100 per day for each car
delayed.
Past experience has indicated that the number of cars arriving during the night have the
frequencies shown in the table below. Furthermore, there is no apparent pattern, so that
the number arriving on any night is independent of the number arriving on any other
night.
Figure 10.1 Arrival Frequency
Number of cars Relative
arriving frequency
0 0.23
1 0.30
2 0.30
3 0.10
4 0.05
5 0.02
6 or more 0.00
1.00
Concepts for Queuing (waiting-line) Models
Arrival pattern
Service time
Number of servers
Queue discipline
116 Chapter 10 Modeling Waiting Lines

Performance measures
Equilibrium
Average waiting time
Average number of customers in line
System utilization, rho = mean arrival rate / mean service rate
Stable system: rho < 1

Figure 10.2 Influence Chart for Simulation Model


Total
Cost

Cost of
Delays

Number Number Number


Delayed Delayed Delayed

Number Actually Number Actually Number Actually


To Unload Unloaded To Unload Unloaded To Unload Unloaded

Number of Number of Number of


Arrivals Arrivals Arrivals

Day 1 Day 2 Day N

Unloading Capacity
10.1 Queue Simulation 117

Figure 10.3 Simulation Model Spreadsheet Model Display


A B C D E F G H I
1 Unloading Capacity 2 Daily Delay Cost $ 100
2
3 Random Number of Number Actually Number Annual Delay Cost $ 16,500
4 Day Number Arrivals To Unload Unloaded Delayed
5 1 0.812 3 3 2 1
6 2 0.524 2 3 2 1
7 3 0.671 2 3 2 1
8 4 0.250 1 2 2 0
9 5 0.940 3 3 2 1
10 6 0.771 2 3 2 1
11 7 0.026 0 1 1 0
12 8 0.178 0 0 0 0
13 9 0.683 2 2 2 0
14 10 0.727 2 2 2 0
44 40 0.082 0 0 0 0
45 41 0.425 1 1 1 0
46 42 0.826 3 3 2 1
47 43 0.855 3 4 2 2
48 44 0.971 3 5 2 3
49 45 0.429 1 4 2 2
50 46 0.592 2 4 2 2
51 47 0.085 0 2 2 0
52 48 0.018 0 0 0 0
53 49 0.678 2 2 2 0
54 50 0.510 2 2 2 0
55
56 Total 86 33
57
58 Daily Average 1.72 0.66

Figure 10.4 Simulation Model Spreadsheet Model Formulas


A B C D E F G H I
1 Unloading Capacity 2 Daily Delay Cost 100
2
3 Random Number of Number Actually Number Annual Delay Cost =250*F58*I1
4 Day Number Arrivals To Unload Unloaded Delayed
5 1 =RAND() =IF(B5<0.2,0,IF(B5<0.5,1,IF(B5<0.8,2,3))) =C5 =MIN(D5,$C$1) =D5-E5
6 2 =RAND() =IF(B6<0.2,0,IF(B6<0.5,1,IF(B6<0.8,2,3))) =F5+C6 =MIN(D6,$C$1) =D6-E6
7 3 =RAND() =IF(B7<0.2,0,IF(B7<0.5,1,IF(B7<0.8,2,3))) =F6+C7 =MIN(D7,$C$1) =D7-E7
8 4 =RAND() =IF(B8<0.2,0,IF(B8<0.5,1,IF(B8<0.8,2,3))) =F7+C8 =MIN(D8,$C$1) =D8-E8
9 5 =RAND() =IF(B9<0.2,0,IF(B9<0.5,1,IF(B9<0.8,2,3))) =F8+C9 =MIN(D9,$C$1) =D9-E9
10 6 =RAND() =IF(B10<0.2,0,IF(B10<0.5,1,IF(B10<0.8,2,3))) =F9+C10 =MIN(D10,$C$1) =D10-E10
11 7 =RAND() =IF(B11<0.2,0,IF(B11<0.5,1,IF(B11<0.8,2,3))) =F10+C11 =MIN(D11,$C$1) =D11-E11
12 8 =RAND() =IF(B12<0.2,0,IF(B12<0.5,1,IF(B12<0.8,2,3))) =F11+C12 =MIN(D12,$C$1) =D12-E12
13 9 =RAND() =IF(B13<0.2,0,IF(B13<0.5,1,IF(B13<0.8,2,3))) =F12+C13 =MIN(D13,$C$1) =D13-E13
14 10 =RAND() =IF(B14<0.2,0,IF(B14<0.5,1,IF(B14<0.8,2,3))) =F13+C14 =MIN(D14,$C$1) =D14-E14
44 40 =RAND() =IF(B44<0.2,0,IF(B44<0.5,1,IF(B44<0.8,2,3))) =F43+C44 =MIN(D44,$C$1) =D44-E44
45 41 =RAND() =IF(B45<0.2,0,IF(B45<0.5,1,IF(B45<0.8,2,3))) =F44+C45 =MIN(D45,$C$1) =D45-E45
46 42 =RAND() =IF(B46<0.2,0,IF(B46<0.5,1,IF(B46<0.8,2,3))) =F45+C46 =MIN(D46,$C$1) =D46-E46
47 43 =RAND() =IF(B47<0.2,0,IF(B47<0.5,1,IF(B47<0.8,2,3))) =F46+C47 =MIN(D47,$C$1) =D47-E47
48 44 =RAND() =IF(B48<0.2,0,IF(B48<0.5,1,IF(B48<0.8,2,3))) =F47+C48 =MIN(D48,$C$1) =D48-E48
49 45 =RAND() =IF(B49<0.2,0,IF(B49<0.5,1,IF(B49<0.8,2,3))) =F48+C49 =MIN(D49,$C$1) =D49-E49
50 46 =RAND() =IF(B50<0.2,0,IF(B50<0.5,1,IF(B50<0.8,2,3))) =F49+C50 =MIN(D50,$C$1) =D50-E50
51 47 =RAND() =IF(B51<0.2,0,IF(B51<0.5,1,IF(B51<0.8,2,3))) =F50+C51 =MIN(D51,$C$1) =D51-E51
52 48 =RAND() =IF(B52<0.2,0,IF(B52<0.5,1,IF(B52<0.8,2,3))) =F51+C52 =MIN(D52,$C$1) =D52-E52
53 49 =RAND() =IF(B53<0.2,0,IF(B53<0.5,1,IF(B53<0.8,2,3))) =F52+C53 =MIN(D53,$C$1) =D53-E53
54 50 =RAND() =IF(B54<0.2,0,IF(B54<0.5,1,IF(B54<0.8,2,3))) =F53+C54 =MIN(D54,$C$1) =D54-E54
55
56 Total =SUM(C5:C54) =SUM(F5:F54)
57
58 Daily Average =C56/50 =F56/50
118 Chapter 10 Modeling Waiting Lines

Figure 10.5 Simulation Model Dynamic Histogram Display


K L M N O P Q R S T U V
1 50-Day Trial $ 17,000 Minimum $ 1,500 Interval Max Frequency
2 1 $ 12,500 Maximum $ 58,000 5000 11
3 2 $ 28,000 10000 39
4 3 $ 2,500 Mean $ 12,845 15000 24
5 4 $ 16,000 20000 10
6 5 $ 6,000 StDev $ 9,016 25000 8
7 6 $ 9,500 30000 3
8 7 $ 10,500 35000 2
9 8 $ 13,500 40000 2
10 9 $ 7,000 45000 0
11 10 $ 15,500 50000 0
12 11 $ 21,500 55000 0
13 12 $ 16,000 60000 1
14 13 $ 9,000 65000 0
15 14 $ 4,500 70000 0
16 15 $ 8,500 75000 0
17 16 $ 8,500 80000 0
18 17 $ 15,000 85000 0
19 18 $ 9,000 90000 0
20 19 $ 2,000 95000 0
21 20 $ 10,500 100000 0
22 21 $ 16,500 More 0
23 22 $ 18,500
24 23 $ 8,500
25 24 $ 5,500
26 25 $ 13,500
Simulation
27 26 $ 5,000
28 27 $ 23,500 45
29 28 $ 58,000 40
30 29 $ 11,500
31 30 $ 7,000 35
Frequency of 100
50-Day Trials

32 31 $ 7,000 30
33 32 $ 6,000 25
34 33 $ 7,500
35 34 $ 9,500 20
36 35 $ 7,500 15
37 36 $ 12,500 10
38 37 $ 8,500
39 38 $ 14,000 5
40 39 $ 7,000 0
41 40 $ 31,000 5000 20000 35000 50000 65000 80000 95000
42 41 $ 22,000
Annual Cost of Delays
43 42 $ 40,000
44 43 $ 10,500
45 44 $ 8,500
10.1 Queue Simulation 119

Figure 10.6 Simulation Model Dynamic Histogram Formulas


L M N O P Q R S
1 50-Day Trial =I3 Minimum =MIN(M2:M101) Interval Max Frequency
2 1 =TABLE(,K1) Maximum =MAX(M2:M101) 5000 =FREQUENCY(M2:M101,R2:R21)
3 2 =TABLE(,K1) 10000 =FREQUENCY(M2:M101,R2:R21)
4 3 =TABLE(,K1) Mean =AVERAGE(M2:M101) 15000 =FREQUENCY(M2:M101,R2:R21)
5 4 =TABLE(,K1) 20000 =FREQUENCY(M2:M101,R2:R21)
6 5 =TABLE(,K1) StDev =STDEV(M2:M101) 25000 =FREQUENCY(M2:M101,R2:R21)
7 6 =TABLE(,K1) 30000 =FREQUENCY(M2:M101,R2:R21)
8 7 =TABLE(,K1) 35000 =FREQUENCY(M2:M101,R2:R21)
9 8 =TABLE(,K1) 40000 =FREQUENCY(M2:M101,R2:R21)
10 9 =TABLE(,K1) 45000 =FREQUENCY(M2:M101,R2:R21)
11 10 =TABLE(,K1) 50000 =FREQUENCY(M2:M101,R2:R21)
12 11 =TABLE(,K1) 55000 =FREQUENCY(M2:M101,R2:R21)
13 12 =TABLE(,K1) 60000 =FREQUENCY(M2:M101,R2:R21)
14 13 =TABLE(,K1) 65000 =FREQUENCY(M2:M101,R2:R21)
15 14 =TABLE(,K1) 70000 =FREQUENCY(M2:M101,R2:R21)
16 15 =TABLE(,K1) 75000 =FREQUENCY(M2:M101,R2:R21)
17 16 =TABLE(,K1) 80000 =FREQUENCY(M2:M101,R2:R21)
18 17 =TABLE(,K1) 85000 =FREQUENCY(M2:M101,R2:R21)
19 18 =TABLE(,K1) 90000 =FREQUENCY(M2:M101,R2:R21)
20 19 =TABLE(,K1) 95000 =FREQUENCY(M2:M101,R2:R21)
21 20 =TABLE(,K1) 100000 =FREQUENCY(M2:M101,R2:R21)
22 21 =TABLE(,K1) More =FREQUENCY(M2:M101,R2:R21)
23 22 =TABLE(,K1)
24 23 =TABLE(,K1)
25 24 =TABLE(,K1)
26 25 =TABLE(,K1)
120 Chapter 10 Modeling Waiting Lines

This page is intentionally mostly blank.


Part 3 Decision Trees

Part 3 describes decision tree models, which are particularly useful for sequential
decision problems under uncertainty. Documentation and examples are included for the
TreePlan decision tree add-in for Excel.
Sensitivity analysis with standard Excel features is used to check decision tree input
assumptions regarding probabilities and cash flows
Subsequent chapters describe value of information and risk attitude.
122

This page is intentionally mostly blank.


Introduction to
Decision Trees
11
A decision tree can be used as a model for a sequential decision problems under
uncertainty. A decision tree describes graphically the decisions to be made, the events
that may occur, and the outcomes associated with combinations of decisions and events.
Probabilities are assigned to the events, and values are determined for each outcome. A
major goal of the analysis is to determine the best decisions.

11.1 DECISION TREE STRUCTURE


Decision tree models include such concepts as nodes, branches, terminal values, strategy,
payoff distribution, certainty equivalent, and the rollback method. The following problem
illustrates the basic concepts.

DriveTek Problem, Part A


DriveTek Research Institute discovers that a computer company wants a new tape drive
for a proposed new computer system. Since the computer company does not have
research people available to develop the new drive, it will subcontract the development to
an independent research firm. The computer company has offered a fixed fee for the best
proposal for developing the new tape drive. The contract will go to the firm with the best
technical plan and the highest reputation for technical competence.
DriveTek Research Institute wants to enter the competition. Management estimates a
moderate cost for preparing a proposal, but they are concerned that they may not win the
contract.
If DriveTek decides to prepare a proposal, and if they win the contract, their engineers
are not sure about how they will develop the tape drive. They are considering three
alternative approaches. The first approach is a very expensive mechanical method, and
the engineers are certain they can develop a successful model with this approach. A
second approach involves electronic components. The engineers think that the electronic
approach is a relatively inexpensive method for developing a model of the tape drive, but
they are not sure that the results will be satisfactory for satisfying the contract. A third
124 Chapter 11 Introduction to Decision Trees

inexpensive approach uses magnetic components. This magnetic method costs more than
the electronic method, and the engineers think that it has a higher chance of success.
DriveTek Research can work on only one approach at a time and has time to try only two
approaches. If it tries either the magnetic or electronic method and the attempt fails, the
second choice must be the mechanical method to guarantee a successful model.
The management of DriveTek Research needs help in incorporating this information into
a decision to proceed or not.

Nodes and Branches


Decision trees have three kinds of nodes and two kinds of branches. A decision node is a
point where a choice must be made; it is shown as a square. The branches extending from
a decision node are decision branches, each branch representing one of the possible
alternatives or courses of action available at that point. The set of alternatives must be
mutually exclusive (if one is chosen, the others cannot be chosen) and collectively
exhaustive (all possible alternatives must be included in the set).
There are two major decisions in the DriveTek problem. First, the company must decide
whether or not to prepare a proposal. Second, if it prepares a proposal and is awarded the
contract, it must decide which of the three approaches to try to satisfy the contract.
An event node is a point where uncertainty is resolved (a point where the decision maker
learns about the occurrence of an event). An event node, sometimes called a "chance
node," is shown as a circle. The event set consists of the event branches extending from
an event node, each branch representing one of the possible events that may occur at that
point. The set of events must be mutually exclusive (if one occurs, the others cannot
occur) and collectively exhaustive (all possible events must be included in the set). Each
event is assigned a subjective probability; the sum of probabilities for the events in a set
must equal one.
The three sources of uncertainty in the DriveTek problem are: whether it is awarded the
contract or not, whether the electronic approach succeeds or fails, and whether the
magnetic approach succeeds or fails.
In general, decision nodes and branches represent the controllable factors in a decision
problem; event nodes and branches represent uncontrollable factors.
Decision nodes and event nodes are arranged in order of subjective chronology. For
example, the position of an event node corresponds to the time when the decision maker
learns the outcome of the event (not necessarily when the event occurs).
The third kind of node is a terminal node, representing the final result of a combination of
decisions and events. Terminal nodes are the endpoints of a decision tree, shown as the
11.1 Decision Tree Structure 125

end of a branch on hand-drawn diagrams and as a triangle or vertical bar on computer-


generated diagrams.
The following table shows the three kinds of nodes and two kinds of branches used to
represent a decision tree.

Figure 11.1 Nodes and Symbols


Type of Node Written Symbol Computer Symbol Node Successor
Decision square square decision branches
Event circle circle event branches
Terminal endpoint triangle or bar terminal value

In the DriveTek problem, the first portion of the decision tree is shown in Figure 10.2.

Figure 11.2 DriveTek Initial Decision and Event

Awarded contract

Prepare proposal

Not awarded contract

Don't prepare proposal

If DriveTek is awarded the contract, they must decide which approach to use. For the
electronic and magnetic approaches, the result is uncertain, as shown in Figure 10.3. The
arrangement of the decision and event branches is called the structure of the decision
tree.
126 Chapter 11 Introduction to Decision Trees

Figure 11.3 DriveTek Decisions and Events (Structure)

Use mechanical method

Electronic success

Try electronic method


Awarded contract

Electronic failure

Magnetic success
Prepare proposal
Try magnetic method

Magnetic failure

Not awarded contract

Don't prepare proposal

For representing a sequential decision problem, the tree diagram is usually better than the
written description. In some decision problems, the choice may be obvious by looking at
the diagram. That is, the decision maker may know enough about the desirability of the
outcomes (endpoints in the tree) and how likely they are. But usually the next step in the
analysis after documenting the structure is to assign values to the endpoints.

11.2 DECISION TREE TERMINAL VALUES


Each terminal node has an associated terminal value, sometimes called a payoff value,
outcome value, or endpoint value. Each terminal value measures the result of a scenario:
the sequence of decisions and events on a unique path leading from the initial decision
node to a specific terminal node. To determine the terminal value, one approach assigns a
cash flow value to each decision branch and event branch and then sums the cash flow
values on the branches leading to a terminal node to determine the terminal value. Some
problems require a more elaborate value model to determine the terminal values.

DriveTek Problem, Part B


DriveTek thinks it will cost $50,000 to prepare a proposal. If they are awarded the
contract, DriveTek will receive an immediate payment of $250,000. The engineers think
11.2 Decision Tree Terminal Values 127

that the sure-success mechanical method will cost $120,000. The possibly-successful
electronic approach will cost $50,000, and the more-likely-successful magnetic approach
will cost $80,000. In the DriveTek problem, these distinct cash flows associated with
many of the decision and event branches are shown in Figure 10.4.

Figure 11.4 DriveTek Cash Flows and Outcome Values


Use mechanical method
$80,000
-$120,000

Electronic success
$150,000
Try electronic method $0
Awarded contract
-$50,000
$250,000 Electronic failure
$30,000
-$120,000

Magnetic success
Prepare proposal $120,000
Try magnetic method $0
-$50,000
-$80,000
Magnetic failure
$0
-$120,000

Not awarded contract


-$50,000
$0

Don't prepare proposal


$0
$0

Figure 10.4 also shows the sum of branch cash flows at the endpoints. For example, the
$30,000 terminal value on the far right of the diagram is associated with the scenario
shown in Figure 10.5.

Figure 11.5 Terminal Value for a Scenario


Branch Type Branch Name Cash Flow
Decision Prepare proposal –$50,000
Event Awarded contract +$250,000
Decision Try electronic method –$50,000
Event Electronic failure (Use mechanical method) –$120,000
Terminal value = +$30,000
128 Chapter 11 Introduction to Decision Trees

11.3 DECISION TREE PROBABILITIES


DriveTek Problem, Part C
DriveTek management thinks there is a fifty-fifty chance of winning the contract. The
engineers think that the inexpensive electronic method has only a 50% chance of
satisfactory results. In their opinion the somewhat more costly magnetic method has 70%
chance of success.

Figure 11.6 DriveTek Probabilities and Terminal Values


Use mechanical method
$80,000

0.5
Electronic success
$150,000
0.5 Try electronic method
Awarded contract
0.5
Electronic failure
$30,000

0.7
Magnetic success
Prepare proposal $120,000
Try magnetic method

0.3
Magnetic failure
$0

0.5
Not awarded contract
-$50,000

Don't prepare proposal


$0

Figure 4.6 is a complete decision tree model.

Next: How do you decide what choice to make at each decision node?

Concepts: Payoff distribution, certainty equivalent, expected value, rollback method


Decision Trees
Using TreePlan
12
TreePlan is a decision tree add-in for Microsoft Excel 97 (and later versions of Excel) for
Windows and Macintosh. It was developed by Professor Michael R. Middleton at the
University of San Francisco and modified for use at Fuqua (Duke) by Professor James E.
Smith.

12.1 TREEPLAN INSTALLATION


All of TreePlan’s functionality is in a single file, TreePlan.xla. Depending on your
preference, there are three ways to install TreePlan. (These instructions also apply to the
other Decision ToolPak add-ins: SensIt.xla and RiskSim.xla.)

Occasional Use
If you plan to use TreePlan on an irregular basis, simply use Excel’s File | Open
command to load TreePlan.xla each time you want to use it. You may keep the
TreePlan.xla file on a floppy disk, your computer’s hard drive, or a network server.

Selective Use
You can use Excel’s Add-In Manager to install TreePlan. First, copy TreePlan.xla to a
location on your computer’s hard drive. Second, if you save TreePlan.xla in the Excel or
Office Library subdirectory, go to the third step. Otherwise, run Excel, choose Tools |
Add-Ins; in the Add-Ins dialog box, click the Browse button, use the Browse dialog box
to specify the location of TreePlan.xla, and click OK. Third, in the Add-Ins dialog box,
note that TreePlan is now listed with a check mark, indicating that its menu command
will appear in Excel, and click OK.
If you plan to not use TreePlan and you want to free up main memory, uncheck the box
for TreePlan in the Add-In Manager. When you do want to use TreePlan, choose Tools |
Add-Ins and check TreePlan’s box.
130 Chapter 12 Decision Trees Using TreePlan

To remove TreePlan from the Add-In Manager, use Windows Explorer or another file
manager to delete TreePlan.xla from the Library subdirectory or from the location you
specified when you used the Add-In Manager’s Browse command. The next time you
start Excel and choose Tools | Add-Ins, a dialog box will state “Cannot find add-in …
treeplan.xla. Delete from list?” Click Yes.

Steady Use
If you want TreePlan’s options immediately available each time you run Excel, use
Windows Explorer or another file manager to save TreePlan.xla in the Excel XLStart
directory. Alternatively, in Excel you can use Tools | Options | General to specify an
alternate startup file location and use a file manager to save TreePlan.xla there. When you
start Excel, it tries to open all files in the XLStart directory and in the alternate startup file
location.
For additional information visit “TreePlan FAQ” at www.treeplan.com.
After opening TreePlan.xla in Excel, the command "Decision Tree" appears at the bottom
of the Tools menu (or, if you have a customized main menu, at the bottom of the sixth
main menu item).

12.2 BUILDING A DECISION TREE IN TREEPLAN


You can start TreePlan either by choosing Tools | Decision Tree from the menu bar or by
pressing Ctrl+t (hold down the Ctrl key and press t). If the worksheet doesn't have a
decision tree, TreePlan prompts you with a dialog box with three options; choose New
Tree to begin a new tree. TreePlan draws a default initial decision tree with its upper left
corner at the selected cell. For example, the figure below shows the initial tree when
$B$2 is selected. (Note that TreePlan writes over existing values in the spreadsheet:
begin your tree to the right of the area where your data is stored, and do not subsequently
add or delete rows or columns in the tree-diagram area.) In Excel 5 and 95 a terminal
node is represented by a triangle instead of a vertical bar.

Figure 12.1 TreePlan Initial Default Decision Tree


A B C D E F G H I
1
2
3 Decision 1
4 0
5 0 0
6 1
7 0
8 Decision 2
9 0
10 0 0
11
12.2 Building a Decision Tree in TreePlan 131

Build up a tree by adding or modifying branches or nodes in the default tree. To change
the branch labels or probabilities, click on the cell containing the label or probability and
type the new label or probability. To modify the structure of the tree (e.g., add or delete
branches or nodes in the tree), select the node or the cell containing the node in the tree to
modify, and choose Tools | Decision Tree or press Ctrl+t. TreePlan will then present a
dialog box showing the available commands.
For example, to add an event node to the top branch of the tree shown above, select the
square cell (cell G4) next to the vertical line at the end of a terminal branch and press
Ctrl+t.. TreePlan then presents this dialog box.

Figure 12.2 TreePlan Terminal Dialog Box

To add an event node to the branch, we change the selected terminal node to an event
node by selecting Change to event node in the dialog box, selecting the number of
branches (here two), and pressing OK. TreePlan then redraws the tree with a chance node
in place of the terminal node.
Figure 12.3
A B C D E F G H I J K L M
1
2 0.5
3 Event 3
4 0
5 Decision 1 0 0
6
7 0 0 0.5
8 Event 4
9 0
10 1 0 0
11 0
12
13 Decision 2
14 0
15 0 0
16

The dialog boxes presented by TreePlan vary depending on what you have selected when
you choose Tools | Decision Tree or press Ctrl+t. The dialog box shown below is
presented when you press Ctrl+t with an event node selected; a similar dialog box is
132 Chapter 12 Decision Trees Using TreePlan

presented when you select a decision node. If you want to add a branch to the selected
node, choose Add branch and press OK. If you want to insert a decision or event node
before the selected node, choose Insert decision or Insert event and press OK. To get a
description of the available commands, click on the Help button.
Figure 12.4

The Copy subtree command is particularly useful when building large trees. If two or
more parts of the tree are similar, you can copy and paste "subtrees" rather than building
up each part separately. To copy a subtree, select the node at the root of the subtree and
choose Copy subtree. This tells TreePlan to copy the selected node and everything to the
right of it in the tree. To paste this subtree, select a terminal node and choose Paste
subtree. TreePlan then duplicates the specified subtree at the selected terminal node.
Since TreePlan decision trees are built directly in Excel, you can use Excel's commands
to format your tree. For example, you can use bold or italic fonts for branch labels: select
the cells you want to format and change them using Excel's formatting commands. To
help you, TreePlan provides a Select dialog box that appears when you choose Tools
Decision Tree or press Ctrl+t without a node selected. You can also bring up this dialog
box by pressing the Select button on the Node dialog box. From here, you can select all
items of a particular type in the tree. For example, if you choose Probabilities and press
OK, TreePlan selects all cells containing probabilities in the tree. You can then format all
of the probabilities simultaneously using Excel's formatting commands. (Because of
limitations in Excel, the Select dialog box will not be available when working with very
large trees.)

12.3 ANATOMY OF A TREEPLAN DECISION TREE


An example of a TreePlan decision tree is shown below. In the example, a firm must
decide (1) whether to prepare a proposal for a possible contract and (2) which method to
use to satisfy the contract. The tree consists of decision nodes, event nodes and terminal
nodes connected by branches. Each branch is surrounded by cells containing formulas,
12.3 Anatomy of a TreePlan Decision Tree 133

cell references, or labels pertaining to that branch. You may edit the labels, probabilities,
and partial cash flows associated with each branch. The partial cash flows are the amount
the firm "gets paid" to go down that branch. Here, the firm pays $50,000 if it decides to
prepare the proposal, receives $250,000 up front if awarded the contract, spends $50,000
to try the electronic method, and spends $120,000 on the mechanical method if the
electronic method fails.

Figure 12.5
PROBABILITIES: Enter numbers TERMINAL VALUES: TreePlan formula for
or formulas in these cells. sum of partial cash flows along path.
DECISION NODES: TreePlan formula
for which alternative is optimal.

Use mechanical method


$80,000
0.5 -$120,000 $80,000
PARTIAL CASH FLOWS: Awarded contract
Enter numbers or 2 0.5
formulas in these cells. $250,000 $90,000 Electronic success
$150,000
Try electronic method $0 $150,000

Prepare proposal -$50,000 $90,000 0.5


ROLLBACK EVs: TreePlan formula for Electronic failure
-$50,000 $20,000 expected value at this point in the tree. $30,000
-$120,000 $30,000

BRANCH LABELS: 0.5


Type text in these EVENT NODES
1 Not awarded contract
cells.
$20,000 -$50,000
$0 -$50,000

TERMINAL NODES
Don't prepare proposal
$0
$0 $0

The trees are "solved" using formulas embedded in the spreadsheet. The terminal values
sum all the partial cash flows along the path leading to that terminal node. The tree is
then "rolled back" by computing expected values at event nodes and by maximizing at
decision nodes; the rollback EVs appear next to each node and show the expected value
at that point in the tree. The numbers in the decision nodes indicate which alternative is
optimal for that decision. In the example, the "1" in the first decision node indicates that
it is optimal to prepare the proposal, and the "2" in the second decision node indicates the
firm should try the electronic method because that alternative leads to a higher expected
value, $90,000, than the mechanical method, $80,000.
TreePlan has a few options that control the way calculations are done in the tree. To
select these options, press the Options button in any of TreePlan's dialog boxes. The first
choice is whether to Use Expected Values or Use Exponential Utility Function for
computing certainty equivalents. The default is to rollback the tree using expected values.
If you choose to use exponential utilities, TreePlan will compute utilities of endpoint cash
flows at the terminal nodes and compute expected utilities instead of expected values at
event nodes. Expected utilities are calculated in the cell below the certainty equivalents.
You may also choose to Maximize (profits) or Minimize (costs) at decision nodes; the
default is to maximize profits. If you choose to minimize costs instead, the cash flows are
134 Chapter 12 Decision Trees Using TreePlan

interpreted as costs, and decisions are made by choosing the minimum expected value or
certainty equivalent rather than the maximum. See the Help file for details on these
options.

12.4 STEP-BY-STEP TREEPLAN TUTORIAL


A decision tree can be used as a model for a sequential decision problems under
uncertainty. A decision tree describes graphically the decisions to be made, the events
that may occur, and the outcomes associated with combinations of decisions and events.
Probabilities are assigned to the events, and values are determined for each outcome. A
major goal of the analysis is to determine the best decisions.
Decision tree models include such concepts as nodes, branches, terminal values, strategy,
payoff distribution, certainty equivalent, and the rollback method. The following problem
illustrates the basic concepts.

DriveTek Problem
DriveTek Research Institute discovers that a computer company wants a new tape drive
for a proposed new computer system. Since the computer company does not have
research people available to develop the new drive, it will subcontract the development to
an independent research firm. The computer company has offered a fee of $250,000 for
the best proposal for developing the new tape drive. The contract will go to the firm with
the best technical plan and the highest reputation for technical competence.
DriveTek Research Institute wants to enter the competition. Management estimates a cost
of $50,000 to prepare a proposal with a fifty-fifty chance of winning the contract.
However, DriveTek's engineers are not sure about how they will develop the tape drive if
they are awarded the contract. Three alternative approaches can be tried. The first
approach is a mechanical method with a cost of $120,000, and the engineers are certain
they can develop a successful model with this approach. A second approach involves
electronic components. The engineers estimate that the electronic approach will cost only
$50,000 to develop a model of the tape drive, but with only a 50 percent chance of
satisfactory results. A third approach uses magnetic components; this costs $80,000, with
a 70 percent chance of success.
DriveTek Research can work on only one approach at a time and has time to try only two
approaches. If it tries either the magnetic or electronic method and the attempt fails, the
second choice must be the mechanical method to guarantee a successful model.
The management of DriveTek Research needs help in incorporating this information into
a decision to proceed or not.
12.4 Step-by-Step TreePlan Tutorial 135

[Source: The tape drive example is adapted from Spurr and Bonini, Statistical Analysis
for Business Decisions, Irwin.]

Nodes and Branches


Decision trees have three kinds of nodes and two kinds of branches. A decision node is a
point where a choice must be made; it is shown as a square. The branches extending from
a decision node are decision branches, each branch representing one of the possible
alternatives or courses of action available at that point. The set of alternatives must be
mutually exclusive (if one is chosen, the others cannot be chosen) and collectively
exhaustive (all possible alternatives must be included in the set).
There are two major decisions in the DriveTek problem. First, the company must decide
whether or not to prepare a proposal. Second, if it prepares a proposal and is awarded the
contract, it must decide which of the three approaches to try to satisfy the contract.
An event node is a point where uncertainty is resolved (a point where the decision maker
learns about the occurrence of an event). An event node, sometimes called a "chance
node," is shown as a circle. The event set consists of the event branches extending from
an event node, each branch representing one of the possible events that may occur at that
point. The set of events must be mutually exclusive (if one occurs, the others cannot
occur) and collectively exhaustive (all possible events must be included in the set). Each
event is assigned a subjective probability; the sum of probabilities for the events in a set
must equal one.
The three sources of uncertainty in the DriveTek problem are: whether it is awarded the
contract or not, whether the electronic approach succeeds or fails, and whether the
magnetic approach succeeds or fails.
In general, decision nodes and branches represent the controllable factors in a decision
problem; event nodes and branches represent uncontrollable factors.
Decision nodes and event nodes are arranged in order of subjective chronology. For
example, the position of an event node corresponds to the time when the decision maker
learns the outcome of the event (not necessarily when the event occurs).
The third kind of node is a terminal node, representing the final result of a combination of
decisions and events. Terminal nodes are the endpoints of a decision tree, shown as the
end of a branch on hand-drawn diagrams and as a triangle on computer-generated
diagrams.
The following table shows the three kinds of nodes and two kinds of branches used to
represent a decision tree.
136 Chapter 12 Decision Trees Using TreePlan

Figure 12.6 Nodes and Symbols


Type of Node Written Symbol Computer Symbol Node Successor
Decision square square decision branches
Event circle circle event branches
Terminal endpoint triangle or bar terminal value

Terminal Values
Each terminal node has an associated terminal value, sometimes called a payoff value,
outcome value, or endpoint value. Each terminal value measures the result of a scenario:
the sequence of decisions and events on a unique path leading from the initial decision
node to a specific terminal node.
To determine the terminal value, one approach assigns a cash flow value to each decision
branch and event branch and then sum the cash flow values on the branches leading to a
terminal node to determine the terminal value. In the DriveTek problem, there are distinct
cash flows associated with many of the decision and event branches. Some problems
require a more elaborate value model to determine the terminal values.
The following diagram shows the arrangement of branch names, probabilities, and cash
flow values on an unsolved tree.
12.4 Step-by-Step TreePlan Tutorial 137

Figure 12.7
Use mechanical method

-$120,000

0.5
Electronic success

0.5 Try electronic method $0


Awarded contract
-$50,000 0.5
$250,000 Electronic failure

-$120,000

0.7
Magnetic success
Prepare proposal
Try magnetic method $0
-$50,000
-$80,000 0.3
Magnetic failure

-$120,000

0.5
Not awarded contract

$0

Don't prepare proposal

$0

To build the decision tree, you use TreePlan’s dialog boxes to develop the structure. You
enter a branch name, branch cash flow, and branch probability (for an event) in the cells
above and below the left side of each branch. As you build the tree diagram, TreePlan
enters formulas in other cells.

Building the Tree Diagram


1. Start with a new worksheet. (If no workbook is open, choose File | New. If a
workbook is open, choose Insert | Worksheet.)
2. Select cell A1. From the Tools menu, choose Decision Tree. In the TreePlan
New dialog box, click the New Tree button. A decision node with two branches
appears.
138 Chapter 12 Decision Trees Using TreePlan

Figure 12.8

Figure 12.9
A B C D E F G
1
2 Decision 1
3 0
4 0 0
5 1
6 0
7 Decision 2
8 0
9 0 0

3. Do not type the quotation marks in the following instructions. Select cell D2,
and enter Prepare proposal. Select cell D4, and enter –50000. Select cell D7,
and enter Don't prepare proposal.

Figure 12.10
A B C D E F G
1
2 Prepare proposal
3 -50000
4 -50000 -50000
5 2
6 0
7 Don't prepare proposal
8 0
9 0 0

4. Select cell F3. From the Tools menu, choose Decision Tree. In the TreePlan
Terminal dialog box, select Change To Event Node, select Two Branches, and
click OK. The tree is redrawn.
12.4 Step-by-Step TreePlan Tutorial 139

Figure 12.11

Figure 12.12
A B C D E F G H I J K
1 0.5
2 Event 3
3 -50000
4 Prepare proposal 0 -50000
5
6 -50000 -50000 0.5
7 Event 4
8 -50000
9 2 0 -50000
10 0
11
12 Don't prepare proposal
13 0
14 0 0

5. Select cell H2, and enter Awarded contract. Select cell H4, and enter 250000.
Select cell H7, and enter Not awarded contract.
140 Chapter 12 Decision Trees Using TreePlan

Figure 12.13
A B C D E F G H I J K
1 0.5
2 Awarded contract
3 200000
4 Prepare proposal 250000 200000
5
6 -50000 75000 0.5
7 Not awarded contract
8 -50000
9 1 0 -50000
10 75000
11
12 Don't prepare proposal
13 0
14 0 0

6. Select cell J3. From the Tools menu, choose Decision Tree. In the TreePlan
Terminal dialog box, select Change To Decision Node, select Three Branches,
and click OK. The tree is redrawn.

Figure 12.14
A B C D E F G H I J K L M N O
1
2 Decision 5
3 200000
4 0 200000
5
6 0.5
7 Awarded contract Decision 6
8 1 200000
9 250000 200000 0 200000
10
11
12 Prepare proposal Decision 7
13 200000
14 -50000 75000 0 200000
15
16 0.5
17 Not awarded contract
18 1 -50000
19 75000 0 -50000
20
21
22 Don't prepare proposal
23 0
24 0 0

7. Select cell L2, and enter Use mechanical method. Select cell L4, and enter –
120000. Select cell L7, and enter Try electronic method. Select cell L9, and
12.4 Step-by-Step TreePlan Tutorial 141

enter –50000. Select cell L12, and enter Try magnetic method. Select cell L14,
and enter –80000.

Figure 12.15
A B C D E F G H I J K L M N O
1
2 Use mechanical method
3 80000
4 -120000 80000
5
6 0.5
7 Awarded contract Try electronic method
8 2 150000
9 250000 150000 -50000 150000
10
11
12 Prepare proposal Try magnetic method
13 120000
14 -50000 50000 -80000 120000
15
16 0.5
17 Not awarded contract
18 1 -50000
19 50000 0 -50000
20
21
22 Don't prepare proposal
23 0
24 0 0

8. Select cell N8. From the Tools menu, choose Decision Tree. In the TreePlan
Terminal dialog box, select Change To Event Node, select Two Branches, and
click OK. The tree is redrawn.
142 Chapter 12 Decision Trees Using TreePlan

Figure 12.16
A B C D E F G H I J K L M N O P Q R S
1
2 Use mechanical method
3 80000
4 -120000 80000
5
6 0.5
7 Event 8
8 0.5 150000
9 Awarded contract Try electronic method 0 150000
10 2
11 250000 150000 -50000 150000 0.5
12 Event 9
13 150000
14 0 150000
15 Prepare proposal
16
17 -50000 50000 Try magnetic method
18 120000
19 -80000 120000
20
21 0.5
22 1 Not awarded contract
23 50000 -50000
24 0 -50000
25
26
27 Don't prepare proposal
28 0
29 0 0

9. Select cell P7, and enter Electronic success. Select cell P12, and enter
Electronic failure. Select cell P14, and enter –120000.
12.4 Step-by-Step TreePlan Tutorial 143

Figure 12.17
A B C D E F G H I J K L M N O P Q R S
1
2 Use mechanical method
3 80000
4 -120000 80000
5
6 0.5
7 Electronic success
8 0.5 150000
9 Awarded contract Try electronic method 0 150000
10 3
11 250000 120000 -50000 90000 0.5
12 Electronic failure
13 30000
14 -120000 30000
15 Prepare proposal
16
17 -50000 35000 Try magnetic method
18 120000
19 -80000 120000
20
21 0.5
22 1 Not awarded contract
23 35000 -50000
24 0 -50000
25
26
27 Don't prepare proposal
28 0
29 0 0

10. Select cell N18. From the Tools menu, choose Decision Tree. In the TreePlan
Terminal dialog box, select Change To Event Node, select Two Branches, and
click OK. The tree is redrawn.
144 Chapter 12 Decision Trees Using TreePlan

Figure 12.18
A B C D E F G H I J K L M N O P Q R S
1
2 Use mechanical method
3 80000
4 -120000 80000
5
6 0.5
7 Electronic success
8 150000
9 0.5 Try electronic method 0 150000
10 Awarded contract
11 3 -50000 90000 0.5
12 250000 120000 Electronic failure
13 30000
14 -120000 30000
15
16 0.5
17 Event 10
18 Prepare proposal 120000
19 Try magnetic method 0 120000
20 -50000 35000
21 -80000 120000 0.5
22 Event 11
23 120000
24 0 120000
25
26 1 0.5
27 35000 Not awarded contract
28 -50000
29 0 -50000
30
31
32 Don't prepare proposal
33 0
34 0 0

11. Select cell P16, and enter .7. Select cell P17, and enter Magnetic success. Select
cell P21, and enter .3. Select cell P22, and enter Magnetic failure. Select cell
P24, and enter –120000.
12.4 Step-by-Step TreePlan Tutorial 145

Figure 12.19
A B C D E F G H I J K L M N O P Q R S
1
2 Use mechanical method
3 80000
4 -120000 80000
5
6 0.5
7 Electronic success
8 150000
9 0.5 Try electronic method 0 150000
10 Awarded contract
11 2 -50000 90000 0.5
12 250000 90000 Electronic failure
13 30000
14 -120000 30000
15
16 0.7
17 Magnetic success
18 Prepare proposal 120000
19 Try magnetic method 0 120000
20 -50000 20000
21 -80000 84000 0.3
22 Magnetic failure
23 0
24 -120000 0
25
26 1 0.5
27 20000 Not awarded contract
28 -50000
29 0 -50000
30
31
32 Don't prepare proposal
33 0
34 0 0

12. Double-click the sheet tab (or right-click the sheet tab and choose Rename from
the shortcut menu), and enter Original. Save the workbook.

Interpreting the Results


The $30,000 terminal value on the far right of the diagram in cell S13 is associated with
the following scenario:

Figure 12.20
Branch Type Branch Name Cash Flow
Decision Prepare proposal –$50,000
Event Awarded contract $250,000
Decision Try electronic method –$50,000
Event Electronic failure (Use mechanical method) –$120,000
Terminal value $30,000

TreePlan put the formula =SUM(P14,L11,H12,D20) into cell S13 for determining the
terminal value.
146 Chapter 12 Decision Trees Using TreePlan

Other formulas, called rollback formulas, are in cells below and to the left of each node.
These formulas are used to determine the optimal choice at each decision node.
In cell B26, a formula displays 1, indicating that the first branch is the optimal choice.
Thus, the initial choice is to prepare the proposal. In cell J11, a formula displays 2,
indicating that the second branch (numbered 1, 2, and 3, from top to bottom) is the
optimal choice. If awarded the contract, DriveTek should try the electronic method. A
subsequent chapter provides more details about interpretation.

Formatting the Tree Diagram


The following steps show how to use TreePlan and Excel features to format the tree
diagram. You may choose to use other formats for your own tree diagrams.
13. From the Edit menu, choose Move or Copy Sheet (or right-click the sheet tab
and choose Move Or Copy from the shortcut menu). In the lower left corner of
the Move Or Copy dialog box, check the Create A Copy box, and click OK.
14.On sheet Original (2), select cell H9. From the Tools menu, choose Decision Tree. In
the TreePlan Select dialog box, verify that the option button for Cells with
Probabilities is selected, and click OK. With all probability cells selected, click
the Align Left button.

Figure 12.21

15. Select cell H12. From the Tools menu, choose Decision Tree. In the TreePlan
Select dialog box, verify that the option button for Cells with Partial Cash Flows
is selected, and click OK. With all partial cash flow cells selected, click the
Align Left button. With those cells still selected, choose Format | Cells. In the
Format Cells dialog box, click the Number tab. In the Category list box, choose
12.4 Step-by-Step TreePlan Tutorial 147

Currency; type 0 (zero) for Decimal Places; select $ in the Symbol list box;
select -$1,234 for Negative Numbers. Click OK.

Figure 12.22

16. Select cell I12. From the Tools menu, choose Decision Tree. In the TreePlan
Select dialog box, verify that the option button for Cells with Rollback EVs/CEs
is selected, and click OK. With all rollback cells selected, choose Format | Cells.
Repeat the Currency formatting of step 16 above.
17. Select cell S3. From the Tools menu, choose Decision Tree. In the TreePlan
Select dialog box, verify that the option button for Cells with Terminal Values is
selected, and click OK. With all terminal value cells selected, choose Format |
Cells. Repeat the Currency formatting of step 16 above.
148 Chapter 12 Decision Trees Using TreePlan

Figure 12.23
A B C D E F G H I J K L M N O P Q R S
1
2 Use mechanical method
3 $80,000
4 -$120,000 $80,000
5
6 0.5
7 Electronic success
8 $150,000
9 0.5 Try electronic method $0 $150,000
10 Awarded contract
11 2 -$50,000 $90,000 0.5
12 $250,000 $90,000 Electronic failure
13 $30,000
14 -$120,000 $30,000
15
16 0.7
17 Magnetic success
18 Prepare proposal $120,000
19 Try magnetic method $0 $120,000
20 -$50,000 $20,000
21 -$80,000 $84,000 0.3
22 Magnetic failure
23 $0
24 -$120,000 $0
25
26 1 0.5
27 $20,000 Not awarded contract
28 -$50,000
29 $0 -$50,000
30
31
32 Don't prepare proposal
33 $0
34 $0 $0

18. Double-click the Original (2) sheet tab (or right-click the sheet tab and choose
Rename from the shortcut menu), and enter Formatted. Save the workbook.

Displaying Model Inputs


When you build a decision tree model, you may want to discuss the model and its
assumptions with co-workers or a client. For such communication it may be preferable to
hide the results of formulas that show rollback values and decision node choices. The
following steps show how to display only the model inputs.
19. From the Edit menu, choose Move or Copy Sheet (or right-click the sheet tab
and choose Move Or Copy from the shortcut menu). In the lower left corner of
the Move Or Copy dialog box, check the Create A Copy box, and click OK.
20. On sheet Formatted (2), select cell B1. From the Tools menu, choose Decision
Tree. In the TreePlan Select dialog box, verify that the option button for
Columns with Nodes is selected, and click OK. With all node columns selected,
choose Format | Cells | Number. In the Category list box, select Custom. Select
the entry in the Type edit box, and type ;;; (three semicolons). Click OK.
12.4 Step-by-Step TreePlan Tutorial 149

Figure 12.24

Explanation: A custom number format has four sections of format codes. The sections are
separated by semicolons, and they define the formats for positive numbers, negative
numbers, zero values, and text, in that order. When you specify three semicolons without
format codes, Excel does not display positive numbers, negative numbers, zero values, or
text. The formula remains in the cell, but its result is not displayed. Later, if you want to
display the result, you can change the format without having to enter the formula again.
Editing an existing format does not delete it. All formats are saved with the workbook
unless you explicitly delete a format.
21. Select cell A27. From the Tools menu, choose Decision Tree. In the TreePlan
Select dialog box, verify that the option button for Cells with Rollback EVs/CEs
is selected, and click OK. With all rollback values selected, choose Format |
Cells | Number. In the Category list box, select Custom. Scroll to the bottom of
the Type list box, and select the three-semicolon entry. Click OK.
22. Double-click the Formatted (2) sheet tab (or right-click the sheet tab and choose
Rename from the shortcut menu), and enter Model Inputs. Save the workbook.
150 Chapter 12 Decision Trees Using TreePlan

Printing the Tree Diagram


23. In the Name Box list box, select TreeDiagram (or select cells A1:S34).
24. To print the tree diagram from Excel, with the tree diagram range selected
choose File | Print Area | Set Print Area. Choose File | Page Setup. In the Page
Setup dialog box, click the Page tab; for Orientation click the option button for
Landscape, and for Scaling click the option button for Fit To 1 Page Wide By 1
Page Tall. Click the Header/Footer tab; in the Header list box select None, and
in the Footer list box select None (or select other appropriate headers and
footers). Click the Sheet tab; clear the check box for Gridlines, and clear the
check box for Row And Column Headings. Click OK. Choose File | Print and
click OK.
25. To print the tree diagram from Word, clear the check boxes for Gridlines and for
Row And Column Headings on Excel’s Page Setup dialog box Sheet tab. Select
the tree diagram range. Hold down the Shift key and from the Edit menu choose
Copy Picture. In the Copy Picture dialog box, click the option button As Shown
When Printed, and click OK. In Word select the location where you want to
paste the tree diagram and choose Edit | Paste.

Figure 12.25

Use mechanical method


$80,000
-$120,000

0.5
Electronic success
$150,000
0.5 Try electronic method $0
Awarded contract
-$50,000 0.5
$250,000 Electronic failure
$30,000
-$120,000

0.7
Magnetic success
Prepare proposal $120,000
Try magnetic method $0
-$50,000
-$80,000 0.3
Magnetic failure
$0
-$120,000

0.5
Not awarded contract
-$50,000
$0

Don't prepare proposal


$0
$0
12.5 Decision Tree Solution 151

Alternative Model
If you want to emphasize that the time constraint forces DriveTek to use the mechanical
approach if they try either of the uncertain approaches and experience a failure, you can
change the terminal nodes in cells R13 and R23 to decision nodes, each with a single
branch.

Figure 12.26

Use mechanical method


$80,000
-$120,000

0.5
Electronic success
$150,000
0.5 Try electronic method $0
Awarded contract
-$50,000 0.5
$250,000 Electronic failure Use mechanical method
$30,000
$0 -$120,000

0.7
Magnetic success
Prepare proposal $120,000
Try magnetic method $0
-$50,000
-$80,000 0.3
Magnetic failure Use mechanical method
$0
$0 -$120,000

0.5
Not awarded contract
-$50,000
$0

Don't prepare proposal


$0
$0

12.5 DECISION TREE SOLUTION


Strategy
A strategy specifies an initial choice and any subsequent choices to be made by the
decision maker. The subsequent choices usually depend upon events. The specification of
a strategy must be comprehensive; if the decision maker gives the strategy to a colleague,
the colleague must know exactly which choice to make at each decision node.
Most decision problems have many possible strategies, and a goal of the analysis is to
determine the optimal strategy, taking into account the decision maker's risk attitude.
There are four strategies in the DriveTek problem. One of the strategies is: Prepare the
proposal; if not awarded the contract, stop; if awarded the contract, try the magnetic
method; if the magnetic method is successful, stop; if the magnetic method fails, use the
mechanical method. The four strategies will be discussed in detail below.
152 Chapter 12 Decision Trees Using TreePlan

Payoff Distribution
Each strategy has an associated payoff distribution, sometimes called a risk profile. The
payoff distribution of a particular strategy is a probability distribution showing the
probability of obtaining each terminal value associated with a particular strategy.
In decision tree models, the payoff distribution can be shown as a list of possible payoff
values, x, and the discrete probability of obtaining each value, P(X=x), where X
represents the uncertain terminal value associated with a strategy. Since a strategy
specifies a choice at each decision node, the uncertainty about terminal values depends
only on the occurrence of events. The probability of obtaining a specific terminal value
equals the product of the probabilities on the event branches on the path leading to the
terminal node.

DriveTek Strategies
In this section each strategy of the DriveTek problem is described by a shorthand
statement and a more detailed statement. The possible branches following a specific
strategy are shown in decision tree form, and the payoff distribution is shown in a table
with an explanation of the probability calculations.
12.5 Decision Tree Solution 153

Strategy 1 (Mechanical): Prepare; if awarded, use mechanical.


Details: Prepare the proposal; if not awarded the contract, stop (payoff = -$50,000); if
awarded the contract, use the mechanical method (payoff = $80,000).

Figure 12.27

Use mechanical method


$80,000

0.5
Electronic success
$150,000
0.5 Try electronic method
Awarded contract
0.5
Electronic failure
$30,000

0.7
Magnetic success
Prepare proposal $120,000
Try magnetic method

0.3
Magnetic failure
$0

0.5
Not awarded contract
-$50,000

Don't prepare proposal


$0

Figure 12.28
Probability
Value, x P(X=x)
$80,000 0.50
-$50,000 0.50
1.00
154 Chapter 12 Decision Trees Using TreePlan

Strategy 2 (Electronic): Prepare; if awarded, try electronic.


Details: Prepare the proposal; if not awarded the contract, stop (payoff = -$50,000); if
awarded the contract, try the electronic method; if the electronic method is successful,
stop (payoff = $150,000); if the electronic method fails, use the mechanical method
(payoff = $30,000).

Figure 12.29

Use mechanical method


$80,000

0.5
Electronic success
$150,000
0.5 Try electronic method
Awarded contract
0.5
Electronic failure
$30,000

0.7
Magnetic success
Prepare proposal $120,000
Try magnetic method

0.3
Magnetic failure
$0

0.5
Not awarded contract
-$50,000

Don't prepare proposal


$0

Figure 12.30
Probability
Value, x P(X=x)
$150,000 0.25 = 0.5 * 0.5
$30,000 0.25 = 0.5 * 0.5
-$50,000 0.50
1.00
12.5 Decision Tree Solution 155

Strategy 3 (Magnetic): Prepare; if awarded, try magnetic.


Details: Prepare the proposal; if not awarded the contract, stop (payoff = -$50,000); if
awarded the contract, try the magnetic method; if the magnetic method is successful, stop
(payoff = $120,000); if the magnetic method fails, use the mechanical method (payoff =
$0).

Figure 12.31

Use mechanical method


$80,000

0.5
Electronic success
$150,000
0.5 Try electronic method
Awarded contract
0.5
Electronic failure
$30,000

0.7
Magnetic success
Prepare proposal $120,000
Try magnetic method

0.3
Magnetic failure
$0

0.5
Not awarded contract
-$50,000

Don't prepare proposal


$0

Figure 12.32
Probability
Value, x P(X=x)
$120,000 0.35 = 0.5 * 0.7
$0 0.15 = 0.5 * 0.3
-$50,000 0.50
1.00
156 Chapter 12 Decision Trees Using TreePlan

Strategy 4 (Don't): Don't.


Details: Don't prepare the proposal (payoff = $0).

Figure 12.33

Use mechanical method


$80,000

0.5
Electronic success
$150,000
0.5 Try electronic method
Awarded contract
0.5
Electronic failure
$30,000

0.7
Magnetic success
Prepare proposal $120,000
Try magnetic method

0.3
Magnetic failure
$0

0.5
Not awarded contract
-$50,000

Don't prepare proposal


$0

Figure 12.34
Probability
Value, x P(X=x)
$0 1.00
1.00

Strategy Choice
Since each strategy can be characterized completely by its payoff distribution, selecting
the best strategy becomes a problem of choosing the best payoff distribution.
One approach is to make a choice by direct comparison of the payoff distributions.
12.5 Decision Tree Solution 157

Figure 12.35
Strategy 1 (Mechanical) Strategy 2 (Electronic)
Probability Probability
Value, x P(X=x) Value, x P(X=x)
$80,000 0.50 $150,000 0.25
-$50,000 0.50 $30,000 0.25
1.00 -$50,000 0.50
1.00

Strategy 3 (Magnetic) Strategy 4 (Don't)


Probability Probability
Value, x P(X=x) Value, x P(X=x)
$120,000 0.35 $0 1.00
$0 0.15 1.00
-$50,000 0.50
1.00
Another approach for making choices involves certainty equivalents.

Certainty Equivalent
A certainty equivalent is a certain payoff value which is equivalent, for the decision
maker, to a particular payoff distribution. If the decision maker can determine his or her
certainty equivalent for the payoff distribution of each strategy, then the optimal strategy
is the one with the highest certainty equivalent.
The certainty equivalent is the minimum selling price for a payoff distribution; it depends
on the decision maker's personal attitude toward risk. A decision maker may be risk
preferring, risk neutral, or risk avoiding.
If the terminal values are not regarded as extreme (relative to the decision maker's total
assets), if the decision maker will encounter other decision problems with similar payoffs,
and if the decision maker has the attitude that he or she will "win some and lose some,"
then the decision maker's attitude toward risk may be described as risk neutral.
If the decision maker is risk neutral, the expected value is the appropriate certainty
equivalent for choosing among the strategies. Thus, for a risk neutral decision maker, the
optimal strategy is the one with the highest expected value.
The expected value of a payoff distribution is calculated by multiplying each terminal
value by its probability and summing the products. The expected value calculations for
each of the four strategies of the DriveTek problem are shown below.
158 Chapter 12 Decision Trees Using TreePlan

Figure 12.36
Strategy 1 (Mechanical)
Probability
Value, x P(X=x) x * P(X=x)
$80,000 0.50 $40,000
-$50,000 0.50 -$25,000
$15,000

Strategy 2 (Electronic)
Probability
Value, x P(X=x) x * P(X=x)
$150,000 0.25 $37,500
$30,000 0.25 7,500
-$50,000 0.50 -$25,000
$20,000

Strategy 3 (Magnetic)
Probability
Value, x P(X=x) x * P(X=x)
$120,000 0.35 $42,000
$0 0.15 $0
-$50,000 0.50 -$25,000
$17,000

Strategy 4 (Don't)
Probability
Value, x P(X=x) x * P(X=x)
$0 1.00 $0
$0
The four strategies of the DriveTek problem have expected values of $15,000, $20,000,
$17,000, and $0. Strategy 2 (Electronic) is the optimal strategy with expected value
$20,000.
A risk neutral decision maker's choice is based on the expected value. However, note that
if strategy 2 (Electronic) is chosen, the decision maker does not receive $20,000. The
actual payoff will be $150,000, $30,000, or -$50,000, with probabilities shown in the
payoff distribution.
12.5 Decision Tree Solution 159

Rollback Method
If we have a method for determining certainty equivalents (expected values for a risk
neutral decision maker), we don't need to examine every possible strategy explicitly.
Instead, the method known as rollback determines the single best strategy.
The rollback algorithm, sometimes called backward induction or "average out and fold
back," starts at the terminal nodes of the tree and works backward to the initial decision
node, determining the certainty equivalent rollback values for each node. Rollback values
are determined as follows:
• At a terminal node, the rollback value equals the terminal value.
• At an event node, the rollback value for a risk neutral decision maker is
determined using expected value; the branch probability is multiplied times the
successor rollback value, and the products are summed.
• At a decision node, the rollback value is set equal to the highest rollback value
on the immediate successor nodes.
In TreePlan tree diagrams the rollback values are located to the left and below each
decision, event, and terminal node. Terminal values and rollback values for the DriveTek
problem are shown below.

Figure 12.37

Use mechanical method


$80,000
$80,000

0.5
Electronic success
$150,000
0.5 Try electronic method $150,000
Awarded contract
$90,000 0.5
$90,000 Electronic failure
$30,000
$30,000

0.7
Magnetic success
Prepare proposal $120,000
Try magnetic method $120,000
$20,000
$84,000 0.3
Magnetic failure
$0
$0

0.5
$20,000 Not awarded contract
-$50,000
-$50,000

Don't prepare proposal


$0
$0
160 Chapter 12 Decision Trees Using TreePlan

Optimal Strategy
After the rollback method has determined certainty equivalents for each node, the optimal
strategy can be identified by working forward through the tree. At the initial decision
node, the $20,000 rollback value equals the rollback value of the "Prepare proposal"
branch, indicating the alternative that should be chosen. DriveTek will either be awarded
the contract or not; there is a subsequent decision only if DriveTek obtains the contract.
(In a more complicated decision tree, the optimal strategy must include decision choices
for all decision nodes that might be encountered.) At the decision node following
"Awarded contract," the $90,000 rollback value equals the rollback value of the "Try
electronic method" branch, indicating the alternative that should be chosen.
Subsequently, if the electronic method fails, DriveTek must use the mechanical method
to satisfy the contract.
Cell B26 has the formula =IF(A27=E20,1,IF(A27=E34,2)) which displays 1, indicating
that the first branch is the optimal choice. Thus, the initial choice is to prepare the
proposal. Cell J11 has the formula =IF(I12=M4,1,IF(I12=M11,2,IF(I12=M21,3))) which
displays 2, indicating that the second branch (numbered 1, 2, and 3, from top to bottom)
is the optimal choice. If awarded the contract, DriveTek should try the electronic method.
The pairs of rollback values at the relevant decision nodes ($20,000 and $90,000) and the
preferred decision branches are shown below in bold.
12.5 Decision Tree Solution 161

Figure 12.38
A B C D E F G H I J K L M N O P Q R S
1
2 Use mechanical method
3 $80,000
4 $80,000
5
6 0.5
7 Electronic success
8 $150,000
9 0.5 Try electronic method $150,000
10 Awarded contract
11 2 $90,000 0.5
12 $90,000 Electronic failure
13 $30,000
14 $30,000
15
16 0.7
17 Magnetic success
18 Prepare proposal $120,000
19 Try magnetic method $120,000
20 $20,000
21 $84,000 0.3
22 Magnetic failure
23 $0
24 $0
25
26 1 0.5
27 $20,000 Not awarded contract
28 -$50,000
29 -$50,000
30
31
32 Don't prepare proposal
33 $0
34 $0

Taking into account event branches with subsequent terminal nodes, all branches and
terminal values associated with the optimal risk neutral strategy are shown below.
162 Chapter 12 Decision Trees Using TreePlan

Figure 12.39

Use mechanical method


$80,000

0.5
Electronic success
$150,000
0.5 Try electronic method
Awarded contract
0.5
Electronic failure
$30,000

0.7
Magnetic success
Prepare proposal $120,000
Try magnetic method

0.3
Magnetic failure
$0

0.5
Not awarded contract
-$50,000

Don't prepare proposal


$0

The rollback method has identified strategy 2 (Electronic) as optimal. The rollback value
on the initial branch of the optimal strategy is $20,000, which must be the same as the
expected value for the payoff distribution of strategy 2. Some of the intermediate
calculations for the rollback method differ from the calculations for the payoff
distributions, but both approaches identify the same optimal strategy with the same initial
expected value. For decision trees with a large number of strategies, the rollback method
is more efficient.

12.6 NEWOX DECISION TREE PROBLEM


The Newox Company is considering whether or not to drill for natural gas on its own
land. If they drill, their initial expenditure will be $40,000 for drilling costs. If they strike
gas, they must spend an additional $30,000 to cap the well and provide the necessary
hardware and control equipment. (This $30,000 cost is not a decision; it is associated
with the event "strike gas.") If they decide to drill but no gas is found, there are no other
subsequent alternatives, so their outcome value is $-40,000.
If they drill and find gas, there are two alternatives. Newox could sell to West Gas, which
has made a standing offer of $200,000 to purchase all rights to the gas well's production
12.7 Brandon Decision Tree Problem 163

(assuming that Newox has actually found gas). Alternatively, if gas is found, Newox can
decide to keep the well instead of selling to West Gas; in this case Newox manages the
gas production and takes its chances by selling the gas on the open market.
At the current price of natural gas, if gas is found it would have a value of $150,000 on
the open market. However, there is a possibility that the price of gas will rise to double its
current value, in which case a successful well will be worth $300,000.
The company's engineers feel that the chance of finding gas is 30 percent; their staff
economist thinks there is a 60 percent chance that the price of gas will double.

12.7 BRANDON DECISION TREE PROBLEM


Brandon Appliance Corporation, a predominant producer of microwave ovens, is
considering the introduction of a new product. The new product is a microwave oven that
will defrost, cook, brown, and boil food as well as sense when the food is done.
Brandon must decide on a course of action for implementing this new product line. An
initial decision must be made to (1) nationally distribute the product from the start, (2)
conduct a marketing test first, or (3) not market the product at all. If a marketing test is
conducted, Brandon will consider the result and then decide whether to abandon the
product line or make it available for national distribution.
The finance department has provided some cost information and probability assignments
relating to this decision. The preliminary costs for research and development have
already been incurred and are considered irrelevant to the marketing decision. A success
nationally will increase profits by $5,000,000, and failure will reduce them by
$1,000,000, while abandoning the product will not affect profits. The test market analysis
will cost Brandon an additional $35,000.
If a market test is not performed, the probability of success in a national campaign is 60
percent. If the market test is performed, the probability of a favorable test result is 58
percent. With favorable test results, the probability for national success is approximately
93 percent. However, if the test results are unfavorable, the national success probability is
approximately 14 percent.

Decision Tree Strategies


Brandon Appliance Corporation must decide on a course of action for implementing this
new microwave oven. An initial decision must be made to (1) nationally distribute the
product from the start, (2) conduct a marketing test first, or (3) not market the product at
all. If a marketing test is conducted, Brandon will consider the result and then decide
whether to abandon the product line or make it available for national distribution. The
164 Chapter 12 Decision Trees Using TreePlan

following decision tree is based on information about cash flows and probability
assignments.

Figure 12.40
0.6
Success
+$5,000
National

0.4
Failure
-$1,000

0.93
Success
+$4,965
National

0.07
0.58 Failure
Favorable -$1,035

Don't
Brandon -$35

Test
0.14
Success
+$4,965
National

0.86
0.42 Failure
Unfavorable -$1,035

Don't
-$35

Don't
$0

In a decision tree model, a strategy is a specification of an initial choice and any


subsequent choices that must be made by the decision maker.
How many strategies are there in the Brandon problem?
Describe each strategy.
12.7 Brandon Decision Tree Problem 165

Figure 12.41 Strategy 1: National


0.6
Success
+$5,000
National

0.4
Failure
-$1,000

0.93
Success
+$4,965
National

0.07
0.58 Failure
Favorable -$1,035

Don't
Brandon -$35

Test
0.14
Success
+$4,965
National

0.86
0.42 Failure
Unfavorable -$1,035

Don't
-$35

Don't
$0
166 Chapter 12 Decision Trees Using TreePlan

Figure 12.42 Strategy 2: Test; if Favorable, National; if Unfavorable, National


0.6
Success
+$5,000
National

0.4
Failure
-$1,000

0.93
Success
+$4,965
National

0.07
0.58 Failure
Favorable -$1,035

Don't
Brandon -$35

Test
0.14
Success
+$4,965
National

0.86
0.42 Failure
Unfavorable -$1,035

Don't
-$35

Don't
$0
12.7 Brandon Decision Tree Problem 167

Figure 12.43 Strategy 3: Test; if Favorable, National; if Unfavorable, Don't


0.6
Success
+$5,000
National

0.4
Failure
-$1,000

0.93
Success
+$4,965
National

0.07
0.58 Failure
Favorable -$1,035

Don't
Brandon -$35

Test
0.14
Success
+$4,965
National

0.86
0.42 Failure
Unfavorable -$1,035

Don't
-$35

Don't
$0
168 Chapter 12 Decision Trees Using TreePlan

Figure 12.44 Strategy 4: Test; if Favorable, Don't; if Unfavorable, National


0.6
Success
+$5,000
National

0.4
Failure
-$1,000

0.93
Success
+$4,965
National

0.07
0.58 Failure
Favorable -$1,035

Don't
Brandon -$35

Test
0.14
Success
+$4,965
National

0.86
0.42 Failure
Unfavorable -$1,035

Don't
-$35

Don't
$0
12.7 Brandon Decision Tree Problem 169

Figure 12.45 Strategy 5: Test; if Favorable, Don't; if Unfavorable, Don't


0.6
Success
+$5,000
National

0.4
Failure
-$1,000

0.93
Success
+$4,965
National

0.07
0.58 Failure
Favorable -$1,035

Don't
Brandon -$35

Test
0.14
Success
+$4,965
National

0.86
0.42 Failure
Unfavorable -$1,035

Don't
-$35

Don't
$0
170 Chapter 12 Decision Trees Using TreePlan

Figure 12.46 Strategy 6: Don't


0.6
Success
+$5,000
National

0.4
Failure
-$1,000

0.93
Success
+$4,965
National

0.07
0.58 Failure
Favorable -$1,035

Don't
Brandon -$35

Test
0.14
Success
+$4,965
National

0.86
0.42 Failure
Unfavorable -$1,035

Don't
-$35

Don't
$0
Sensitivity Analysis
for Decision Trees
13
13.1 ONE-VARIABLE SENSITIVITY ANALYSIS
One-Variable Sensitivity Analysis using an Excel data table
1. Construct a decision tree model or financial planning model.
2. Identify the model input cell (H1) and model output cell (A10).
3. Modify the model so that probabilities will always sum to one. (That is, enter the
formula =1-H1 in cell H6.)

Figure 13.1 Display for One-Variable Sensitivity Analysis


A B C D E F G H I J K L
1 0.6
Model Input Cell
2 High sales
3 +$300
4 Introduce product +$600 +$300
5
=1-H1
6 -$300 +$100 0.4
7 Low sales
8 -$200
9 1 +$100 -$200
10 +$100
11
12 Model Don't introduce
13 Output $0
14 Cell $0 $0

4. Enter a list of input values in a column (N3:N13).


5. Enter a formula for determining output values at the top of an empty column on
the right of the input values (=A10 in cell O2).
6. Select the data table range (N2:O13).
172 Chapter 13 Sensitivity Analysis for Decision Trees

7. From the Data menu choose the Table command.

Figure 13.2
M N O P
1
2 +$100 =A10
3 0.00
4 0.10
5 0.20
6 0.30
7 0.40
8 0.50
9 0.60
10 0.70
11 0.80
12 0.90
13 1.00
14

8. In the Data Table dialog box, select the Column Input Cell edit box. Type the
model input cell (H1), or point to the model input cell (in which case the edit
box displays $H$1). Click OK.

Figure 13.3

9. The Data Table command substitutes each input value into the model input cell,
recalculates the worksheet, and displays the corresponding model output value
in the table.
10. Optional: Change the formula in cell O2 to
=CHOOSE(B9,”Introduce”,”Don’t”).
13.2 Two-Variable Sensitivity Analysis 173

Figure 13.4
M N O P
1 P(High Sales) Exp. Value
2
3 0.00 0
4 0.10 0
5 0.20 0
6 0.30 0
7 0.40 0
8 0.50 50
9 0.60 100
10 0.70 150
11 0.80 200
12 0.90 250
13 1.00 300
14

13.2 TWO-VARIABLE SENSITIVITY ANALYSIS


Two-Variable Sensitivity Analysis using an Excel data table

Figure 13.5 Decision Tree for Strategy Region Table


A B C D E F G H I J K L M N O P Q R S
1
2 Use mechanical method
3 +$80,000
4 -$120,000 +$80,000
5
6 0.50
7 Electronic success
8 +$150,000
9 0.50 Try electronic method $0 +$150,000
10 Awarded contract
11 2 -$50,000 +$90,000 0.50
12 +$250,000 +$90,000 Electronic failure
13 +$30,000
14 -$120,000 +$30,000
15
16 0.70
17 Magnetic success
18 Prepare proposal +$120,000
19 Try magnetic method $0 +$120,000
20 -$50,000 +$20,000
21 -$80,000 +$84,000 0.30
22 Magnetic failure
23 $0
24 -$120,000 $0
25
26 1 0.50
27 +$20,000 Not awarded contract
28 -$50,000
29 $0 -$50,000
30
31
32 Don't prepare proposal
33 $0
34 $0 $0
174 Chapter 13 Sensitivity Analysis for Decision Trees

Optional: Activate the Base Case worksheet. From the Edit menu, choose Move Or Copy
Sheet. In the Move Or Copy dialog box, check the box for Create A Copy, and click OK.
Double-click the new worksheet tab and enter Strategy Region Table.

Setup for Data Table


Select cell P11, and enter the formula =1–P6. Select cell P21, and enter the formula =1–
P16.
In cell U3 enter P(Elec OK). In cell V3 enter 1, and in cell V4 enter 0.9. Select cells
V3:V4. In the lower right corner of cell V4, click the fill handle and drag down to cell
V13. With cells V3:V13 still selected, click the Increase Decimal button once so that all
values are displayed with one decimal place.
Select columns V:AG. (Select column V. Click and drag the horizontal scroll bar until
column AG is visible. Hold down the Shift key and click column AG.) From the Format
menu choose Column | Width. In the Column Width edit box type 5 and click OK.
In cell W1 enter P(Mag OK). In cell W2 enter 0 (zero), and in cell X2 enter 0.1. Select
cells W2:X2. In the lower right corner of cell X2, click the fill handle and drag right to
cell AG2. With cells W2: AG2 still selected, click the Increase Decimal button once so
that all values are displayed with one decimal place.
Select cell V2 and enter the formula =CHOOSE(J11,"Mech","Elec","Mag"). With the
base case assumptions the formula shows Elec.

Figure 13.6
U V W X Y Z AA AB AC AD AE AF AG
1 P(Mag OK)
2 Elec 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
3 P(Elec OK) 1.0
4 0.9
5 0.8
6 0.7
7 0.6
8 0.5
9 0.4
10 0.3
11 0.2
12 0.1
13 0.0

Obtaining Results Using Data Table Command


Select the entire data table, cells V2:AG13.
13.2 Two-Variable Sensitivity Analysis 175

From the Data menu, choose Table. In the Table dialog box, type P16 in the Row Input
Cell edit box, type P6 in the Column Input Cell edit box, and click OK.
With cells V2:AG13 still selected, click the Align Right button.

Figure 13.7
U V W X Y Z AA AB AC AD AE AF AG
1 P(Mag OK)
2 Elec 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
3 P(Elec OK) 1.0 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec
4 0.9 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec
5 0.8 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec
6 0.7 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Mag
7 0.6 Elec Elec Elec Elec Elec Elec Elec Elec Elec Mag Mag
8 0.5 Elec Elec Elec Elec Elec Elec Elec Elec Mag Mag Mag
9 0.4 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
10 0.3 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
11 0.2 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
12 0.1 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
13 0.0 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag

Embellishments
Select cells U1:AG13, and click the Copy button. Select cell AI1, right-click, and from
the shortcut menu choose Paste Special. In the Paste Special dialog box, click the Values
option button, and click OK. Right-click again, choose Paste Special, click the Formats
option button, and click OK.
Select columns AJ:AU. Choose Format | Cells | Width, type 5, and click OK.
Select cell AJ2, right-click, and from the shortcut menu choose Clear Contents. Select
cells AK2:AU2, move the cursor near the border of the selection until it becomes an
arrow, click and drag the selection down to cells AK14:AU14. Similarly, select cell AK1
and move its contents down to cell AP15. Also, move the contents of cell AI3 to cell AI8.
Select cell AN1, and enter Strategy Region Table.
176 Chapter 13 Sensitivity Analysis for Decision Trees

Figure 13.8
AI AJ AK AL AM AN AO AP AQ AR AS AT AU
1 Strategy Region Table
2
3 1.0 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec
4 0.9 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec
5 0.8 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec
6 0.7 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Mag
7 0.6 Elec Elec Elec Elec Elec Elec Elec Elec Elec Mag Mag
8 P(Elec OK) 0.5 Elec Elec Elec Elec Elec Elec Elec Elec Mag Mag Mag
9 0.4 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
10 0.3 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
11 0.2 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
12 0.1 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
13 0.0 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
14 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
15 P(Mag OK)

Apply borders to appropriate ranges and cells to show the strategy regions. Apply
shading to cell AR8 to show the base case strategy.

Figure 13.9
AI AJ AK AL AM AN AO AP AQ AR AS AT AU
1 Strategy Region Table
2
3 1.0 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec
4 0.9 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec
5 0.8 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec
6 0.7 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Mag
7 0.6 Elec Elec Elec Elec Elec Elec Elec Elec Elec Mag Mag
8 P(Elec OK) 0.5 Elec Elec Elec Elec Elec Elec Elec Elec Mag Mag Mag
9 0.4 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
10 0.3 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
11 0.2 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
12 0.1 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
13 0.0 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
14 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
15 P(Mag OK)

13.3 MULTIPLE-OUTCOME SENSITIVITY ANALYSIS


Sensitivity Analysis for Multiple-Outcome Event Probabilities
Choose one of the outcome probabilities that will be explicitly changed.
For example, focus on P(Low Sales).
13.4 Robin Pinelli's Sensitivity Analysis 177

Keep same relative likelihood (base case) for the other probabilities.

Figure 13.10
A B C D E F G H I J K L M N O
1 0.2 P(Low Sales) OptStrat
2 High Sales
3 +$1,500 1.00 Don't
4 +$2,500 +$1,500 0.90 Don't
5 0.80 Don't
6 0.5 0.70 Don't
7 Intro Medium Sales 0.60 Intro
8 +$500 0.50 Intro
9 -$1,000 +$400 +$1,500 +$500 0.40 Intro
10 Base -> 0.30 Intro
11 0.3 0.20 Intro
12 Low Sales 0.10 Intro
13 1 -$500 0.00 Intro
14 +$400 +$500 -$500
15
16
17 Don't
18 $0
19 $0 $0

Figure 13.11
A B C D E F G H I J K L M N O
1 =(0.2/(0.2+0.5))*(1-H11) P(Low Sales) OptStrat
2 High Sales =CHOOSE(B13,"Intro","Don't")
3 1.00
4 0.90
5 0.80
6 =(0.5/(0.2+0.5))*(1-H11) 0.70
7 Intro Medium Sales 0.60
8 0.50
9 0.40
10 Base -> 0.30
11 0.3 0.20
12 Low Sales 0.10
13 0.00
14
15
16
17 Don't
18
19

13.4 ROBIN PINELLI'S SENSITIVITY ANALYSIS


Adapted from Clemen's Making Hard Decisions. Robin Pinelli is considering three job
offers. In trying to decide which to accept, robin has concluded that three objectives are
important in this decision. First, of course, is to maximize disposable income -- the
amount left after paying for housing, utilities, taxes, and other necessities. Second, Robin
likes cold weather and enjoys winter sports. The third objective relates to the quality of
the community. Being single, Robin would like to live in a city with a lot of activities and
a large population of single professionals.
178 Chapter 13 Sensitivity Analysis for Decision Trees

Developing attributes for these three objectives turns out to be relatively straightforward.
Disposable income can be measured directly by calculating monthly take-home pay
minus average monthly rent (being careful to include utilities) for an appropriate
apartment. The second attribute is annual snowfall. For the third attribute, Robin has
located a magazine survey of large cities that scores those cities as places for single
professionals to live. Although the survey is not perfect from Robin's point of view, it
does capture the main elements of her concern about the quality of the singles community
and available activities. Also all three of the cities under consideration are included in the
survey.
Here are descriptions of the three job offers:
1 MPR Manufacturing in Flagstaff, Arizona. Disposable income estimate: $1600
per month. Snowfall range: 150 to 320 cm per year. Magazine score: 50 (out of
100).
2 Madison Publishing in St. Paul, Minnesota. Disposable income estimate: $1300
to $1500 per month. (This uncertainty here is because Robin knows there is a
wide variety in apartment rental prices and will not know what is appropriate
and available until spending some time in the city.) Snowfall range: 100 to 400
cm per year. Magazine score: 75.
3 Pandemonium Pizza in San Francisco, California. Disposable income estimate:
$1200 per month. Snowfall range: negligible. Magazine score: 95.
Robin has created a decision tree to represent the situation. The uncertainty about
snowfall and disposable income are represented by the chance nodes as Robin has
included them in the tree. The ratings in the consequence matrix are such that the worst
consequence has a rating of zero points and the best has 100.
Ratings in the consequence matrix (three attribute values at each endpoint of the decision
tree) are proportional scores, corresponding to linear individual utility over the range of
possible values for each attribute.
After considering the situation, Robin concludes that the quality of the city is most
important, the amount of snowfall is next, and the third is income. (Income is important,
but the variation between $1200 and $1600 is not enough to make much difference to
Robin.) Furthermore, Robin concludes that the weight of the magazine rating in the
consequence matrix should be 1.5 time the weight for the snowfall rating and three times
as much as the weight for the income rating. This information is used to calculate the
weights for the three attributes and to calculate overall scores for each of the endpoints in
the decision tree.
13.4 Robin Pinelli's Sensitivity Analysis 179

Figure 13.12 Decision Tree and Multi-Attribute Utility (Robin Pinelli)


A B C D E F G H I J K L M N O P Q R S T U V
1 Robin Pinelli, Clemen2, pp. 150-151 Individual Utility Weight Ratio Input
2 Overall Mag/Snow 1.50
3 Non-TreePlan Formulas Utility Income Snowfall Magazine Mag/Income 3.00
4 V6 =V8/V3 0.15
5 V7 =V8/V2 Snowfall 100 cm Weights
6 V8 =1/(1/V2+1/V3+1) 48.83 75 25 56 Income 0.167
7 O6 =$V$6*Q6+$V$7*R6+$V$8*S6 Snowfall 0.333
8 Select O6:O10; click and drag Magazine 0.500
9 fill handle to O51. 0.60 0.70
10 Disp. Income $1500 Snowfall 200 cm
11 57.17 75 50 56
12
13
14 0.15
15 Snowfall 400 cm
16 73.83 75 100 56
17 Madison Publishing
18
19 55.08 0.15
20 Snowfall 100 cm
21 40.50 25 25 56
22
23
24 0.40 0.70
25 Disp. Income $1300 Snowfall 200 cm
26 48.83 25 50 56
27
28
29 0.15
30 Snowfall 400 cm
31 65.50 25 100 56
32
33
34 1 0.15
35 Snowfall 150 cm
36 29.17 100 37.5 0
37
38
39 0.70
40 MPR Manufacturing Snowfall 230 cm
41 35.83 100 57.5 0
42 35.96
43
44 0.15
45 Snowfall 320 cm
46 43.33 100 80 0
47
48
49
50 Pandemonium Pizza
51 50.00 0 0 100
52 50.00
180 Chapter 13 Sensitivity Analysis for Decision Trees

Figure 13.13 Sensitivity Analysis of Weight-Ratio Input Assumptions


X Y Z AA AB AC AD AE AF AG AH
1 Sensitivity Analysis
2 Mag/Income Weight Ratio
3 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
4 Mag/Snow 1.00 Madison Madison Madison Madison Madison Madison Madison Madison Madison
5 Weight 1.25 Madison Madison Madison Madison Madison Madison Madison Madison Madison
6 Ratio 1.50 Madison Madison Madison Madison Madison Madison Madison Madison Madison
7 1.75 Madison Madison Madison Madison Madison Madison Madison Pizza Pizza
8 2.00 Madison Madison Madison Madison Madison Pizza Pizza Pizza Pizza
9 2.25 Madison Madison Madison Madison Pizza Pizza Pizza Pizza Pizza
10 2.50 Madison Madison Madison Pizza Pizza Pizza Pizza Pizza Pizza
11 2.75 Madison Madison Madison Pizza Pizza Pizza Pizza Pizza Pizza
12 3.00 Madison Madison Madison Pizza Pizza Pizza Pizza Pizza Pizza
13
14
15
16 Mag/Income Weight Ratio
17 0.25 0.50 0.75 1.00 1.50 2.00 2.50 3.00 3.50
18 Mag/Snow 0.25 MPR MPR MPR MPR Madison Madison Madison Madison Madison
19 Weight 0.50 MPR MPR MPR Madison Madison Madison Madison Madison Madison
20 Ratio 0.75 MPR MPR MPR Madison Madison Madison Madison Madison Madison
21 1.00 MPR MPR MPR Madison Madison Madison Madison Madison Madison
22 1.25 MPR MPR MPR Madison Madison Madison Madison Madison Madison
23 1.50 MPR MPR MPR Madison Madison Madison Madison Madison Madison
24 1.75 MPR MPR MPR Madison Madison Madison Madison Madison Madison
25 2.00 MPR MPR MPR Madison Madison Madison Madison Madison Pizza
26 2.25 MPR MPR MPR Madison Madison Madison Madison Pizza Pizza
27
28
29 Formulas
30 Y3 =CHOOSE(B34,"Madison","MPR", "Pizza")
31 Y17 =CHOOSE(B34,"Madison","MPR", "Pizza")
32
33 Data Tables Y3:AH11 and Y17:AH26
34 V3 Row Input Cell
35 V2 Column Input Cell
Value of Information
in Decision Trees
14
14.1 VALUE OF INFORMATION
Useful concept for
Evaluating potential information-gathering activities
Comparing importance of multiple uncertainties

14.2 EXPECTED VALUE OF PERFECT INFORMATION


Several computational methods
Flipping tree, moving an event set of branches, appropriate for any decision tree
Payoff table, most appropriate only for single-stage tree (one set of uncertain
outcomes with no subsequent decisions)
Expected improvement
All three methods start by determining Expected Value Under Uncertainty, EVUU,
which is the expected value of the optimal strategy without any additional information.
To use these methods, you need (a) a model of your decision problem under uncertainty
with payoffs and probabilities and (b) a willingness to summarize a payoff distribution
(payoffs with associated probabilities) using expected value.
The methods can be modified to use certainty equivalents for a decision maker who is not
risk neutral.
182 Chapter 14 Value of Information in Decision Trees

Expected Value of Perfect Information, Reordered Tree

Figure 14.1 Structure, Cash Flows, Endpoint Values, and Probabilities


0.5
High Sales
$400,000
$700,000

0.3
Introduce Product Medium Sales
$100,000
-$300,000 $400,000

0.2
Low Sales
1 -$200,000
$100,000

Don't Introduce
$0
$0

Figure 14.2 Rollback Expected Values


0.5
High Sales
$400,000

0.3
Introduce Product Medium Sales
$100,000
$190,000

0.2
Low Sales
1 -$200,000
$190,000

Don't Introduce
$0

The two figures above show what is called the prior problem, i.e., the decision problem
under uncertainty before obtaining any additional information.
14.2 Expected Value of Perfect Information 183

Figure 14.3 Structure Using Perfect Prediction

High Sales

Introduce Product Medium Sales

"High Sales" Low Sales

Don't Introduce

High Sales

Introduce Product Medium Sales

Perfect Prediction "Medium Sales" Low Sales

Don't Introduce

High Sales

Introduce Product Medium Sales

"Low Sales" Low Sales

Don't Introduce

Before you get a perfect prediction, you are uncertain about what that prediction will be.
If you originally think the probability of High Sales is 0.5, then you should also think the
probability is 0.5 that a perfect prediction will tell you that sales will be high.
After you get a prediction of "High Sales," the probability of actually having high sales is
1.0.
184 Chapter 14 Value of Information in Decision Trees

Figure 14.4 Rollback Using Free Perfect Prediction


1.0
High Sales
$400,000

0.0
Introduce Product Medium Sales
$100,000
$400,000
0.5 0.0
"High Sales" Low Sales
1 -$200,000
$400,000

Don't Introduce
$0

0.0
High Sales
$400,000

1.0
Introduce Product Medium Sales
$100,000
$100,000
0.3 0.0
Perfect Prediction "Medium Sales" Low Sales
1 -$200,000
$230,000 $100,000

Don't Introduce
$0

0.0
High Sales
$400,000

0.0
Introduce Product Medium Sales
$100,000
-$200,000
0.2 1.0
"Low Sales" Low Sales
2 -$200,000
$0

Don't Introduce
$0

EVUU: Expected Value Under Uncertainty


the expected value of the best strategy without any additional information
EVPP Expected Value using a (free) Perfect Prediction
EVPI Expected Value of Perfect Information
EVPI = EVPP – EVUU
In this example, EVPI = $230,000 – $190,000 = $40,000
14.2 Expected Value of Perfect Information 185

For a perfect prediction, the information message "Low Sales" is the same as the event
Low Sales, so the detailed structure shown above is not needed.

Figure 14.5 Shortcut EVPP

Introduce Product
0.5 $400,000
High Sales
1
$400,000
Don't Introduce
$0

Introduce Product
0.3 $100,000
Perfect Prediction Medium Sales
1
$230,000 $100,000
Don't Introduce
$0

Introduce Product
0.2 -$200,000
Low Sales
2
$0
Don't Introduce
$0

Expected Value of Perfect Information, Payoff Table


This method is most appropriate only for a single-stage decision tree (one set of uncertain
outcomes with no subsequent decisions).

Figure 14.6 Payoff Table for Prior Problem with Expected Values
Alternatives
Probability Event Introduce Don't
0.5 High Sales $400,000 $0
0.3 Medium Sales $100,000 $0
0.2 Low Sales -$200,000 $0

Expected Value $190,000 $0


186 Chapter 14 Value of Information in Decision Trees

For each row in the body of the payoff table, if you receive a perfect prediction that the
event in that row will occur, which alternative would you choose and what would your
payoff be?
Before you receive the prediction, you don't know which of the payoffs you will receive
(either $400,000 or $100,000 or $0), so you summarize the payoff distribution using
expected value, EVPP.

Figure 14.7 Payoff Table with EVPP


Alternatives Payoff Using
Probability Event Introduce Don't Perfect Prediction
0.5 High Sales $400,000 $0 $400,000
0.3 Medium Sales $100,000 $0 $100,000
0.2 Low Sales -$200,000 $0 $0

Expected Value $190,000 $0 $230,000


EVUU EVPP

EVPI = $230,000 – $190,000 = $40,000

Expected Value of Perfect Information, Expected Improvement


Like the payoff table method, this method is most appropriate only for a single-stage
decision tree.
(1) Use the prior decision tree or prior payoff table to find EVUU (the expected value of
the best strategy without any additional information).
(2) If you are committed to the best strategy, consider each outcome of the uncertain
event and whether you would change your choice if you received a perfect prediction that
the event was going to occur.
In the example, you would not change your choice if you are told that sales will be high
or medium. However, if you are told that sales will be low, you would change your
choice from Introduce to Don't.
(3) Determine how much your payoff will improve in each of the cases.
In the example, your payoff will not improve if you are told that sales will be high or
medium, but your payoff will improve by $200,000 (from –$200,000 to $0) if you are
told that sales will be low.
(4) Compute expected improvement associated with having the perfect prediction by
weighting each improvement by its associated probability.
14.2 Expected Value of Perfect Information 187

In the example, the improvements associated with a perfect prediction of high, medium,
and low are $0, $0, and $200,000, respectively, with probabilities 0.5, 0.3, 0.2.
EVPI = Expected Improvement = 0.5*0 + 0.3*0 + 0.2*200,000 = $40,000

Expected Value of Perfect Information, Single-Season Product

Figure 14.8 Prior Problem, Four Alternatives and Three Outcomes


A B C D E F
1 Single-Season Product
2
3 Data
4
5 Price $3.00
6 Equip. Size
7 None Small Medium Large
8 Fixed Cost $0 $1,000 $2,000 $3,000
9 Var. Cost $0.00 $0.90 $0.70 $0.50
10 Capacity 0 4500 5500 6500
11
12 Payoff Table
13
14 Equip. Size
15 Prob. Demand None Small Medium Large
16 0.3 3000 $0 $5,300 $4,900 $4,500
17 0.4 4000 $0 $7,400 $7,200 $7,000
18 0.3 5000 $0 $8,450 $9,500 $9,500
19
20 Exp.Val. $0 $7,085 $7,200 $7,000
21
22
23 C16 formula: =($B$5-C$9)*MIN(C$10,$B16)-C$8
24 copied to C16:F18
25
26 C20 formula: =SUMPRODUCT($A16:$A18,C16:C18)
27 copied to C20:F20

Figure 14.9 EVPP


A B C D E F G H I
14 Equip. Size Payoff Using
15 Prob. Demand None Small Medium Large Perfect Prediction
16 0.3 3000 $0 $5,300 $4,900 $4,500 $5,300
17 0.4 4000 $0 $7,400 $7,200 $7,000 $7,400
18 0.3 5000 $0 $8,450 $9,500 $9,500 $9,500
19
20 Exp.Val. $0 $7,085 $7,200 $7,000 $7,400
21
22 H16 formula =MAX(C16:F16) copied to H16:H18
23 C20 formula copied to H20
188 Chapter 14 Value of Information in Decision Trees

EVPI = EVPP – EVUU = $7,400 – $7,200 = $200

Figure 14.10 Basic Probability Decision Tree


High Sales

Introduce Product

Low Sales
Success Prediction

Don't Introduce

High Sales

Introduce Product

Low Sales
Market Survey Inconclusive

Don't Introduce

High Sales

Introduce Product

Low Sales
Failure Prediction

Don't Introduce

High Sales

Introduce Product

Low Sales
Don't Survey

Don't Introduce
14.2 Expected Value of Perfect Information 189

Figure 14.11 DriveTek EVPI Magnetic Success/Failure


Use mechanical method
+$80,000
+$80,000

0.5
Electronic success
+$150,000
0.5 Try electronic method +$150,000
Awarded contract
2 +$90,000 0.5
+$90,000 Electronic failure
+$30,000
+$30,000

0.7
Magnetic success
Prepare proposal +$120,000
Try magnetic method +$120,000
+$20,000
+$84,000 0.3
Magnetic failure
$0
$0
No Additional Information
1 1 0.5
+$20,000 +$20,000 Not awarded contract
-$50,000
-$50,000

Don't prepare proposal


$0
$0

Use mechanical method


+$80,000
+$80,000

0.5
Electronic success
+$150,000
0.5 Try electronic method +$150,000
Awarded contract
3 +$90,000 0.5
+$120,000 Electronic failure
+$30,000
+$30,000

1.0
2 Magnetic success
+$30,500 Prepare proposal +$120,000
Try magnetic method +$120,000
+$35,000
+$120,000 0.0
Magnetic failure
$0
0.7 $0
"Magnetic Success"
1 0.5
+$35,000 Not awarded contract
-$50,000
-$50,000

Don't prepare proposal


$0
$0

Use mechanical method


+$80,000
+$80,000

0.5
Perfect Prediction Electronic success
+$150,000
+$30,500 0.5 Try electronic method +$150,000
Awarded contract
2 +$90,000 0.5
+$90,000 Electronic failure
+$30,000
+$30,000

0.0
Magnetic success
Prepare proposal +$120,000
Try magnetic method +$120,000
+$20,000
$0 1.0
Magnetic failure
$0
0.3 $0
"Magnetic Failure"
1 0.5
+$20,000 Not awarded contract
-$50,000
-$50,000

Don't prepare proposal


$0
$0
190 Chapter 14 Value of Information in Decision Trees

14.3 DRIVETEK POST-CONTRACT-AWARD PROBLEM


DriveTek decided to prepare the proposal, and it turned out that they were awarded the
contract. The $50,000 cost and $250,000 up-front payment are in the past. The current
decision is to determine which method to use to satisfy the contract.
The following decision trees show costs as negative cash flows, so the decision criterion
is to maximize expected cash flow. An alternative formulation (not shown here) would
show all costs as positive values and would minimize expected cost.

Figure 14.12 EVUU

Use mechanical
-120000
-120000 -120000

0.5
Electronic success
-50000
Try electronic 0 -50000

2 -50000 -110000 0.5


-110000 Electronic failure
-170000
-120000 -170000

0.7
Magnetic success
-80000
Try magnetic 0 -80000

-80000 -116000 0.3


Magnetic failure
-200000
-120000 -200000
14.3 DriveTek Post-Contract-Award Problem 191

Figure 14.13 EVPP Elec

Use mechanical
-120000
-120000 -120000

1
Electronic success
-50000
0.5 Try electronic 0 -50000
"Electronic success"
2 -50000 -50000 0
0 -50000 Electronic failure
-170000
-120000 -170000

0.7
Magnetic success
-80000
Try magnetic 0 -80000

-80000 -116000 0.3


Magnetic failure
-200000
-83000 -120000 -200000

Use mechanical
-120000
-120000 -120000

0
Electronic success
-50000
0.5 Try electronic 0 -50000
"Electronic failure"
3 -50000 -170000 1
0 -116000 Electronic failure
-170000
-120000 -170000

0.7
Magnetic success
-80000
Try magnetic 0 -80000

-80000 -116000 0.3


Magnetic failure
-200000
-120000 -200000
192 Chapter 14 Value of Information in Decision Trees

Figure 14.14 EVPP Mag

Use mechanical
-120000
-120000 -120000

0.5
Electronic success
-50000
0.7 Try electronic 0 -50000
"Magnetic success"
3 -50000 -110000 0.5
0 -80000 Electronic failure
-170000
-120000 -170000

1
Magnetic success
-80000
Try magnetic 0 -80000

-80000 -80000 0
Magnetic failure
-200000
-89000 -120000 -200000

Use mechanical
-120000
-120000 -120000

0.5
Electronic success
-50000
0.3 Try electronic 0 -50000
"Magnetic failure"
2 -50000 -110000 0.5
0 -110000 Electronic failure
-170000
-120000 -170000

0
Magnetic success
-80000
Try magnetic 0 -80000

-80000 -200000 1
Magnetic failure
-200000
-120000 -200000
14.3 DriveTek Post-Contract-Award Problem 193

Figure 14.15 EVPP Both


EVPP Both
Use mechanical
-120000
-120000 -120000

1
Electronic success
-50000
0.7 Try electronic 0 -50000
"Magnetic success"
2 -50000 -50000 0
0 -50000 Electronic failure
-170000
-120000 -170000

1
Magnetic success
-80000
Try magnetic 0 -80000

0.5 -80000 -80000 0


"Electronic success" Magnetic failure
-200000
0 -50000 -120000 -200000

Use mechanical
-120000
-120000 -120000

1
Electronic success
-50000
0.3 Try electronic 0 -50000
"Magnetic failure"
2 -50000 -50000 0
0 -50000 Electronic failure
-170000
-120000 -170000

0
Magnetic success
-80000
Try magnetic 0 -80000

-80000 -200000 1
Magnetic failure
-200000
-71000 -120000 -200000

Use mechanical
-120000
-120000 -120000

0
Electronic success
-50000
0.7 Try electronic 0 -50000
"Magnetic success"
3 -50000 -170000 1
0 -80000 Electronic failure
-170000
-120000 -170000

1
Magnetic success
-80000
Try magnetic 0 -80000

0.5 -80000 -80000 0


"Electronic failure" Magnetic failure
-200000
0 -92000 -120000 -200000

Use mechanical
-120000
-120000 -120000

0
Electronic success
-50000
0.3 Try electronic 0 -50000
"Magnetic failure"
1 -50000 -170000 1
0 -120000 Electronic failure
-170000
-120000 -170000

0
Magnetic success
-80000
Try magnetic 0 -80000

-80000 -200000 1
Magnetic failure
-200000
-120000 -200000
194 Chapter 14 Value of Information in Decision Trees

14.4 SENSITIVITY ANALYSIS VS EVPI


Working Paper Title: Do Sensitivity Analyses Really Capture Problem Sensitivity? An
Empirical Analysis Based on Information Value
Authors: James C. Felli, Naval Postgraduate School and Gordon B. Hazen, Northwestern
University
Date: March 1998
The most common methods of sensitivity analysis (SA) in decision-analytic modeling are
based either on proximity in parameter-space to decision thresholds or on the range of
payoffs that accompany parameter variation. As an alternative, we propose the use of the
expected value of perfect information (EVPI) as a sensitivity measure and argue from
first principles that it is the proper measure of decision sensitivity. EVPI has significant
advantages over conventional SA, especially in the multiparametric case, where graphical
SA breaks down. In realistically sized problems, simple one- and two-way SAs may not
fully capture parameter interactions, raising the disturbing possibility that many
published decision analyses might be overconfident in their policy recommendations. To
investigate the extent of this potential problem, we re-examined 25 decision analyses
drawn from the published literature and calculated EVPI values for parameters on which
sensitivity analyses had been performed, as well as the entire set of problem parameters.
While we expected EVPI values to indicate greater problem sensitivity than conventional
SA due to revealed parameter interaction, we in fact found the opposite: compared to
EVPI, the one- and two-parameter SAs accompanying these problems dramatically
overestimated problem sensitivity to input parameters. This phenomenon can be
explained by invoking the flat maxima principle enunciated by von Winterfeldt and
Edwards.
http://www.mccombs.utexas.edu/faculty/jim.dyer/DA_WP/WP980019.pdf
Value of Imperfect
Information
15
15.1 TECHNOMETRICS PROBLEM
Prior Problem
Technometrics, Inc., a large producer of electronic components, is having some problems
with the manufacturing process for a particular component. Under its current production
process, 25 percent of the units are defective. The profit contribution of this component
is $40 per unit. Under the contract the company has with its customers, Technometrics
refunds $60 for each component that the customer finds to be defective; the customers
then repair the component to make it usable in their applications. Before shipping the
components to customers, Technometrics could spend an additional $30 per component
to rework any components thought to be defective (regardless of whether the part is really
defective). The reworked components can be sold at the regular price and will definitely
not be defective in the customers' applications. Unfortunately, Technometrics cannot tell
ahead of time which components will fail to work in their customers' applications. The
following payoff table shows Technometrics' net cash flow per component.

Figure 15.1 Payoff Table


Component Technometrics' Choice
Condition Ship as is Rework first
Good +$40 +$10
Defective -$20 +$10

What should Technometrics do?


How much should Technometrics be willing to pay for a test that could evaluate the
condition of the component before making the decision to ship as is or rework first?
196 Chapter 15 Value of Imperfect Information

Imperfect Information
An engineer at Technometrics has developed a simple test device to evaluate the
component before shipping. For each component, the test device registers positive,
inconclusive, or negative. The test is not perfect, but it is consistent for a particular
component; that is, the test yields the same result for a given component regardless of
how many times it is tested. To calibrate the test device, it was run on a batch of known
good components and on a batch of know defective components. The results in the table
below, based on relative frequencies, show the probability of a test device result,
conditional on the true condition of the component.

Figure 15.2 Likelihoods


Component Condition
Test Result Good Defective
Positive 0.70 0.10
Inconclusive 0.20 0.30
Negative 0.10 0.60

For example, of the known defective components tested, sixty percent had a negative test
result.
An analyst at Technometrics suggested using Bayesian revision of probabilities to
combine the assessments about the reliability of the test device (shown above) with the
original assessment of the components' condition (25 percent defectives).
Technometrics uses expected monetary value for making decisions under uncertainty.
What is the maximum (per component) the company should be willing to pay for using
the test device?

Probabilities From Relative Frequencies

Figure 15.3 Joint Outcome Table


Component Condition
Test Result Good Defective
Positive
Inconclusive
Negative

Random Process: select a component at random


15.1 Technometrics Problem 197

Six possible outcomes (most detailed description of result of random process), described
by test result and component condition

Figure 15.4 Six Possible Outcomes


Component Condition
Test Result Good Defective
Positive P&G P&D
Inconclusive I& G I&D
Negative N&G N&D

Event: a collection of outcomes


We say an event has occurred when the single outcome of the random process is
contained in the event.
Five obvious events
For example, the event Good contains three outcomes in left column, and the event
Negative contains two outcomes in the bottom row.
400 Components Classified by Test Result and Condition

Figure 15.5 Joint Frequency Table


Component Condition
Test Result Good Defective
Positive 210 10
Inconclusive 60 30
Negative 30 60

Figure 15.6 Joint Frequency Table with Row and Column Totals
Component Condition
Test Result Good Defective
Positive 210 10 220
Inconclusive 60 30 90
Negative 30 60 90
300 100 400
198 Chapter 15 Value of Imperfect Information

Figure 15.7 Joint Probability Table with Row and Column Totals
Component Condition
Test Result Good Defective
Positive 0.525 0.025 0.550
Inconclusive 0.150 0.075 0.225
Negative 0.075 0.150 0.225
0.750 0.250 1.000
15.1 Technometrics Problem 199

Figure 15.8 Decision Tree Model


A B C D E F G H I J K L M N O P Q R S
1 EVSI 0.7500
2 $ 2.25 Good
3 $40.00
4 Ship as is $40.00
5
6 $25.00 0.2500
7 Defective
8 No add'l info -$20.00
9 1 -$20.00
10 $25.00
11 EVUU
12 Rework first
13 $10.00
14 $10.00
15
16 0.9545
17 Good
18 $40.00
19 Ship as is $40.00
20
21 $37.27 0.0455
22 0.5500 Defective
23 Positive -$20.00
24 2 1 -$20.00
25 $27.25 $37.27
26
27 Rework first
28 $10.00
29 $10.00
30
31 0.6667
32 Good
33 $40.00
34 Ship as is $40.00
35
36 $20.00 0.3333
37 0.2250 Defective
38 Test Inconclusive -$20.00
39 1 -$20.00
40 $27.25 $20.00
41 EVSP
42 Rework first
43 $10.00
44 $10.00
45
46 0.3333
47 Good
48 $40.00
49 Ship as is $40.00
50
51 $0.00 0.6667
52 0.2250 Defective
53 Negative -$20.00
54 2 -$20.00
55 $10.00
56
57 Rework first
58 $10.00
59 $10.00
200 Chapter 15 Value of Imperfect Information

Revision of Probability

Figure 15.9 Display

U V W X Y
1 Prior 0.75 0.25 = P(Main)
2 Likelihood Good Bad
3 Positive 0.7 0.1 = P(Info | Main)
4 Inconclusive 0.2 0.3
5 Negative 0.1 0.6
6
7 Joint Good Bad Preposterior
8 Positive 0.525 0.025 0.550 = P(Info)
9 Inconclusive 0.150 0.075 0.225
10 Negative 0.075 0.150 0.225
11
12 Posterior Good Bad
13 Positive 0.9545 0.0455 = P(Main | Info)
14 Inconclusive 0.6667 0.3333
15 Negative 0.3333 0.6667

Figure 15.10 Formulas


U V W X Y
1 Prior 0.75 0.25 = P(Main)
2 Likelihood Good Bad
3 Positive 0.7 0.1 = P(Info | Main)
4 Inconclusive 0.2 0.3
5 Negative 0.1 0.6
6
7 Joint Good Bad Preposterior
8 Positive =V$1*V3 =W$1*W3 =SUM(V8:W8) = P(Info)
9 Inconclusive =V$1*V4 =W$1*W4 =SUM(V9:W9)
10 Negative =V$1*V5 =W$1*W5 =SUM(V10:W10)
11
12 Posterior Good Bad
13 Positive =V8/$X8 =W8/$X8 = P(Main | Info)
14 Inconclusive =V9/$X9 =W9/$X9
15 Negative =V10/$X10 =W10/$X10
Modeling Attitude
Toward Risk
16
16.1 RISK UTILITY FUNCTION
A certainty equivalent is a certain payoff value which is equivalent, for the decision
maker, to a particular payoff distribution. If the decision maker can determine his or her
certainty equivalent for the payoff distribution of each strategy in a decision problem,
then the optimal strategy is the one with the highest certainty equivalent.
The certainty equivalent, i.e., the minimum selling price for a payoff distribution,
depends on the decision maker's personal attitude toward risk. A decision maker may be
risk preferring, risk neutral, or risk avoiding.
If the terminal values are not regarded as extreme relative to the decision maker's total
assets, if the decision maker will encounter other decision problems with similar payoffs,
and if the decision maker has the attitude that he or she will "win some and lose some,"
then the decision maker's attitude toward risk may be described as risk neutral.
If the decision maker is risk neutral, the certainty equivalent of a payoff distribution is
equal to its expected value. The expected value of a payoff distribution is calculated by
multiplying each terminal value by its probability and summing the products.
If the terminal values in a decision situation are extreme or if the situation is "one-of-a-
kind" so that the outcome has major implications for the decision maker, an expected
value analysis may not be appropriate. Such situations may require explicit consideration
of risk.
Unfortunately, it can be difficult to determine one's certainty equivalent for a complex
payoff distribution. We can aid the decision maker by first determining his or her
certainty equivalent for a simple payoff distribution and then using that information to
infer the certainty equivalent for more complex payoff distributions.
A utility function, U(x), can be used to represent a decision maker's attitude toward risk.
The values or certainty equivalents, x, are plotted on the horizontal axis; utilities or
expected utilities, u or U(x), are on the vertical axis. You can use the plot of the function
202 Chapter 16 Modeling Attitude Toward Risk

by finding a value on the horizontal axis, scanning up to the plotted curve, and looking
left to the vertical axis to determine the utility.
A typical risk utility function might have the general shape shown below if you draw a
smooth curve approximately through the points.

Figure 16.1 Typical Risk Utility Function

1.0

0.9

0.8
Utility U(x) or Expected Utility

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0
-$50,000 -$25,000 $0 $25,000 $50,000 $75,000 $100,000 $125,000 $150,000
Monetary Value x or Certainty Equivalent

Since more value generally means more utility, the utility function is monotonically non-
decreasing, and its inverse is well-defined. On the plot of the utility function, you locate a
utility on the vertical axis, scan right to the plotted curve, and look down to read the
corresponding value.
The concept of a payoff distribution, risk profile, gamble, or lottery is important for
discussing utility functions. A payoff distribution is a set of payoffs, e.g., x1, x2, and x3,
with corresponding probabilities, P(X=x1), P(X=x2), and P(X=x3). For example, a
payoff distribution may be represented in decision tree form as shown below.
16.1 Risk Utility Function 203

Figure 16.2 Figure 2 Payoff Distribution Probability Tree


P(X=x 1)
x1

P(X=x 2 )
x2

P(X=x 3 )
x3

The fundamental property of a utility function is that the utility of the certainty equivalent
CE of a payoff distribution is equal to the expected utility of the payoffs, i.e,
U(CE) = P(X=x1)*U(x1) + P(X=x2)*U(x2) + P(X=x3)*U(x3).
It follows that if you compute the expected utility (EU) of a lottery,
EU = P(X=x1)*U(x1) + P(X=x2)*U(x2) + P(X=x3)*U(x3),
the certainty equivalent of the payoff distribution can be determined using the inverse of
the utility function. That is, you locate the expected utility on the vertical axis, scan right
to the plotted curve, and look down to read the corresponding certainty equivalent.
If a utility function has been determined, you can use this fundamental property to
determine the certainty equivalent of any payoff distribution. Calculations for the
Magnetic strategy in the DriveTek problem are shown below. First, using a plot of the
utility function, locate each payoff x on the horizontal axis and determine the
corresponding utility U(x) on the vertical axis. Second, compute the expected utility EU
of the lottery by multiplying each utility by its probability and summing the products.
Third, locate the expected utility on the vertical axis and determine the corresponding
certainty equivalent CE on the horizontal axis.

Figure 16.3 Calculations Using Risk Utility Function


P(X=x) x U(x) P(X=x)*U(x)
0.50 -$50,000 0.00 0.0000
0.15 $0 0.45 0.0675
0.35 $120,000 0.95 0.3325
0.4000 EU

-$8,000 CE
204 Chapter 16 Modeling Attitude Toward Risk

16.2 EXPONENTIAL RISK UTILITY


Instead of using a plot of a utility function, an exponential function may be used to
represent risk attitude. The general form of the exponential utility function is
U(x) = A – B*EXP(–x/RT).
The risk tolerance parameter RT determines the curvature of the utility function reflecting
the decision maker’s attitude toward risk. Subsequent sections cover three methods for
determining RT.
EXP is Excel's standard exponential function, i.e., EXP(z) represents the value e raised to
the power of z, where e is the base of the natural logarithms.
The parameters A and B determine scaling. After RT is determined, if you want to plot a
utility function so that U(High) = 1.0 and U(Low) = 0.0, you can use the following
formulas to determine the scaling parameters A and B.
A = EXP (–Low/RT) / [EXP (–Low/RT) – EXP (–High/RT)]
B = 1 / [EXP (–Low/RT) – EXP (–High/RT)]
The inverse function for finding the certainty equivalent CE corresponding to an expected
utility EU is
CE = –RT*LN[(A–EU)/B],
where LN(y) represents the natural logarithm of y.
After the parameters A, B, and RT have been determined, the exponential utility function
and its inverse can be used to determine the certainty equivalent for any lottery.
Calculations for the Magnetic strategy in the DriveTek problem are shown in Figure 4.
16.2 Exponential Risk Utility 205

Figure 16.4 Exponential Risk Utility Results

Computed values are displayed with four decimal places, but Excel's 15-digit precision is
used in all calculations. For a decision maker with a risk tolerance parameter of
$100,000, the payoff distribution for the Magnetic strategy has a certainty equivalent of
-$7,676. That is, if the decision maker is facing the payoff distribution shown in A9:B12
in Figure 4, he or she would be willing to pay $7,676 to be relieved of the obligation.
Formulas are shown in Figure 5. To construct the worksheet, enter the text in column A
and the monetary values in column B. To define names, select A2:B4, and choose Insert |
Name | Create. Similarly, select A6:B7, and choose Insert | Name | Create. Then enter the
formulas in B6:B7. Enter formulas in C10 and D10, and copy down. Finally, enter the
EU formula in D13 and the CE formula in D15. The defined names are absolute
references by default.
206 Chapter 16 Modeling Attitude Toward Risk

Figure 16.5 Exponential Risk Utility Formulas

Figure 6 shows results for the same payoff distribution using a simplified form of the
exponential risk utility function with A = 1 and B = 1. This function could be represented
as U(x) = 1–EXP(–x/RT) with inverse CE = –RT*LN(1–EU). The utility and expected
utility calculations are different, but the certainty equivalent is the same.

Figure 16.6 Simplified Exponential Risk Utility Results


16.3 Approximate Risk Tolerance 207

16.3 APPROXIMATE RISK TOLERANCE


The value of the risk tolerance parameter RT is approximately equal to the maximum
value of Y for which the decision maker is willing to accept a payoff distribution with
equally-likely payoffs of $Y and −$Y/2 instead of accepting $0 for certain.

Figure 16.7 Approximate Risk Tolerance


0.5
Heads
+$Y
Play

0.5
Tails
-$Y/2

Don't
$0

For example, in a personal decision, you may be willing to play the game shown in
Figure 7 with equally-likely payoffs of $100 and –$50, but you might not play with
payoffs of $100,000 and –$50,000. As the better payoff increases from $100 to $100,000
(and the corresponding worse payoff increases from –$50 to –$50,000), you reach a value
where you are indifferent between playing the game and receiving $0 for certain. At that
point, the value of the better payoff is an approximation of RT for an exponential risk
utility function describing your risk attitude.
In a business decision for a small company, the company may be willing to play the game
with payoffs of $200,000 and –$100,000 but not with payoffs of $20,000,000 and
-$10,000,000. Somewhere between a better payoff of $200,000 and $20,000,000, the
company would be indifferent between playing the game and not playing, thereby
determining the approximate RT for their business decision.

16.4 EXACT RISK TOLERANCE USING EXCEL


A simple payoff distribution, called a risk attitude assessment lottery, may be used to
determine the decision maker's attitude toward risk. This lottery has equal probability of
obtaining each of the two payoffs. It is good practice to use a better payoff at least as
large as the highest payoff in the decision problem and a worse payoff as small as or
smaller than the lowest payoff. In any case, the payoffs should be far enough apart that
the decision maker perceives a definite difference in the two outcomes. Three values
must be specified for the fifty-fifty lottery: the Better payoff, the Worse payoff, and the
Certainty Equivalent, as shown in Figure 8.
208 Chapter 16 Modeling Attitude Toward Risk

Figure 16.8 Risk Attitude Assessment Lottery

0.5
Better Payoff
Certainty
Equivalent =

0.5
Worse Payoff

According to the fundamental property of a risk utility function, the utility of the
certainty equivalent equals the expected utility of the lottery, so the three values are
related as follows.
U(CertEquiv) = 0.5*U(BetterPayoff) + 0.5*U(WorsePayoff)
If you use the general form for an exponential utility function with parameters A, B, and
RT, and if you simplify terms, it follows that RT must satisfy the following equation.
Exp(–CertEquiv/RT) = 0.5*Exp(–BetterPayoff/RT) + 0.5*Exp(–WorsePayoff/RT)
Given the values for CE, Better, and Worse, you could use trial-and-error to find the
value of RT that exactly satisfies the equation. In Excel you can use Goal Seek or Solver
by creating a worksheet like Figure 9.
Enter the text in column A. Enter the assessment lottery values in B2:B4. Enter a
tentative RT value in B6. Select A2:B4, and use Insert | Name | Create; repeat for A6:B6
and A8:B9. Note that the parentheses symbol is not allowed in a defined name, so Excel
changes U(CE) to U_CE and EU(Lottery) to EU_Lottery.
16.4 Exact Risk Tolerance Using Excel 209

Figure 16.9 Formulas for Risk Tolerance Search

Figure 16.10 Tentative Values for Risk Tolerance Search

Figure 10 shows tentative values for the search. From the Tools menu, choose Goal Seek.
In the Goal Seek dialog box, enter B11, 0, and B6. If you point to cells, the reference
appears in the edit box as an absolute reference, as shown in Figure 11. Click OK.
210 Chapter 16 Modeling Attitude Toward Risk

Figure 16.11 Goal Seek Dialog Box

The Goal Seek Status dialog box shows that a solution has been found. Click OK. The
worksheet appears as shown in Figure 12.

Figure 16.12 Results of Goal Seek Search

The difference between U(CE) and EU(Lottery) is not exactly zero. If you start at
$250,000, the Goal Seek converges to a difference of –6.2E–05 or 0.000062, which is
closer to zero, resulting in a RT of $243,041.
If extra precision is needed, use Solver. With Solver's default settings, the difference is
2.39E–08 with RT equal to $243,261. If you change the precision from 0.000001 to
0.00000001 or an even smaller value in Solver's Options, the difference will be even
closer to zero.
16.5 Exact Risk Tolerance Using RiskTol.xla 211

16.5 EXACT RISK TOLERANCE USING RISKTOL.XLA


The Goal Seek and Solver methods for determining the risk tolerance parameter RT yield
static results. For a dynamic result, use the risktol.xla add-in function. A major advantage
of risktol.xla is that it facilitates sensitivity analysis. Whenever an input to the function
changes, the result is recalculated. The function syntax is
RISKTOL(WorsePayoff,CertEquiv,BetterPayoff,BetterProb).
When you open the risktol.xla file, the function is added to the Math & Trig function
category list.
The function returns a very precise value of the risk tolerance parameter for an
exponential utility function. The result is consistent with CertEquiv as the decision
maker’s certainty equivalent for a two-payoff assessment lottery with payoffs
WorsePayoff and BetterPayoff, with probability BetterProb of obtaining BetterPayoff and
probability 1 − BetterProb of obtaining WorsePayoff.
In case of an error, the RISKTOL function returns:
#N/A if there are too few or too many arguments. The first three arguments
(WorsePayoff, CertEquiv, and BetterPayoff) are required; the fourth argument
(BetterProb) is optional, with default value 0.5.
#VALUE! if WorsePayoff >= CertEquiv, or CertEquiv >= Better Payoff, or
BetterProb (if specified) <= 0 or >= 1.
#NUM! if the search procedure fails to converge.
In Figure 13, the text in cells A2:A4 has been used as defined names for cells B2:B4, and
the text in cell A6 is the defined name for cell B6, as shown in the name box. After
opening the risktol.xla file, enter the function name and arguments, as shown in the
formula bar. If one of the three inputs change, the result in cell B6 is recalculated.

Figure 16.13 Exact Risk Tolerance Using RiskTol.xla


212 Chapter 16 Modeling Attitude Toward Risk

16.6 EXPONENTIAL UTILITY AND TREEPLAN


TreePlan's default is to rollback the tree using expected values. If you choose to use
exponential utilities in TreePlan's Options dialog box, TreePlan will redraw the decision
tree diagram with formulas for computing the utility and certainty equivalent at each
node. For the Maximize option, the rollback formulas are U = A–B*EXP(–X/RT) and
CE = -LN((A-EU)/B)*RT, where X and EU are cell references. For the Minimize option,
the formulas are U = A-B*EXP(X/RT) and CE = LN((A-EU)/B)*RT.
TreePlan uses the name RT to represent the risk tolerance parameter of the exponential
utility function. The names A and B determine scaling. If the names A, B, and RT don't
exist on the worksheet when you choose to use exponential utility, they are initially
defined as A=1, B=1, and RT=999999999999. You can redefine the names using the
Insert | Name | Define or Insert | Name | Create commands.
To plot the utility curve, enter a list of X values in a column on the left, and enter the
formula =A−B*EXP(−X/RT) in a column on the right, where X is a reference to the
corresponding cell on the left. Select the values in both columns, and use the
ChartWizard to develop an XY (Scatter) chart.
If RT is specified using approximate risk tolerance values, you can perform sensitivity
analysis by (1) using the defined name RT for a cell, (2) constructing a data table with a
list of possible RT values and an appropriate output formula (usually a choice indicator at
a decision node or a certainty equivalent), and (3) specifying the RT cell as the input cell
in the Data Table dialog box.

16.7 EXPONENTIAL UTILITY AND RISKSIM


After using RiskSim to obtain model output results, select the column containing the
Sorted Data, copy to the clipboard, select a new sheet, and paste. Alternatively, you can
use the unsorted values, and you can also do the following calculations on the original
sheet containing the model results. This example uses only ten iterations; 500 or 1,000
iterations are more appropriate.
Use one of the methods described previously to specify values of RT, A, and B. Since the
model output values shown in Figures 14 and 15 range from approximately $14,000 to
$176,000, the utility function is defined for a range from worse payoff $0 to better payoff
$200,000. RT was determined using risktol.xla with a risk-seeking certainty equivalent of
$110,000.
To obtain the utility of each model output value in cells A2:A11, select cell B2, and enter
the formula =A−B*EXP(−A2/RT). Select cell B2, click the fill handle in the lower right
corner of the cell and drag down to cell B11. Enter the formulas in cells A13:C13 and the
labels in row 14.
16.7 Exponential Utility and RiskSim 213

Figure 16.14 Risk Utility Formulas for RiskSim


A B C
1 Sorted Data Utility
2 14229.56 =A-B*EXP(-A2/RT)
3 32091.92 =A-B*EXP(-A3/RT)
4 51091.48 =A-B*EXP(-A4/RT)
5 66383.79 =A-B*EXP(-A5/RT)
6 69433.32 =A-B*EXP(-A6/RT)
7 87322.23 =A-B*EXP(-A7/RT)
8 95920.93 =A-B*EXP(-A8/RT)
9 135730.71 =A-B*EXP(-A9/RT)
10 154089.36 =A-B*EXP(-A10/RT)
11 175708.87 =A-B*EXP(-A11/RT)
12
13 =AVERAGE(A2:A11) =AVERAGE(B2:B11) =-LN((A-B13)/B)*RT
14 Exp. Value Exp.Util. CE

Figure 16.15 Risk Utility Results for RiskSim


A B C
1 Sorted Data Utility
2 $ 14,230 0.05862
3 $ 32,092 0.13462
4 $ 51,091 0.21851
5 $ 66,384 0.28841
6 $ 69,433 0.30260
7 $ 87,322 0.38767
8 $ 95,921 0.42966
9 $ 135,731 0.63382
10 $ 154,089 0.73363
11 $ 175,709 0.85600
12
13 $ 88,200 0.40435 $ 90,757
14 Exp. Value Exp.Util. CE
214 Chapter 16 Modeling Attitude Toward Risk

16.8 RISK SENSITIVITY FOR MACHINE PROBLEM


Figure 16.16
A B C D E F G H I J K L
1 Process 1 NPV Utility Process 2 NPV Utility RT AJS, Clemen2
2 $107,733 0.102133 $86,161 0.082554 $1,000,000 pp. 428-430
3 $39,389 0.038623 $58,417 0.056744
4 $125,210 0.117689 $171,058 0.157228 Process 1 Process 2
5 $66,032 0.063899 $263,843 0.231906
6 $32,504 0.031982 $37,180 0.036498 ExpUtility 0.085527 0.107258
7 $138,132 0.129016 $254,027 0.224329
8 $83,000 0.079649 $118,988 0.112181 CertEquiv $89,407 $113,458
9 $48,178 0.047036 $133,862 0.125289
10 $20,130 0.019928 $26,597 0.026247 ExpValue $90,526 $116,159
11 $31,445 0.030956 $187,063 0.170608
12 $19,739 0.019546 $88,060 0.084294
13 $4,641 0.00463 $114,837 0.108489
14 $92,368 0.08823 $130,638 0.122465 Goal Seek
15 $102,585 0.097498 $138,882 0.12967
16 $106,411 0.100945 $226,909 0.203006 CE2 - CE1 $24,050
17 $110,528 0.104639 $156,102 0.144528
18 $171,524 0.15762 $193,209 0.17569
19 $87,698 0.083963 $92,004 0.087898
20 $123,907 0.116538 $163,780 0.151071 NPV values from RiskSim Summary
21 $69,783 0.067404 $22,176 0.021932 Cell I2 has defined name RT
22 $144,052 0.134157 $135,190 0.12645 Formulas
23 $131,461 0.123187 $61,013 0.059189 C2 =1-EXP(-B2/RT)
24 $34,938 0.034335 $184,907 0.168819 Copy down to C1001
25 $75,551 0.072768 $70,967 0.068507 G2 =1-EXP(-F2/RT)
26 $32,144 0.031633 -$10,251 -0.010304 Copy down to G1001
27 $61,719 0.059853 $89,645 0.085744 J6 =AVERAGE(C2:C1001)
28 $139,568 0.130266 $119,405 0.112551 K6 =AVERAGE(G2:G1001)
29 $89,107 0.085252 $96,670 0.092144 J8 =-RT*LN(1-J6)
30 $94,158 0.089861 $114,124 0.107853 K8 =-RT*LN(1-K6)
31 $81,459 0.07823 $208,778 0.188425 J10 =AVERAGE(B2:B1001)
32 $139,258 0.129997 $24,580 0.02428 K10 =AVERAGE(F2:F1001)
33 $58,190 0.056529 $155,958 0.144405 J16 =K8-J8
34 -$13,104 -0.01319 $198,519 0.180056
35 $36,529 0.035869 $167,568 0.154281
36 $91,239 0.0872 $36,676 0.036011
37 $147,155 0.13684 $225,777 0.202104
38 $154,168 0.142872 $195,738 0.177773
39 $180,770 0.165372 $53,467 0.052063
40 $112,313 0.106235 $213,920 0.192587
16.9 Risk Utility Summary 215

Figure 16.17
M N O P Q R S T U V W X Y Z
1 RiskTolerance CE Process 1 CE Process 2
2
3 $5,000 -$25,597 -$37,262 AJS
4 $10,000 $3,504 -$10,097
5 $15,000 $23,904 $10,897 $100,000
6 $20,000 $37,468 $26,409
Process 2
7 $25,000 $46,811 $38,010
8 $30,000 $53,528 $46,998 $80,000
9 $35,000 $58,541 $54,184
10 $40,000 $62,404 $60,067 Process 1
$60,000
11 $45,000 $65,459 $64,972
Certainty Equivalent

12 $50,000 $67,930 $69,122


13 $55,000 $69,966 $72,675 $40,000
14 $60,000 $71,672 $75,749
15 $65,000 $73,119 $78,431
16 $70,000 $74,363 $80,791 $20,000
17 $75,000 $75,443 $82,882
18 $80,000 $76,389 $84,746
19 $85,000 $77,224 $86,417 $0
20 $90,000 $77,966 $87,924
21 $95,000 $78,631 $89,288
-$20,000
22 $100,000 $79,229 $90,529
23
24 Formulas -$40,000
25 N2 =J8 $0 $10,000 $20,000 $30,000 $40,000 $50,000 $60,000 $70,000 $80,000 $90,000 $100,000
26 O2 =K8 Risk Tolerance Parameter for Exponential Utility
27
28 Data Table
29 I2 Column Input Cell
30

16.9 RISK UTILITY SUMMARY


Concepts
Strategy, Payoff Distribution, Certainty Equivalent

Figure 16.18 Utility Function

Utility Function

1.0
Utility or Expected Utility, U(x)

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-50000 -25000 0 25000 50000 75000 100000 125000 150000
Value or Certainty Equivalent, x
216 Chapter 16 Modeling Attitude Toward Risk

Fundamental Property of Utility Function


The utility of the CE of a lottery equals the expected utility of the lottery's payoffs.
U(CE) = EU = p1*U(x1) + p2*U(x2) + p3*U(x3)

Using a Utility Function To Find the CE of a Lottery


1. U(x): Locate each payoff on the horizontal axis and determine the corresponding
utility on the vertical axis.
2. EU: Compute the expected utility of the lottery by multiplying each utility by its
probability and summing the products.
3. CE: Locate the expected utility on the vertical axis and determine the
corresponding certainty equivalent on the horizontal axis.

Exponential Utility Function


General form: U(x) = A − B*EXP(−x/RT)
Parameters A and B affect scaling.
Parameter RT (RiskTolerance) depends on risk attitude and affects curvature.
Inverse: CE = −RT*LN[(A−EU)/B]

TreePlan's Simple Form of Exponential Utility


Set A and B equal to 1.
U(x) = 1 − EXP(−x/RT)
CE = −RT*LN(1−EU)

Approximate Assessment of RiskTolerance


Refer to the Clemen textbook, Figure 13.12, on page 478.
16.9 Risk Utility Summary 217

Figure 16.19 Assessing ApproximateRisk Tolerance


Risk tolerance parameter for an exponential utility function is approximately
equal to the maximum amount Y for which the decision maker will play.

0.5
Heads
+$Y +$10 +$100 +$1,000 +$10,000 +$100,000 +$200,000 +$300,000
Play

0.5
Tails
-$Y/2 -$5 -$50 -$500 -$5,000 -$50,000 -$100,000 -$150,000

Don't
$0 $0 $0 $0 $0 $0 $0 $0

more less
risk risk
aversion aversion

Exact Assessment of RiskTolerance


The RISKTOL.XLA Excel add-in file adds the following function to the Math & Trig
function category list:
RISKTOL(WorsePayoff,CertEquiv,BetterPayoff,BetterProb)
The first three arguments are required, and the last argument is optional with default
value 0.5. WorsePayoff and BetterPayoff are payoffs of an assessment lottery, and
CertEquiv is the decision maker's certainty equivalent for the lottery.
RISKTOL returns #N/A if there are too few or too many arguments, #VALUE! if
WorsePayoff >= CertEquiv, or CertEquiv >= Better Payoff, or BetterProb (if specified)
<= 0 or >= 1, and #NUM! if the search procedure fails to converge.
For example, consider a 50-50 lottery with payoffs of $100,000 and $0. A decision maker
has decided that the certainty equivalent is $43,000. If you open the RISKTOL.XLA file
and type =RISKTOL(0,43000,100000) in a cell, the result is 176226. Thus, the value of
the RiskTolerance parameter in an exponential utility function for this decision maker
should be 176226.

Using Exponential Utility for TreePlan Rollback Values


1. Select a cell, and enter a value for the RiskTolerance parameter.
2. With the cell selected, choose Insert Name | Define, and enter RT.
3. From TreePlan's Options dialog box, select Use Exponential Utility. The new
decision tree diagram includes the EXP and LN functions for determining U(x)
and the inverse.
218 Chapter 16 Modeling Attitude Toward Risk

Using Exponential Utility for a Payoff Distribution


Enter the exponential utility function directly, using the appropriate value for
RiskTolerance. If the payoff values are equally-likely, use the AVERAGE function to
determine the expected utility; otherwise, use SUMPRODUCT. Enter the inverse
function directly to obtain the certainty equivalent.
Part 4 Data Analysis

Part 4 reviews basic concepts of data analysis and uses multiple regression to model
relationships for both cross-sectional and time series data.
The spreadsheet analysis uses Excel's standard Analysis ToolPak. Several chapters
include step-by-step instructions for descriptive statistics, histograms, and multiple
regression.
220

This page is intentionally mostly blank.


Introduction to
Data Analysis
17
Why analyze data? understand and explain past variation
predict future observations
measure relationships among variables
object of analysis: person, thing, business entity, etc.
characteristic of interest: weight, hair color, diameter, sales, etc.
measurement of the characteristic: pounds, blond/brunette/red/etc., inches, dollars, etc.

17.1 LEVELS OF MEASUREMENT


called measurement scales by some authors
important distinctions because analysis and summary methods are very different
two general levels of measurement, each with two specific levels

Categorical Measure
also called qualitative measure
assign a category level to each object of analysis
Nominal Measure: simple classification, "assign a name"
Ordinal Measure: ranked categories, "assign an ordered classification"

Numerical Measure
also called quantitative measure
assign a numerical value to each object of analysis
Interval Measure:, rankings and numerical differences are meaningful
222 Chapter 17 Introduction to Data Analysis

Ratio Measure: natural zero and numerical ratios are meaningful

17.2 DESCRIBING CATEGORICAL DATA


List each categorical level with frequencies (counts) or relative frequencies (percentages).
Use an Excel pivot table to obtain frequencies.
Use an Excel bar chart, column chart, or pie chart.
To display the relationship between two categorical measures, use a two-way
classification table.
For nominal data, the appropriate summary measure is the mode (most frequently
occurring level)
For ordinal data, the appropriate summary measures are the mode and median (the
middle-ranked category level with approximately 50% of the counts below and
approximately 50% above).
Do not assign meaningless numerical values to the categorical levels.
Do not use the mean and standard deviation.

17.3 DESCRIBING NUMERICAL DATA


Frequency Distribution and Histogram
Determine the range (maximum minus minimum), generally use between 5 and 15
equally-spaced intervals, and pick "nice" numbers for the upper limit of each interval
(Excel "bins").
Use Excel's Histogram analysis tool, or use Excel's FREQUENCY array-entered
worksheet function with an Excel Column chart (vertical bars).

Numerical Summary Measures


Appropriate summary measures for central tendency ("What's a typical value?") include
mean (average, most appropriate for mound-shaped data), median, and mode.
Appropriate summary measures for dispersion ("How typical is the typical value?")
include range, standard deviation (most appropriate for mound-shaped distributions), and
fractiles (first quartile, or 25th percentile, is a value with approximately 25% of the
values below it and approximately 75% of the values above).
17.3 Describing Numerical Data 223

Appropriate summary measures for shape are Excel's SKEW worksheet function and
Pearson's coefficient of skewness.

Distribution Shapes

Figure 17.1 Positively Skewed Distribution (Skewed to the Right)


Frequency

Value

In a distribution with positive skew, the mean is greater than the median.

Figure 17.2 Negatively Skewed Distribution (Skewed to the Left)


Frequency

Value
224 Chapter 17 Introduction to Data Analysis

In a distribution with negative skew, the mean is less than the median.

Figure 17.3 Mound-Shaped Distribution (Symmetric)


Frequency

Value

In a symmetric distribution, the mean and median are equal.

Figure 17.4 Bimodal Distribution


Frequency

Value

In a bimodal distribution, there is often a distinguishing characteristic for the two groups
of data that have been combined into a single distribution.
Univariate Numerical Data
18
Excel includes several analysis tools useful for summarizing single-variable data. The
Descriptive Statistics analysis tool provides measures of central tendency, variability, and
skewness. The Histogram analysis tool provides a frequency distribution table,
cumulative frequencies, and the histogram column chart.
These tools are appropriate for data without any time dimension. If the data were
collected over time, first examine a time sequence plot of the data to detect patterns. If
the time sequence plot appears random, then the univariate tools may be used to
summarize the data.
If the Data Analysis command doesn't appear on the Tools menu, choose the Add-Ins
command from the Tools menu; in the Add-Ins Available list box, check the box next to
Analysis Tools. If Analysis Tools doesn't appear in the Add-Ins Available list box, you
may need to add the Analysis ToolPak through a custom installation using the Microsoft
Excel Setup program.

18.1 ANALYSIS TOOL: DESCRIPTIVE STATISTICS


Example 18.1 The operating costs of the vehicles used by your company's salespeople
are too high. A major component of operating expense is fuel costs; to analyze fuel costs,
you collect mileage data from the company's cars for the previous month. Later you may
examine other characteristics of the cars-for example, make, model, driver, or routes.
The following steps describe how to use Excel's Descriptive Statistics analysis tool.
1. Open a new worksheet and enter the gas mileage data in column A as shown in
Figure 18.1. Be sure the values in your data set are entered in a single column on
the worksheet, with a label in the cell just above the first value. Excel uses this
label in the report on summary values.
2. From the Tools menu, choose the Data Analysis command. The Analysis Tools
dialog box is shown in Figure 18.1.
226 Chapter 18 Univariate Numerical Data

Figure 18.1 Analysis Tools Dialog Box

3. Double-click Descriptive Statistics. The dialog box for Descriptive Statistics


appears as shown in Figure 18.2, with prompts for inputs and outputs.
4. Input Range: Enter the reference for the range of cells containing the data,
including the labels for the data sets. In Example 18.1 either type A1:A18 or
click on cell A1 and drag to cell A18 (in which case $A$1:$A$18 appears as the
input range). Press the Tab key to move to the next field of the dialog box. Do
not press Enter or click OK until all the boxes are filled.
5. Grouped By: Click Columns for this example (if the data were arranged in rows
on the worksheet, you would choose Rows).
6. Labels in First Row (or Labels in First Column, where the data are arranged
in rows): Select this checkbox because the input range in this example includes a
label.
18.1 Analysis Tool: Descriptive Statistics 227

Figure 18.2 Descriptive Statistics Dialog Box

7. Output Range: Click the option button, click the adjacent edit box, and specify
a reference for the upper-left cell of the range where the descriptive statistics
output should appear, either by typing C1 or by clicking on cell C1 (in which
case $C$1 appears as the output range as shown in this example). Alternatively,
you can choose to send the output to a new sheet in the current workbook or to a
new sheet in a new workbook.
8. Summary statistics: This feature is the primary reason for using the Descriptive
Statistics analysis tool, so it should be selected. The summary statistics require
two columns in the output range for each data set.
9. Confidence Level for Mean: Select this checkbox to see the half-width of a
confidence interval for the mean, and type a number in the % edit box for the
desired confidence level. This example requests the half-width for a 90%
confidence interval.
10. Kth Largest: Select this checkbox if you want to know the kth largest value in
the data set, and type a number for k in the Kth Largest edit box. This example
requests the fourth largest value.
11. Kth Smallest: Select this checkbox to get the kth smallest value in the data set
and type a number for k in the Kth Smallest edit box. This example requests the
fourth smallest value.
228 Chapter 18 Univariate Numerical Data

12. When finished, click OK. Excel computes the descriptive statistics and puts the
results in the output range.

Formatting the Output Table


The following steps describe how to change the column width and numerical display for
the descriptive statistics output table.
1. To adjust column C's width to fit the longest entry, double-click the column
heading border between C and D. To adjust column D's width, double-click the
column heading border between D and E. (Alternatively, select columns C and
D. From the Format menu, choose the Column command and choose AutoFit
Selection.)
Some of the values in the output table are displayed with nine decimal places. To make
the table easier to read, select cells, even noncontiguous ones, as a group and reformat
them with fewer decimal places.
2. First select the Mean and Standard Error values in cells D3 and D4. (Click on
D3, drag to cell D4, and release the mouse button.) Then hold down the Control
key, and click on cell D7, drag to cell D10, and release. Finally, hold down the
Control key, and click on cell D18. To decrease the number of decimal places
displayed, repeatedly click on the Decrease Decimal button until the selected
cells show three decimal places. (Alternatively, select the nonadjacent cells as
described and choose the Cells command from the Format menu. In the Format
Cells dialog box, select the Number tab. In the Category list box, select Number.
Type 3 in the Decimal Places edit box, or click the spinner controls until 3
appears, and click OK.)
3. To adjust column D's width to fit the longest entry, double-click the column
heading border between D and E.
The results are shown in columns A through D in Figure 18.3. The values in column D
are static. If the data values in column A are changed, these results are not automatically
updated. You must use the Descriptive Statistics command again to obtain updated
results.
Column F in Figure 18.3 shows the worksheet functions that would produce the same
results shown in column D. The worksheet functions are dynamic. If the data values in
column A are changed, the result of a worksheet function is automatically recalculated
(unless you have selected manual calculation using Tools | Options | Calculation |
Manual). A worksheet function is useful if you want dynamic recalculation or if you don't
want all of the summary statistics.
A worksheet usually displays the results of formulas, not the formulas themselves. If you
want to see all formulas, choose Tools | Options | View | Formulas. However, the formula
18.1 Analysis Tool: Descriptive Statistics 229

view uses different column widths and formatting for the entire worksheet. To display
only specific formulas, put a single quotation mark before the equal sign so that Excel
displays the cell contents as text, as shown in column F in Figure 18.3.

Figure 18.3 Descriptive Statistics Output

Interpreting Descriptive Statistics


The output table contains three measures of central tendency: mean, median, and mode.
The mean gas mileage is 23.471 mpg, computed by dividing the sum (399) by the count
(17).
The median is the middle-ranked value, here 21 mpg. Thus, approximately half of the
cars have gas mileage greater than 21 mpg, and approximately half get less than 21 mpg.
If the 17 values are sorted, and ranks 1 through 17 are assigned to the sorted values, then
the middle-ranked value is the ninth value, 21 mpg. There are 8 values below this ninth-
ranked value and 8 values above. (In a data set with an odd number of values, n, the
median is the value with rank (n + 1)/2. In a data set with an even number of values, the
median is a value halfway between the two middle values with ranks n/2 and n/2 + 1.)
The mode is the most frequently occurring value, reported here as 21 mpg. Actually, the
value 21 mpg appears twice and the value 19 mpg also appears twice, so there are two
modes. When two or more values have the same number of duplicate values (multiple
modes), Excel reports the value that appears first in your data set.
In some data sets, each value may be unique, in which case each value is a mode, and
Excel reports "#N/A." Where this occurs, first develop a frequency distribution and then
report a range of values with the highest frequency; this result is termed a modal interval.
The output table contains several measures of variation. The range (33 mpg) equals the
maximum (41 mpg) minus the minimum (8 mpg). In some data sets the range may be a
230 Chapter 18 Univariate Numerical Data

misleading measure of variation because it is based only on the two most extreme values,
which may not be representative.
The sample standard deviation (9.214 mpg) is the most widely used measure of variation
in data analysis. For each value in the data set the deviation between the value and the
mean is computed. Each deviation is squared, and the squared deviations are summed.
The sum of the squared deviations is divided by the count minus one (that is, n – 1),
obtaining the sample variance (84.890). The standard deviation equals the square root of
the variance.
The standard deviation has the same units or dimensions as the original values: mpg, in
this example. The variance is expressed in squared units: squared miles per gallon. The
standard deviation and variance reported in the output table are the sample standard
deviation and sample variance, computed using n – 1 in the denominator. To determine
the population standard deviation and population variance, computed using n in the
denominator, use the STDEVP and VARP worksheet functions.
The largest(4) and smallest(4) values in the output table are the fourth largest (33 mpg)
and fourth smallest (16 mpg) gas mileage values. To obtain similar results for all values
in the data set, use the Rank and Percentile analysis tool. These values correspond to
approximately the 75th percentile (third quartile) and 25th percentile (first quartile) in the
data set of 17 values. Interpolated values for the third and first quartiles are obtained
using the QUARTILE worksheet function, =QUARTILE(A2:A18,3) and
=QUARTILE(A2:A18,1), respectively.
The standard error of the mean (2.235 mpg) equals the sample standard deviation
divided by the square root of the sample size. The standard error is a measure of
uncertainty about the mean, and it is used for statistical inference (confidence intervals
and hypothesis tests).
The value shown for the confidence level (90.0%) (3.901 mpg) is the half-width of a 90%
confidence interval for the mean. The specified confidence level, 90% in this example,
corresponds to t = 1.746 for the t distribution with 10% in the sum of two tails and n – 1
= 17 – 1 = 16 degrees of freedom. The half-width of a confidence interval is t times the
standard error—that is, 1.7459 times 2.2346 mpg, or 3.901 mpg.
A 90% confidence interval for the mean extends from the mean minus the half-width to
the mean plus the half-width—that is, from 23.471 – 3.901 to 23.471 + 3.901, or
approximately 19.6 to 27.4 mpg. Therefore, if we think of these 17 cars as a random
sample from a larger population, we can say there is a 90% chance that the unknown
population mean is between 19.8 and 27.1 mpg.
Kurtosis measures the degree of peakedness in symmetric distributions. If a symmetric
distribution is flatter than the normal distribution—that is, if there are more values in the
tails than a corresponding normal distribution—the kurtosis measure is positive. If the
18.1 Analysis Tool: Descriptive Statistics 231

distribution is more peaked than the normal distribution—that is, if there are fewer values
in the tails—the kurtosis measure is negative. In this example, the distribution is
approximately symmetric with negative kurtosis (–0.547). (Excel computes the kurtosis
value using the fourth power of deviations from the mean. For details, search Help for
"KURT function.")
Skewness refers to the lack of symmetry in a distribution. If there are a few extreme
values in the positive direction, we say the distribution is positively skewed, or skewed to
the right. If there are a few extreme values in the negative direction, the distribution is
negatively skewed, or skewed to the left. Otherwise, the distribution is symmetric or
approximately symmetric. In this example, the measure is positive (+0.361). (Excel
computes the skewness value using the third power of deviations from the mean. For
details, search Help for "SKEW function.")

Another Measure of Skewness


Pearson's coefficient of skewness is a simple alternative to Excel's measure of skewness.
Pearson's coefficient is defined as 3 * (mean – median) / standard deviation. The mean is
affected by extreme values in a data set. Extreme values in the positive direction cause
the mean to be greater than the median, in which case Pearson's coefficient has a positive
value. Extreme values in the negative direction cause the mean to be less than the
median, in which case the coefficient is negative. The constant 3 and the standard
deviation in Pearson's coefficient affect the scaling and allow comparison of one
distribution with another.
Follow these steps to compute Pearson's coefficient of skewness on your worksheet.
1. Select a blank cell (F10) and enter the formula =3*(D3-D5)/D7. Click the
Decrease Decimal button to display three decimal places.
2. Enter the label Pearson's Coefficient of Skewness in cells F6 through F9.
3. If you want to document the formula using names, select cells C3:D7. From the
Insert menu, choose Name | Create; in the Create Names dialog box, check
Create Names in Left Column and click OK. Then select the cell containing the
formula (F10) and from the Insert menu choose Name | Apply. In the Apply
Names list box, select all names and click OK.
The result is shown in Figure 18.4.
232 Chapter 18 Univariate Numerical Data

Figure 18.4 Pearson's Coefficient of Skewness

The following guidelines apply to Pearson's Coefficient of Skewness and to Excel's


SKEW worksheet function:
Pearson's Skew < –0.5 Excel's SKEW < –1 negatively skewed
–0.5 ≤ Pearson's Skew ≤ +0.5 –1 ≤ Excel's SKEW ≤ +1 approximately symmetric
Pearson's Skew > +0.5 Excel's SKEW > +1 positively skewed
For the small data set of Example 18.1, the value 0.804 for Pearson's Coefficient of
Skewness indicates that the data are positively skewed, and the value 0.361 for Excel's
SKEW worksheet function (shown in the Descriptive Statistics output) indicates that the
data are approximately symmetric with only slight positive skew. For larger data sets, the
two measures usually produce the same conclusion.
18.2 Analysis Tool: Histogram 233

18.2 ANALYSIS TOOL: HISTOGRAM


The Histogram analysis tool determines a frequency distribution table for your data and
prepares a histogram chart. In addition to individual frequencies there is an option to
include cumulative frequencies in the results.
You should determine the intervals of the distribution before using this tool. Otherwise,
Excel will use a number of intervals approximately equal to the square root of the number
of values in your data set, with equal-width intervals starting and ending at the minimum
and maximum values of your data set. If you specify the intervals yourself, you can use
numbers that are multiples of two, five, or ten-which are much easier to analyze.
To determine intervals, first use the Descriptive Statistics analysis tool to determine the
minimum and maximum values of the data set. Alternatively, enter the MIN and MAX
functions on your worksheet. Use these extreme values to help determine the limits for
your histogram's intervals. Usually 5 to 15 intervals are used for a histogram.
For the gas mileage data, the minimum is 8 and the maximum is 41. A compact
histogram could start the first interval at 5, use an interval width of 5, and finish the last
interval at 45, requiring 8 intervals. The approach used here adds an empty interval at
each end; at the low end is an interval "5 or less," and at the high end is an interval "more
than 45."
Excel refers to the maximum value for each interval as a bin. Here, the first bin is 5, and
the interval will contain all values that are 5 or less. The Histogram tool automatically
adds an interval labeled "More" to the bins you specify. Here, the last bin specified is 45,
and the last interval (More) will contain all values greater than 45.
Refer to Figure 18.5 and follow these steps to obtain the frequency distribution and
histogram.
1. Hide columns B through F. (Select columns B through F by clicking on B and
dragging to F. Right-click and select Hide from the shortcut menu. To unhide
the columns, select the two adjacent columns, A and G, right-click, and select
Unhide. If column A is hidden, click the Select All button in the top-left corner
at the intersection of the row and column headings, right-click a column
heading, and select Unhide.)
2. Enter Bin as a label in cell H1, enter 5 in cell H2, and enter 10 in cell H3. Select
H2:H3. Drag the AutoFill square in the lower-right corner of the selected range
down to cell H10.
3. From the Tools menu, choose the Data Analysis command and choose
Histogram from the Analysis Tools list box.
234 Chapter 18 Univariate Numerical Data

Figure 18.5 Bins and Histogram Dialog Box

4. Input Range: Enter the reference for the range of cells containing the data
(A1:A18), including the label.
5. Bin Range: Enter the reference for the range of cells containing the values that
separate the intervals (H1:H10), including the label. These interval break points,
or bins, must be in ascending order.
6. Labels: Check this box to indicate that labels have been included in the
references for the input range and bin range.
7. Output Range: Enter the reference for the upper-left cell of the range where
you want the output table to appear (I1). The combined table and chart output
requires approximately ten columns.
8. Pareto: To obtain a standard frequency distribution and chart, clear the Pareto
checkbox. If this box is checked, the intervals are sorted according to
frequencies before preparing the chart. (In this example the box has been
cleared.)
9. Cumulative Percentage: Check this box for cumulative frequencies in addition
to the individual frequencies for each interval. (In this example the box has been
cleared.)
10. Chart Output: Check this box to obtain a histogram chart in addition to the
frequency distribution table on the worksheet. (In this example the box has been
checked.)
18.2 Analysis Tool: Histogram 235

11. After you provide inputs to the dialog box, click OK. (If you receive the error
message "Cannot add chart to a shared workbook," click the OK button. Then
click New Workbook under Output in the Histogram dialog box. Use the Edit |
Move or Copy Sheet command to copy the results to the original workbook.)
Excel puts the frequency distribution and histogram on the worksheet. As shown in
Figure 18.6, the output table in columns I and J includes the original bins specified. These
bins are actually the upper limit for each interval; that is, the bins are actually bin
boundaries.
For example, the interval associated with bin value 15 (cell I4) includes mileage values
strictly greater than 10 (the previous bin value) and less than or equal to 15. There are
two such mileage values in this data set: 12 mpg and 15 mpg. Thus, for bin value 15 the
frequency is 2 (cell J4).

Figure 18.6 Histogram Output Table and Chart

Histogram Embellishments
To make the chart more like a traditional histogram and easier to interpret, make the
following changes.
1. Legend: Because only one series is shown on the chart, a legend isn't needed.
Click on the legend ("Frequency" on the right side of the chart) and press the
Delete key.
2. Plot area pattern: The plot area is the rectangular area bounded by the x and y
axes. Double-click the plot area (above the bars); in the Format Plot Area dialog
box, change Border to None and change Area to None. Click OK.
3. Y-axis labels: If you resize the chart vertically, intermediate values (0.5, 1.5,...)
may appear on the y axis, but frequencies must be integer values. Double-click
the y-axis (value axis); in the Format Axis dialog box on the Scale tab, set the
Major Unit and Minor Unit values to 1. Click OK.
236 Chapter 18 Univariate Numerical Data

4. Bar width: In traditional histograms, the bars are adjacent to each other, not
separated. Double-click one of the bars; in the Format Data Series dialog box on
the Options tab, change the gap width from 150% to 0%. Click OK.
5. X-axis labels: Double-click the x-axis (category axis); in the Format Axis dialog
box on the Alignment tab, double-click the Degrees edit box and type 0 (zero).
With this setting, the x-axis labels will be horizontal even if the chart is resized.
Click OK.
6. Chart title: Click on Histogram (chart title). Type Distribution of Gas
Mileage, hold down Alt and press Enter, type for 17 cars, and press Enter.
Click the Bold button to change from bold to normal type.
7. Y-axis title: Click on Frequency (value axis title). Click the Bold button to
change from bold to normal type.
8. X-axis title: Click on Bin. Enter Interval Maximum, in miles per gallon. Click
the Bold button to change from bold to normal type. Excel puts the x-axis values
at the center of each interval, not at the marks that separate the intervals. This
title makes it clear to the reader that these values are the maximum ones for each
interval.
9. Bar color: Columns in a dark color may print as black with no gaps, in which
case it is difficult to see the boundaries. Click on the center of one of the
columns to select the data series. Click the right mouse button, choose Format
Data Series, and click the Patterns tab. In the dialog box, leave Border at
Automatic and change Area from Automatic to None. Click OK.
To move the chart, click just inside the chart's outer border (chart area) and drag the chart
to the desired location. To resize the chart, first click the chart area and then click and
drag one of the eight handles.
When you first create a chart, Excel uses automatic scaling for the font sizes of the chart
title, the axis titles, and the axis labels. When you resize the chart, the font sizes change
and the number of axis labels displayed may change. For example, if the axis labels on
the horizontal axis have a large font size and you resize the chart to be narrow, perhaps
only every other axis label will be displayed.
One approach to chart and font sizing is to first decide the size of the chart. For this
example the chart is 6 columns wide using the standard column width of 8.43 and 14
rows high. The font size of the three titles is Arial 10, and the font size of the two axes is
Arial 8 so that all axis labels are displayed. The resulting histogram chart is shown in
Figure 18.7.
18.3 Better Histograms Using Excel 237

Figure 18.7 Histogram Chart with Embellishments

18.3 BETTER HISTOGRAMS USING EXCEL


Figure 18.8 Better Histogram Chart

Histogram

4
Frequency

0
0 5 10 15 20 25 30 35 40 45 50

Miles Per Gallon

A histogram is usually shown in Excel as a Column chart type (vertical bars). The labels
of a Column chart are aligned under each bar as shown in Figure 18.7, and there is no
238 Chapter 18 Univariate Numerical Data

Excel feature for changing the alignment. A better histogram has a horizontal axis with
numerical labels aligned under the tick marks between the bars as shown in Figure 18.8.
To download a free Excel add-in for automatically creating a better histogram from data
on a worksheet or to view step-by-step instructions for creating a better histogram using
Excel's built-in features, go to the Better Histograms page at www.treeplan.com.

EXERCISES
Exercise 18.1 Construct a frequency distribution and histogram for the following selling
prices of 15 properties:
$26,000 $38,000 $43,600
31,000 39,600 44,800
37,400 31,200 40,600
34,800 37,200 41,800
39,200 38,400 45,200
Use intervals $5,000 wide starting at $25,000. Comment on the symmetry or skewness of
the selling prices.
Exercise 18.2 Determine measures of central tendency and dispersion for the selling
prices of the 15 properties in Exercise 18.1. Which measure(s) of central tendency should
be used to describe a typical selling price? What is the mode or modal interval?
Exercise 18.3 To verify the symmetry or skewness observed in Exercise 18.1, calculate
Pearson's coefficient of skewness.
Bivariate Numerical Data
19
A scatterplot is useful for examining the relationship between two numerical variables. In
Excel this kind of chart is called an XY (scatter) chart; other names include scatter
diagram, scattergram, and XY plot. Such a graphical display is often the first step before
fitting a curve to the data using a regression model.
Example 19.1 (Adapted from Cryer, p. 139) The data shown in Figure 19.1 were
collected in a study of real estate property valuation. The 15 properties were sold in a
particular calendar year in a particular neighborhood in a city stratified into a number of
neighborhoods. Although the data displayed are from a single year, similar data are
available for each neighborhood for a number of years. Cryer's RealProp.dat file contains
4 variables for 60 observations; these 15 properties are the first and every fourth
observation.
Because we expect that selling price might depend on square feet of living space, selling
price becomes the dependent variable and square feet the explanatory variable. Some call
the dependent variable the response variable or the y variable. Similarly, other terms for
the explanatory variable are predictor variable, independent variable, or the x variable.
Our initial purpose is to visually examine the relationship between the square feet of
living space and the selling price of the parcels. Then we will calculate two summary
measures, correlation and covariance, using both the analysis tool and functions. Finally,
we will include a third variable, assessed value of the property, and use the analysis tool
to compute pairwise correlations. In subsequent chapters we will fit straight lines and
curves to these same data using regression models.
240 Chapter 19 Bivariate Numerical Data

Figure 19.1 Initial XY (Scatter) Chart

19.1 XY (SCATTER) CHARTS


The following steps describe how to create and embellish a scatterplot using Excel's
Chart Wizard.
1. Arrange the data in columns on a worksheet with the x values (for the horizontal
axis) on the left and the y values (for the vertical axis) on the right as shown in
Figure 19.1. If the x variable is not on the left, insert a column on the left, select
the x data, and click and drag to move the x data to the column on the left.
2. Select the x and y values (A2:B16). Do not include the labels above the data.
3. Click on the Chart Wizard tool.
4. In step 1 (Chart Type) of the Chart Wizard on the Standard Types tab, select XY
(Scatter) in the Chart Type list box and verify that the chart sub-type is "Scatter.
Compares pairs of values." Click on the wide button Press and Hold to View
Sample to preview the chart. Click Next.
5. In step 2 (Chart Source Data) on the Data Range tab, verify that cells A2:B16
were selected and that Excel is treating the data series as columns. (If you don't
select the data range before starting the Chart Wizard, you can enter the data
range in this step.) On the Series tab, verify that Excel is using cells A2:A16 for
x values and cells B2:B16 as y values. (If the data ranges for the x and y values
aren't correct, you can specify their locations here.) Click Next.
19.1 XY (Scatter) Charts 241

6. In step 3 (Chart Options) on the Titles tab, select the Chart Title edit box and
type Real Estate Properties. Don't press Enter; use the mouse or Tab key to
move among the edit boxes. Type Living Space, in Sq. Ft. for the value (x) axis
title (the horizontal axis), and Selling Price, in Thousands of Dollars for the
value (y) axis title (the vertical axis).
7. In step 3 (Chart Options) on the Gridlines tab, clear all checkboxes.
8. In step 3 (Chart Options) on the Legends tab, clear the checkbox for Show
Legend. (With only one set of data on the chart, a legend is not needed.) Click
Next.
9. In step 4 (Chart Location), verify that you want to place the chart as an object in
the current worksheet. Click Finish.
The chart is embedded on the worksheet, as shown in Figure 19.1. The property data
show a general positive relationship; more living space is associated with a higher selling
price, on the average. Follow steps 10 through 12 to obtain the embellished scatterplot
shown in Figure 19.2.

Figure 19.2 Final XY (Scatter) Chart

10. Change the x-axis to display 400 to 1400 square feet. Select the value (x) axis.
Right-click, choose Format Axis from the shortcut menu, and click the Scale
tab. Type 400 in the Minimum edit box, 1400 in the Maximum edit box, and 200
in the Major Unit edit box. Click OK.
11. Change the y-axis to display 20 to 50 thousands of dollars. Select the value (y)
axis. Right-click, choose Format Axis from the shortcut menu, and click the
242 Chapter 19 Bivariate Numerical Data

Scale tab; type 20, 50, and 10 in the Minimum, Maximum, and Major Unit edit
boxes. Click the Number tab and set Decimal Places to zero. Then click OK.
12. To obtain the appearance shown in Figure 19.2, click just inside the outer border
of the chart to select the chart area. Click and drag the sizing handles so the
chart is approximately 6 standard column widths by 15 rows. Click the chart title
and choose Arial Bold 12 from the formatting toolbar. For each horizontal and
vertical axis and title, click the chart object and choose Arial Regular 10 from
the formatting toolbar. Double-click the y-axis title and change the space after
the comma to a carriage return. Double-click the grey plot area and change the
pattern for both border and area to None. Select the Price data (B2:B16) and
click the Increase Decimal button several times so that three significant figures
are displayed to the right of the decimal point.

19.2 ANALYSIS TOOL: CORRELATION


The correlation coefficient is a useful summary measure for bivariate data, in the same
sense that the mean and standard deviation are useful summary measures for univariate
data. The possible values for the correlation coefficient range from –1 (exact negative
correlation, with all points falling on a downward-sloping straight line) through 0 (no
linear relationship) to +1 (exact positive correlation, with all points falling on an upward-
sloping straight line). The correlation coefficient measures only the amount of straight-
line relationship; a strong curvilinear relationship (a U-shaped pattern, for example)
might have a correlation coefficient close to zero. The long name for the correlation
coefficient is "Pearson product moment correlation coefficient," which is often shortened
to simply "correlation."
The following steps describe how to obtain the correlation coefficient using the analysis
tool.
1. Enter the x and y data in a worksheet as shown in columns A and B of Figure
19.3 and enter Analysis Tool: Correlation in cell D1.
2. From the Tools menu, choose Data Analysis. From the Data Analysis dialog
box, select Correlation in the Analysis Tools list box and click OK.
3. In the Input section of the Correlation dialog box, specify the location of the
data in the Input Range edit box, including the labels (A1:B16). Verify that the
data is grouped in columns and be sure the Labels in First Row box is checked.
4. In the Output options section, click the Output Range button, select the Range
edit box, and specify the upper-left cell where the correlation output will be
located (D2).
19.2 Analysis Tool: Correlation 243

5. Click OK. The output appears in cells D2:F4 as shown in Figure 19.3. (The
discussions of CORREL function and covariance outputs follow.)
The output is a matrix of pairwise correlations. The diagonal values are 1, indicating that
each variable has perfect positive correlation with itself. The value 0.814651 is the
correlation of Price and SqFt. The upper-right section is blank, because its values would
be the same as those in the lower-left section.
The following steps describe how to use Excel's CORREL function to determine the
correlation.
1. Enter CORREL Function in cell D6.
2. Select cell D7. Click the insert Function tool button (icon fx). In the Insert
Function dialog box, select Statistical in the category list box. In the function list
box, select CORREL. Then click OK.
3. To move the CORREL dialog box, click in any open area and drag. Select the
Array1 edit box, and click and drag on the worksheet to select A2:A16. Select
the Array2 edit box, and click and drag to select B2:B16. Do not include the text
labels in row 1 in either selection. Then click OK.
The value of the correlation coefficient appears in cell D7. Alternatively, you could have
entered the formula =CORREL(A2:A16,B2:B16) by typing or by a combination of
typing and pointing. Unlike the static text output of the analysis tool, the worksheet
function is dynamic. If the data values in A2:B16 are changed, the value of the
correlation coefficient in cell D7 will change.

Figure 19.3 Bivariate Correlation and Covariance


244 Chapter 19 Bivariate Numerical Data

19.3 ANALYSIS TOOL: COVARIANCE


The covariance is another measure for summarizing the extent of the linear relationship
between two numerical variables. Unfortunately, the covariance is difficult to interpret
because its measurement units are the product of the units for the two variables. For the
selling price and living space data in Example 19.1, the covariance is expressed in units
of square feet times thousands of dollars. It is usually preferable to use the correlation
coefficient because it is scale-free. However, the covariance is used in finance theory to
describe the relationship of one stock price with another.
The covariance computed by the analysis tool is a population covariance; that is, Excel
2002 uses n in the denominator (instead of using n – 1, which would be appropriate for
sample covariance), where n is the number of data points.
The following steps describe how to obtain the covariance using the analysis tool.
1. Enter the x and y data in a worksheet as shown in columns A and B of Figure
19.3 and enter Analysis Tool: Covariance in cell D10.
2. From the Tools menu, choose Data Analysis. From the Data Analysis dialog
box, select Covariance in the Analysis Tools list box and click OK.
3. In the Input section of the Covariance dialog box, specify the location of the
data in the Input Range edit box, including the labels (A1:B16). Verify that the
data is grouped in columns and be sure the Labels box is checked.
4. In the Output Options section, click the Output Range button, select the Range
edit box, and specify the upper-left cell where the correlation output will be
located (D11).
5. Click OK. The output appears in cells D11:F13 as shown in Figure 19.3.
The output is a matrix of pairwise population covariances. The diagonal values are
population variances (the square of the population standard deviation) for each variable.
The value 914.1886 is the population covariance of Price and SqFt. The upper-right
section is blank, because its values would be the same as those in the lower-left section.
The following steps describe how to use Excel's COVAR function to determine the
population covariance.
1. Optional: Enter COVAR Function in cell D15.
2. Select cell D16. Click the Insert Function tool button (icon fx). In the Insert
Function dialog box, select Statistical in the category list box. In the function list
box, select COVAR. Then click OK.
3. To move the COVAR dialog box, click in any open area and drag. Select the
Array1 edit box, and click and drag on the worksheet to select A2:A16. Select
19.4 Correlations for Several Variables 245

the Array2 edit box, and click and drag to select B2:B16. Do not include the text
labels in row 1 in either selection. Then click OK.
The population covariance value appears in cell D16. Alternatively, you could have
entered the formula =COVAR(A2:A16,B2:B16) by typing or by a combination of typing
and pointing. If the data values in A2:B16 are changed, the population covariance value
in cell D16 will change. The covariance computed by Excel's COVAR function uses n in
the denominator. In this example, n = 15, so 853.2427 = (14/15)*914.1886.

19.4 CORRELATIONS FOR SEVERAL VARIABLES


The Correlation analysis tool is most useful for determining pairwise correlations for
three or more variables, often as an aid to selecting variables for a multiple regression
model. The following steps describe how to obtain correlations for several variables.
1. Enter the data in cells A1:C16 as shown in Figure 19.4. If the data for SqFt and
Price are already in columns A and B, select A1:B16, copy to the clipboard
(using the shortcut menu), select a new sheet, and paste into cell A1; then select
column B, choose Insert from the shortcut menu, and enter the Assessed data.
2. Optional: Enter Analysis Tool: Correlation in cell E1.

Figure 19.4 Pairwise Correlations

3. From the Tools menu, choose Data Analysis. From the Data Analysis dialog
box, select Correlation in the Analysis Tools list box and press OK. The
Correlation dialog box appears as shown in Figure 19.5.
246 Chapter 19 Bivariate Numerical Data

Figure 19.5 Correlation Dialog Box

4. In the Input section, specify the location of the data in the Input Range edit box,
including the labels (A1:C16). Verify that the data is grouped in columns and be
sure the Labels box is checked.
5. In the Output Options section, click the Output Range button, click the adjacent
edit box, and specify the upper-left cell where the correlation output will be
located (E3).
6. Click OK. The output appears in cells E3:H6 as shown in Figure 19.4.
The output shows three pairwise correlations. The highest correlation, 0.814651, is
between SqFt and Price. The correlation between Assessed and Price, 0.67537, is smaller,
indicating less of a linear relationship between these two variables. The lowest
correlation, 0.424219, is between SqFt and Assessed.
If we must use a single explanatory variable to predict selling price in a linear regression
model, these correlations suggest that SqFt is a better candidate than Assessed, because
0.814651 is higher than 0.67537. If we can use two explanatory variables to predict
selling price in a multiple regression model, both SqFt and Assessed should be useful,
and there shouldn't be a problem with multicollinearity because the correlation between
these two explanatory variables is only 0.424219.
Exercises 247

EXERCISES
Exercise 19.1 (Adapted from Keller, p. 642) An economist wanted to determine how
office vacancy rates depend on average rent. She took a random sample of the monthly
office rents per square foot and the percentage of vacant office space in ten different
cities. The results are shown in the following table.

Vacancy Monthly Rent


City Percentage per Sq. Ft.
1 10 $5.00
2 2 2.50
3 7 4.75
4 8 4.50
5 4 3.00
6 11 4.50
7 8 4.00
8 6 3.00
9 3 3.25
10 5 2.75
Arrange the data in appropriate columns and prepare a scatterplot. Does there appear to
be a positive or negative relationship between the two variables?
Exercise 19.2 Compute the correlation coefficient for the data in Exercise 19.1.
Comment on the direction and strength of the linear relationship.
Exercise 19.3 (Adapted from Canavos, p. 104) Does a student's test grade seem to
depend on the number of hours spent studying? The following table shows the number of
hours 20 students reported studying for a major test and their test grades.
Study Test Study Test
Student Hours Grade Student Hours Grade
1 5 54 11 12 74
2 10 56 12 20 78
3 4 63 13 16 83
4 8 64 14 14 86
5 12 62 15 22 83
6 9 61 16 18 81
7 10 63 17 30 88
8 12 73 18 21 87
9 15 78 19 28 89
10 12 72 20 24 93
248 Chapter 19 Bivariate Numerical Data

Arrange the data in appropriate columns and prepare a scatterplot. Does there appear to
be a positive or negative relationship between the two variables?
Exercise 19.4 Compute the correlation coefficient for the data in Exercise 19.3.
Comment on the direction and strength of the linear relationship.
One-Sample Inference
for the Mean
20
This chapter covers the basic methods of statistical inference for the mean of a single
population. These methods are appropriate for a single random sample consisting of
values for a single variable. For example, a random sample of a particular brand of tires
would be used to construct a confidence interval for the average mileage of all tires of
that brand or to test the hypothesis that the average mileage of all tires is at least 40,000
miles.

20.1 NORMAL VERSUS t DISTRIBUTION


If the values in the population have a normal distribution, and if the standard deviation of
the population values is known, then the sample means have a normal distribution.
However, due to the central limit theorem, the normal distribution is often used to
describe uncertainty about sample means when the sample size is large, even though the
population distribution may not be normal or the population standard deviation may be
unknown. A common guideline is that "large" means 30 or more.
If the values in the population have a normal distribution, and if the standard deviation of
the population values is unknown and must be estimated using the sample, then the
standardized sample means have a t distribution. The t distribution is often used for
analyzing small samples, even when the shape of the population distribution is unknown.
You can use a histogram or other methods to check that your sample data are
approximately normal. As long as the population isn't extremely skewed or otherwise
nonnormal, the t distribution is generally regarded as an adequate approximation for the
sampling distribution of means.

20.2 HYPOTHESIS TESTS


A hypothesis test is an alternative to the confidence interval method of statistical
inference. To conduct a hypothesis test, first set up two opposing hypothetical statements
describing the population. These two statements are called the null hypothesis, H0, and
250 Chapter 20 One-Sample Inference for the Mean

the alternative hypothesis, HA. Usually, the alternative hypothesis is a statement about
what we are trying to show or prove. For example, to detect if the mean of monthly
accounts is significantly less than $70, the alternative hypothesis is HA: Mean < 70.
The null hypothesis is the opposite of the alternative hypothesis-that is, H0: Mean ≥ 70 or
simply H0: Mean = 70. Using the hypothesis test method, develop the distribution of
sample results that would be expected if the null hypothesis is true. Then compare the
particular sample result with this sampling distribution. If the sample result is one that is
likely to be obtained when the null hypothesis is true, we cannot reject the null
hypothesis, and we cannot conclude that the alternative hypothesis is true. On the other
hand, if the sample result is one that is unlikely to occur when the null hypothesis is true,
reject the null hypothesis and conclude the alternative hypothesis may be true.

Left-Tail, Right-Tail, or Two-Tail


There are three kinds of hypothesis tests, depending on the direction specified in the
alternative hypothesis. If the alternative hypothesis is HA: Mean < 70, we must observe a
sample mean significantly below 70 to reject the null hypothesis and conclude that the
population mean is really less than 70. This kind of test is a left-tail test because sample
means that cause rejection of the null hypothesis are in the left tail of the sampling
distribution.
If we are trying to show that the average breaking strength of steel rods is greater than
500 pounds (HA: Mean > 500), then a right-tail test is appropriate. In this case, we must
observe a sample mean significantly greater than 500 to reject the null hypothesis.
If we are trying to detect a change in either direction instead of a single direction, then a
two-tail test is appropriate. For example, an insurance company may want to determine
whether the actual mean commission payment to its agents differs from the previously
planned $32,000 per year. In this situation, the null hypothesis specifies "no change" or
"no difference," for example, H0: Mean = 32,000, and the alternative hypothesis is HA:
Mean ≠ 32,000. We can reject the null hypothesis if we observe a sample mean either
significantly above 32,000 or significantly below 32,000.

Decision Approach or Reporting Approach


There are two ways to summarize the results of a hypothesis test. Using the decision
approach, the decision maker must specify a significance level or alpha. Typical
significance levels are 10%, 5%, or 1%. This value is the probability in the left tail, right
tail, or sum of two tails of the sampling distribution; it determines the region of sample
means in which we reject the null hypothesis. In effect, the significance level specifies
what the decision maker regards as "close" or "far away" with regard to the null
hypothesis. A smaller significance level (for example, 1% instead of 5%) requires that the
sample mean must be farther away from the hypothesized population mean to reject the
20.2 Hypothesis Tests 251

null hypothesis. The end result of using this approach is a decision to either reject or not
reject the null hypothesis.
The other way to summarize the results of a hypothesis test is to report a p-value
(probability value, or prob-value). Using this reporting approach, we do not specify a
significance level or make a decision about rejecting the null hypothesis. Instead, we
simply report how likely it is that the observed sample result, or a sample result more
extreme, could be obtained if the null hypothesis is true. In a left-tail or right-tail test, we
report the probability in a single tail; in a two-tail test, we report the probability of
obtaining a difference (between the observed sample mean and the hypothesized
population mean) in either direction. A small p-value is associated with a more extreme
sample result-that is, a sample mean that is significantly different from the hypothesized
population mean.
252 Chapter 20 One-Sample Inference for the Mean

This page is intentionally mostly blank.


Simple Linear Regression
21
Simple linear regression can be used to determine a straight-line equation describing the
average relationship between two variables. Three methods are described in this chapter:
the Add Trendline command, the Regression analysis tool, and Excel functions. Before
fitting a line, it is important to examine a scatterplot as described in Chapter 19191919. If
the points on the scatterplot fall approximately on a straight line, the methods described
in this chapter are appropriate. If the points fall on a curve or have another pattern,
consider the nonlinear methods described in Chapter 22.
The data analyzed in this chapter are selling price and living space for 15 real estate
properties as shown in Figure 19.2. Because we expect that selling price might depend on
square feet of living space, selling price becomes the dependent variable and square feet
the explanatory variable. Some call the dependent variable the response variable or the y
variable. Similarly, other terms for the explanatory variable are predictor variable,
independent variable, or the x variable.
The first step is to examine the relationship between selling price, in thousands of dollars,
and living space, in square feet, by constructing a scatterplot. The general approach is to
arrange the data so that the x variable for the horizontal axis is in a column on the left and
the y variable for the vertical axis is in a column on the right. Then select the data
excluding the labels, click the Chart Wizard tool, and follow the steps for an XY (scatter)
chart. Details of these steps with subsequent rescaling and formatting are described in
Section 19.1. The results are shown in Figure 21.1, where the chart title is Arial 10 bold
and the axes and axis titles are Arial 8.
254 Chapter 21 Simple Linear Regression

Figure 21.1 Scatterplot before Inserting Trendline

21.1 INSERTING A LINEAR TRENDLINE


The points in Figure 21.1 follow an approximate straight line, so a linear trendline is
appropriate. The method of ordinary least squares determines the intercept and slope for
the linear trendline such that the sum of the squared vertical distances between the actual
y values and the line is as small as possible. Such a line is often called the line of average
relationship. The following steps describe inserting a linear trendline on the scatterplot
and formatting the results.
1. Select the data series by clicking on one of the data points. The points are
highlighted, the name box shows "Series 1," and the formula bar shows that the
SERIES is selected.
2. From the Chart menu, choose the Add Trendline command. Alternatively, right-
click the data series and choose Add Trendline from the shortcut menu.
3. Click the Type tab of the Add Trendline dialog box, as shown in Figure 21.2.
4. On the Add Trendline Type tab, click the Linear icon. (The nonlinear
trend/regression types are described in Chapter 22.)
21.1 Inserting a Linear Trendline 255

Figure 21.2 Add Trendline Dialog Box Type Tab

5. Click the Options tab of the Add Trendline dialog box, as shown in Figure 21.3.
6. On the Add Trendline Options tab, select the Automatic: Linear (Series1) button
for Trendline Name. Be sure the checkbox for Set Intercept is clear. Click to put
checks in the Display Equation on Chart and Display R-squared Value on Chart
checkboxes, as shown in Figure 21.3. Then click OK. The trendline, equation,
and R2 are inserted on the scatterplot as shown in Figure 21.4.

Figure 21.3 Add Trendline Dialog Box Options Tab


256 Chapter 21 Simple Linear Regression

Figure 21.4 Initial Trendline on Scatterplot

Trendline Interpretation
We can answer the question "What is the average relationship?" by examining the fitted
equation y = 0.021x + 18.789, which may be written as
Predicted Price = 18.789 + 0.021 * SqFt.
The y-intercept or constant term in the equation is 18.789, measured in the same units as
the y variable. Naively, the constant term says that a property with zero square feet of
living space has a selling price of 18.789 thousands of dollars. However, there are no
properties with fewer than 521 square feet in our data, so this constant can be considered
a starting point that is relevant for properties with living space between 521 and 1,298
square feet.
The slope or regression coefficient, 0.021, indicates the average change in the y variable
for a unit change in the x variable. The measurement units in this example are 0.021
thousands of dollars per square foot, or $21 per square foot. If two properties differ by
100 square feet of living space, we expect the selling prices to differ by 0.021 * 100 = 2.1
thousands of dollars, or $2,100.
One popular way to answer the question "How good is the relationship?" is to examine
the value for R2, which measures the proportion of variation in the dependent variable, y,
that is explained using the x variable and the regression line. Here the R2 value of 0.6637
indicates that approximately 66% of the variation in selling prices can be explained by a
linear model using living space. Perhaps the remaining 34% of the variation can be
explained using other property characteristics in a multiple regression model.
21.2 Regression Analysis Tool 257

Trendline Embellishments
If the equation displayed on the chart is used to calculate predicted selling prices, the
results may be imprecise because the intercept and slope have only three decimal places.
To display more decimal places, double-click the chart to activate it and click on the
region containing the equation and R2 value to select them for editing. Then click the
Increase Decimal tool repeatedly to display more decimal places. The equation values
shown in Figure 21.5 were obtained by clicking Increase Decimal twice to change from
three decimal places to five. These changes affect both the equation and R2 value, and
these changes must be made before any other editing.
With the equation and R2 value selected, you can move the entire text box by clicking and
dragging near the edge of the box, and you can use the regular text editing options for
rearranging the text. Figure 21.5 shows the result of such editing; variable names were
substituted for x and y, terms were rearranged, and the last three significant figures of R2
were deleted. Once you begin any such editing, you are unable to use the Increase
Decimal or Decrease Decimal tools to change the displayed precision.

Figure 21.5 Final Trendline on Scatterplot

21.2 REGRESSION ANALYSIS TOOL


The Add Trendline command provides only the fitted line, equation, and R2. To obtain
additional information for assessing the relationship between the two variables, follow
these steps to use the Regression analysis tool.
258 Chapter 21 Simple Linear Regression

1. Arrange the data in columns with the x variable on the left and the y variable on
the right, as before. Make space for the results of the regression analysis to the
right of the data. Allow at least 16 columns. (Delete the scatterplot or move it far
to the right.)
2. From the Tools menu, choose the Data Analysis command. In the Data Analysis
dialog box, scroll the list box, select Regression, and click OK. The Regression
dialog box appears as shown in Figure 21.6.

Figure 21.6 Regression Dialog Box

In the Regression dialog box, move from box to box using the mouse or the tab key. For a
box requiring a range, select the box and then select the appropriate range on the
worksheet by pointing. To see cells on the worksheet, move the Regression dialog box by
clicking on its title bar and dragging, or click the collapse button on the right side of each
range edit box. Click the Help button for additional information.
3. Input Y Range: Point to or enter the reference for the range containing values of
the dependent variable. Include the label above the data.
4. Input X Range: Point to or enter the reference for the range containing values of
the explanatory variable. Include the label above the data.
21.2 Regression Analysis Tool 259

5. Labels: Select this box, because the labels at the top of the Input Y Range and
Input X Range were included in those ranges.
6. Constant is Zero: Select this box only if you want to force the regression line to
pass through the origin (0,0).
7. Confidence Level: Excel automatically includes 95% confidence intervals for
the regression coefficients. For an additional confidence interval, select this box
and enter the level in the Confidence Level box.
8. Output location: Click the Output Range button, click to select the range edit
box on its right, and point to or type a reference for the top-left corner cell of a
range 16 columns wide where the summary output and charts should appear.
Alternatively, click the New Worksheet Ply button if you want the output to
appear on a separate sheet and optionally type a name for the new sheet, or click
the New Workbook button if you want the output in a separate workbook.
9. Residuals: Select this box to obtain the fitted values (predicted y) and residuals.
10. Residual Plots: Select this box to obtain charts of residuals versus each x
variable.
11. Standardized Residuals: Select this box to obtain standardized residuals (each
residual divided by the standard deviation of the residuals). This output makes it
easy to identify outliers.
12. Line Fit Plots: Select this box to obtain an XY (scatter) chart of the y input data
and fitted y values versus the x variable. This chart is similar to the scatterplot
with an inserted trendline shown in Figure 21.4.
13. Normal Probability Plots: This option is not implemented properly, so don't
check this box.
14. After selecting all options and pointing to or typing references, click OK. (If you
receive the error message "Cannot add chart to a shared workbook," click the
OK button. Then click New Workbook under Output in the Regression dialog
box. If desired, use the Edit | Move or Copy Sheet command to copy the results
back to the original workbook.) The summary output and charts appear.
15. Optional: To change column widths so that all summary output is visible, make
a nonadjacent selection. First select the cell containing the Adjusted R Square
label (D6). Hold down the Control key while clicking the following cells:
Significance F (I11), Coefficients (E16), Standard Error (F16), and Upper 95%
(J16). From the Format menu, choose Column | AutoFit Selection. The
formatted summary output is shown in Figure 21.7.
260 Chapter 21 Simple Linear Regression

Figure 21.7 Regression Tool Summary Output

16. Optional: The residual output appears below the summary output. To relocate
the residuals to facilitate comparisons, select columns C:E and choose Insert
from the shortcut menu. Select the residual output (H24:J39), including the row
of labels but excluding the Observation numbers, and choose Cut or Copy from
the shortcut menu. Select cell C1 and choose Paste from the shortcut menu.
Adjust the widths of columns C:E and decrease the decimals displayed in cells
C2:E16 to obtain the results shown in Figure 21.8.

Figure 21.8 Relocated Residual Output


21.2 Regression Analysis Tool 261

Regression Interpretation
The intercept and slope of the fitted regression line are in the lower-left section labeled
"Coefficients" of the summary output in Figure 21.7. The Intercept coefficient
18.7894675 is the constant term in the linear regression equation, and the SqFt coefficient
0.02101025 is the slope. The regression equation is
Predicted Price = 18.7894675 + 0.02101025 * SqFt.
For an explanation of the intercept and slope, refer to Trendline Interpretation, Section
21.1.
In the residual output shown in Figure 21.8, the predicted prices, sometimes termed the
fitted values, are the result of estimating the selling price of each property using this
regression equation. The residuals are the difference between the actual and fitted values.
For example, the first property has 521 square feet. On the average, we would expect this
property to have a selling price of $29,736, but its actual selling price is $26,000. The
residual for this property is $26,000 – $29,736—that is, –$3,736. Its actual selling price is
$3,736 below what is expected. The residuals are also termed deviations or errors.
The four most common measures to answer the question "How good is the relationship?"
are the standard error, R2, t statistics, and analysis of variance. The standard error,
3.23777441, shown in cell E7 of Figure 21.7, is expressed in the same units as the
dependent variable, selling price. As the standard deviation of the residuals, it measures
the scatter of the actual selling prices around the regression line. This summary of the
residuals is $3,238. The standard error is often called the standard error of the estimate.
R square, shown in cell E5 of Figure 21.7, measures the proportion of variation in the
dependent variable that is explained using the regression line. This proportion must be a
number between zero and one, and it is often expressed as a percentage. Here
approximately 66% of the variation in selling prices is explained using living space as a
predictor in a linear equation. Adjusted R square, shown in cell E6, is useful for
comparing this model with other models using additional explanatory variables.
The t statistics, shown in cells G17:G18 of Figure 21.7, are part of individual hypothesis
tests of the regression coefficients. For example, these 15 properties could be treated as a
sample from a larger population. The null hypothesis is that there is no relationship: the
population regression coefficient for living space is zero, implying that differences in
living space don't affect selling price. With a sample regression coefficient of 0.02101025
and a standard error of the coefficient (an estimate of the sampling error) of 0.004148397,
the coefficient is 5.064667 standard errors from zero. The two-tail p-value, 0.000217,
shown in cell H18, is the probability of obtaining these results, or something more
extreme, assuming the null hypothesis is true. Therefore, we reject the null hypothesis
and conclude there is a significant relationship between selling price and living space.
262 Chapter 21 Simple Linear Regression

The analysis of variance table, shown in cells D10:I14 of Figure 21.7, is a test of the
overall fit of the regression equation. Because it summarizes a test of the null hypothesis
that all regression coefficients are zero, it will be discussed in Chapter 23 with multiple
regression.

Regression Charts
For simple linear regression the analysis tool provides two charts: residual plot and line
fit plot. These charts are embedded near the top of the worksheet to the right of the
summary output. In the real estate properties example, the charts are originally located in
cells M1:S12; after relocating the residuals, the charts are in cells P1:V12.

Figure 21.9 Initial Line Fit Plot

The line fit plot is shown in Figure 21.9. This chart is similar to the scatterplot with
inserted trendline, except that the predicted values in this chart are markers without a line.
The following steps describe how to format the line fit plot.
1. Select the data series for Predicted Price by clicking one of the square markers
that are in a straight line. (Alternatively, select any chart object and use the up
and down arrow keys to make the selection.) The points are highlighted and
"=SERIES("Predicted Price",...)" appears in the formula bar. Right-click, choose
Format Data Series from the shortcut menu, and click the Patterns tab. Select
Automatic for Line and select None for Marker. Then click OK.
2. Select the x-axis by clicking on the horizontal line at the bottom of the plot area.
A square handle appears at each end of the x-axis. Right-click, choose Format
Axis from the shortcut menu, and click the Scale tab. Clear the Auto checkbox
for Minimum and type 400 in its edit box; clear the Auto checkbox for
Maximum and type 1400 in its edit box; clear the Auto checkbox for Major Unit
and type 200 in its edit box. Then click OK.
3. Select the y-axis. Right-click, choose Format Axis from the shortcut menu, and
click the Scale tab. Clear the Auto checkbox for Minimum and type 20 in its edit
21.2 Regression Analysis Tool 263

box; clear the Auto checkbox for Maximum and type 50 in its edit box; clear the
Auto checkbox for Major Unit and type 10 in its edit box. Click the Number tab,
select Number in the Category list box, and click the Decimal Places spinner
control to select 0. Then click OK.
4. Optional: To obtain the appearance shown in Figure 21.10, select and enter more
descriptive text for the chart title, x-axis title, and y-axis title. Resize the chart so
that it is approximately 7 columns wide and 14 rows high. Select the chart title
and choose Arial 10 bold from the formatting toolbar. For the legend, axes and
axis titles, select each object and choose Arial 8.

Figure 21.10 Final Line Fit Plot

The residual plot (after resizing to approximately 6 columns by 14 rows) is shown in


Figure 21.11. This type of chart is useful for determining whether the functional form of
the fitted line is appropriate. If the residual plot is a random pattern, the linear fitted line
is satisfactory; if the residual plot shows a pattern, additional modeling may be needed.
When there is only one x variable (simple regression), the residual plot provides a view
that is similar to making the fitted line in Figure 21.10 horizontal. When there are several
x variables (multiple regression), the residual plot is an even more valuable tool for
checking model adequacy, because there is usually no way to view the fitted equation in
three or more dimensions.
264 Chapter 21 Simple Linear Regression

Figure 21.11 Regression Tool Residual Plot

21.3 REGRESSION FUNCTIONS


A third method for obtaining regression results is worksheet functions. Five functions
described here are appropriate for simple regression (one x variable), and four of these
have identical syntax for their arguments. For example, the syntax for the INTERCEPT
function is
INTERCEPT(known_y's,known_x's).
The same syntax applies to the SLOPE, RSQ (R square), and STEYX (standard error of
estimate). These four functions are entered in cells H2:H5 of Figure 21.12, and the values
returned by these functions are shown in cells F2:F5.
To prepare Figure 21.12, the function results in column H are copied to the clipboard
(Edit | Copy), and the values are pasted into column F (Edit | Paste Special | Values). The
formulas are displayed in column H by choosing Options from the Tools menu, clicking
the View tab, and checking the Formulas checkbox in the Window Option section.
Cells H9 and H11 show two methods for obtaining a predicted selling price for a property
with 1,000 square feet of living space. If the intercept and slope of the regression
equation have already been calculated, the formula "= intercept + slope * x" can be
entered into a cell (H9) using appropriate cell references. Here the predicted selling price
is 39.7997169881321, in thousands of dollars, or approximately $39,800.
Another method for obtaining a predicted value based on simple linear regression is the
FORECAST function, with syntax
FORECAST(x,known_y's,known_x's).
21.3 Regression Functions 265

This method, shown in cell H11, calculates the intercept and slope using least squares and
returns the predicted value of y for the specified value of x.

Figure 21.12 Regression Using Functions

Yet another method for obtaining predicted y values is the TREND function, which has
the following syntax:
TREND(known_y's,known_x's,new_x's,const)
This function, unlike the FORECAST function, can also be used for multiple regression
(two or more x variables). Because the TREND function is an array function, it must be
entered in a special way, as described in the following steps.
1. Enter the data for the x and y variables (A2:B16) and values of the x variable
(D13:D16) for which predicted y values will be calculated.
2. Select a range where the predicted y values are to appear (H13:H16).
3. From the Insert menu, choose the Function command. Alternatively, click the
Insert Function button (icon fx). In the Insert Function dialog box, select
Statistical in the category list box and select TREND in the function list box.
Then click OK.
4. In the TREND dialog box, type or point (click and drag) to ranges on the
worksheet containing the known y values (B2:B16), known x values (A2:A16),
and new x values (D13:D16). Do not include the labels in row 1 in these ranges.
In the edit box labeled "Const," type the integer 1, which is interpreted as true,
indicating that an intercept term is desired. Then click OK.
266 Chapter 21 Simple Linear Regression

5. With the function cells (H13:H16) still selected, press the F2 key (for editing).
The word "Edit" appears in the status bar at the bottom of the screen. Hold down
the Control and Shift keys and press Enter. The formula bar shows curly
brackets around the TREND function, indicating that the array function has been
entered correctly.
A companion function, LINEST, provides regression coefficients, standard errors, and
other summary measures. Like TREND, this function can be used for multiple regression
(two or more x variables) and must be array-entered. Its syntax is
LINEST(known_y's,known_x's,const,stats).
The "const" and "stats" arguments are true-or-false values, where "const" specifies
whether the fitted equation has an intercept term and "stats" indicates whether summary
statistics are desired.
To obtain the results shown in Figure 21.13, select D1:E5, type or use the Insert Function
tool to enter LINEST, press F2, and finally hold down the Control and Shift keys while
you press Enter. Cells D7:E11 show the numerical results that appear in cells D1:E5, and
cells D13:E17 describe the contents of those cells. These same values appear with labels
in the Regression analysis tool summary output shown in Figure 21.7.

Figure 21.13 Regression Using LINEST


Exercises 267

EXERCISES
Exercise 21.1 Refer to the data on vacancy percentages and monthly rents for ten cities in
Exercise 19.1.
1. Prepare a scatterplot and insert a linear trendline.
2. Use the Regression analysis tool to obtain complete diagnostics.
3. Make a prediction of vacancy percentage for a city where monthly rent per square
foot is $3.50.
Exercise 21.2 Refer to the data on study hours and test grades for 20 students in Exercise
19.3.
1. Prepare a scatterplot and insert a linear trendline.
2. Use the Regression analysis tool to obtain complete diagnostics.
3. Make a prediction of test grade for a student who studies ten hours.
4. Student 7 studied ten hours and received a test grade of 63. Taking into account the
number of study hours, is this test grade below average, average, or above average?
268 Chapter 21 Simple Linear Regression

This page is intentionally mostly blank.


Simple Nonlinear
Regression
22
This chapter describes four methods for modeling a nonlinear relationship between two
variables: polynomial, logarithm, power, and exponential. For each functional form, I
describe both inserting a trendline on a scatterplot and using the Regression analysis tool
on transformed variables to obtain additional summary measures and diagnostics. For an
exponential relationship, I also describe using the LOGEST function to obtain similar
results.
It is important to examine a scatterplot as an aid to selecting the appropriate nonlinear
form. Figure 22.1 shows four single-bulge nonlinear patterns that might be observed on a
scatterplot. Each panel has a label indicating the direction of the bulge, and the direction
may be used to determine an appropriate nonlinear form.

Figure 22.1 Single-Bulge Nonlinear Patterns


270 Chapter 22 Simple Nonlinear Regression

For example, the upper-left panel shows data where the bulge points toward the
northwest (NW). The power (for x > 1) and logarithmic functions are appropriate for this
pattern. The lower-left panel shows data with a bulge toward the southwest (SW), in
which case the power, logarithmic, or exponential functions are candidates. And the
lower-right panel shows data with a bulge toward the southeast (SE), where the power
(for x > 1) and exponential functions are appropriate. In addition, all four data patterns
may be modeled using a quadratic function (polynomial of order 2).
If the pattern of the data on a scatterplot doesn't fit any of the single-bulge examples
shown in Figure 22.1, some other functional form may be needed. For example, if the
data have two bulges (an S shape), a cubic function (polynomial of order 3) may be
appropriate.
The general approach for inserting a nonlinear trendline is as follows. First, construct the
scatterplot. (Arrange the data on a worksheet with the x data in a column on the left and
the y data in a column on the right. Select both the x and y data and use the Chart Wizard
to construct the XY chart.) Second, click a data point on the chart to select the data series,
and choose Add Trendline from the Chart menu; alternatively, right-click the data series
and choose Add Trendline from the shortcut menu. The upper portion of the Add
Trendline dialog box Type tab is shown in Figure 22.2.

Figure 22.2 Add Trendline Dialog Box Type Tab

To obtain the trendline results shown in this chapter, select the appropriate type
(polynomial, logarithmic, power, or exponential) and in the Options tab select the
checkboxes for Display Equation on Chart and Display R-squared Value on Chart.
The first example is the real estate property data set described in Chapter 19. The
dependent variable is selling price, in thousands of dollars, and the explanatory variable
is living space, in square feet. Details for constructing the scatterplot are described in
Chapter 19, and steps for inserting a linear trendline are in Chapter 21.
22.1 Polynomial 271

In the residual plot of real estate property data—shown in Figure 21.11—the first two
properties with low square footage and the last two or three properties with high square
footage have negative residuals. This observation is some indication that a nonlinear fit
may be more appropriate. Although the curvature is minimal, the scatterplot shows a
slight bulge pointing toward the northwest (NW). Thus, the quadratic (polynomial of
order 2), power, and logarithmic functions are candidates.

22.1 POLYNOMIAL
Figure 22.3 shows the results for a quadratic fit (polynomial of order 2). The R2 value of
68% is only slightly better than the value of 66% obtained with the linear fit described in
Chapter 21.

Figure 22.3 Polynomial Trendline

The following steps describe how to obtain more complete regression results using the
quadratic model.
1. Enter the data into columns A and C as shown in Figure 22.4. If the SqFt and
Price data are already in columns A and B, select column B and choose Insert
from the shortcut menu. Enter the label SqFt^2 in cell B1.
2. Select cell B2 and enter the formula =A2^2. To copy the formula to the other
cells in column B, select cell B2 and double-click the fill handle in its lower-
right corner. The squared values appear in column B.
272 Chapter 22 Simple Nonlinear Regression

3. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box,
scroll the list box, select Regression, and click OK. The Regression dialog box
appears.
4. Input Y Range: Point to or enter the reference for the range containing values of
the dependent variable (C1:C16), including the label in row 1.
5. Input X Range: Point to or enter the reference for the range containing values of
the explanatory variables (A1:B16), including the labels in row 1.
6. Labels: Select this box, because labels were included in the Input X and Y
Ranges.
7. Do not select the checkboxes for Constant is Zero or Confidence Level.
8. Output options: Click the Output Range option button, select the edit box to the
right, and point to or enter a reference for the top-left corner cell of a range 16
columns wide where the summary output and charts should appear (E1). If
desired, check the appropriate boxes for Residuals. Then click OK.
Figure 22.4 shows the regression output after deleting the ANOVA portion (by selecting
E10:M14 and choosing Delete | Shift Cells Up from the shortcut menu). Compared to the
linear model in Chapter 21, this quadratic model has a slightly larger standard error and a
smaller adjusted R2; using these criteria, the quadratic model is not really better than the
linear one.

Figure 22.4 Polynomial Regression Results

To make a prediction of average selling price using the quadratic model, enter the SqFt
value in a cell (A17, for example) and a formula for SqFt^2 (=A17^2 in cell B17). Then
22.2 Logarithmic 273

build a formula for predicted price (=F12+F13*A17+F14*B17 in cell C17). Chapter 23


discusses interpretation of multiple regression output and other methods for making
predictions.
The quadratic model, using x and x2 as explanatory variables, can be used to fit a wide
variety of single-bulge data patterns. If a scatterplot shows data with two bulges (an S
shape) like the Polynomial icon shown in Figure 22.2, a cubic model may be appropriate.
The Add Trendline feature may give erroneous results for a polynomial of order 3, so an
alternative is to use the Regression tool using x, x2, and x3 as explanatory variables.

22.2 LOGARITHMIC
The logarithmic model creates a trendline using the equation
y = c * Ln(x) + b
where Ln is the natural log function with base e (approximately 2.718). Because the log
function is defined only for positive values of x, the values of the explanatory variable in
your data set must be positive. If any x values are zero or negative, the Logarithmic icon
on the Add Trendline Type tab will be grayed out. (As a workaround, you can add a
constant to each x value.) The results of adding a logarithmic trendline to the scatterplot
of real estate property data are shown in Figure 22.5.

Figure 22.5 Logarithmic Trendline

The following steps describe how to use the Regression analysis tool to obtain more
complete regression results using the logarithmic model.
274 Chapter 22 Simple Nonlinear Regression

1. Enter the data into columns A and C as shown in Figure 22.6. If the SqFt and
Price data are already in columns A and B, select column B and choose Insert
from the shortcut menu. Enter the label Ln(SqFt) in cell B1.
2. Select cell B2 and enter the formula =LN(A2). To copy the formula to the other
cells in column B, select cell B2 and double-click the fill handle in its lower-
right corner. The log values appear in column B.
3. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box,
scroll the list box, select Regression, and click OK. The Regression dialog box
appears.
4. Input Y Range: Point to or enter the reference for the range containing values of
the dependent variable (C1:C16), including the label in row 1.
5. Input X Range: Point to or enter the reference for the range containing values of
the explanatory variable (B1:B16), including the label in row 1.
6. Labels: Select this box, because labels were included in the Input X and Y
Ranges.
7. Do not select the checkboxes for Constant is Zero or Confidence Level.
8. Output options: Click the Output Range option button, select the text box to the
right, and point to or enter a reference for the top-left corner cell of a range 16
columns wide where the summary output and charts should appear (E1). If
desired, check the appropriate boxes for Residuals. Then click OK.
Figure 22.6 shows the regression output after deleting the ANOVA portion (by selecting
E10:M14 and choosing Delete | Shift Cells Up from the shortcut menu). Compared with
the linear model in Chapter 21, this logarithmic model has a smaller standard error and a
higher adjusted R2; using these criteria, the logarithmic model is somewhat better than the
linear one.
22.3 Power 275

Figure 22.6 Logarithmic Regression Results

To make a prediction of average selling price using the logarithmic model, enter the SqFt
value in a cell (A17, for example) and a formula for Ln(SqFt) (=LN(A17) in cell B17).
Then build a formula for predicted price (=F12+F13*B17 in cell C17).

22.3 POWER
The power model creates a trendline using the equation
y = c * xb.
Excel uses a log transformation of the original x and y data to determine fitted values, so
the values of both the dependent and explanatory variables in your data set must be
positive. If any y or x values are zero or negative, the Power icon on the Add Trendline
Type tab will be grayed out. (As a workaround, you can add a constant to each y and x
value.) The results of adding a power trendline to the scatterplot of real estate property
data are shown in Figure 22.7.
The power trendline feature does not find values of b and c that minimize the sum of
squared deviations between actual y and predicted y (= c * xb). Instead, Excel's method
takes the logarithm of both sides of the power formula, which then can be written as
Ln(y) = Ln(c) + b * Ln(x),
and uses standard linear regression with Ln(y) as the dependent variable and Ln(x) as the
explanatory variable. That is, Excel finds the intercept and slope that minimize the sum of
squared deviations between actual Ln(y) and predicted Ln(y), using the formula
Ln(y) = Intercept + Slope * Ln(x).
276 Chapter 22 Simple Nonlinear Regression

Therefore, the Intercept value corresponds to Ln(c), and c in the power formula is equal
to Exp(Intercept). The Slope value corresponds to b in the power formula.

Figure 22.7 Power Trendline

The following steps describe how to use the Regression analysis tool on the transformed
data to obtain regression results for the power model.
1. Enter the data into columns A and B as shown in Figure 22.8.
2. Enter the label Ln(SqFt) in cell C1. Select cell C2 and enter the formula
=LN(A2).
3. Enter the label Ln(Price) in cell D1. Select cell D2 and enter the formula
=LN(B2).
4. To copy the formulas to the other cells, select cells C2 and D2, and double-click
the fill handle in the lower-right corner of cell D2. The log values appear in
columns C and D.
5. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box,
select Regression and click OK. The Regression dialog box appears.
6. Input Y Range: Point to or enter the reference for the range containing values of
the dependent variable (D1:D16), including the label in row 1.
7. Input X Range: Point to or enter the reference for the range containing values of
the explanatory variable (C1:C16), including the label in row 1.
8. Labels: Select this box, because labels are included in the Input X and Y
Ranges.
22.4 Exponential 277

9. Do not select the checkboxes for Constant is Zero or Confidence Level.


10. Output options: Click the Output Range option button, select the text box to the
right, and point to or enter a reference for the top-left corner cell of a range 16
columns wide where the summary output and charts should appear (F1). If
desired, check the appropriate boxes for Residuals. Then click OK.
Figure 22.8 shows the regression output after deleting the ANOVA portion (by selecting
F10:N14 and choosing Delete | Shift Cells Up from the shortcut menu). The R Square
and Standard Error values cannot be compared directly with the linear model in Chapter
21. Here, R Square is the proportion of variation in Ln(y) explained by Ln(x) in a linear
model, and the Standard Error is expressed in the same units of measurement as Ln(y).

Figure 22.8 Power Regression Results

To determine the value of c for the power formula, select cell G14 and enter the formula
=EXP(G12). To make a prediction of average selling price using the power model, enter
the SqFt value in a cell (A17, for example). Then build a formula for predicted price
(=G14*A17^G13 in cell B17).

22.4 EXPONENTIAL
The exponential model creates a trendline using the equation
y = c * ebx.
Excel uses a log transformation of the original y data to determine fitted values, so the
values of the dependent variable in your data set must be positive. If any y values are zero
278 Chapter 22 Simple Nonlinear Regression

or negative, the Exponential icon on the Add Trendline Type tab will be grayed out. (As a
workaround, you can add a constant to each y value.)
This function may be used to model exponentially increasing growth. The data shown in
Figure 22.9 are an example of such a pattern.

Figure 22.9 Annual Sales Data

Time series data are often displayed using an Excel line chart instead of an XY (scatter)
chart. The following steps describe how to construct the line chart with an exponential
trendline shown in Figure 22.10.
1. Enter the year and sales data as shown in Figure 22.9.
2. Select the sales data (B2:B9) and click the Chart Wizard button.
3. In step 1 of the Chart Wizard (Chart Type) on the Standard Types tab, select
"Line with markers displayed at each data value." Click Next. In step 2 (Chart
Source Data) on the Series tab, select the range edit box for Category (X) Axis
Labels, and click and drag A2:A9 on the worksheet. Click Next. In step 3 (Chart
Options) on the Titles tab, type the chart and axis labels shown in Figure 22.10;
on the Legend tab, clear the checkbox for Show Legend. Click Finish.
4. Click one of the data points of the chart to select the data series. Right-click and
choose Add Trendline from the shortcut menu. On the Type tab, click the
Exponential icon. On the Options tab, click Display Equation on Chart and click
Display R-squared Value on Chart. Then click OK.
Because this is a line chart instead of an XY (scatter) chart, Excel does not use the Year
data in column A for fitting the exponential function. The Year data are used only as
labels for the x-axis, but the values used for x in the exponential function are the numbers
1 through 8.
The exponential trendline feature does not find values of b and c that minimize the sum
of squared deviations between actual y and predicted y (= c * ebx). Instead, Excel's
22.4 Exponential 279

method takes the logarithm of both sides of the exponential formula, which then can be
written as
Ln(y) = Ln(c) + b * x
and uses standard linear regression with Ln(y) as the dependent variable and x as the
explanatory variable. That is, Excel finds the intercept and slope that minimize the sum of
squared deviations between actual Ln(y) and predicted Ln(y), using the formula
Ln(y) = Intercept + Slope * x.
Therefore, the Intercept value corresponds to Ln(c), and c in the exponential formula is
equal to Exp(Intercept). The Slope value corresponds to b in the exponential formula.

Figure 22.10 Exponential Trendline

The following steps describe how to use the Regression analysis tool on the transformed
data to obtain regression results for the exponential model.
1. Enter the data into columns A, B, and C as shown in Figure 22.11. If the Year
and Sales data are already in columns A and B as shown in Figure 22.9, select
column B, choose Insert from the shortcut menu, and enter the label X and
integers 1 through 8 in column B.
2. Enter the label Ln(Sales) in cell D1. Enter the formula =LN(C2) in cell D2.
3. To copy the formula, select cell D2 and double-click the fill handle in its lower-
right corner. The log values appear in column D.
4. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box,
select Regression and click OK. The Regression dialog box appears.
280 Chapter 22 Simple Nonlinear Regression

5. Input Y Range: Point to or enter the reference for the range containing values of
the dependent variable (D1:D9), including the label in row 1.
6. Input X Range: Point to or enter the reference for the range containing values of
the explanatory variable (B1:B9), including the label in row 1.
7. Labels: Labels were included in the Input X and Y Ranges, so select this box.
8. Do not select the checkboxes for Constant is Zero or Confidence Level.
9. Output options: Click the Output Range option button, select the range edit box
to the right, and point to or enter a reference for the top-left corner cell of a
range 16 columns wide where the summary output and charts should appear
(F1). If desired, check the appropriate boxes for Residuals. Then click OK.
Figure 22.11 shows the regression output after deleting the ANOVA portion (by selecting
F10:N14 and choosing Delete | Shift Cells Up from the shortcut menu). The R Square
and Standard Error values cannot be compared directly with the linear model in Chapter
21. Here, R square is the proportion of variation in Ln(y) explained by x in a linear model,
and the standard error is expressed in the same units of measurement as Ln(y).

Figure 22.11 Exponential Regression Results

To determine the value of c for the exponential formula, select cell G14, and enter the
formula =EXP(G12). To make a prediction of average sales using the exponential model,
enter the x value in a cell (9 in cell B10, for example). Then build a formula for predicted
sales (=G14*EXP(G13*B10) in cell C10).
An alternative method for obtaining exponential regression results is to use the LOGEST
and GROWTH worksheet functions. The descriptions of these functions in Excel's on-
line help use the equation
y = b * mx.
22.4 Exponential 281

This b value corresponds to c in the trendline exponential equation, and this m


corresponds to eb.
LOGEST provides regression coefficients, standard errors, and other summary measures.
This function can be used for multiple regression (two or more x variables) and must be
array-entered. Its syntax is
LOGEST(known_y's,known_x's,const,stats).
The "const" and "stats" arguments are true-or-false values, where "const" specifies
whether b is forced to equal one and "stats" indicates whether summary statistics are
desired.
To obtain the results shown in Figure 22.12, select E1:F5, type or use the Insert Function
button to enter LOGEST, press F2, and finally hold down the Control and Shift keys
while you press Enter. Cells E7:F11 show the numerical results that appear in cells
E1:F5, and cells E13:F17 describe the contents of those cells. These same values, except
m, appear with labels in the Regression analysis tool summary output shown in Figure
22.11.

Figure 22.12 Regression Using LOGEST

The GROWTH function is similar to the TREND function, except that it returns fitted
values for the exponential equation instead of the linear equation. GROWTH can also be
used for multiple regression (two or more x variables) and must be array-entered.
282 Chapter 22 Simple Nonlinear Regression

EXERCISES
Exercise 22.1 Seven identical automobiles were driven by employees for business
purposes for several days. The drivers reported average speed, in miles per hour, and gas
mileage, in miles per gallon, as shown in the following table.
Speed Gas Mileage
MPH MPG
32 20
37 23
44 26
49 27
56 26
62 25
68 22
1. Prepare a scatterplot and insert a quadratic trendline.
2. Use the Regression analysis tool to obtain complete diagnostics.
3. Make a prediction of gas mileage for an automobile driven at an average speed of 50
miles per hour.
Exercise 22.2 A chain store tried different prices for a television set in five retail markets
during a four-week period. The following table shows the retail prices and sales rates, in
units sold per thousand of residents in the market.
Price Sales Rate
$275 1.60
$300 0.95
$325 0.65
$350 0.50
$375 0.45
1. Prepare a scatterplot and insert an appropriate trendline.
2. Use the Regression analysis tool to obtain complete diagnostics.
3. Make a prediction of sales rate for a market where the price is $295.
Multiple Regression
23
In Chapter 21, a simple linear regression model examined the relationship between
selling price and living space for 15 real estate properties. The standard error was $3,328,
and R square was 0.664, indicating 66% of the variation in selling prices could be
explained using living space as the explanatory variable in a linear model.
More of the variation in selling prices might be explained by using an additional variable.
Data on the most recent assessed value (for property tax purposes) are also available;
perhaps selling price is related to assessed value. Multiple regression can examine the
relationship between selling price and two explanatory variables, living space and
assessed value. (The pairwise correlations among these three variables were examined in
Chapter 19.) The following steps describe how to use the Regression analysis tool for
multiple regression.
1. Arrange the data in columns with the two explanatory variables in columns on
the left and the dependent variable in a column on the right. The two (or more)
explanatory variables must be in adjacent columns. If the data from Chapter 21
(or Example 19.1) are in columns A and B, insert a new column B and enter the
new data for assessed value as shown in Figure 16.1.
2. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box,
scroll the list box, select Regression, and choose OK.
3. Input Y Range: Point to or enter the reference for the range containing values of
the dependent variable (selling prices, C1:C16). Include the label above the data.
4. Input X Range: Point to or enter the reference for the range containing values of
the two explanatory variables (SqFt and Assessed, A1:B16). Include the labels
above the data.
5. Other dialog box entries: Fill in the other checkboxes and edit boxes as shown in
Figure 23.1. Then click OK. If the error message "Regression - Cannot add chart
to a shared workbook" appears, click Cancel; to obtain chart output, select New
Workbook under Output Options in the Regression dialog box.
284 Chapter 23 Multiple Regression

Figure 23.1 Regression Dialog Box

6. Optional: To change column widths so that all summary output labels are
visible, select the cell containing the Adjusted R Square label (E6) and hold
down the Control key while selecting cells containing the labels Coefficients
(F16), Standard Error (G16), Significance F (J11), and Upper 95% (K16). From
the Format menu, choose the Column command and select AutoFit Selection.
The results are shown in Figure 23.2.
23.1 Interpretation of Regression Output 285

Figure 23.2 Multiple Regression Summary Output

23.1 INTERPRETATION OF REGRESSION OUTPUT


Referring to the coefficients in cells F17:F19 shown in Figure 23.2, and rounding to three
decimal places, the regression equation is
Price = 14.123 + 0.017 * SqFt + 0.361 * Assessed.
In a multiple regression model, the coefficients are called net regression coefficients or
partial slopes. For example, if assessed value is held constant (or if we could examine a
subset of the properties that have equal assessed value), and living space is allowed to
vary, then selling price varies by 0.017 thousands of dollars for a unit change in square
feet of living space. Similarly, if living space is held constant, then selling price varies by
0.361 thousands of dollars for a unit change in assessed value (also measured in
thousands of dollars).

Significance of Coefficients
The t statistic for the SqFt coefficient is greater than two, indicating that 0.017 is
significantly different from zero. We can reject the null hypothesis that there is no
relationship between SqFt and Price in this model and conclude that a significant
relationship exists.
The t statistic for the Assessed coefficient is 2.79, indicating that 0.361 is significantly
different from zero.
286 Chapter 23 Multiple Regression

The p-value is a two-tail probability using the t distribution. Since we would expect to see
a positive relationship between selling price and each explanatory variable, one-tail tests
are appropriate here. Dividing each p-value in the summary output by two, the one-tail p-
values are approximately 0.00038 and 0.0081. Thus, in this model we can reject the
hypotheses of no relationship between selling price and each explanatory variable at the
1% level of significance.
The t statistic for the Intercept term is usually ignored.

Interpretation of the Regression Statistics


Referring to row 7 of Figure 23.2, the standard error for the multiple regression model is
$2,623, which is an improvement over the $3,328 standard error for the simple regression
model. The R-Square value in row 5 indicates that approximately 80% of the variation in
selling price can be explained using a linear model with living space and assessed value
as explanatory variables. This is also an improvement over the simple model with one
explanatory variable, where only 66% of the variation was explained.

Interpretation of the Analysis of Variance


The analysis of variance output shown in rows 10 through 14 of Figure 23.2 is the result
of testing the null hypothesis that all regression coefficients are simultaneously equal to
zero. The final result is a p-value, labeled Significance F in the output. Here, the p-value
is approximately 0.00007, the probability of getting these results in a random sample
from a population with no relationship between selling price and the explanatory
variables. Our p-value indicates it is extremely unlikely to observe these results in a
random sample from such a population, so we reject the hypothesis of no relationship and
conclude that at least one significant relationship exists.

23.2 ANALYSIS OF RESIDUALS


Residual plots are useful for checking to see whether the assumptions of linear
relationships and constant variance are appropriate. Excel provides plots of residuals
versus each of the explanatory variables, as shown in Figure 23.3 and Figure 23.4. These
charts are located to the right of the regression summary output.
23.2 Analysis of Residuals 287

Figure 23.3 Residuals versus SqFt of Living Space

If the relationship between selling price and living space is linear (after taking into
account assessed value), then a random pattern should appear in the residual plot. On the
other hand, if we see curvature or some other systematic pattern, then we should change
our model to incorporate the nonlinear relationship.
Most observers would conclude that the residual plot is essentially random, so no
additional modeling is required. Because our sample size is so small (15 observations), it
can be difficult to detect nonlinear patterns.
Residual plots are useful for detecting situations where the residuals are smaller in one
region and larger in another. The residual plot would have the shape of a tree resting on
its side. In such cases the standard error of the estimate, which summarizes all of the
residual terms, would overstate the variation in one region and understate the variation in
another.
Looking at the plot of residuals versus assessed values shown in Figure 23.4, the pattern
also appears random. Once again, the small sample size makes it difficult to detect
nonlinear patterns.
288 Chapter 23 Multiple Regression

Figure 23.4 Residuals versus Assessed Value

23.3 USING TREND TO MAKE PREDICTIONS


When satisfied with the model, we can proceed to use the model to make predictions of
selling price for new properties. Assume there are four properties with 600, 800, 1,000,
and 1,200 square feet of living space and assessed values of $22,500, $25,000, $27,500,
and $30,000, respectively. The following steps describe how to use the TREND function
for making the predictions about selling price. The syntax for the TREND function is
TREND(known_y's,known_x's,new_x's,const).
1. Enter the values for the explanatory variables on the worksheet (A18:B21) as
shown in Figure 23.5 (where Predicted Price, Residuals, and Standard Residuals
have been relocated, and rows 11 through 14 are hidden).
2. Select the cells that will contain the predicted values (D18:D21). Type an equals
sign, the TREND function in lowercase, and appropriate references for the
function arguments:
=trend(c2:c16,a2:b16,a18:b21,1)
Don't press Enter; instead, hold down the Control and Shift keys and press
Enter. The formula bar displays TREND in uppercase, indicating that Excel
recognizes the function name, and displays curly brackets around the function as
shown in Figure 23.5, indicating that the array function has been entered
correctly.
23.3 Using TREND to Make Predictions 289

Instead of typing the TREND function, an alternative is to select the output cells
(D18:D21) and click the Insert Function tool (icon fx). In the Insert Function dialog box,
select Statistical in the category list box, select TREND in the function list box, and click
OK. In the TREND dialog box, type or point to (click and drag) ranges on the worksheet
containing the known y values (C2:C16), known x values (A2:B16), and new x values
(A18:B21). Do not include the labels in row 1 in these ranges. In the edit box labeled
"Const," type the integer 1, which is interpreted as true, indicating that an intercept term
is desired. Then click OK. With the function cells (D18:D21) still selected, press the F2
key (for editing). The word "Edit" appears in the status bar at the bottom of the screen.
Hold down the Control and Shift keys and press Enter.

Figure 23.5 Multiple Regression Predictions

Interpretation of the Predictions


The best-guess prediction of selling price for a property with 800 square feet of living
space and an assessed value of $25,000 is $36,445. An approximate 95% prediction
interval uses this best guess plus or minus two standard errors of the estimate ($36,445 ±
2 * $2,623, or $36,445 ± $5,246, which is from $31,199 to $41,691). We are 95%
confident that the selling price will be in this range.
290 Chapter 23 Multiple Regression

However, there are two things approximate about this prediction interval. First, instead of
using the standard error of the estimate, which measures only the scatter of the actual
values around the regression equation, we should use the standard error of a prediction,
which also takes into account uncertainty in the coefficients of the regression equation.
The standard error of a prediction is always greater than the standard error of the
estimate. Unfortunately, there is no simple way to compute the standard error of a
prediction using Excel.
Second, the number of standard errors for a 95% prediction interval based on 15
observations with our model should use a value of the t statistic with 12 degrees of
freedom, which is 2.179, not 2. (For a very large sample size, the normal distribution is
appropriate, and the number of standard errors is 1.96, which is approximately 2.)
Therefore, our approximate interval is very approximate. An exact 95% prediction
interval would be wider.

EXERCISES
Exercise 23.1 The president of a national real estate company wanted to know why
certain branches of the company outperformed others. He felt that the key factors in
determining total annual sales (in $ millions) were the advertising budget (in $ thousands)
and the number of sales agents. To analyze the situation, he took a sample of eight offices
and collected the data in the following table.
Advertising Number Annual Sales
Office ($ thousands) of Agents ($ millions)
1 249 15 32
2 183 14 18
3 310 21 49
4 246 18 52
5 288 13 36
6 248 21 43
7 256 20 24
8 241 19 41
1. Prepare a regression model and interpret the coefficients.
2. Test to determine whether there is a linear relationship between each explanatory
variable and the dependent variable, with a 5% level of significance.
3. Make a prediction of annual sales for a branch with an advertising budget of
$250,000 and 17 agents.
Exercise 23.2 (adapted from Canavos, p. 602) A university placement office conducted a
study to determine whether the variation in starting salaries for school of business
Exercises 291

graduates can be explained by the students' grade point average (GPA) and age upon
graduation. The placement office obtained the sample data shown in the following table.
GPA Age Starting Salary
2.95 22 $25,500
3.40 23 28,100
3.20 27 28,200
3.10 25 25,000
3.05 23 22,700
2.75 28 22,500
3.15 26 26,000
2.75 26 23,800
1. Prepare a regression model and interpret the coefficients.
2. Determine whether grade point average and age contribute substantially in
explaining the variation in the sample of starting salaries.
3. Make a prediction of starting salary for a 24-year-old graduate with a 3.00 GPA.
292 Chapter 23 Multiple Regression

This page is intentionally mostly blank.


Regression Using
Categorical Variables
24
This chapter describes regression models in which an explanatory variable or dependent
variable is categorical (qualitative) instead of numerical (quantitative).

24.1 CATEGORIES AS EXPLANATORY VARIABLES


In the regression models of previous chapters, the explanatory variables were numerical
variables. In many situations it is better to use categorical variables as predictors. When
binary, the categorical variables indicate the presence or absence of a characteristic, such
as male/female, married/unmarried, or weekend/weekday. These binary variables can be
used as predictors in a regression model by assigning the value 0 or 1 for each
observation in the data set. The 0/1 variable is sometimes called an indicator variable or
dummy variable.
In other situations a categorical variable has more than two categories, such as season
(winter, spring, summer, or fall), weather (sunny, overcast, or rain), or academic major
(accounting, management, or finance). In these cases we use a number of indicator
variables equal to one less than the number of categories. For each observation the value
of an indicator variable is 1 or 0, indicating whether the observation corresponds to one
of the categories. For an observation that corresponds to the category that doesn't have an
indicator variable, the value for all indicator variables is 0; this category is sometimes
called the default category or base-case category.
Example 24.1 (adapted from Cryer, p. 139) In addition to square feet of living space and
assessed value, each property is categorized by construction grade (low, medium, or
high) as shown in Figure 24.1. This categorical variable can be used as a predictor
variable in a regression model for explaining variation in the selling price of the property.
294 Chapter 24 Regression Using Categorical Variables

Figure 24.1 Real Estate Property Data

The initial analysis uses only construction grade as the predictor of selling price, followed
by a multiple regression model using construction grade and the other predictor variables
(square feet of living space and assessed value).
The following steps describe how to use indicator variables in a regression model. An
indicator variable is defined for each of the three categories. Low is selected as the base-
case category; only indicator variables for the Medium and High categories are included
in the regression model.
1. Arrange the data in a worksheet as shown in Figure 24.1.
2. Select columns C:E. With the pointer in the selected range, right-click and
choose Insert from the shortcut menu. Enter the labels Low, Medium, and High
in cells C1:E1.
3. Enter a formula in cell C2 for determining values of the Low indicator variable:
=IF(B2="Low",1,0). The meaning of this formula is "If the grade is low, use
the value 1; otherwise use the value 0."
4. Enter a formula in cell D2 for determining values of the Medium indicator
variable: =IF(B2="Medium",1,0). The meaning of this formula is "If the grade
is medium, use the value 1; otherwise use the value 0."
5. Enter a formula in cell E2 for determining values of the High indicator variable:
=IF(B2="High",1,0). The meaning of this formula is "If the grade is high, use
the value 1; otherwise use the value 0." If the three formulas are entered
correctly, the contents of cells C2:E2 are 1, 0, and 0.
24.1 Categories as Explanatory Variables 295

6. Select the new formulas in cells C2:E2. To copy the formulas to the other cells,
double-click the fill handle (small square in the lower-right corner of the
selected range). The worksheet should appear as shown in Figure 24.2.

Figure 24.2 Indicator Variables

7. Optional: The formulas in columns C, D, and E contain relative references to


column B. If these formulas are copied to other parts of the worksheet, the
references may not be correct. To eliminate the formulas and retain the zero-one
values, select columns C, D, and E, right-click and choose Copy from the
shortcut menu; with C, D, and E still selected, right-click, choose Paste Special
from the shortcut menu, select Values (also, select None as the Operation and
clear both checkboxes for Skip Blanks and Transpose), and click OK.
8. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box,
scroll the list box, select Regression, and click OK.
9. If necessary, refer to Chapter 21 for details on filling in the dialog box. The
Input Y Range contains the selling prices (G1:G16), the Input X Range contains
the values for the two explanatory variables, Medium and High (D1:E16), the
Output Range is I1, and the Labels, Residuals, and Standardized Residuals
checkboxes are selected.
10. Optional: Adjust column widths so that all labels of the regression output are
visible. Details are described in Chapter 21. The formatted Summary Output
section is shown in Figure 24.3.
296 Chapter 24 Regression Using Categorical Variables

Figure 24.3 Regression Output Using Two Indicators

24.2 INTERPRETATION OF REGRESSION USING


INDICATORS
Referring to the coefficients in the summary output shown in Figure 24.3 and rounding to
three decimal places, the fitted regression model is
Price = 29.400 + 9.356 * Medium + 14.533 * High.
For a property with low construction grade (substituting Medium = 0 and High = 0 into
the model), the fitted selling price is 29.400. The average selling price for properties with
low construction grade is thus $29,400. For a property with medium construction
(Medium = 1 and High = 0), the fitted selling price is 38.756. For a property with high
construction grade (Medium = 0 and High = 1), the fitted selling price is 43.933.
The Intercept constant, 29.400, is the average selling price for the base-case category.
The Medium coefficient, 9.356, indicates the difference in the average selling price for
the Medium category from the base-case category, Low. And the High coefficient,
14.533, indicates the difference in the average selling price for the High category from
the base-case category.
The R-square value of 0.820701 indicates that 82% of the variation in selling prices can
be explained using only construction grade. This compares favorably with approximately
80% explained variation for the multiple regression model of Chapter 23 using living
space and assessed value as explanatory variables.
24.3 Interpretation of Multiple Regression 297

These regression results yield the same average selling prices that would be obtained by
simply averaging the price for each construction grade. For example, the mean selling
price for the three high construction grade properties (44.8, 41.8, and 45.2) is 43.933.
An advantage of using indicator variables is that they can be combined with other
explanatory variables in a multiple regression model. The following steps provide a
general description of how to use construction grade, assessed value, and living space as
explanatory variables.
1. The four x variables (SqFt, Medium, High, and Assessed) must be in adjacent
columns. If the data are arranged as shown in Figure 24.2, one method is to
select column F (Assessed), right-click, and choose Insert from the shortcut
menu. Then select column A (SqFt), right-click, and choose Copy from the
shortcut menu; select column F (empty), right-click, and choose Paste from the
shortcut menu. (Alternatively, after inserting empty column F, select column A,
position the mouse pointer near the edge of column A until it turns into an
arrow, and click and drag column A to column F.)
2. In the Regression dialog box, the Input Y Range contains the selling prices
(H1:H16), the Input X Range contains the values for the four explanatory
variables, Medium, High, SqFt, and Assessed (D1:G16), the Output Range is J1,
and the Labels, Residuals, and Standardized Residuals checkboxes are selected.

24.3 INTERPRETATION OF MULTIPLE REGRESSION


After adjusting column widths, the summary output is shown in Figure 24.4. Rounding to
three decimal places, the fitted regression model is
Price = 19.152 + 6.035 * Medium + 7.953 * High + 0.010 * SqFt + 0.184 * Assessed.
The net regression coefficients taking all four variables into consideration are different
from the model in Chapter 23 (which used only SqFt and Assessed) and the previous
model in this chapter (using only Medium and High). For example, for properties with
the same construction grade and assessed value, selling price varies by 0.010 thousands
of dollars for a unit change in square feet of living space, on the average.
R square indicates that 92% of the variation in selling prices can be explained using this
linear model with construction grade, living space, and assessed value as explanatory
variables. The remaining unexplained variation is summarized by the $1,783 standard
error of estimate.
298 Chapter 24 Regression Using Categorical Variables

Figure 24.4 Multiple Regression Output

24.4 CATEGORIES AS THE DEPENDENT VARIABLE


Discriminant analysis refers to the use of models where the dependent variable is
categorical. If the dependent variable is binary (two categorical values, coded as 0 and 1),
then multiple regression can be used to determine a fitted model. The more general
problem involving a dependent variable with three or more categories requires advanced
nonregression techniques not described here.
Example 17.2 (adapted from Cryer, p. 614) Figure 24.5 contains financial ratio data on
16 firms from 1968 to 1972. Seven of these firms went bankrupt two years later, and nine
firms were financially sound at the end of the same period. Two financial ratios were
selected as explanatory variables: net income to total assets (NI/TA) and current assets to
net sales (CA/NS). The problem is to determine a linear combination of the two variables
that best discriminates between the bankrupt firms and the financially sound firms.
24.4 Categories as the Dependent Variable 299

Figure 24.5 Financial Ratio and Bankruptcy Data

The following steps describe how to perform discriminant analysis for a binary dependent
variable using multiple regression.
1. Enter the data shown in Figure 24.5 on a worksheet.
2. Use the Regression analysis tool as described in Chapters 21, 22, and 23. The
Input Y Range is the bankruptcy 1/0 variable (C1:C17), the Input X Range
contains the two financial ratios (A1:B17), and the Output Range is E1. Select
the Labels checkbox and the Residuals checkbox.
3. Format the regression summary output as described in Chapter 21. The result is
shown in Figure 24.6.
300 Chapter 24 Regression Using Categorical Variables

Figure 24.6 Financial Ratio and Bankruptcy Regression Output

Referring to the coefficients in the summary output shown in Figure 24.6 and rounding to
four decimal places, the fitted regression model is
Bankrupt = - 0.0027 - 1.7623 * NI/TA + 0.9600 * CA/NS.
The Predicted Bankrupt values calculated using this model are located below the
regression summary output. The following steps relocate the predicted values and
calculate other values for the discriminant analysis.
4. To make room for additional calculations, select columns D:F. With the pointer
in the selected range, right-click and choose Insert from the shortcut menu.
5. To relocate the predicted values, select cells I25:I41. With the pointer in the
selected range, right-click and choose Copy from the shortcut menu. Then select
cell D1, right-click, and choose Paste from the shortcut menu.
6. Optional: With the pasted range D1:D17 still selected, choose Column from the
Format menu and select AutoFit Selection. Select the predicted values D2:D17
and repeatedly click the Decrease Decimal tool button until three decimal places
are displayed.
The regression model uses the two financial ratios to predict the value 1 for bankrupt
firms and 0 for the sound firms. However, the predicted values are not exactly equal to 1
or 0, so we need a rule for predicting which firms are bankrupt and which are sound. A
simple rule is to predict bankruptcy if the Predicted Bankrupt value is greater than 0.5
and predict soundness if the Predicted Bankrupt value is less than or equal to 0.5.
24.4 Categories as the Dependent Variable 301

7. Enter the label Classification in cell E1 and adjust the column width. To
classify the Predicted Bankrupt values, enter a formula in cell E2:
=IF(D2>0.5,1,0). The meaning of this formula is "If the Predicted Bankrupt
value is greater than 0.5, use the value 1; otherwise use the value 0."
8. Enter the label Correct in cell F1. To determine which firms were classified
correctly, enter a formula in cell F2: =IF(C2=E2,1,0). This means "If the actual
Bankrupt value equals the predicted classification, use the value 1; otherwise use
the value 0."
9. Select the two formulas (E2:F2). To copy the formulas to the other cells, double-
click the fill handle (small square in the lower-right corner of the selected
range).
10. To determine the total number of correct classifications, select cell F18 and click
the sum tool twice. The results are shown in Figure 24.7.

Figure 24.7 Bankruptcy Predictions

Interpretation of the Classifications


Using the break point 0.5 to determine the classification from the Predicted Bankrupt
values, the Correct values in Figure 24.7 show that observations in rows 3, 7, 11, and 15
are misclassified. Two of the seven bankrupt firms were misclassified, and two of the
nine sound firms were misclassified.
Overall, 12 of 16 firms (75%) were properly classified by the model. If this "hit rate" is
acceptable, then we could use the model to predict the soundness of another firm. We
302 Chapter 24 Regression Using Categorical Variables

would substitute the firm's financial ratios into our model, evaluate the regression
equation to obtain a fitted value, and predict bankruptcy if the fitted value exceeds 0.5.
Additional analysis could involve trying classification threshold values other than 0.5.
Such analysis could be automated using Excel's Data Table feature.

EXERCISES
Exercise 24.1 Refer to the real estate property data in Figure 24.1. Determine the selling
price per square foot of living space for each of the 15 properties. Develop a regression
model using indicator variables for construction grade to explain the variation in price per
square foot. Interpret the coefficients. What is the expected price per square foot for a
property with low construction grade?
Exercise 24.2 (adapted from Canavos, p. 607) A personnel recruiter for industry wishes
to identify the factors that explain the starting salaries for business school graduates. He
believes that a student's grade point average (GPA) and academic major are appropriate
explanatory variables.
GPA Major Starting Salary
2.95 Management $21,500
3.20 Management 23,000
3.40 Management 24,100
2.85 Accounting 24,000
3.10 Accounting 27,000
2.85 Accounting 27,800
2.75 Finance 20,500
3.10 Finance 22,200
3.15 Finance 21,800
Fit an appropriate model to these data, evaluate it, and interpret it. What is the expected
starting salary for an accounting major with a 3.00 GPA?
Exercises 303

Exercise 24.3 The performance of each production line employee in a manufacturing


plant has been classified as satisfactory or unsatisfactory. Each employee took pre-
employment tests for manual dexterity and analytic aptitude. The company wants to use
the test data to predict how future job applicants will perform.
Manual Analytic Satisfactory=1
Dexterity Aptitude Unsatisfactory=0
85 56 1
89 70 1
67 76 1
67 63 1
53 73 1
100 93 1
78 80 1
64 50 1
75 76 0
53 73 0
67 83 0
85 90 0
64 90 0
60 96 0
71 80 0
57 56 0
75 100 0
50 90 0
1. Use a regression model for discriminant analysis of these data.
2. What proportion of the employees are properly classified by the model?
3. If a prospective employee scores 75 on manual dexterity and 80 on analytic aptitude,
what is the predicted performance: satisfactory or unsatisfactory?
304 Chapter 24 Regression Using Categorical Variables

Exercise 24.4 A credit manager has classified each of the company's loans as being either
current or in default. For each loan, the manager has data describing the person's annual
income and assets (both in thousands of dollars) and years of employment. The manager
wants to use this information to develop a rule for predicting whether a loan applicant
will default.
Current=1
Years of Default=0
Income Assets Employment Performance
44 105 10 1
26 109 19 1
39 120 12 1
50 139 20 1
42 84 9 1
35 120 13 0
28 84 10 0
37 114 5 0
26 109 15 0
33 114 10 1
37 150 5 0
30 144 4 0
32 75 15 1
32 135 8 0
42 135 4 0
33 94 13 1
33 124 7 0
25 135 14 0
1. Use a regression model for discriminant analysis of these data.
2. What proportion of the loans is properly classified by the model?
3. If an applicant has $40,000 annual income, $100,000 assets, and 11 years of
employment, what is the predicted performance: current or default?
Regression Models for
Cross-Sectional Data
25
25.1 CROSS-SECTIONAL REGRESSION CHECKLIST
Plot Y versus each X
1 Verify that the relationship agrees with your prior judgment, e.g., positive vs
negative relationship, linear vs nonlinear, strong vs weak
2 Identify outliers or unusual observations and decide whether to exclude
3 Determine whether the relationship is linear; if not, consider using a nonlinear
form, e.g., quadratic (include X and X^2 in the model)

Examine the correlation matrix


4 Identify potential multicollinearity problems, i.e., high correlation between a
pair of X variables; if so, consider using only one X of the pair in the model

Calculate the regression model with diagnostics


5 Verify that the sign of each regression coefficient agrees with your prior
judgment, i.e., positive vs negative relationship; otherwise, consider excluding
that X and rerun the regression
6 Examine each plot of residuals vs X; if there is a non-random pattern (e.g., U-
shape or upside-down-U-shape), use a nonlinear form for that X in a new model
7 Identify key X variables by comparing standardized regression coefficients,
usually computed by multiplying an X coefficient by the standard deviation of
that X and dividing by the standard deviation of Y. This dimensionless
standardized regression coefficient measures how much Y (in standard deviation
units) is affected by a change in X (in standard deviation units).
306 Chapter 25 Regression Models for Cross-Sectional Data

8 If a goal is to find a model with small standard error of estimate (approx.


standard deviation of residuals), use the t-stat screening method. Disregard the t-
stat for the intercept. If there are X variables with a t-stat between -1 and +1,
remove the single X variable whose t-stat is closest to zero, and rerun the
regression. Remove only one X variable at a time.
9 Before using the final model, examine each plot of residuals vs X to verify that
the random scatter is the same for all values of X. If there is more scatter for
higher values of X, consider using a log transformation of X in the model
(instead of using X itself). If the scatter is not uniform with respect to X, the
standard error of estimate may not be a useful measure of uncertainty because it
overstates the uncertainty for some values of X and understates the uncertainty
for other values of X.

Use the model


10 If the purpose is to identify unusual observations, examine the residuals directly
for large negative or large positive values, or examine the standardized residuals
(each residual divided by the standard deviation of residuals) for values more
extreme than +2 or -2 or for values more extreme than +3 or -3.
11 If the purpose is to make predictions, use the X values for a new observation to
compute a predicted Y. Use the standard error of estimate to provide an interval
estimate, e.g., an approximate 95% prediction interval that ranges from two
standard errors below to two standard errors above the predicted Y. Avoid
extrapolation, i.e., do not make predictions using X values outside the range of
the original data.
Time Series Data
and Forecasts
26
26.1 TIME SERIES PATTERNS
Meandering time series pattern: Small changes from period to period, possible larger
changes over a longer period of time
Use an autoregressive model

Figure 26.1 Typical Meandering Time Series Pattern


Value

Time
308 Chapter 26 Time Series Data and Forecasts

Figure 26.2 Typical Long-Term Trend Time Series Patterns

Positive Nonlinear

Positive Linear
Value

Negative Linear

No Trend

Time

Figure 26.3 Typical Quarterly Seasonal Time Series with Linear Trend
Value

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Quarter
26.1 Time Series Patterns 309

Figure 26.4 Quarterly Seasonal Pattern with Nonlinear Trend


Value

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

Quarter

Strong seasonal pattern, no trend during first 12 quarters, positive trend during middle 12
quarters, no trend during last 12 quarters
310 Chapter 26 Time Series Data and Forecasts

This page is intentionally mostly blank.


Autocorrelation and
Autoregression
27
This chapter describes techniques for analyzing time sequence data that exhibit a non-
seasonal meandering pattern, where adjacent observations have values that are usually
close but distant observations may have very different values. Meandering patterns are
quite common for many economic time series, such as stock prices. If the time sequence
data have seasonality—that is, a recurring pattern over time—the techniques described in
Chapter 29 are appropriate.
To obtain the results shown in following figures, enter the month and wage data in
columns A and B of a worksheet as shown in Figure 27.1. For each type of analysis
described in this chapter, create a copy of the original data by choosing Move or Copy
Sheet from the Edit menu, checking the Create a Copy checkbox, and clicking OK.

Figure 27.1 Wage Data and Time Sequence Plot

The first step is to examine a time sequence plot. Select the wage data, and use Excel's
Chart Wizard to create a Line chart type. Figure 27.1 shows the data and a plot of average
312 Chapter 27 Autocorrelation and Autoregression

hourly wages of textile and apparel workers for the 18 months from January 1986
through June 1987. These data are the last 18 values from the 72-value data file
APAWAGES.DAT that accompanies Cryer, second edition; the original source is Survey
of Current Business, September issues, 1981–1987.

27.1 LINEAR TIME TREND


Initial inspection of the time sequence plot in Figure 27.1 suggests that a straight-line fit
may be an appropriate model. To obtain the results shown in following figures, create a
copy of the data shown in Figure 27.1. From the Tools menu, choose Data Analysis. In
the Data Analysis dialog box, select Regression from the Analysis Tools list box and
click OK. In the Regression dialog box, the Input Y Range is B1:B19 and the Input X
Range is A1:A19. Check the Labels box. Click the Output Range option button, select the
adjacent text box, and specify D1. Check the Residuals and Line Fit Plots checkboxes in
the Residuals section. Then click OK. (If the error message "Cannot add chart to a shared
workbook" appears, click Cancel; in the Regression dialog box, click New Workbook in
the Output options, and click OK.) An edited portion of the regression output is shown in
Figure 27.2.

Figure 27.2 Simple Linear Regression Output

The R-square value indicates that approximately 73% of the variation in wages can be
explained using a linear time trend. The regression model is Fitted Wage = 5.7709 +
0.0095 * Month, indicating that wages increase by 0.0095 dollars per month, on the
average. The t statistic and p-value verify that there is a significant linear relationship.
The R Square, t statistic, and p-value indicate an excellent fit, but the line fit plot shown
in Figure 27.3 shows that the regression model assumption of independent residuals may
be violated. When wages are above the linear time trend, they tend to stay above, and
when they are below the trend line, they tend to stay below. In other words, if the
previous residual is positive, the current residual is likely to be positive, and if the
previous residual is negative, the current residual is likely to be negative. Thus, the
residuals are not independent. Successive residuals in this model are positively
27.2 Durbin-Watson Statistic 313

correlated. This "stickiness" is positive autocorrelation, which can be quantified using the
Durbin-Watson statistic.

Figure 27.3 Time Sequence Plot and Linear Fit

27.2 DURBIN-WATSON STATISTIC


The Durbin-Watson statistic may be used to test for correlation of successive residuals in
a time series model. The statistic is calculated by first determining the difference between
successive residuals. For example, in Figure 27.4, we could compute F26 – F25, F27 –
F26, F28 – F27, and so on. These differences are squared and then summed to determine
the numerator of the Durbin-Watson statistic. In Excel, the numerator can be computed
using the SUMXMY2 function, where XMY2 means the square of x minus y. The
denominator of the Durbin-Watson statistic is the sum of the squared residuals, which can
be computed using Excel's SUMSQ function. Both functions accept arrays as arguments.

Figure 27.4 Residual Output and Durbin-Watson Statistic


314 Chapter 27 Autocorrelation and Autoregression

For the linear time trend model, the residuals are in cells F25:F42. In Figure 27.4, cell
H25 contains the following formula for computing the Durbin-Watson statistic:
=SUMXMY2(F26:F42,F25:F41)/SUMSQ(F25:F42)
In general, for time periods 1 through n, the first argument for SUMXMY2 is the range
containing residuals for periods 2 through n, and the second argument is the range for
residuals for periods 1 through n – 1. The argument for SUMSQ is the range containing
residuals for periods 1 through n.
The possible values of the Durbin-Watson statistic range from 0 to 4. Values close to 0
indicate strong positive autocorrelation; a value of 2 indicates zero autocorrelation;
values near 4 indicate strong negative autocorrelation. Here the value 1.050 shows that
there is some positive autocorrelation of residuals.

27.3 AUTOCORRELATION
The Durbin-Watson statistic measures autocorrelation of residuals associated with a
model. It is often useful to examine the correlation of time series values with themselves
before modeling. This approach looks at the correlation between current and previous
values. The previous values are called lagged values, and the number of time periods
between each current and previous value is the lag length. For example, values that are
one time period before the current values are called lag 1; values that are two periods
earlier are called lag 2.
The following steps describe how to construct an autocorrelation plot for lag 1.
1. Enter the month and wage data in columns A and B of a sheet as shown in
Figure 27.1 or copy previously entered data to a new sheet.
2. Select column B, right-click, and choose Insert from the shortcut menu.
3. Type the label Lag 1 in cell B1.
4. Select cells C2:C18 containing the first 17 wage values, right-click, and choose
Copy from the shortcut menu.
5. Select cell B3, right-click, and choose Paste from the shortcut menu. The top
section of the sheet appears as shown in Figure 27.5.
27.3 Autocorrelation 315

Figure 27.5 Arranging Lag 1 Data

6. Select row 2, right-click, and choose Delete from the shortcut menu. The results
appear as shown in columns A, B, and C in Figure 27.6.
7. To calculate the correlation coefficient, enter the label CORREL= in cell F1
and enter the formula =CORREL(B2:B18,C2:C18) in cell G1. The value of the
correlation coefficient, r = 0.8545, appears in cell G1 as shown in Figure 27.6.
8. To prepare the chart, select cells B2:C18 and click the Chart Wizard button.
9. In step 1 (Chart Type) of the Chart Wizard on the Standard Types tab, click the
XY (Scatter) chart type and click Next. In step 2 (Chart Source Data), verify the
data range and click Next. In step 3 (Chart Options) on the Titles tab, type chart
and axis titles as shown in Figure 27.6; on the Gridlines tab, clear all
checkboxes; on the Legend tab, clear the checkbox for Show Legend and click
Finish.
10. To facilitate interpreting the autocorrelation plot, change its size and axes. Use
the handles on the outermost edge of the chart to obtain a nearly square shape.
For both the vertical axis and the horizontal axis, select the axis, double-click or
right-click and choose Format Axis from the shortcut menu, click the Scale tab,
change Minimum to 5.7, change Maximum to 6, change Major Unit to .05, and
click OK. Change font size of the axes and titles to 8. The result appears as
shown in Figure 27.6.
316 Chapter 27 Autocorrelation and Autoregression

Figure 27.6 Lagged Data and Autocorrelation Plot

The autocorrelation plot shown in Figure 27.6 shows relatively strong correlation
between current wage and one-month previous wage. When the wage is low in a
particular month, it is likely that it will be low in the following month; when the wage is
high in a particular month, it is likely to be high in the following month.

27.4 AUTOREGRESSION
A regression model may be used to quantify the functional relationship between current
and previous values of time sequence data. When regression is used to analyze data that
exhibit autocorrelation, the technique is called autoregression, and the model is called an
autoregressive model. If only one-period lagged data are used for the explanatory
variable, the model is called an AR(1) model.
To develop an AR(1) for the wage data, prepare the autocorrelation plot described in the
previous section. Right-click on a data point and choose Add Trendline from the shortcut
menu. In the Add Trendline dialog box, click the Type tab and click the Linear icon.
Click the Options tab and click the checkboxes for Display Equation on Chart and
Display R-squared Value on Chart. Then click OK. Optionally, click and drag to relocate
the equation and R2. The results appear as shown in Figure 27.7.
27.4 Autoregression 317

Figure 27.7 AR(1) Model Using Add Trendline

The linear fit equation could be written as Wage = 0.8253 + 0.86 * Lag 1, or Current =
0.8253 + 0.86 * Previous, or Yt = 0.8253 + 0.86 * Yt – 1. The R2 value indicates that
approximately 73% of the variation in wages can be explained using this simple linear
autoregressive model.
A forecast of wage for period 19 can be expressed as Y19 = 0.8253 + 0.86 * Y18 = 0.8253
+ 0.86 * 5.91 = 5.9079. A forecast for period 20 could be based on the forecast for period
19: Y20 = 0.8253 + 0.86 * Y19 = 0.8253 + 0.86 * 5.9079 = 5.9061. Of course, the likely
error increases for forecasts made further into the future. To quantify the error, to obtain
additional diagnostics, and to plot fitted and actual values in a time sequence plot, use the
Regression analysis tool.
If a blank sheet is needed, choose Worksheet from the Insert menu. Copy the data shown
in columns A, B, and C in Figure 27.7, select a blank worksheet, select cell A1, and
Paste. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box,
click Regression in the Analysis Tools list box and click OK.
In the Regression dialog box, the Input Y Range is C1:C18 and the Input X Range is
B1:B18. Check the Labels checkbox. The Output Range is E1. Check the Residuals and
Residual Plots checkboxes. Then click OK. (If the error message "Cannot add chart to a
shared workbook" appears, click Cancel; in the Regression dialog box, click New
Workbook in the Output Options, and click OK.) The results are shown in Figure 27.8.
318 Chapter 27 Autocorrelation and Autoregression

Figure 27.8 AR(1) Model Using Regression Tool

Referring to cell F7 in Figure 27.8, the standard error of estimate for this AR(1) model is
0.03235, slightly larger than the standard error for the linear time trend model, 0.0319.
Thus, an approximate 95% prediction interval uses the previously calculated point
estimate plus or minus six cents (two standard errors = 2 * $0.03235 = $0.0647). The
residual plot, not shown here, has an essentially random pattern, indicating that the linear
relationship between wage and lag 1 is appropriate.
27.4 Autoregression 319

The following steps describe how to construct a time sequence plot showing actual and
fitted values.
1. Select C1:C18 and hold down the Control key while selecting F24:F41. Click
the Chart Wizard tool.
2. In step 1 of the Chart Wizard (Chart Type) on the Standard Types tab, select
Line for the chart type, select "Line with markers displayed at each data value"
for the chart sub-type, and click Next.
3. In step 2 (Chart Source Data) on the Series tab, select the range edit box for
Category (X) Axis Labels, click and drag cells A2:A18 on the worksheet, and
click Next.
4. In step 3 (Chart Options) on the Titles tab, type the chart and axis titles as shown
in Figure 27.9. On the Gridlines tab, uncheck all boxes and click Finish.
5. Select the horizontal axis and double-click, or right-click and choose Format
Axis from the shortcut menu. In the Format Axis dialog box, click the
Alignment tab, select the Degrees edit box, type 0 (zero), and click OK.
6. Select the Predicted Wage data series by clicking one of its markers on the chart.
Right-click and choose Format Data Series from the shortcut menu. In the
Format Data Series dialog box, click the Patterns tab. For Line, click the Custom
button and select the small dashed-line pattern from the Line Style drop-down
list box. Click OK.
7. Use the chart's fill handles to resize the chart to be approximately 8 standard
columns wide and 17 rows high. Change the font size of the chart title, axis
titles, axes, and legend to 8. The chart appears as shown in Figure 27.9.
320 Chapter 27 Autocorrelation and Autoregression

Figure 27.9 Time Sequence Plot and AR(1) Fit

Each Predicted Wage value shown in Figure 27.9 depends upon the actual wage in the
previous month. The standard error of estimate is a summary measure of the vertical
distances between the actual wage and predicted wage for each month.

27.5 AUTOCORRELATION COEFFICIENTS FUNCTION


Autocorrelation coefficients are useful for measuring autocorrelation at various lags. The
results may be used as a guide for determining the appropriate number of lagged values
for explanatory variables in an autoregressive model. A function that provides the
autocorrelation coefficients for any specified lag is called an autocorrelation coefficients
function (ACF). A plot of autocorrelation coefficients versus lags is called a correlogram.
The following steps describe how to calculate autocorrelation coefficients.
1. Enter the month and wage data in columns A and B, or make a copy of the data
shown in Figure 27.1.
2. Enter the label Z in cell C1. Select cells B1:C19 and from the Insert menu
choose Name | Create. In the Create Names dialog box, check the Top Row
checkbox and click OK. This step creates the name Wage for the range B2:B19
and the name Z for the range C2:C19.
3. Select cell C2 and enter the formula
=(B2-AVERAGE(Wage))/STDEV(Wage).
27.5 Autocorrelation Coefficients Function 321

With cell C2 selected, double-click the fill handle in the lower-right corner.
With cells C2:C19 still selected, click the Decrease Decimal button repeatedly
until three decimal places are displayed.
4. Enter the labels Lag and ACF in cells E1 and F1, respectively. Enter the digits 1
through 6 in cells E2:E7. (Here we examine only the first 6 lags. For monthly
data where seasonality is expected, the first 12 lags should be investigated.)
5. Select cell F2. Enter the formula
=SUMPRODUCT(OFFSET(Z,E2,0,18-E2),OFFSET(Z,0,0,18-E2))/17.
With cell F2 selected, double-click the fill handle in the lower-right corner. With
cells F2:F7 still selected, click the Decrease Decimal button repeatedly until
three decimal places are displayed. The results appear as shown in columns A:F
in Figure 27.10. (To adapt the formula to other data, use the number of
observations, n, instead of 18, and use n–1 instead of 17.)
6. To create the correlogram, select cells F2:F7 and click the Chart Wizard tool.
7. In step 1 of the Chart Wizard (Chart Type), select Column as the chart type and
Clustered Column as the chart sub-type, and click Next. In step 2 (Chart Source
Data), verify the data range and click Next. In step 3 (Chart Options) on the
Titles tab, type the chart and axis titles shown in Figure 27.10; on the Gridlines
tab, clear all checkboxes; on the Legend tab, clear the checkbox for Show
Legend, and click Finish.
8. Double-click the vertical axis, or right-click and choose Format Axis from the
shortcut menu. In the Format Axis dialog box, click the Scale tab; click
Minimum and type –0.2; click Maximum and type 1; click Major Unit and type
0.2; click OK.
9. Double-click the horizontal axis, or right-click and choose Format Axis from the
shortcut menu. In the Format Axis dialog box, click the Patterns tab; in the Tick-
Mark Labels section click Low and click OK. The correlogram appears as
shown in Figure 27.10.
The lag 1 autocorrelation coefficient 0.822 shown in Figure 27.10 differs slightly from
the regular correlation coefficient 0.8545 for current and lag 1 shown in cell G1 in Figure
27.6. One of the reasons is that the autocorrelation coefficient uses z values for current
and lag based on the mean and standard deviation of all 18 observations, but the regular
correlation coefficient computes z values using the first 17 observations for current and
using the last 17 for lag. The autocorrelation coefficients for wages decrease gradually,
indicating that it may be worthwhile to investigate autoregressive models incorporating
lagged values beyond lag 1.
322 Chapter 27 Autocorrelation and Autoregression

Figure 27.10 Autocorrelation Coefficients Function (ACF)

27.6 AR(2) MODEL


The autocorrelation coefficients computed in the previous section are 0.822 for lag 1 and
0.664 for lag 2, suggesting that the autoregressive model might be improved by using
both lag 1 and lag 2 as explanatory variables.
The following steps describe how to arrange the data for an AR(2) model.
1. Enter the month and wage data in columns A and B, or make a copy of the data
shown in Figure 27.1.
2. Select columns B and C. Right-click and choose Insert from the shortcut menu.
3. Enter the labels Lag 1 and Lag 2 in cells B1 and C1, respectively.
4. Copy the wage data in cells D2:D18, select cell B3, and paste.
5. Copy the wage data in cells D2:D17, select cell C4, and paste. The top portion
of the worksheet appears as shown in Figure 27.11.
27.6 AR(2) Model 323

Figure 27.11 Arranging Lag 2 Data

6. Select rows 2 and 3. Choose Delete from the shortcut menu. Columns A through
D appear as shown in Figure 27.12.
After arranging the data, from the Tools menu choose Data Analysis. In the Data
Analysis dialog box, click Regression in the Analysis Tools list box and click OK. In the
Regression dialog box, the Input Y Range is D1:D17 and the Input X Range is B1:C17.
Check the Labels checkbox. The Output Range is F1. Optionally, select outputs in the
Residuals section and click OK. Formatted and edited results without the ANOVA table
are shown in Figure 27.12.

Figure 27.12 AR(2) Data and Edited Regression Tool Output

Compared to the AR(1) model, this AR(2) model has a slightly higher standard error of
estimate and a lower adjusted R2. The t statistic for the Lag 2 explanatory variable is
0.16251, indicating that the Lag 2 regression coefficient is not significantly different from
zero. After taking lag 1 into account, the addition of lag 2 is not useful for explaining the
variation in wages.
324 Chapter 27 Autocorrelation and Autoregression

EXERCISES
Exercise 27.1 (adapted from Keller, p. 930) As a preliminary step in forecasting future
values, a large mail-order retail outlet has recorded the sales figures, in millions of
dollars, shown in the following table.
Year Sales Year Sales
1974 6.7 1984 14.2
1975 7.4 1985 18.1
1976 8.5 1986 16.0
1977 11.2 1987 11.2
1978 12.5 1988 14.8
1979 10.7 1989 15.2
1980 11.9 1990 14.1
1981 11.4 1991 12.2
1982 9.8 1992 15.7
1983 11.5
1. Fit a linear time trend and compute the Durbin-Watson statistic.
2. Construct an autocorrelation plot and develop an autoregressive model.
3. Make forecasts for 1993 using the linear time trend and autoregressive model.
Exercise 27.2 The following table shows annual sales in thousands of units for a new
product from the Ekans company.
Year Sales Year Sales Year Sales
1980 36 1985 61 1990 79
1981 44 1986 63 1991 87
1982 52 1987 66 1992 97
1983 56 1988 69 1993 101
1984 58 1989 73 1994 103
1. Fit a linear time trend and compute the Durbin-Watson statistic.
2. Calculate values of the autocorrelation function for lags 1 through 6.
3. Try autoregressive models AR(1), AR(2), AR(3), and AR(4). Which of these models
is most appropriate?
Time Series Smoothing
28
This chapter describes two methods for smoothing time series data: moving averages and
exponential smoothing. The purpose of smoothing is to eliminate the irregular and
seasonal variation in the data so it's easier to see the long-run behavior of the time series.
The long-run pattern is called the trend, and it may also include variation due to the
business cycle. The smoothed version of the data may be used to make a forecast of
trend, or it may be used as part of the analysis of seasonality, as described in Chapter 29.
The data set used for moving averages in this chapter and for seasonal analysis in Chapter
29 is quarterly U.S. retail sales, in billions of dollars, from first quarter 1983 through
fourth quarter 1987. These data, shown in column C of Figure 28.1, are a quarterly
aggregation of the monthly data in the file RETAIL.DAT that accompanies the second
edition of Cryer; the original source is Survey of Current Business, 1987.
326 Chapter 28 Time Series Smoothing

Figure 28.1 Labels and Sales Data

The following steps describe how to construct a time sequence plot using two lines
(quarter and year) for labeling the horizontal axis.
1. Enter the labels Year, Quarter, and Sales in row 1 and enter the years, quarters,
and sales data in columns A, B, and C.
2. Select cells A1:C21 and click the Chart Wizard button.
3. In step 1 (Chart Type) of the Chart Wizard on the Standard Types tab, select
Line for chart type and "Line with markers displayed at each data value" for
chart sub-type. Click Next.
4. In step 2 (Chart Source Data), verify the data range and click Next.
5. In step 3 (Chart Options) on the Titles tab, type the chart and axis titles shown in
Figure 28.2. On the Gridlines tab, clear all checkboxes. On the Legend tab, clear
the checkbox for Show Legend. Click Finish.
6. Click and drag the sizing handles so that the chart is approximately 9 columns
wide and 20 rows high.
7. To change the font size of the chart title, axis titles, and axes to 7, select each
object, click the Font Size tool on the Formatting toolbar, and enter 7.
28.1 Moving Average Using Add Trendline 327

8. Select the vertical axis and click the Decrease Decimal button.
9. Double-click the vertical axis; in the Format Axis dialog box on the Scale tab,
enter 200 for the Minimum.
10. Double-click the horizontal axis; in the Format Axis dialog box on the
Alignment tab, enter 0 (zero) in the Degrees edit box. The chart appears as
shown in Figure 28.2.

Figure 28.2 Time Sequence Plot of Sales Data

Quarterly U.S. retail sales exhibit strong seasonality with an upward linear trend. A
moving average may be used to eliminate the seasonal variation so the trend is even more
apparent.

28.1 MOVING AVERAGE USING ADD TRENDLINE


The following steps describe how to insert the moving average line on the time sequence
chart.
1. Right-click one of the markers of the data series and choose Add Trendline from
the shortcut menu.
2. In the Add Trendline dialog box on the Type tab, click the Moving Average icon
and enter 4 as the Period, as shown in Figure 28.3. The moving average line
appears on the chart as shown in Figure 28.4.
328 Chapter 28 Time Series Smoothing

Figure 28.3 Add Trendline Dialog Box

The first moving average shown in Figure 28.4 is an average of the first four quarters and
is associated with 1983 quarter IV. The period is specified as 4 in this example because
the repeating pattern is four quarters long. If the time series data are monthly, the period
is usually 12. If daily data have a recurring pattern each week, the period should be 7.

Figure 28.4 Time Sequence Plot with Moving Average

When the Add Trendline command is used to obtain the moving average, the default
pattern is a medium-weight line as shown in Figure 28.4. The style and weight of the line
may be changed by double-clicking on the moving average line, but it isn't possible to
add markers. Also, there is no way to access the values that Excel uses to plot the moving
average.
28.2 Moving Average Data Analysis Tool 329

28.2 MOVING AVERAGE DATA ANALYSIS TOOL


The following steps describe how to obtain the moving average values and a chart.
1. Copy the labels and sales data shown in Figure 28.1 to a new worksheet. Enter
the label MovAvg in cell D1 and the label StdError in cell E1.
2. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box,
click Moving Average in the Analysis Tools list box, and click OK. The Moving
Average dialog box appears as shown in Figure 28.5.

Figure 28.5 Moving Average Dialog Box

3. Make entries in the Moving Average dialog box as shown in Figure 19.5. Then
click OK. (If you receive the error message "Cannot add chart to a shared
workbook," click the OK button. To construct a line chart, select the Sales and
Moving Average data and click the Chart Wizard button.) The output appears in
columns D and E, as shown in Figure 28.6.
4. Double-click the vertical axis. In the Format Axis dialog box on the Scale tab,
click the Minimum edit box and enter 200. The results appear as shown in
Figure 28.6.
330 Chapter 28 Time Series Smoothing

Figure 28.6 Output of Moving Average Analysis Tool

The Moving Average analysis tool puts formulas in the worksheet. Cell D5 contains the
formula =AVERAGE(C2:C5), cell D6 contains =AVERAGE(C3:C6), and so on. Each
average uses four values: the current sales and the three previous sales.
Cell E8 contains the formula =SQRT(SUMXMY2(C5:C8,D5:D8)/4). The
SUMXMY2(C5:C8,D5:D8) portion of this formula computes the difference between the
smoothed values in cells D5:D8 and the actual values in cells C5:C8, squares each of the
four differences, and sums the squared differences. Each of the standard error values in
column E is based on the four most recent values.
A simplistic forecasting model could use the last moving average, 376.8, as a forecast for
the next quarter's trend, with the standard error, 23.7, as a measure of uncertainty. A
forecast of the seasonal component could be combined with this trend forecast to obtain a
more accurate prediction of next quarter's sales.

28.3 EXPONENTIAL SMOOTHING TOOL


The moving average approach to smoothing uses a specified number of actual values to
obtain the smoothed result. For seasonal data, the number of values in each average is
usually set equal to the cycle length. For example, for quarterly data, four actual values
are used to calculate the smoothed value.
28.3 Exponential Smoothing Tool 331

Instead of using a finite number of values, the exponential smoothing approach


theoretically uses the entire past history of the actual time series values to compute
smoothed values. Practically, the smoothed or forecast values are calculated using a
simple recursive formula:
Forecastt+1 = Alpha * Actualt + (1 – Alpha) * Forecastt
where alpha is a number between 0 and 1 called the smoothing constant. To apply this
formula to actual values, we must choose an initial forecast value and an appropriate
value of alpha.
Excel uses the term damping factor for the quantity (1 – alpha). Thus, to obtain
exponential smoothed forecasts using a smoothing constant, alpha, equal to 0.1, we must
specify a value for the damping factor equal to 0.9.
The following data are based on quarterly Iowa nonfarm income per capita from the data
file IOWAINC.DAT that accompanies the Cryer textbook. The values shown in column
B of Figure 28.7 are percent changes, rounded to one decimal place, using the last 18
periods.

Figure 28.7 Data and Output for Smoothing Constant 0.1

The following steps describe how to use the Exponential Smoothing analysis tool without
specifying an initial smoothed value.
1. Enter the Quarter and Actual labels and data in columns A and B of a new
worksheet as shown in Figure 28.7. Enter the label Forecast in cell C1 and the
label StdError in cell D1.
332 Chapter 28 Time Series Smoothing

2. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box,
click Exponential Smoothing in the Analysis Tools list box and click OK. The
Exponential Smoothing dialog box appears as shown in Figure 28.8.

Figure 28.8 Exponential Smoothing Dialog Box

3. Make entries in the Exponential Smoothing dialog box as shown in Figure 28.8.
Then click OK. The output appears in columns C and D, with the chart output to
the right. Adjust the size of the chart by clicking and dragging a handle on the
border to obtain the results shown in Figure 28.7.
The Exponential Smoothing analysis tool puts formulas in the worksheet. The actual
value in the first period is used as the forecast for the second period. That is, cell C3
contains the formula =B2. The forecast for the third period uses the actual value and
forecast from the second period in the recursive formula; cell C4 contains the formula
=0.1*B3+0.9*C3. In general, the forecast for a specific period is based on the actual and
forecast values from the previous period.
The damping factor specified here is 0.9, so the smoothing constant alpha is 0.1. To
obtain a forecast, the most recent actual value receives weight 0.1 in the recursive
formula. Because this weight is relatively small, the smoothed values respond very
slowly to changes in the actual values.
Cell D6 contains the formula =SQRT(SUMXMY2(B3:B5,C3:C5)/3). Each of the
standard error values in column D is based on the three previous actual values and
forecasts.
To obtain a forecast for quarter 19, a simplistic forecasting model could use the actual
and forecast values from quarter 18 in the recursive formula: 0.1 * 1.3 + 0.9 * 2.669 =
2.532. This forecast could be obtained by selecting cell C19 and dragging the fill handle
in the lower-right corner down to cell C20, which then contains the copied formula
=0.1*B19+0.9*C19, with the result 2.532.
Exercises 333

EXERCISES
Exercise 28.1 (adapted from Mendenhall, p. 635) The week's end closing prices for the
securities of the Color-Vision Company, a manufacturer of color television sets, have
been recorded over a period of 30 consecutive weeks as shown in the following table.
Week Price Week Price Week Price
1 $71 11 $75 21 $72
2 70 12 70 22 73
3 69 13 75 23 72
4 68 14 75 24 77
5 64 15 74 25 83
6 65 16 78 26 81
7 72 17 86 27 81
8 78 18 82 28 85
9 75 19 75 29 85
10 75 20 73 30 84
1. Determine the five-week moving average.
2. Use exponential smoothing with smoothing constant, alpha, of 0.1.
3. Use exponential smoothing with smoothing constant, alpha, of 0.5.
4. Which of the three smoothing results are most appropriate for detecting the long-
term trend for these data?
Exercise 28.2 (adapted from Mendenhall, p. 638) The following table shows gross
monthly sales revenue, in thousands of dollars, of a pharmaceutical company from
January 1989 through December 1992.
Year
Month 1989 1990 1991 1992
January 18.0 23.3 24.7 28.3
February 18.5 22.6 24.4 27.5
March 19.2 23.1 26.0 28.8
April 19.0 20.9 23.2 22.7
May 17.8 20.2 22.8 19.6
June 19.5 22.5 24.3 20.3
July 20.0 24.1 27.4 20.7
August 20.7 25.0 28.6 21.4
September 19.1 25.2 28.8 22.6
October 19.6 23.8 25.1 28.3
November 20.8 25.7 29.3 27.5
December 21.0 26.3 31.4 28.1
334 Chapter 28 Time Series Smoothing

1. Construct a time sequence plot of the monthly sales revenue.


2. To help identify the long-term trend, smooth the time series using a three-month
moving average.
3. Smooth the time series using exponential smoothing with smoothing constant, alpha,
of 0.1.
4. Smooth the time series using exponential smoothing with smoothing constant, alpha,
of 0.3.
Time Series Seasonality
29
This chapter describes three methods for analyzing seasonal patterns in time series data.
These methods may be used whenever the data have a pattern that repeats itself on a
regular basis. These recurring patterns are often associated with the seasons of the year,
but the same methods of analysis may be applied to any systematic, repeating pattern.
The first two methods use regression: regression using indicator variables and autoregres-
sion. The focus of the third method is determining seasonal indexes: classical time series
decomposition. The three methods are illustrated using quarterly U.S. retail sales, in
billions of dollars, from first quarter 1983 through fourth quarter 1987. To develop Figure
29.1, select A2:C21 and use the Chart Wizard to create a line chart.

Figure 29.1 Labels, Data, and Time Sequence Plot

The time series shown in Figure 29.1 has a strong seasonal pattern with an upward trend.
Sales are consistently highest in quarter IV of each year and lowest in quarter I. The trend
appears to be linear.
336 Chapter 29 Time Series Seasonality

29.1 REGRESSION USING INDICATOR VARIABLES


Retail sales may be analyzed using a multiple regression model including both the trend
and seasonal components. The trend component may be modeled as a linear time trend
using the data shown in column D in Figure 29.2. The seasonal component may be
described using seasonal indicator variables. As shown in columns E:H in Figure 29.2,
one of four possible categories (Winter, Spring, Summer, and Fall, corresponding to
quarters I, II, III, and IV) is associated with each observation. The number of indicator
variables included in the multiple regression model is one less than the number of
categories being modeled, so three indicator variables are used. If the data are monthly,
11 indicator variables are used.

Figure 29.2 Data for Regression

The following steps describe how to develop a regression model with linear time trend
and seasonal indicator variables.
1. Enter the labels and data shown in Figure 29.2. (Enter 1 and 2 in cells D2:D3,
select D2:D3, and double-click the fill handle. Enter the zero-one pattern in cells
E2:H5, copy, and paste to cells E6, E10, E14, and E18.)
2. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box,
select Regression from the Analysis Tools list box and click OK. The
Regression dialog box appears as shown in Figure 29.3.
29.1 Regression Using Indicator Variables 337

Figure 29.3 Regression Dialog Box

3. In the Regression dialog box, the Input Y Range is C1:C21 and the Input X
Range is D1:G21. (It is important to include only three of the four indicator
variables as x variables for the regression model.) Check the Labels box. Click
the Output Range option button, select the adjacent text box, and specify J1.
Check all checkboxes in the Residuals section. Then click OK. (If the error
message "Cannot add chart to a shared workbook" appears, click Cancel; in the
Regression dialog box, click New Workbook in the Output Options, and click
OK.) An edited portion of the regression output is shown in Figure 29.4.

Figure 29.4 Edited Portion of Regression Summary Output


338 Chapter 29 Time Series Seasonality

The Coefficients section of the output in Figure 29.4 shows that the fitted equation is
Sales = 311.005 + 5.106*Time – 56.601*Winter – 19.387*Spring – 22.574*Summer.
After taking seasonality into account, retail sales increase by 5.106 billions of dollars per
quarter, on the average. The Fall quarter indicator variable was not included in the
regression input, so the Fall seasonal effect is included in the constant term 311.005. The
coefficient for the Winter indicator variable tells us that retail sales in the Winter quarter
are 56.601 billions of dollars less than sales in the Fall, on the average. Similarly, the
seasonal effect of Spring relative to Fall is measured by the –19.387 coefficient, and the
effect of Summer relative to Fall is measured by the –22.574 coefficient.
R square indicates that approximately 98.2% of the variation in retail sales can be
explained using linear time trend and seasonal indicators. The standard error of the
residuals is 6.089 billions of dollars, which may be loosely interpreted as the error
associated with predictions using this model. The absolute values of the t statistics are far
greater than two, and the related p-values are less than 0.0005, indicating significant
relationships between each explanatory variable and retail sales.
The Regression analysis tool's line fit plot for explanatory variable Time shows the actual
and fitted values in a time sequence plot. The following steps describe some
embellishments to obtain the chart shown in Figure 29.5.
4. Click and drag the chart sizing handles so that the chart is approximately 10
columns wide and 20 rows high. Change the font size to 10 for the chart title,
axis titles, axes, and legend.
5. Select the vertical axis. Double-click, or right-click and choose Format Axis
from the shortcut menu. In the Format Axis dialog box, click the Scale tab.
Click Minimum and type 200. Click Maximum and type 450. Click OK.
6. Select the horizontal axis. Double-click, or right-click and choose Format Axis
from the shortcut menu. In the Format Axis dialog box, click the Scale tab.
Click Minimum and type 1. Click Maximum and type 20. Click Major Unit and
type 1. Click OK.
7. Click one of the square markers associated with the Predicted Sales data series,
or use the up and down arrow keys to select the series. The formula bar shows
=SERIES("Predicted Sales",...). Double-click, or right-click and choose Format
Data Series from the shortcut menu. In the Format Data Series dialog box, click
the Patterns tab. Click Automatic for Line, click None for Marker, and click OK.
The chart appears as shown in Figure 29.5.
29.1 Regression Using Indicator Variables 339

Figure 29.5 Formatted Regression Chart Output

A forecast of retail sales in quarter 21 (Winter 1988) is obtained by setting Time = 21,
Winter = 1, Spring = 0, and Summer = 0. Referring to the fitted equation,
predicted Sales = 311.005 + 5.106 * 21 – 56.601 * 1 – 19.387 * 0 – 22.574 * 0
= 311.005 + 107.226 – 56.601 – 0 – 0
= 361.63 billions of dollars.
Forecasts for individual quarters may be calculated in a similar manner.
To calculate fitted values and forecasts for a large number of quarters, the TREND
function is convenient. The following steps describe how to obtain fitted values for the
first 20 quarters and forecasts for the next 4 quarters.
8. Copy cells A18:B21 and paste into cell A22. Enter 1988 in cell A22.
9. Select cells D20:D21 and drag the fill handle down to cell D25.
10. Copy cells E18:H21 and paste into cell E22.
11. Enter the label Forecast in cell I1.
12. Select cells I2:I25. Click the Insert Function tool button (icon fx). In the Insert
Function dialog box, select Statistical in the category list box, select TREND in
the function list box, and click OK. In the TREND dialog box, fill in the dialog
box as shown in Figure 29.6 and click OK.
340 Chapter 29 Time Series Seasonality

Figure 29.6 TREND Function Dialog Box

13. With I2:I25 selected, press F2 (or click in formula bar). To array-enter the
formula, hold down the Control and Shift keys and press Enter. Click the
Decrease Decimal button to display one decimal place. The results appear as
shown in Figure 29.7.

Figure 29.7 Forecast Using TREND Function


29.1 Regression Using Indicator Variables 341

The forecasts for the next four quarters are shown in cells I22:I25 in Figure 29.7. The
forecast for quarter 21 (Winter 1988) using TREND agrees with the value calculated
earlier using the fitted equation from the Regression analysis tool: 361.6 billions of
dollars.
The following steps describe how to prepare a time sequence plot showing the actual,
fitted, and forecast values.
14. Select cells C1:C25. Hold down the Control key and select I1:I25. Click the
Chart Wizard button.
15. In step 1 of the Chart Wizard (Chart Type) on the Standard Types tab, select
Line for chart type and "Line with markers displayed at each data value" for
chart sub-type. Click Next.
16. In step 2 (Chart Source Data) on the Series tab, select the range edit box for
Category (X) Axis Labels, and click and drag A2:B25. Click Next.
17. In step 3 (Chart Options) on the Titles tab, type the chart and axis labels shown
in Figure 29.8. On the Gridlines tab, clear all checkboxes. Click Finish.
18. Double-click the vertical axis, or right-click and choose Format Axis from the
shortcut menu. In the Format Axis dialog box, click the Scale tab; click
Minimum and type 200; click Maximum and type 500; click Major Unit and
type 50; click OK.
19. Double-click the horizontal axis, or right-click and choose Format Axis from the
shortcut menu. In the Format Axis dialog box on the Alignment tab, select the
Degrees edit box, type 0 (zero), and click OK.
20. Click on a data point or use the up and down arrow keys to select the actual
sales data series. Double-click, or right-click and choose Format Data Series
from the shortcut menu. In the Format Data Series dialog box, click the Patterns
tab. Click None for Line, click Automatic for Marker, and click OK.
21. Click on a data point or use the up and down arrow keys to select the forecast
data series. Double-click, or right-click and choose Format Data Series from the
shortcut menu. In the Format Data Series dialog box, click the Patterns tab.
Click Automatic for Line, click None for Marker, and click OK.
22. To format the chart as shown in Figure 29.8, click and drag the chart sizing
handles so that the chart is approximately 10 standard columns wide and 20
rows high. Change the font size to 8 for the chart title, axis titles, and legend.
Change the font size to 6 for the axes.
342 Chapter 29 Time Series Seasonality

Figure 29.8 Time Sequence Plot with Forecast

29.2 AR(4) MODEL


Seasonal autoregression is an alternative to using indicator variables to model
seasonality. The general idea is to relate values in the current period to values with an
appropriate lag. For seasonal quarterly data, we expect current Winter sales to be
correlated with the previous year's Winter sales. Autocorrelation and autoregression are
discussed in Chapter 27, which includes details for calculating the autocorrelation
coefficients function (ACF) in section 27.5. The ACF results are useful for identifying
which lagged variables should be included in the autoregressive model.
The following steps describe how to construct the ACF shown in Figure 20.9.
1. Enter the data shown in columns A:C in Figure 29.1 on a new sheet.
2. Enter the labels Z, Lag, and ACF in cells D1, F1, and G1, and enter the digits 1
through 8 in cells F2:F9.
3. Select cells C1:D21. From the Insert menu choose Name | Create. In the Create
Names dialog box check the Top Row checkbox. Click OK.
4. In cell D2, enter the formula =(C2-AVERAGE(Sales))/STDEV(Sales). With
cell D2 selected, double-click the fill handle.
5. In cell G2, enter the formula
=SUMPRODUCT(OFFSET(Z,F2,0,20-F2),OFFSET(Z,0,0,20-F2))/19.
29.2 AR(4) Model 343

With cell G2 selected, double-click the fill handle.


6. Select cells G2:G9, click the Chart Wizard button, and create a Clustered
Column chart. See Chapter 27, section 27.5, for details on obtaining the
appearance shown in Figure 29.9.

Figure 29.9 Autocorrelation Coefficients Function (ACF)

Referring to Figure 29.9, the correlation is highest at lag 4, as expected. An


autoregressive model may be used to explain variation in sales with lag 4 for seasonality
and lag 1 for short-term trend (after taking seasonality into account). The following steps
describe how to construct the AR(4) model.
7. Enter the data shown in columns A:C in Figure 29.1 on a new sheet.
Alternatively, copy the data, choose Worksheet from the Insert menu, and paste.
8. Select columns C and D. Right-click and choose Insert from the shortcut menu.
Enter the labels Lag 1 and Lag 4 in cells C1 and D1.
9. Select cells E2:E20. Click the Copy button, or right-click and choose Copy from
the shortcut menu. Select cell C3. Click the Paste button, or right-click and
choose Paste from the shortcut menu.
10. Copy cells E2:E17 and paste into cell D6. The top portion of the worksheet
appears as shown in Figure 20.10.
344 Chapter 29 Time Series Seasonality

Figure 29.10 Arranging Lagged Data

11. Select rows 2:5. Right-click and choose Delete from the shortcut menu. The data
appear as shown in columns C:E in Figure 29.11.
12. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box,
select Regression from the Analysis Tools list box and click OK. In the
Regression dialog box, the Input Y Range is E1:E17 and the Input X Range is
C1:D17. Check the Labels box. Click the Output Range option button, select the
adjacent text box, and specify H1. Check the Residuals checkbox in the
Residuals section. Then click OK. A portion of the regression output is shown in
Figure 29.11.

Figure 29.11 Lagged Data and Regression Output

Rounded to four decimal places, the fitted equation is Sales = 87.5903 – 0.1198 * Lag1 +
0.9236 * Lag4. The t statistics and p-values indicate significant relationships, and R
29.2 AR(4) Model 345

square shows that approximately 97% of the variation in sales can be explained using the
lagged variables.
The standard error of this AR(4) model is 5.9 billions of dollars, very close to the
standard error of the model using indicator variables, 6.1 billions of dollars. The
following steps describe how to obtain forecasts for the next four quarters and a plot of
actual, fitted, and forecast values.
13. Copy cells A14:B17 and paste into cell A18. Enter 1988 in cell A18.
14. Enter the label Forecast in cell F1.
15. The Predicted Sales values from regression output appear below the Summary
Output. Copy cells I26:I41 into cell F2.
16. Select cell E18. Enter the formula =I$17+I$18*E17+I$19*E14. Click the fill
handle and drag down to cell E21. The results appear as shown in Figure 29.12.

Figure 29.12 Preparing Forecasts

17. Select cells E18:E21. Move the mouse pointer near the edge of the selected
region until the pointer becomes an arrow. Click and drag right to column F.
(Alternatively, cut E18:E21 and paste special values to F18.) The results appear
as shown in Figure 29.13.
346 Chapter 29 Time Series Seasonality

Figure 29.13 Sales Data and Forecasts for Chart

18. To prepare a line chart, select cells E1:F21 and click the Chart Wizard button. In
step 2 (Chart Source Data), select the range edit box for Category (X) Axis
Labels, and click and drag cells A2:B21.
19. Details for the Chart Wizard steps and formatting are described in steps 15
through 22 in section 29.1. The results appear as shown in Figure 29.14.
29.3 Classical Time Series Decomposition 347

Figure 29.14 Time Sequence Plot with AR(4) Forecast

29.3 CLASSICAL TIME SERIES DECOMPOSITION


A third method for analyzing seasonality is classical time series decomposition. The time
series values are decomposed into several components: long-term trend; business cycle
effects; seasonality; and unexplained, random variation. Because it is usually very
difficult to isolate the business cycle effects, the approach described here assumes the
trend component has both long-term average and cyclical effects. The multiplicative
model is
Valuet = Trendt * Seasonalt * Randomt.
The trend component is expressed in the same units as the original time series values, and
the seasonal and random components are expressed as index numbers (percentages) or
decimal equivalents.
A common method for estimating the trend component uses moving averages. Other
approaches are exponential smoothing, linear time trend using simple regression, and
nonlinear regression. The following steps describe centered moving averages.
1. Enter the data shown in columns A:C in Figure 29.7 on a new sheet.
Alternatively, copy the data, choose Worksheet from the Insert menu, and paste.
2. Enter the labels Early_MA, Late_MA, and Center_MA in cells D1:F1, as
shown in Figure 29.15.
348 Chapter 29 Time Series Seasonality

3. Select cell D4 and enter the formula =AVERAGE(C2:C5). This average of the
first four quarters is actually associated with a time point located between the
second and third quarters. Because it is located on the row of the third quarter, it
is labeled "Early_MA."
4. Select cell E4 and enter the formula =AVERAGE(C3:C6). This average of the
second through fifth quarters is actually associated with a time point located
between the third and fourth quarters. Since it is located on the row of the third
quarter, it is labeled "Late_MA."
5. Select cell F4 and enter the formula =AVERAGE(D4:E4). This average of the
Early_MA and Late_MA is centered on the third quarter.
6. Select cells D4:F4. Click the fill handle in the lower-right corner of the selection
and drag down to cell F19. Format the extended selection to display one decimal
place. The results appear as shown in Figure 29.15.

Figure 29.15 Worksheet for Centered Moving Average

7. To chart the moving average, select cells C1:C25. Hold down the Control key
and select cells F1:F25. Click the Chart Wizard button.
29.3 Classical Time Series Decomposition 349

8. In step 1 of the Chart Wizard (Chart Type) on the Standard Types tab, select
Line for chart type and "Line with markers displayed at each data value" for
chart sub-type; click Next. In step 2 (Chart Source Data) on the Series tab, select
the range edit box for Category (X) Axis Labels, and click and drag A2:B25;
click Next. In step 3 (Chart Options) on the Titles tab, type the chart and axis
labels shown in Figure 29.16; on the Gridlines tab, clear all checkboxes; click
Finish.
9. To format the chart, double-click the vertical axis, or right-click and choose
Format Axis from the shortcut menu. In the Format Axis dialog box, click the
Scale tab; click Minimum and type 200; click Maximum and type 500; click
Major Units and type 50; click OK.
10. Double-click the horizontal axis, or right-click and choose Format Axis from the
shortcut menu. In the Format Axis dialog box on the Alignment tab, select the
Degrees edit box, type 0 (zero), and click OK.
11. Click on a data point to select the centered moving average data series. Double-
click, or right-click and choose Format Data Series from the shortcut menu. In
the Format Data Series dialog box, click the Patterns tab. Click Automatic for
Line, click None for Marker, and click OK.
12. To display all labels on the horizontal axis, click and drag the sizing handles to
make the chart wider. Also, select a smaller font size for the axes, axis titles, and
legend. The results are shown in Figure 29.16.

Figure 29.16 Plot of Actual Sales and Centered Moving Average


350 Chapter 29 Time Series Seasonality

13. Enter the labels Ratio, AvgRatio, and Standard in cells G1:I1.
14. Select cell G4. Enter the formula =C4/F4. With cell G4 selected, click the fill
handle and drag down to cell G19. The results appear as shown in column G in
Figure 29.17. These numbers are the ratio of actual sales to the moving average.
For example, the number 1.0748 in cell G5 indicates that actual sales in that
particular fourth quarter were 107.48% of the average sales during the year.
15. Select cell H2 and enter the formula =AVERAGE(G6,G10,G14,G18). With
cell H2 selected, click the fill handle and drag down to cell H3.
16. Select cell H4 and enter the formula =AVERAGE(G4,G8,G12,G16). With cell
H4 selected, click the fill handle and drag down to cell H5. The results are
shown in column H in Figure 29.17. These formulas summarize the ratios for a
particular quarter for all years. For example, the value 1.0175 (approximately
1.02) in cell H3 indicates that sales in the second quarter are typically 2% above
the annual average. If the set of ratios in column G for a particular quarter has
outliers, these summaries in column H could use the MEDIAN or TRIMMEAN
functions.
17. Select cell H6 and click the AutoSum tool twice.
18. The base for an index is 1.00, so the four prospective indexes should sum to 4.
To modify the average ratios so that they sum to 4, select cell I2 and enter the
formula =H2*4/$H$6. With cell I2 selected, click the fill handle and drag down
to cell I5.
19. Select cell I6 and click the AutoSum tool twice. The seasonal indexes in column
I sum to 4 as shown in Figure 29.17.
One use for the seasonal indexes shown in cells I2:I5 in Figure 29.17 is to seasonally
adjust historical data. The multiplicative model is Valuet = Trendt * Seasonalt * Randomt,
so if an original value is divided by the seasonal index, the result has only trend and
random components remaining. Successive seasonally adjusted values can be compared
to detect changes in the long-run behavior of the time series.
A second use is to combine the seasonal index with a forecast of trend to obtain a forecast
of value. The trend forecast may be obtained by extrapolating the moving average or
using a regression model. The following steps describe how to seasonally adjust the
historical data, extrapolate the linear time trend of the adjusted values four quarters, and
multiply the extrapolated trend by the appropriate seasonal index to obtain the forecasts.
29.3 Classical Time Series Decomposition 351

Figure 29.17 Worksheet for Seasonal Indexes

20. Enter the labels Index, Trend, and Forecast in cells J1:L1.
21. Select cells I2:I5 and click the Copy button (or right-click and choose Copy
from the shortcut menu). Select cell J2, right-click, and choose Paste Special
from the shortcut menu. In the Paste Special dialog box, select Values for Paste
and None for Operation. Leave the Skip Blanks and Transpose checkboxes clear
and click OK.
22. Copy the values in cells J2:J5 and paste into cells J6, J10, J14, J18, and J22.
23. Select cell K2 and enter the formula =C2/J2. With cell K2 selected, click the fill
handle and drag down to cell K21. The values in cells K2:K21 are the seasonally
adjusted historical data.
24. With cells K2:K21 selected, right-click and choose Copy from the shortcut
menu. With cells K2:K21 still selected, right-click and choose Paste Special
from the shortcut menu. In the Paste Special dialog box, select Values for Paste
and None for Operation. Leave the Skip Blanks and Transpose checkboxes clear
and click OK.
25. With cells K2:K21 selected, click the fill handle in the lower-right corner of cell
21 and drag down to cell K25. The results are shown in column K in Figure
29.18. When Excel's AutoFill is used in this manner, the series of numbers in
K2:K21 is extended using a linear trend. The same results could be obtained
352 Chapter 29 Time Series Seasonality

using the values 1 through 20 as explanatory variables for fitting simple linear
regression and using the values 21 through 24 for predictions.

Figure 29.18 Worksheet for Forecasts

Figure 29.19 Extrapolation of Seasonally Adjusted Sales


29.3 Classical Time Series Decomposition 353

26. To chart the actual sales, seasonally adjusted sales, and the linear extrapolation,
select cells C1:C25, hold down the Control key, and select cells K1:K25. Click
the Chart Wizard, prepare a line chart, and format using steps 8 through 12 in
this section. The result is shown in Figure 29.19.
27. To combine the trend and seasonal components in the forecasts, select cell L22
and enter the formula =J22*K22. With cell L22 selected, double-click the fill
handle. The results appear as shown in Figure 29.18.
28. To chart the actual sales and forecasts, select cells C1:C25, hold down the
Control key, and select cells L1:L25. Click the Chart Wizard, prepare a line
chart, and format using steps 8 through 12 in this section. The result is shown in
Figure 29.20.

Figure 29.20 Actual Sales and Forecasts

The three methods analyze seasonality using different models, so there are some
differences in the results, as shown in Figure 29.21.

Figure 29.21 Forecast Results


354 Chapter 29 Time Series Seasonality

The additive model using linear time trend and seasonal indicator variables and the
multiplicative model using classical time series decomposition have very similar results.
For these particular data, the autoregressive model produces forecasts that are
consistently below the results of the other models; the autoregressive model using lag 1
and lag 4 would be more appropriate for seasonal data with a long-term meandering
pattern.

EXERCISES
Exercise 29.1 (adapted from Mendenhall, p. 647) The following table shows quarterly
earnings, in millions of dollars, for a multimedia communications firm for the years 1984
through 1989.
Year
Quarter 1984 1985 1986 1987 1988 1989
1 302.2 426.5 504.2 660.9 743.6 1043.6
2 407.3 451.5 592.4 706.0 774.5 1037.8
3 483.3 543.9 647.9 751.3 915.7 1167.6
4 463.2 590.5 726.4 758.6 1013.4 1345.3
1. Construct a time sequence plot of the quarterly earnings.
2. Develop a regression model using linear time trend and quarterly indicator variables.
Make forecasts for the next four quarters.
3. Develop a regression model using quadratic time trend and quarterly indicator
variables. Make forecasts for the next four quarters.
4. Develop an AR(4) model. Make forecasts for the next four quarters.
5. Use classical time series decomposition to obtain seasonal indexes.
Exercise 29.2 (adapted from Mendenhall, p. 646) Texas Chemical Products manufactures
an agricultural chemical that is applied to farmlands after crops have been harvested.
Because the chemical tends to deteriorate in storage, Texas Chemical cannot stockpile
quantities in advance of the winter season demand for the product. The following table
shows sales of the product, in thousands of pounds, over four consecutive years.
Exercises 355

Year
Month 1 2 3 4
January 123 134 144 145
February 130 146 159 146
March 157 174 168 164
April 155 163 153 158
May 161 176 179 182
June 169 154 164 169
July 142 166 160 166
August 157 168 170 174
September 169 166 160 166
October 185 223 208 215
November 209 238 221 213
December 238 252 244 258
1. Construct a time sequence plot of the monthly sales.
2. Develop a regression model using linear time trend and monthly indicator variables.
Make forecasts for the next 12 months.
3. Develop an AR(12) model. Make forecasts for the next 12 months.
4. Use classical time series decomposition to obtain seasonal indexes.
356 Chapter 29 Time Series Seasonality

This page is intentionally mostly blank.


Regression Models
for Time Series Data
30
30.1 TIME SERIES REGRESSION CHECKLIST
Relevant explanatory variables (X) for time series data related to business activity (Y),
e.g., sales over time, include several general types:
a Internal business activity, like advertising, promotion, research and development
b Competitor business activity, like competitor sales and competitor advertising
c Industry activity, like number of competitors and market size
d General economic activity, like personal disposable income

Plot Y versus time


1 Identify any systematic pattern to help determine an appropriate model

Plot Y versus each X


2 Verify that the relationship agrees with your prior judgment, e.g., positive vs
negative relationship, linear vs nonlinear, strong vs weak
3 Identify outliers or unusual observations and decide whether to exclude
4 Determine whether the relationship is linear; if not, consider using a nonlinear
form, e.g., quadratic (include X and X^2 in the model)

Examine the correlation matrix


5 Include a time period variable in the correlation matrix. For example, if there are
n equally-spaced time periods, include a variable in your data set with values
1,2,...,n.
6 Identify potential multicollinearity problems, i.e., high correlation between a
pair of X variables; if so, consider using only one X of the pair in the model
358 Chapter 30 Regression Models for Time Series Data

Calculate the regression model with diagnostics


7 Verify that the sign of each regression coefficient agrees with your prior
judgment, i.e., positive vs negative relationship; otherwise, consider excluding
that X and rerun the regression
8 Examine each plot of residuals vs X; if there is a non-random pattern (e.g., U-
shape or upside-down-U-shape), use a nonlinear form for that X in a new model
9 In addition to the residual plots generated automatically by Excel's Regression
tool, prepare and examine a plot of residuals vs time. If there is a snake-like
pattern of residuals, consider adding lag Y as an explanatory variable.
Optionally, compute the Durbin-Watson statistic to detect autocorrelation of
residuals.
10 Identify key X variables by comparing standardized regression coefficients,
usually computed by multiplying an X coefficient by the standard deviation of
that X and dividing by the standard deviation of Y. This dimensionless
standardized regression coefficient measures how much Y (in standard deviation
units) is affected by a change in X (in standard deviation units).
11 If a goal is to find a model with small standard error of estimate (approx.
standard deviation of residuals), use the t-stat screening method. Disregard the t-
stat for the intercept. If there are X variables with a t-stat between -1 and +1,
remove the single X variable whose t-stat is closest to zero, and rerun the
regression. Remove only one X variable at a time.
12 Before using the final model, examine each plot of residuals vs X to verify that
the random scatter is the same for all values of X. If there is more scatter for
higher values of X, consider using a log transformation of X in the model
(instead of using X itself). If the scatter is not uniform with respect to X, the
standard error of estimate may not be a useful measure of uncertainty because it
overstates the uncertainty for some values of X and understates the uncertainty
for other values of X.

Use the model


13 If the purpose is to identify unusual observations, examine the residuals directly
for large negative or large positive values, or examine the standardized residuals
(each residual divided by the standard deviation of residuals) for values more
extreme than +2 or -2 or for values more extreme than +3 or -3.
14 If the purpose is to make predictions, use the X values for a new observation to
compute a predicted Y. Use the standard error of estimate to provide an interval
estimate, e.g., an approximate 95% prediction interval that ranges from two
standard errors below to two standard errors above the predicted Y. Note that a
30.2 Autocorrelation of Residuals 359

time series forecast usually extrapolates beyond the original range of data, so the
standard error of estimate is a minimum indication of the uncertainty
surrounding a forecast.

30.2 AUTOCORRELATION OF RESIDUALS


Figure 30.1 Undesirable Extreme Negative Autocorrelation
Residual

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Time
360 Chapter 30 Regression Models for Time Series Data

Figure 30.2 Undesirable Extreme Positive Autocorrelation


Residual

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Time

Figure 30.3 Desirable Zero Autocorrelation


Residual

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Time
Part 5 Constrained Optimization

Part 5 describes decision models involving constrained optimization. The topic is


introduced using the classic product mix problem. Subsequent chapters examine
constrained optimization problems in the areas of marketing, transportation logistics, and
finance.
The spreadsheet analysis uses Excel's standard Solver add-in for linear, nonlinear, and
integer problems.
362

This page is intentionally mostly blank.


Product Mix Optimization
31
31.1 LINEAR PROGRAMMING CONCEPTS
Formulation
Decision variables (Excel Solver “Changing Cells”)
Objective function (“Target Cell”)
Constraints and right-hand-side values (“Constraints”)
Non-negativity constraints (“Constraints”)

Graphical Solution
Constraints
Feasible region
Corner points (extreme points)
Objective function value at each corner point
Total enumeration vs. simplex algorithm (search)
Optimal solution

Sensitivity Analysis
Post-optimality analysis and interpretation of computer print-outs
Shadow price (a marginal value)
(Excel Solver Sensitivity Report, Constraints section, “Shadow Price”)
The shadow price for a particular constraint is the amount of change in the value
of the objective function corresponding to a unit change in the right-hand-side
value of the constraint.
364 Chapter 31 Product Mix Optimization

Range on a right-hand-side (RHS) value


(Excel Solver Sensitivity Report, Constraints section, “Allowable
Increase/Decrease”)
Range over which the shadow price applies. The optimal values of the decision
variables would change depending on the exact RHS value, but the current mix
of decision variables remains optimal over the specified range of RHS values.
Range on an objective function coefficient
(Excel Solver Sensitivity Report, Changing Cells section, “Allowable
Increase/Decrease”)
Range over which an objective function coefficient could change with the
current optimal solution remaining optimal (same mix and values of decision
variables). The value of the objective function would change depending on the
exact value of the objective function coefficient.
Simplex algorithm terminology
Slack, surplus, and artificial variables
Basic variables (variables "in the solution," typically with non-zero values)
Non-basic variables (value equal to zero)
Complementary slackness
31.2 Basic Product Mix Problem 365

31.2 BASIC PRODUCT MIX PROBLEM


Figure 31.1 Display

A B C D E F G
1 Small Example 1: Product mix problem
2 Your company manufactures TVs and stereos, using a common parts inventory
3 of power supplies, speaker cones, etc. Parts are in limited supply and you must
4 determine the most profitable mix of products to build.
5
6 TV set Stereo RHS
7 Number to Build-> 250 100 Used Available Slack
8 Part Name Chassis 1 1 350 450 100
9 Picture Tube 1 0 250 250 0
10 Speaker Cone 2 2 700 800 100
11 Power Supply 1 1 350 450 100
12 Electronics 2 1 600 600 0
13 Profit
14 Per Unit $75 $50
15 By Product $18,750 $5,000
16 Total $23,750

Figure 31.2 Formulas


A B C D E F G
1 Small Example 1: Product mix problem
2 Your company manufactures TVs and stereos, using a common parts inventory
3 of power supplies, speaker cones, etc. Parts are in limited supply and you must
4 determine the most profitable mix of products to build.
5
6 TV set Stereo RHS
7 Number to Build-> 250 100 Used Available Slack
8 Part Name Chassis 1 1 =SUMPRODUCT($C$7:$D$7,C8:D8) 450 =F8-E8
9 Picture Tube 1 0 =SUMPRODUCT($C$7:$D$7,C9:D9) 250 =F9-E9
10 Speaker Cone 2 2 =SUMPRODUCT($C$7:$D$7,C10:D10) 800 =F10-E10
11 Power Supply 1 1 =SUMPRODUCT($C$7:$D$7,C11:D11) 450 =F11-E11
12 Electronics 2 1 =SUMPRODUCT($C$7:$D$7,C12:D12) 600 =F12-E12
13 Profit
14 Per Unit $75 $50
15 By Product =C14*C7 =D14*D7
16 Total =SUMPRODUCT(C7:D7,C14:D14)
366 Chapter 31 Product Mix Optimization

Figure 31.3 Graphical Solution

600

Five Constraints
500

Chassis & Power Supply


400
Number of TVs

Speaker Cone

300
Picture Tube

200

100 Feasible Region


Electronics

0
0 100 200 300 400 500 600 700
Number of Stereos
31.2 Basic Product Mix Problem 367

Figure 31.4 Solver Parameters Main Dialog Box

Figure 31.5 Solver Add Constraint Dialog Box


368 Chapter 31 Product Mix Optimization

Figure 31.6 Solver Options Dialog Box

Figure 31.7 Solver Solution


31.2 Basic Product Mix Problem 369

Figure 31.8 Solver Answer Report


Target Cell (Max)
Cell Name Original Value Final Value
$C$16 Total Profit $23,750 $25,000

Adjustable Cells
Cell Name Original Value Final Value
$C$7 Number to Build-> TV set 250 200
$D$7 Number to Build-> Stereo 100 200

Constraints
Cell Name Cell Value Formula Status Slack
$E$8 Chassis Used 400 $E$8<=$F$8 Not Binding 50
$E$9 Picture Tube Used 200 $E$9<=$F$9 Not Binding 50
$E$10 Speaker Cone Used 800 $E$10<=$F$10 Binding 0
$E$11 Power Supply Used 400 $E$11<=$F$11 Not Binding 50
$E$12 Electronics Used 600 $E$12<=$F$12 Binding 0

Figure 31.9 Solver Sensitivity Report


Adjustable Cells
Final Reduced Objective Allowable Allowable
Cell Name Value Cost Coefficient Increase Decrease
$C$7 Number to Build-> TV set 200 $0.00 $75.00 $25.00 $25.00
$D$7 Number to Build-> Stereo 200 $0.00 $50.00 $25.00 $12.50

Constraints
Final Shadow Constraint Allowable Allowable
Cell Name Value Price R.H. Side Increase Decrease
$E$8 Chassis Used 400 $0.00 450 1E+30 50
$E$9 Picture Tube Used 200 $0.00 250 1E+30 50
$E$10 Speaker Cone Used 800 $12.50 800 100 100
$E$11 Power Supply Used 400 $0.00 450 1E+30 50
$E$12 Electronics Used 600 $25.00 600 50 200
370 Chapter 31 Product Mix Optimization

Figure 31.10 Solver Limits Report


Target
Cell Name Value
$C$16 Total Profit $25,000

Adjustable Lower Target Upper Target


Cell Name Value Limit Result Limit Result
$C$7 Number to Build-> TV set 200 0 $10,000 200 $25,000
$D$7 Number to Build-> Stereo 200 0 $15,000 200 $25,000

31.3 OUTDOORS PROBLEM


Outdoors, Inc., has lawn furniture as one of its product lines. They currently have three
items in that line: a lawn chair, a standard bench, and a table. These products are
produced in a two-step manufacturing process involving the tube bending department and
the welding department. The hours required by each item in each department is as
follows:

Product
Department Chair Bench Table Present Capacity
Bending 1.2 1.7 1.2 1,000 hours
Welding 0.8 0.0 2.3 1,200 hours
The profit contribution that Outdoors receives from manufacture and sale of one unit of
each product is $3 for a chair, $3 for a bench, and $5 for a table.
The company is trying to plan its production mix for the current selling season. They feel
that they can sell any number they produce, but unfortunately production is further
limited by available material because of a prolonged strike. The company currently has
on hand 2,000 pounds of tubing. The three products require the following amounts of
this tubing: 2 pounds per chair, 3 pounds per bench, and 4.5 pounds per table.
In order to determine the optimal product mix, the production manager has formulated
the linear programming problem as shown below.
31.3 Outdoors Problem 371

Product
Chair Bench Table
Contribution $3 $3 $5

Constraint Relation Limit


Bending 1.2 1.7 1.2 <= 1,000
Welding 0.8 0.0 2.3 <= 1,200
Tubing 2.0 3.0 4.5 <= 2,000

A. The inventory manager suggests that the company produce 200 units of each
product. Is the plan to produce 200 units of each product a feasible plan, i.e.,
does it satisfy all contraints? If not, which constraints are not satisfied?
B. If the company produces 200 chairs, 200 benches, and 200 tables, how much
tubing, if any, will be left over?
Each of the following questions refer to the solution of the original linear programming
problem.
C. A local manufacturing firm has excess capacity in its welding department and
has offered to sell 100 hours of welding time to Outdoors for $3 per hour. This
arrangement would cost $300 and would increase welding capacity from 1,200
hours to 1,300 hours. Should Outdoors purchase the additional welding
capacity? Why or why not?
D. The marketing manager thinks that the original estimate of $3 profit contribution
per chair should be changed to $2.50 per chair. Should the production manager
solve the linear programming problem again using the $2.50 value, or should
Outdoors go ahead with the plan to produce 700 chairs, zero benches, and 133
tables? Why or why not?
E. A local metal products distributor has offered to sell Outdoors some additional
metal tubing for 60 cents per pound. Should Outdoors buy additional tubing at
this price? If so, how much would their contribution increase if they bought 500
pounds and used it in an optimal fashion?
F. The R&D department has been redesigning the bench to make it more
profitable. The new design will require 1.1 hours of tube bending time, 2 hours
of welding time, and 2.0 pounds of metal tubing. If they can sell one unit of this
bench with a unit contribution of $3, what effect will it have on overall
contribution?
G. Marketing has suggested a new patio awning that would require 1.8 hours of
tube bending time, 0.5 hours of welding time, and 1.3 pounds of metal tubing.
372 Chapter 31 Product Mix Optimization

What contribution must this new product have to make it attractive to produce
this season?
H. Outdoors, Inc., has a chance to sell some of its capacity in tube bending at a
price of $1.50 per hour. If it sells 200 hours at that price, how will this affect
contribution?
I. If Outdoors, Inc., feels that it must produce benches to round out its production
line, what effect will production of benches have on overall contribution?
Adapted from Vatter et al., Quantitative Methods in Management, Irwin, 1978.

Spreadsheet Model

Figure 31.11 Model


A B C D E F G H
1 Outdoors, Inc.
2 Chair Bench Table
3 Number to Build-> 100 100 100 Used Available Slack
4 Resource Tube Bending 1.2 1.7 1.2 410 1000 590
5 Welding 0.8 0 2.3 310 1200 890
6 Tubing 2 3 4.5 950 2000 1050
7 Profits Per Unit $3 $3 $5
8 By Product $300 $300 $500
9 Total $1,100

Figure 31.12 Formulas


A B C D E F G H
1 Outdoors, Inc.
2 Chair Bench Table
3 Number to Build-> 100 100 100 Used Available Slack
4 Resource Tube Bending 1.2 1.7 1.2 =SUMPRODUCT(C$3:E$3,C4:E4) 1000 =G4-F4
5 Welding 0.8 0 2.3 =SUMPRODUCT(C$3:E$3,C5:E5) 1200 =G5-F5
6 Tubing 2 3 4.5 =SUMPRODUCT(C$3:E$3,C6:E6) 2000 =G6-F6
7 Profits Per Unit 3 3 5
8 By Product =C7*C3 =D7*D3 =E7*E3
9 Total =SUMPRODUCT(C3:E3,C7:E7)
31.3 Outdoors Problem 373

Figure 31.13 Solution


A B C D E F G H
1 Outdoors, Inc.
2 Chair Bench Table
3 Number to Build-> 700 0 133.33 Used Available Slack
4 Resource Tube Bending 1.2 1.7 1.2 1000 1000 0
5 Welding 0.8 0.0 2.3 866.67 1200 333.33
6 Tubing 2.0 3.0 4.5 2000 2000 0
7 Profits Per Unit $3 $3 $5
8 By Product $2,100.00 $0.00 $666.67
9 Total $2,766.67

Solver Reports

Figure 31.14 Answer Report


Target Cell (Max)
Cell Name Original Value Final Value
$C$9 Total Chair $1,100 $2,767

Adjustable Cells
Cell Name Original Value Final Value
$C$3 Number to Build-> Chair 100 700
$D$3 Number to Build-> Bench 100 0
$E$3 Number to Build-> Table 100 133.33

Constraints
Cell Name Cell Value Formula Status Slack
$F$4 Tube Bending Used 1000 $F$4<=$G$4 Binding 0
$F$5 Welding Used 866.67 $F$5<=$G$5 Not Binding 333.33
$F$6 Tubing Used 2000 $F$6<=$G$6 Binding 0
374 Chapter 31 Product Mix Optimization

Figure 31.15 Sensitivity Report


Adjustable Cells
Final Reduced Objective Allowable Allowable
Cell Name Value Cost Coefficient Increase Decrease
$C$3 Number to Build-> Chair 700 $0.00 $3.00 $2.00 $0.778
$D$3 Number to Build-> Bench 0 -$1.383 $3.00 $1.383 1E+30
$E$3 Number to Build-> Table 133 $0.00 $5.00 $1.75 $2.00

Constraints
Final Shadow Constraint Allowable Allowable
Cell Name Value Price R.H. Side Increase Decrease
$F$4 Tube Bending Used 1000 $1.167 1000 200 466.67
$F$5 Welding Used 866.67 $0.00 1200 1E+30 333.33
$F$6 Tubing Used 2000 $0.80 2000 555.56 333.33
Modeling Marketing
Decisions
32
32.1 ALLOCATING ADVERTISING EXPENDITURES
Figure 32.1 Quick Tour
A B C D E F G H I
1 Quick Tour of Microsoft Excel Solver
2 Month Q1 Q2 Q3 Q4 Total
3 Seasonality 0.9 1.1 0.8 1.2
4
5 Units Sold 3,592 4,390 3,192 4,789 15,962
6 Sales Revenue $143,662 $175,587 $127,700 $191,549 $638,498
7 Cost of Sales 89,789 109,742 79,812 119,718 399,061
8 Gross Margin 53,873 65,845 47,887 71,831 239,437
9
10 Salesforce 8,000 8,000 9,000 9,000 34,000
11 Advertising 10,000 10,000 10,000 10,000 40,000
12 Corp Overhead 21,549 26,338 19,155 28,732 95,775
13 Total Costs 39,549 44,338 38,155 47,732 169,775
14
15 Prod. Profit $14,324 $21,507 $9,732 $24,099 $69,662
16 Profit Margin 10% 12% 8% 13% 11%
17
18 Product Price $40.00
19 Product Cost $25.00
20
21 The following exam ples show you how to work with the m odel above to solve for one value or several
22 values to m axim ize or m inim ize another value, enter and change constraints, and save a problem m odel.
23
376 Chapter 32 Modeling Marketing Decisions

23
24 Row Contains Explanation
25 3 Fixed values Seasonality factor: sales are higher in quarters 2 and 4,
26 and lower in quarters 1 and 3.
27
28 5 =35*B3*(B11+3000)^ 0.5 Forecast for units sold each quarter: row 3 contains
29 the seasonality factor; row 11 contains the cost of
30 advertising.
31
32 6 =B5*$B$18 Sales revenue: forecast for units sold (row 5) tim es
33 price (cell B18).
34
35 7 =B5*$B$19 Cost of sales: forecast for units sold (row 5) tim es
36 product cost (cell B19).
37
38 8 =B6-B7 Gross m argin: sales revenues (row 6) m inus cost of
39 sales (row 7).
40
41 10 Fixed values Sales personnel expenses.
42
43 11 Fixed values Advertising budget (about 6.3% of sales).
44
45 12 =0.15*B6 Corporate overhead expenses: sales revenues (row 6)
46 tim es 15%.
47
32.1 Allocating Advertising Expenditures 377

A B C D E F G H I
48 13 =SUM(B10:B12) Total costs: sales personnel expenses (row 10) plus
49 advertising (row 11) plus overhead (row 12).
50
51 15 =B8-B13 Product profit: gross m argin (row 8) m inus total costs
52 (row 13).
53
54 16 =B15/ B6 Profit m argin: profit (row 15) divided by sales revenue
55 (row 6).
56
57 18 Fixed values Product price.
58
59 19 Fixed values Product cost.
60
61 This is a typical m arketing m odel that shows sales rising from a base figure (perhaps due to the sales
62 personnel) along with increases in advertising, but with dim inishing returns. For exam ple, the first
63 $5,000 of advertising in Q1 yields about 1,092 increm ental units sold, but the next $5,000 yields only
64 about 775 units m ore.
65
66 You can use Solver to find out whether the advertising budget is too low, and whether advertising
67 should be allocated differently over tim e to take advantage of the changing seasonality factor.
68
69 Solving for a Value to Maximize Another Value
70 One way you can use Solver is to determ ine the m axim um value of a cell by changing another cell. The
71 two cells m ust be related through the form ulas on the worksheet. If they are not, changing the value in
72 one cell will not change the value in the other cell.
73
74 For exam ple, in the sam ple worksheet, you want to know how m uch you need to spend on advertising
75 to generate the m axim um profit for the first quarter. You are interested in m axim izing profit by changing
76 advertising expenditures.
77
78 „
On the Tools m enu, click Solver. In the Set target cell box, type b15 or
79 select cell B15 (first-quarter profits) on the worksheet. Select the Max option.
80 In the By changing cells box, type b11 or select cell B11 (first-quarter advertising)
81 on the worksheet. Click Solve.
82
83 You will see m essages in the status bar as the problem is set up and Solver starts working. After a
84 m om ent, you'll see a m essage that Solver has found a solution. Solver finds that Q1 advertising of
85 $17,093 yields the m axim um profit $15,093.
86
87 „
After you exam ine the results, select Restore original values and click OK to
88 discard the results and return cell B11 to its form er value.
89
90 Resetting the Solver Options
91
92 If you want to return the options in the Solver Parameters dialog box to their original settings so that
93 you can start a new problem , you can click Reset All.
94
378 Chapter 32 Modeling Marketing Decisions

A B C D E F G H I
95 Solving for a Value by Changing Several Values
96
97 You can also use Solver to solve for several values at once to m axim ize or m inim ize another value. For
98 exam ple, you can solve for the advertising budget for each quarter that will result in the best profits for
99 the entire year. Because the seasonality factor in row 3 enters into the calculation of unit sales in row 5
100 as a m ultiplier, it seem s logical that you should spend m ore of your advertising budget in Q4 when the
101 sales response is highest, and less in Q3 when the sales response is lowest. Use Solver to determ ine
102 the best quarterly allocation.
103
104 „
On the Tools m enu, click Solver. In the Set target cell box, type f15 or select
105 cell F15 (total profits for the year) on the worksheet. Make sure the Max option is
106 selected. In the By changing cells box, type b11:e11 or select cells B11:E11
107 (the advertising budget for each of the four quarters) on the worksheet. Click Solve.
108
109 „
After you exam ine the results, click Restore original values and click OK to
110 discard the results and return all cells to their form er values.
111
112 You've just asked Solver to solve a m oderately com plex nonlinear optim ization problem ; that is, to find
113 values for the four unknowns in cells B11 through E11 that will m axim ize profits. (This is a nonlinear
114 problem because of the exponentiation that occurs in the form ulas in row 5). The results of this
115 unconstrained optim ization show that you can increase profits for the year to $79,706 if you spend
116 $89,706 in advertising for the full year.
117
118 However, m ost realistic m odeling problem s have lim iting factors that you will want to apply to certain
119 values. These constraints m ay be applied to the target cell, the changing cells, or any other value that
120 is related to the form ulas in these cells.
121
122 Adding a Constraint
123
124 So far, the budget recovers the advertising cost and generates additional profit, but you're reaching a
125 point of dim inishing returns. Because you can never be sure that your m odel of sales response to
126 advertising will be valid next year (especially at greatly increased spending levels), it doesn't seem
127 prudent to allow unrestricted spending on advertising.
128
129 Suppose you want to m aintain your original advertising budget of $40,000. Add the constraint to the
130 problem that lim its the sum of advertising during the four quarters to $40,000.
131
132 „
On the Tools m enu, click Solver, and then click Add. The Add Constraint
133 dialog box appears. In the Cell reference box, type f11 or select cell F11
134 (advertising total) on the worksheet. Cell F11 m ust be less than or equal to $40,000.
135 The relationship in the Constraint box is <= (less than or equal to) by default, so
136 you don't have to change it. In the box next to the relationship, type 40000. Click
137 OK, and then click Solve.
138
139 „
After you exam ine the results, click Restore original values and then click OK
140 to discard the results and return the cells to their form er values.
141
32.1 Allocating Advertising Expenditures 379

A B C D E F G H I
142 The solution found by Solver allocates am ounts ranging from $5,117 in Q3 to $15,263 in Q4. Total
143 Profit has increased from $69,662 in the original budget to $71,447, without any increase in the
144 advertising budget.
145
146 Changing a Constraint
147
148 When you use Microsoft Excel Solver, you can experim ent with slightly different param eters to decide
149 the best solution to a problem . For exam ple, you can change a constraint to see whether the results
150 are better or worse than before. In the sam ple worksheet, try changing the constraint on advertising
151 dollars to $50,000 to see what that does to total profits.
152
153 „
On the Tools m enu, click Solver. The constraint, $ F$ 11<=40000, should
154 already be selected in the Subject to the constraints box. Click Change. In
155 the Constraint box, change 40000 to 50000. Click OK, and then click Solve.
156 Click Keep solver solution and then click OK to keep the results that are
157 displayed on the worksheet.
158
159 Solver finds an optim al solution that yields a total profit of $74,817. That's an im provem ent of $3,370
160 over the last figure of $71,447. In m ost firm s, it's not too difficult to justify an increm ental investm ent of
161 $10,000 that yields an additional $3,370 in profit, or a 33.7% return on investm ent. This solution also
162 results in profits of $4,889 less than the unconstrained result, but you spend $39,706 less to get there.
163
164 Saving a Problem Model
165
166 When you click Save on the File m enu, the last selections you m ade in the Solver Parameters
167 dialog box are attached to the worksheet and retained when you save the workbook. However, you
168 can define m ore than one problem for a worksheet by saving them individually using Save Model in
169 the Solver Options dialog box. Each problem m odel consists of cells and constraints that you
170 entered in the Solver Parameters dialog box.
171
172 When you click Save Model, the Save Model dialog box appears with a default selection, based
173 on the active cell, as the area for saving the m odel. The suggested range includes a cell for each
174 constraint plus three additional cells. Make sure that this cell range is an em pty range on the
175 worksheet.
176
177 „
On the Tools m enu, click Solver, and then click Options. Click Save Model.
178 In the Select model area box, type h15:h18 or select cells H15:H18 on the
179 worksheet. Click OK.
180
181 Note You can also enter a reference to a single cell in the Select model area box. Solver will use
182 this reference as the upper-left corner of the range into which it will copy the problem specifications.
183
184
185 To load these problem specifications later, click Load Model on the Solver Options dialog box,
186 type h15:h18 in the Model area box or select cells H15:H18 on the sam ple worksheet, and then
187 click OK. Solver displays a m essage asking if you want to reset the current Solver option settings with
188 the settings for the m odel you are loading. Click OK to proceed.
380 Chapter 32 Modeling Marketing Decisions

Figure 32.2 Quick Tour Influence Chart


SolvSamp.xls
Quick Tour

(Row number for each variable)

Profit Prod.
Margin 16 Profit 15

Gross
Margin 8

Cost of Sales Corporate Total


Sales 7 Revenue 6 Overhead 12 Costs 13

Units
Sold 5

Product Seasonality Advertising Product Overhead Salesforce


Cost 19 3 11 Price 18 Rate (15%) 10
Nonlinear Product
Mix Optimization
33
33.1 DIMINISHING PROFIT MARGIN
Figure 33.1 Product Mix Problem
A B C D E F G H I
1 Example 1: Product mix problem with diminishing profit margin.
2 Your com pany m anufactures TVs, stereos and speakers, using a com m on parts inventory
3 of power supplies, speaker cones, etc. Parts are in lim ited supply and you m ust determ ine
4 the m ost profitable m ix of products to build. But your profit per unit built decreases with
5 volum e because extra price incentives are needed to load the distribution channel.
6
7
8 TV set Stereo Speaker
9 Number to Build-> 100 100 100
10 Part Name Inventory No. Used
11 Chassis 450 200 1 1 0
12 Picture Tube 250 100 1 0 0 Diminishing
13 Speaker Cone 800 500 2 2 1 Returns
14 Power Supply 450 200 1 1 0 Exponent:
15 Electronics 600 400 2 1 1 0.9
16 Profits:
17 By Product $4,732 $3,155 $2,208
18 Total $ 10,095
19
20 This m odel provides data for several products using com m on parts, each with a different profit m argin
21 per unit. Parts are lim ited, so your problem is to determ ine the num ber of each product to build from the
22 inventory on hand in order to m axim ize profits.
23
382 Chapter 33 Nonlinear Product Mix Optimization

23
24 Problem Specifications
25
26 Target Cell D18 Goal is to m axim ize profit.
27
28 Changing cells D9:F9 Units of each product to build.
29
30 Constraints C11:C15<=B11:B15 Num ber of parts used m ust be less than or
31 equal to the num ber of parts in inventory.
32
33 D9:F9>=0 Num ber to build value m ust be greater than or
34 equal to 0.
35
36 The form ulas for profit per product in cells D17:F17 include the factor ^ H15 to show that profit per unit
37 dim inishes with volum e. H15 contains 0.9, which m akes the problem nonlinear. If you change H15 to
38 1.0 to indicate that profit per unit rem ains constant with volum e, and then click Solve again, the
39 optim al solution will change. This change also m akes the problem linear.
Integer-Valued
Optimization Models
34
34.1 TRANSPORTATION PROBLEM
Figure 34.1 Transportation Problem
A B C D E F G H I
1 Example 2: Transportation Problem.
2 Minim ize the costs of shipping goods from production plants to warehouses near m etropolitan dem and
3 centers, while not exceeding the supply available from each plant and m eeting the dem and from each
4 m etropolitan area.
5
6 Number to ship from plant x to warehouse y (at intersection):
7 Plants: Total San Fran Denver Chicago Dallas New York
8 S. Carolina 5 1 1 1 1 1
9 Tennessee 5 1 1 1 1 1
10 Arizona 5 1 1 1 1 1
11 --- --- --- --- ---
12 Totals: 3 3 3 3 3
13
14 Demands by Whse --> 180 80 200 160 220
15 Plants: Supply Shipping costs from plant x to warehouse y (at intersection):
16 S. Carolina 310 10 8 6 5 4
17 Tennessee 260 6 5 4 3 6
18 Arizona 280 3 4 5 5 9
19
20 Shipping: $ 83 $19 $17 $15 $13 $19
21
22 The problem presented in this m odel involves the shipm ent of goods from three plants to five regional
23 warehouses. Goods can be shipped from any plant to any warehouse, but it obviously costs m ore to
24 ship goods over long distances than over short distances. The problem is to determ ine the am ounts
25 to ship from each plant to each warehouse at m inim um shipping cost in order to m eet the regional
26 dem and, while not exceeding the plant supplies.
27
384 Chapter 34 Integer-Valued Optimization Models

27
28 Problem Specifications
29
30 Target cell B20 Goal is to m inim ize total shipping cost.
31
32 Changing cells C8:G10 Am ount to ship from each plant to each
33 warehouse.
34
35 Constraints B8:B10<=B16:B18 Total shipped m ust be less than or equal to
36 supply at plant.
37
38 C12:G12>=C14:G14 Totals shipped to warehouses m ust be greater
39 than or equal to dem and at warehouses.
40
41 C8:G10>=0 Num ber to ship m ust be greater than or equal
42 to 0.
43
44 You can solve this problem faster by selecting the Assume linear model check box in the Solver
45 Options dialog box before clicking Solve. A problem of this type has an optim um solution at which
46 am ounts to ship are integers, if all of the supply and dem and constraints are integers.

34.2 MODIFIED TRANSPORTATION PROBLEM


Figure 34.2 Display
A B C D E F G H I
1 Modified Example 2: Transportation Problem.
2
3 Minimize the costs of shipping goods from production plants to warehouses near metropolitan demand
4 centers, while not exceeding the supply available from each plant and meeting the demand from each
5 metropolitan area.
6
7 Number to ship from plant to warehouse
8 Warehouse Shipped Plant
9 Plant San Fran Denver Chicago Dallas New York from plant supply
10 S. Carolina 1 1 1 1 1 5 310
11 Tennessee 1 1 1 1 1 5 260
12 Arizona 1 1 1 1 1 5 280
13 Shipped to warehouse 3 3 3 3 3
14 Warehouse demand 180 80 200 160 220
15
16 Shipping cost from plant to warehouse
17 Warehouse
18 Plant San Fran Denver Chicago Dallas New York
19 S. Carolina $10 $8 $6 $5 $4
20 Tennessee $6 $5 $4 $3 $6
21 Arizona $3 $4 $5 $5 $9
22
23 Total shipping cost $83
34.2 Modified Transportation Problem 385

Figure 34.3 Formulas


A B C D E F G H I
1 Modified Example 2: Transportation Problem.
2
3 Minimize the costs of shipping goods from production plants to warehouses near metropolitan demand
4 centers, while not exceeding the supply available from each plant and meeting the demand from each
5 metropolitan area.
6
7 Number to ship from plant to warehouse
8 Warehouse Shipped Plant
9 Plant San Fran Denver Chicago Dallas New York from plant supply
10 S. Carolina 1 1 1 1 1 =SUM(C10:G10) 310
11 Tennessee 1 1 1 1 1 =SUM(C11:G11) 260
12 Arizona 1 1 1 1 1 =SUM(C12:G12) 280
13 Shipped to warehouse =SUM(C10:C12) =SUM(D10:D12) =SUM(E10:E12) =SUM(F10:F12) =SUM(G10:G12)
14 Warehouse demand 180 80 200 160 220
15
16 Shipping cost from plant to warehouse
17 Warehouse
18 Plant San Fran Denver Chicago Dallas New York
19 S. Carolina $10 $8 $6 $5 $4
20 Tennessee $6 $5 $4 $3 $6
21 Arizona $3 $4 $5 $5 $9
22
23 Total shipping cost =SUMPRODUCT(C10:G12,C19:G21)
386 Chapter 34 Integer-Valued Optimization Models

34.3 SCHEDULING PROBLEM


Figure 34.4 Personnel Scheduling
A B C D E F G H I J K L M
1 Example 3: Personnel scheduling for an Amusement Park.
2 For em ployees working five consecutive days with two days off, find the schedule that m eets dem and
3 from attendance levels while m inim izing payroll costs.
4
5
6 Sch. Days off Employees Sun M on Tue Wed Thu Fri Sat
7 A Sunday, Monday 0 0 0 1 1 1 1 1
8 B Monday, Tuesday 8 1 0 0 1 1 1 1
9 C Tuesday, Wed. 0 1 1 0 0 1 1 1
10 D Wed., Thursday 10 1 1 1 0 0 1 1
11 E Thursday, Friday 0 1 1 1 1 0 0 1
12 F Friday, Saturday 7 1 1 1 1 1 0 1
13 G Saturday, Sunday 0 0 1 1 1 1 1 0
14
15 Schedule Totals: 25 25 17 17 15 15 18 25
16
17 Total Demand: 22 17 13 14 15 18 24
18
19 Pay/Employee/Day: $40
20 Payroll/Week: $ 1,000
21
22 The goal for this m odel is to schedule em ployees so that you have sufficient staff at the lowest cost. In
23 this exam ple, all em ployees are paid at the sam e rate, so by m inim izing the num ber of em ployees working
24 each day, you also m inim ize costs. Each em ployee works five consecutive days, followed by two days
25 off.
26
27 Problem Specifications
28
29 Target cell D20 Goal is to m inim ize payroll cost.
30
31 Changing cells D7:D13 Em ployees on each schedule.
32
33 Constraints D7:D13>=0 Num ber of em ployees m ust be greater than or equal
34 to 0.
35
36 D7:D13=Integer Num ber of em ployees m ust be an integer.
37
38 F15:L15>=F17:L17 Em ployees working each day m ust be greater than or
39 equal to the dem and.
40
41 Possible schedules Rows 7-13 1 m eans em ployee on that schedule works that day.
42
43 In this exam ple, you use an integer constraint so that your solutions do not result in fractional num bers of
44 em ployees on each schedule. Selecting the Assume linear model check box in the Solver Options
45 dialog box before you click Solve will greatly speed up the solution process.
34.3 Scheduling Problem 387

Figure 34.5 Personnel Scheduling with Corrections


A B C D E F G H I J K L M
1 Example 3: Personnel scheduling for an Amusement Park. (with corrections)
2 For em ployees working five consecutive days with two days off, find the schedule that m eets dem and
3 from attendance levels while m inim izing payroll costs.
4
5
6 Sch. Days off Employees Sun M on Tue Wed Thu Fri Sat
7 A Sunday, Monday 0 0 0 1 1 1 1 1
8 B Monday, Tuesday 8 1 0 0 1 1 1 1
9 C Tuesday, Wed. 0 1 1 0 0 1 1 1
10 D Wed., Thursday 10 1 1 1 0 0 1 1
11 E Thursday, Friday 0 1 1 1 1 0 0 1
12 F Friday, Saturday 7 1 1 1 1 1 0 0
13 G Saturday, Sunday 0 0 1 1 1 1 1 0
14
15 Schedule Totals: 25 25 17 17 15 15 18 18
16
17 Total Demand: 22 17 13 14 15 18 24
18
19 Pay/Employee/Day: $40
20 Payroll/Week: $ 5,000
21
22 The goal for this m odel is to schedule em ployees so that you have sufficient staff at the lowest cost. In
23 this exam ple, all em ployees are paid at the sam e rate, so by m inim izing the num ber of em ployees working
24 each day, you also m inim ize costs. Each em ployee works five consecutive days, followed by two days
25 off.
26
27 Problem Specifications
28
29 Target cell D20 Goal is to m inim ize payroll cost.
30
31 Changing cells D7:D13 Em ployees on each schedule.
32
33 Constraints D7:D13>=0 Num ber of em ployees m ust be greater than or equal
34 to 0.
35
36 D7:D13=Integer Num ber of em ployees m ust be an integer.
37
38 F15:L15>=F17:L17 Em ployees working each day m ust be greater than or
39 equal to the dem and.
40
41 Possible schedules Rows 7-13 1 m eans em ployee on that schedule works that day.
42
43 In this exam ple, you use an integer constraint so that your solutions do not result in fractional num bers of
44 em ployees on each schedule. Selecting the Assume linear model check box in the Solver Options
45 dialog box before you click Solve will greatly speed up the solution process.
388 Chapter 34 Integer-Valued Optimization Models

This page is intentionally mostly blank.


Optimization Models for
Finance Decisions
35
35.1 WORKING CAPITAL MANAGEMENT PROBLEM
Figure 35.1 Working Capital Management
A B C D E F G H I J
1 Example 4: Working Capital Management.
2 Determ ine how to invest excess cash in 1-m onth, 3-m onth and 6-m onth CDs so as to
3 m axim ize interest incom e while m eeting com pany cash requirem ents (plus safety m argin).
4
5 Yield Term Purchase CDs in months:
6 1-mo CDs: 1.0% 1 1, 2, 3, 4, 5 and 6 Interest
7 3-mo CDs: 4.0% 3 1 and 4 Earned:
8 6-mo CDs: 9.0% 6 1 Total $ 7,700
9
10 M onth: M onth 1 M onth 2 M onth 3 M onth 4 M onth 5 M onth 6 End
11 Init Cash: $400,000 $205,000 $216,000 $237,000 $158,400 $109,400 $125,400
12 M atur CDs: 100,000 100,000 110,000 100,000 100,000 120,000
13 Interest: 1,000 1,000 1,400 1,000 1,000 2,300
14 1-mo CDs: 100,000 100,000 100,000 100,000 100,000 100,000
15 3-mo CDs: 10,000 10,000
16 6-mo CDs: 10,000
17 Cash Uses: 75,000 (10,000) (20,000) 80,000 50,000 (15,000) 60,000
18 End Cash: $205,000 $216,000 $237,000 $158,400 $109,400 $125,400 $187,700
19
20 -290000
21
22 If you're a financial officer or a m anager, one of your tasks is to m anage cash and short-term investm ents in a
23 way that m axim izes interest incom e, while keeping funds available to m eet expenditures. You m ust trade off
24 the higher interest rates available from longer-term investm ents against the flexibility provided by keeping funds
25 in short-term investm ents.
26
27 This m odel calculates ending cash based on initial cash (from the previous m onth), inflows from m aturing
28 certificates of deposit (CDs), outflows for new CDs, and cash needed for com pany operations for each m onth.
29
30 You have a total of nine decisions to m ake: the am ounts to invest in one-m onth CDs in m onths 1 through 6;
31 the am ounts to invest in three-m onth CDs in m onths 1 and 4; and the am ount to invest in six-m onth CDs in
32 m onth 1.
33
390 Chapter 35 Optimization Models for Finance Decisions

A B C D E F G H I J
34 Problem Specifications
35
36 Target cell H8 Goal is to m axim ize interest earned.
37
38 Changing cells B14:G14 Dollars invested in each type of CD.
39 B15, E15, B16
40
41 Constraints B14:G14>=0 Investm ent in each type of CD m ust be greater than
42 B15:B16>=0 or equal to 0.
43 E15>=0
44
45 B18:H18>=100000 Ending cash m ust be greater than or equal to
46 $100,000.
47
48 The optim al solution determ ined by Solver earns a total interest incom e of $16,531 by investing as m uch as
49 possible in six-m onth and three-m onth CDs, and then turns to one-m onth CDs. This solution satisfies all of the
50 constraints.
51
52 Suppose, however, that you want to guarantee that you have enough cash in m onth 5 for an equipm ent
53 paym ent. Add a constraint that the average m aturity of the investm ents held in m onth 1 should not be m ore
54 than four m onths.
55
56 The form ula in cell B20 com putes a total of the am ounts invested in m onth 1 (B14, B15, and B16), weighted
57 by the m aturities (1, 3, and 6 m onths), and then it subtracts from this am ount the total investm ent, weighted by
58 4. If this quantity is zero or less, the average m aturity will not exceed four m onths. To add this constraint,
59 restore the original values and then click Solver on the Tools m enu. Click Add. Type b20 in the Cell
60 Reference box, type 0 in the Constraint box, and then click OK. To solve the problem , click Solve.
61
62 To satisfy the four-m onth m aturity constraint, Solver shifts funds from six-m onth CDs to three-m onth CDs. The
63 shifted funds now m ature in m onth 4 and, according to the present plan, are reinvested in new three-m onth
64 CDs. If you need the funds, however, you can keep the cash instead of reinvesting. The $56,896 turning
65 over in m onth 4 is m ore than sufficient for the equipm ent paym ent in m onth 5. You've traded about $460 in
66 interest incom e to gain this flexibility.
35.2 Work Cap Alternate Formulations 391

35.2 WORK CAP ALTERNATE FORMULATIONS


Figure 35.2 Working Capital Management Horizontal Time
392 Chapter 35 Optimization Models for Finance Decisions

Figure 35.3 Working Capital Management Vertical Time


35.3 Stock Portfolio Problem 393

35.3 STOCK PORTFOLIO PROBLEM


Figure 35.4 Efficient Stock Portfolio
A B C D E F G H I J K
1 Example 5: Efficient stock portfolio.
2 Find the weightings of stocks in an efficient portfolio that m axim izes the portfolio rate of
3 return for a given level of risk. This worksheet uses the Sharpe single-index m odel; you
4 can also use the Markowitz m ethod if you have covariance term s available.
5
6 Risk-free rate 6.0% Market variance 3.0%
7 Market rate 15.0% Maximum weight 100.0%
8
9 Beta ResVar Weight *Beta *Var.
10 Stock A 0.80 0.04 20.0% 0.160 0.002
11 Stock B 1.00 0.20 20.0% 0.200 0.008
12 Stock C 1.80 0.12 20.0% 0.360 0.005
13 Stock D 2.20 0.40 20.0% 0.440 0.016
14 T-bills 0.00 0.00 20.0% 0.000 0.000
15
16 Total 100.0% 1.160 0.030
17 Return Variance
18 Portfolio Totals: 16.4% 7.1%
19
20 Maximize Return: A21:A29 Minimize Risk: D21:D29
21 0.1644 0.07077
22 5 5
23 TRUE TRUE
24 TRUE TRUE
25 TRUE TRUE
26 TRUE TRUE
27 TRUE TRUE
28 TRUE TRUE
29 TRUE TRUE
30
31 One of the basic principles of investm ent m anagem ent is diversification. By holding a portfolio of several
32 stocks, for exam ple, you can earn a rate of return that represents the average of the returns from the
33 individual stocks, while reducing your risk that any one stock will perform poorly.
34
35 Using this m odel, you can use Solver to find the allocation of funds to stocks that m inim izes the portfolio
36 risk for a given rate of return, or that m axim izes the rate of return for a given level of risk.
37
38 This worksheet contains figures for beta (m arket-related risk) and residual variance for four stocks. In
39 addition, your portfolio includes investm ents in Treasury bills (T-bills), assum ed to have a risk-free rate of
40 return and a variance of zero. Initially equal am ounts (20 percent of the portfolio) are invested in each
41 security.
42
43 Use Solver to try different allocations of funds to stocks and T-bills to either m axim ize the portfolio rate of
44 return for a specified level of risk or m inim ize the risk for a given rate of return. With the initial allocation
45 of 20 percent across the board, the portfolio return is 16.4 percent and the variance is 7.1 percent.
46
394 Chapter 35 Optimization Models for Finance Decisions

A B C D E F G H I J K
47 Problem Specifications
48
49 Target cell E18 Goal is to m axim ize portfolio return.
50
51 Changing cells E10:E14 Weight of each stock.
52
53 Constraints E10:E14>=0 Weights m ust be greater than or equal to 0.
54
55 E16=1 Weights m ust equal 1.
56
57 G18<=0.071 Variance m ust be less than or equal to 0.071.
58
59 Beta for each stock B10:B13
60
61 Variance for each stock C10:C13
62
63 Cells D21:D29 contain the problem specifications to m inim ize risk for a required rate of return of 16.4
64 percent. To load these problem specifications into Solver, click Solver on the Tools m enu, click
65 Options, click Load Model, select cells D21:D29 on the worksheet, and then click OK until the
66 Solver Parameters dialog box is displayed. Click Solve. As you can see, Solver finds portfolio
67 allocations in both cases that surpass the rule of 20 percent across the board.
68
69 You can earn a higher rate of return (17.1 percent) for the sam e risk, or you can reduce your risk without
70 giving up any return. These two allocations both represent efficient portfolios.
71
72 Cells A21:A29 contain the original problem m odel. To reload this problem , click Solver on the Tools
73 m enu, click Options, click Load Model, select cells A21:A29 on the worksheet, and then click OK.
74
75 Solver displays a m essage asking if you want to reset the current Solver option settings with the settings
76 for the m odel you are loading. Click OK to proceed.
35.4 MoneyCo Problem 395

35.4 MONEYCO PROBLEM


Figure 35.5 Display
A B C D E F G H I J K L M
1 Return on investments
2 CD rate = 0.06
3 A B C D E CD1 CD2 CD3
4 Time 1 -1.00 -1.00 -1.00 -1.00
5 Time 2 1.15 -1.00 1.06 -1.00
6 Time 3 1.28 -1.00 1.06 -1.00
7 Time 4 1.40 1.15 1.32 1.06
8
9 Max to invest $500 $500 $500 $500 $500 $1,000,000 $1,000,000 $1,000,000
10
11 Amount Invested $100 $100 $100 $100 $100 $100 $100 $100 Feasible
12
13 Cash flows from investments Cash in Cash out
14 Time 1 -$100 -$100 -$100 $0 $0 -$100 $0 $0 $1,000 $600
15 Time 2 $0 $115 $0 $0 -$100 $106 -$100 $0 $0 $21
16 Time 3 $0 $0 $128 -$100 $0 $0 $106 -$100 $0 $34
17 Time 4 $140 $0 $0 $115 $132 $0 $0 $106 $493 Final balance
18
19
20 Legend
21
22 data cells input assumptions, uncontrollable, constraints Defined Names
23 Amount_Invested = $B$11:$I$11
24 changing cells decision variables, controllable Cash_out = $K$14:$K$17
25 Final_balance = $K$17
26 computed cells intermediate and output variables, target Max_to_invest = $B$9:$I$9
27

Figure 35.6 Formulas


A B C D E F G H I J K L
1 Return on investments
2 CD rate = 0.06
3 A B C D E CD1 CD2 CD3
4 Time 1 -1.00 -1.00 -1.00 -1.00
5 Time 2 1.15 -1.00 =1+$B$2 -1.00
6 Time 3 1.28 -1.00 =1+$B$2 -1.00
7 Time 4 1.40 1.15 1.32 =1+$B$2
8
9 Max to invest $500 $500 $500 $500 $500 $1,000,000 $1,000,000 $1,000,000
10
11 Amount Invested $100 $100 $100 $100 $100 $100 $100 $100 =IF(AND(Amount_Invested<
12
13 Cash flows from investments Cash in Cash out
14 Time 1 =B4*B$11 =C4*C$11 =D4*D$11 =E4*E$11 =F4*F$11 =G4*G$11 =H4*H$11 =I4*I$11 $1,000 =SUM(B14:J14)
15 Time 2 =B5*B$11 =C5*C$11 =D5*D$11 =E5*E$11 =F5*F$11 =G5*G$11 =H5*H$11 =I5*I$11 $0 =SUM(B15:J15)
16 Time 3 =B6*B$11 =C6*C$11 =D6*D$11 =E6*E$11 =F6*F$11 =G6*G$11 =H6*H$11 =I6*I$11 $0 =SUM(B16:J16)
17 Time 4 =B7*B$11 =C7*C$11 =D7*D$11 =E7*E$11 =F7*F$11 =G7*G$11 =H7*H$11 =I7*I$11 =SUM(B17:J17) Final balanc
18
19 Array-entered (Control+Shift+Enter) formula in K11: =IF(AND(Amount_Invested<=Max_to_invest,Cash_out>=0),"Feasible","Not Feasible")
20 Enter =B4*B$11 in cell B14 and copy to cells B14:I17
21 Enter =SUM(B14:J14) in cell K14 and copy to K14:K17
396 Chapter 35 Optimization Models for Finance Decisions

Figure 35.7 Solver Dialog Box


Appendix
Excel for the Macintosh

The step-by-step instructions and screen shots in this book are based on Excel 2002
(Office XP). This appendix describes some differences between Excel 2002 on Windows
and Excel on the Macintosh.
If you are using Excel on an Apple Macintosh computer, first learn the Macintosh
graphical user interface, the basic features of the operating system, and the online help.
For example, to get answers to your questions about using Mac OS X, choose Mac Help
from the Help menu, type your question, and press the Return key.

The Shortcut Menu


One frequently occurring exception is accessing the shortcut menu: Windows users will
press the right mouse button; Macintosh users may either hold down the Control (Ctrl)
key and click the mouse button or hold down the Option and Command keys and click
the mouse button. This book emphasizes double-clicking, right-clicking, and shortcut
menus.

Relative and Absolute References


When entering formulas with the insertion point in a reference, Windows users will press
the F4 key to cycle through the four combinations of relative and absolute references; on
a Macintosh without function keys, users will substitute Command-t for F4.
398

This page is intentionally mostly blank.


References

Canavos, George C., and Don M. Miller. An Introduction to Modern Business Statistics.
Belmont, Calif.: Duxbury, 1993.
Clemen, Robert T. Making Hard Decisions: An Introduction to Decision Analysis. 2nd
ed. Belmont, Calif.: Duxbury, 1996.
Cryer, Jonathan D., and Robert B. Miller. Statistics for Business: Data Analysis and
Modeling. 2nd ed. Belmont, Calif.: Duxbury, 1994.
Keller, Gerald, Brian Warrack, and Henry Bartel. Statistics for Management and
Economics. 3rd ed. Belmont, Calif.: Duxbury, 1994.
Mendenhall, William, James E. Reinmuth, and Robert J. Beaver. Statistics for
Management and Economics. 7th ed. Belmont, Calif.: Duxbury, 1993.
Menzefricke, Ulrich. Statistics for Managers. Belmont, Calif.: Duxbury, 1995.
Survey of Current Business. Washington, D.C.: U.S. Government Printing Office, 1983-
1987.
400

This page is intentionally mostly blank.

You might also like