You are on page 1of 26

1

CHAPTER 4: REGRESSION
MODELS

2
CHAPTER 4: REGRESSION
MODELS

Discussion Questions and Problems

4.1 What is the meaning of least squares in a regression model?


A least squares regression is used in every analysis for a regression model. It is the
regression line with the minimum sum of squared errors. When given data points,
they are first plotted onto a graph. In essence, creating a scatter plot. Many lines can
be drawn through these points to attain some kind of relationship between the
variables. The best regression line is the one with the least error, or least squares
regression. Why must it be the least squares? Error = Actual value Predicted value.
Since error can be both positive and negative, errors could be canceled out when
summed together. To avoid this, the errors are squared.

4.2 Discuss the use of dummy variables in regression analysis.


In regression analysis it is often assumed all the variables are quantitative. Examples
include how many years of experience an employee has and how old a house is.
However, qualitative variables may also be examined and this is what a dummy
variable represents. A dummy variable can be used in determining if the employee has
a college degree and what condition the house is in at the time of sale. A dummy
variable is therefore not measurable and can also be called an indicator variable or
binary variable. The number of variables must be one less than the number of
categories of a qualitative variable. An example, when deciding if an employee that
has a college education will be better for the company overall:
X3 = 1 if the employee has a college degree

3
CHAPTER 4: REGRESSION
MODELS

= 0 if otherwise
This variable can then be factored into the regression equation to determine if
employees with a college education are more productive for the company that those
without one.

4.3 Discuss how the coefficient of determination and the coefficient of correlation are
related and how they are used in regression analysis.
Both the coefficient of determination and the coefficient of correlation measure the
strength of the linear relationship. The coefficient of determination is represented by
r2. R2 is the proportion of variability in Y that is explained by the regression equation.
R2 ranges from 0 to 1. The closer the coefficient is to 1 the stronger the correlation is.
The coefficient of correlation is represented by r and ranges from -1 to +1. It can be
found by taking the square root of the coefficient of determination. A positive r is
associated with a positive slope and a negative r has a negative slope.

4.4 Explain how a scatter diagram can be used to identify the type of regression to use.
A scatter diagram is a graph of the data. It can be used to find a relationship
between variables. From the created graph, different types of regressions can be
identified such as if it is linear, nonlinear, or no correlation. Specifics such as if it is
positive or negative can also be assessed using the scatter diagram.

4
CHAPTER 4: REGRESSION
MODELS

4.5
In a regression model when it is not clear which variable should be included or excluded
in the model r^2 should be used. It helps in determining whether the addition of more
variables to the regression model is useful or not. It gives a better picture over the normal
r^2 as it starts to fall when more than the required numbers of variables are added to the
model, thus helping in deciding how many variables should be used.

4.6
F test is used to determine a relationship between the dependent variable and
independent. A large F- test value shows that the regression model is not significant
enough while a low F- test value shows that the model is significant and there is a
relationship between the independent and dependent variables.

4.10

a)
Demand For Bass
Drums

Green Shades TV
Appearance

Column1

Column2

10

5
CHAPTER 4: REGRESSION
MODELS

b)

Y
3
6
7
5
10
8
Y=6.5

X
3
4
7
6
8
5

(Y-Y)^2
12.5
.25
.25
2.25
12.25
2.25
SST = 29.5

Y
4
5
8
7
9
6

(Y-Y)^2
1
1
1
4
1
4
SSE = 12.0

(Y-Y)^2
6.25
2.25
2.25
.25
6.25
.25
SSR = 17.5

c)
If the Green Shades performed on TV 6 times during the last month, using the regression
model in (1), the demand for bass drums can be calculated as Y = 1+6

4.12

Regression
Residuals

d.f

SS

MS

F Values

1
4

17.5
12.0

17.5
3.0

5.83

Significance
F
0.073

6
CHAPTER 4: REGRESSION
MODELS

Total

Intercept
Sales

29.5

Coefficient
1
1

Standard Error
2.385
0.414

t- Stat
0.419
2.415

p- Stat
0.697
0.073

Regression line is Y=1+X


Theres no statistical significance

4.13

A)

Regression model:

( x x)( y y )
( x x )^ 2

(Y)

(X)

( x x )( y y )

( x x )^ 2

93

98

236.444

285.23

7
CHAPTER 4: REGRESSION
MODELS

78

77

4.111

16.901

84

88

34.444

47.457

73

80

6.667

1.235

84

96

74.444

221.679

64

61

301.667

404.457

64

66

226.667

228.346

95

95

222.222

192.901

76

69

36.333

146.679

711

730

1143

1544.9

1st grade average X = 730


Final average Y = 711

1143/1544.9=0.74

(711/9)-0.74(730/9) = 18.99

= 18.99 + 0.74x

B)

Y= 18.99+ 0.74* 83
= 80.41

8
CHAPTER 4: REGRESSION
MODELS

C)

( y y )^ 2

( y y )^ 2

196

156.135

9.252

25

25.977

36

0.676

25

121.345

225

221.396

225

124.994

256

105.592

80.291

SST = 998

SSR= 845.659

Given the formula


Given the formula

( y y )^ 2
( y y )^ 2

SSR/SST = 845.659/998
r^2 = 0.8473 = 0.85

0.8473
r=0.92048 = 0.92

, SST would equal 998.


, SSR would equal 845.659

9
CHAPTER 4: REGRESSION
MODELS

4.14

MSE= SSE/(n-k-1)
( y y )^ 2

( y y )^ 2

( y y )^ 2

196

156.135

2.2617

9.252

4.1689

25

25.977

0.0009

36

0.676

28.8106

25

121.345

36.1959

225

221.396

0.0143

225

124.994

14.5871

256

105.592

32.7596

80.291

35.5335

SST = 998

SSR= 845.659

SSE = 152.341

MSE= 152.341/(9-1-1) = 21.76

MSR=SSR/k
MSR= 845.659/1= 845.659
F= 845.659/21.76= 38.9
F is equal to 5.59. So 38.9 is > 5.59 and is significantly significant due to relationship
between the first test grade and final average from problems 4.13.

4.16

10
CHAPTER 4: REGRESSION
MODELS

Formula is
y 13,473 37.65 x

A)
x= 1,860
y 13,473 37.65(1,860)

= $83,502

B)
The selling price for the house of 1,860 square feet was $83,502. The selling price was
based on other homes sold within the neighborhood, not just the square footage. One can
purchase a home for either lower the selling price or above the selling price.

C) Other quantitative variables may be location and square footage of the house being
sold. The number of bedrooms, bathrooms, whether its one story or 2 story can also be
included in the model.

D) The coefficient of determination for the model r^2 = 0.63^2 = 0.3969

4.17

A)
Distance traveled= x2= 300 miles
Days out of town= x1= 5 days

y $90.00 $48.50(5) $0.40(300)

Expected expenses= $452.50

11
CHAPTER 4: REGRESSION
MODELS

B)
The reimbursement request was for $685. Expected expenses was $452.50. The expected
expenses was less than the reimbursement request. A receipt of expenses for the 5 day trip
should be provided to justify the high expenses by Williams.

C)
Travel expenses can vary in its variables. This can include food, gas, rental vehicle and
hotel. With business trips, meetings/conferences or events can be included. Only 46% of
the cost is covered under the proposed model. It is not efficient in accounting for the
other percent due to other variables.

4.24
Use the data in Problem 4-22 and develop a regression model to predict
selling price based on the square footage, number of bedrooms, and age.
Use this to predict the selling price of a 10-year-old, 2,000-square-foot
house with 3 bedrooms.

12
CHAPTER 4: REGRESSION
MODELS

SUMMARY
OUTPUT

Regressio
n

13
CHAPTER 4: REGRESSION
MODELS

Statistics

Multiple R

0.94134
8

R Square

0.88613
7

Adjusted
R Square

0.85986
1

Standard
Error

13439.7
7

Observati
ons

17

ANOVA

df

Regressio
n

Residual

SS

MS

Significa
nce F

1.83E
+10

6.09E 33.724 2.12E-06


+09
06

13

2.35E

1.81E

14
CHAPTER 4: REGRESSION
MODELS

+09

Total

16

+08

2.06E
+10

Coefficie Standa
nts
rd
Error

t Stat

Pvalue

Lower
95%

Upper
95%

Lower
95.0%

Upper
95.0%

13189 32478.
3.1
23

13189
3.1

Intercept

82185.6 23008. 3.5719 0.0034


5
77
28
1

32478.2
3

Sq

25.9407 9.5830 2.7069 0.0179


6
37
46
55

5.23787 46.643 5.2378 46.643


66
7
66

Footage

Bedrooms

- 8826.0
- 0.8111 -21219.3 16915.
- 16915.
2151.74
87 0.2437
96
86 21219.
86
9
3

Age Years

- 327.19
- 0.0001 -2418.39
1711.54
08 5.2310
62
1004.6 2418.3 1004.6
1
8
9
8

15
CHAPTER 4: REGRESSION
MODELS

SUMMARY
OUTPUT

Regressio
n
Statistics

Multiple R

0.94107
2

R Square

0.88561
6

Adjusted
R Square

0.86927
6

Standard
Error

12980.4
5

Observati
ons

17

ANOVA

16
CHAPTER 4: REGRESSION
MODELS

df

Regressio
n

SS

MS

Significa
nce F

1.83E
+10

9.13E 54.197 2.56E-07


+09
54

Residual

14

2.36E
+09

1.68E
+08

Total

16

2.06E
+10

Coefficie Standa
nts
rd
Error

t Stat

Pvalue

Lower
95%

Upper
95%

Lower
95.0%

Upper
95.0%

12072 38062.
1.4
11

12072
1.4

Intercept

79391.7 19269. 4.1200 0.0010


5
82
06
41

38062.1
1

Sq

24.3190 6.6624 3.6501 0.0026


4
2
8
24

10.0295 38.608 10.029 38.608


7
51
57
51

Footage

Age Years

- 315.99
1712.21
77 5.4184

9.06E- -2389.96
05
1034.4 2389.9 1034.4

17
CHAPTER 4: REGRESSION
MODELS

P value of bedrooms is .81 > .15, so I will exclude it.

New formula is Selling value = 79391 + 24(Sq Feet) - 1712 (years)


= 79391 + (24*2000) - (1712*10)
= 79391 + 48000 - 17120
=110271$

4.27
A sample of 20 automobiles was taken, and the miles per gallon (MPG),
horsepower, and the total weight were recorded. Develop a linear
regression model to predict MPG, using horsepower as the only independent
variable. Develop another model with weight as the independent variable.
Which of these two models is better? Explain.

18
CHAPTER 4: REGRESSION
MODELS

The

19
CHAPTER 4: REGRESSION
MODELS

Horsepower vs MPG is model is better in that it is more predictable,


because the R2 value is .7702 compared to .7326, which is closer to 1
and thus more points are closer to the regression line, meaning there is
less error with this set of data.
4.29

Use the data in problem 4-27 to find the best quadratic regression model.
(There is more than one to consider.) How does this compare to the models
in 4-27?
a.

Based on the R2 values, the horsepower quadratic seems


to be more predictive with less room for error considering it has the
highest R2 value, of .0896, which is the closest to 1

Horsepower quadratic: y = 66.84 - 0.5769x + .001x 2

20
CHAPTER 4: REGRESSION
MODELS

SUMMAR
Y
OUTPUT

Regressi
on
Statistics

Multiple
R

0.89979
1

R Square

0.80962
4

Adjusted
R Square

0.78582
7

Standard
Error

3.96071
3

Observati
ons

19

21
CHAPTER 4: REGRESSION
MODELS

ANOVA

df

Regressio
n

SS

MS

2 1067.4 533.71 34.022


25
25
06

Residual

16 250.99 15.687
6
25

Total

18 1318.4
21

Coefficie Stand
nts
ard
Error

Intercept

t Stat

66.8404 10.204 6.5501


4
38
73

Pvalue

Significa
nce F

1.73E06

Lower
95%

Upper
95%

Lower
95.0%

Upper
95.0%

6.68E06

45.2081 88.472 45.208 88.472


3
75
13
75

67

- 0.2134
- 0.0156
0.57691
74 2.7024
92
8

1.02946 0.1243 1.0294 0.1243


6
6
6

4489

0.00160 0.0010 1.5262 0.1464


3
5
83
63

- 0.0038
- 0.0038
0.00062
3 0.0006
3

22
CHAPTER 4: REGRESSION
MODELS

Weight Quadratic

SUMMARY
OUTPUT

Regressio
n
Statistics

23
CHAPTER 4: REGRESSION
MODELS

Multiple R

0.88003
3

R Square

0.77445
8

Adjusted
R Square

0.74626
5

Standard
Error

4.31102
7

Observati
ons

19

ANOVA

df

Regressio
n

SS

MS

2 1021.0 510.53 27.470


62
09
12

Residual

16 297.35 18.584
93
95

Total

18 1318.4

Significa
nce F

6.7E-06

24
CHAPTER 4: REGRESSION
MODELS

21

Coefficie Standa
nts
rd
Error

Intercept

1844

3400336

t Stat

84.2071 14.661 5.7432


9
84
87

Pvalue

Lower
95%

Upper
95%

Lower
95.0%

Upper
95.0%

3.02E05

53.1254 115.28 53.125 115.28


7
89
47
89

- 0.0101
- 0.0082 -0.05231
0.03071
87 3.0148
2
0.0091 0.0523 0.0091
8
2
1
2

3.45E06

1.7E- 2.0373 0.0585


06
23
1

-1.4E-07

7.05E06

-1.4E07

7.05E06

25
CHAPTER 4: REGRESSION
MODELS

Y = 84.207-.0307x+0.00000345x2

Case Study

-The first concern is the maintenance cost. As the age of the aircraft increases, so does the
cost of maintenance.
-Maintenance should be looked at for both airlines as maintenance costs vary greatly.
-The data that was provided does not provide sufficient results. Maintenance cost seems
to be based on the airline rather than the age of the aircraft.

26
CHAPTER 4: REGRESSION
MODELS

-Northern Airline appears more efficient as the cost of maintenance has little variation
from year to year.
-Southeast Airline appears to have a steady increase for engine and air frame maintenance
cost.

Based on the overall data, Southeast Airline seems more efficient at doing repairs that
requires abrupt attention. Northern Airline however with its high costs is better suited for
preventative maintenance.

You might also like