You are on page 1of 18

Multiple Regression

MULTIPLE REGRESSION

• We now assume that k independent variables are


potentially related to the dependent variable
Y = ß0 + ß1 X1 + ß2 X2 +… + ßk Xk + 
where  is the error variable (deviation between
predicted value and actual value of y)
• Graphically for k = 2, the regression equation
creates a plane. For k > 2 it is called a response
surface
• The linear model assumptions on  are the same
as those found for the simple linear regression
MULTIPLE REGRESSION
Example
• La Quinta Motor Inns is a moderately priced chain of
motor inns catering to the frequent business traveller.
It is trying to increase its market share by building new
inns
• To predict profitable sites, they acquired data on 100
randomly selected inns belonging to La Quinta
• To measure profitability, La Quinta used operating
margin defined as the ratio (profit + depreciation +
interest expenses) / total revenue. La Quinta defines
profitable inns as those with an operating margin in
excess of 50% and unprofitable ones with margins of
less than 30%
MULTIPLE REGRESSION
Example
Operating margin
“Profitability”

Competition Market Customers Community Geography


awareness

Office University Distance (km)


Rooms Proximity Income
space enrolment to downtown
Nb of motel/hotel Distance to Median
rooms within 5 km the nearest household
of site La Quinta inn. income
MULTIPLE REGRESSION
Data
Margin (%) Nb rooms Nearest competitor (km) Office space (000 m 2) Enrollment Income (000 $) Distance (km) downtown

55.5 3203 6.8 54.9 8.0 37 4.3


33.8 2810 4.5 49.6 17.5 35 23.2
49.0 2890 3.9 25.4 20.0 35 4.2
31.9 3422 5.3 43.4 15.5 38 19.5
57.4 2687 1.4 67.8 15.5 42 11.1
49.0 3759 4.7 63.5 19.0 33 17.4
46.0 2341 3.7 58.0 23.0 29 11.9
50.2 3021 2.7 57.2 8.5 41 8.8
46.0 2655 1.8 66.6 22.0 34 13.0
45.5 2691 5.1 51.9 13.5 46 9.2
44.2 3471 3.5 52.3 12.0 39 8.7
29.8 3567 4.0 14.0 13.5 32 14.6
38.4 3264 4.3 40.4 22.5 29 16.7
54.4 3234 5.1 64.9 19.5 39 13.4
34.5 2730 0.5 17.1 17.0 33 6.9
44.9 3003 1.4 40.2 15.5 37 16.4
56.5 2045 2.4 56.2 15.0 38 6.9
39.3 3591 3.2 51.0 18.5 29 12.2
62.8 1613 2.7 68.6 21.5 29 6.6
40.3 2848 4.0 74.4 19.0 30 13.5
41.1 3098 3.7 44.9 11.5 34 12.4
35.7 3591 2.3 27.0 9.5 43 8.0
MULTIPLE REGRESSION
Output of linear regression tool
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.7246
R Square 0.5251
Adjusted R Square 0.4944
Standard Error 5.51
Observations 100

ANOVA
df SS MS F Significance F
Regression 6 3123.8 520.6 17.14 0.00000
Residual 93 2825.6 30.4
Total 99 5949.5

Coefficients Standard Error t Stat P-value


Intercept 38.14 6.99 5.45 0.00000
Nb rooms -0.01 0.00 -6.07 0.00000
Nearest competitor (km) 1.02 0.39 2.60 0.01080
Office space (000 m2) 0.20 0.03 5.80 0.00000
Enrollment 0.21 0.13 1.59 0.11585
Income (000 $) 0.41 0.14 2.96 0.00390
Distance (km) downtown -0.14 0.11 -1.26 0.21065
MULTIPLE REGRESSION
Estimating the coefficients and assessing the model

• The regression model is estimated by


y = 38.14 – 0.0076 x1 + 1.02x2 + 0.198x3 + 0.21x4 + 0.41x5 – 0.14x6

• We assess the model in three ways: the standard


error of estimate, the coefficient of determination
and Fisher’s F test
MULTIPLE REGRESSION
Standard error of estimate
• As in the simple linear regression, the square of
the standard error of estimate, 2 = Var(), given a
sample of size n, is based on the squared
differences between the points and the response
surface
SSEE = Estimator of 2 = S y2x = (Y - YC )2 /(n - k -1)
• Excel provides a standard error of estimate of 5.51
(in %). Compared to the mean value of the
dependent variable (45.74%), it appears that this
value is not particularly small
MULTIPLE REGRESSION
Standard error of estimate
– We need to estimate the standard error of estimate

SSE
s =
n - k -1

– Compare s to the mean value of y


• From the printout, Standard Error = 5.5121
y = 45.739
• Calculating the mean value of y we have
– It seems s is not particularly small.
– Can we conclude the model does not fit the data
well?
MULTIPLE REGRESSION
Coefficient of determination
– The definition is
SSE
R = 1-
2

SST

– From the printout, R2 = 0.5251


– 52.51% of the variation in the measure of
profitability is explained by the linear regression
model formulated above.
– When adjusted for degrees of freedom,
Adjusted R2 = 1-[SSE/(n-k-1)] / [SS(Total)/(n-1)] =
= 49.44%
MULTIPLE REGRESSION
Coefficient of determination
• The Excel output provides r 2 = 0.5251. This means
that 52.51% of the variation in operating margin is
explained by the six independent variables, while
47.49% remains unexplained
• Note that Excel gives a second r 2 statistic, called the
“coefficient of determination adjusted for degrees of
freedom”, which has been adjusted to take into
account the sample size and the number of
independent variables. In our example, the adjusted
coefficient of determination is 49.44%, indicating that,
no matter how we measure the coefficient of
determination, the model’s fit is reasonably good
MULTIPLE REGRESSION
Testing the validity of the model
• To test the overall validity of the regression model, we
specify the following hypotheses
« H0 : ß1 = ß2 = … = ßk = 0 » and « H1 : at least one ßi  0 »
• This test is conducted by comparing Fisher’s F statistic
given by the regression tool output (F = 17.14) to its
critical value
Fa, k,n-k-1 = FINV(a, k, n-k-1)
= FINV(0.05, 6, 93) = 2.20
• In our example F > Fa, k,n-k-1 (and the p-value
associated with the test is equal to 0 on the output);
obviously there is a great deal of evidence to infer that
the model is valid
MULTIPLE REGRESSION
Interpreting the coefficients
• Intercept b0 = 38.14. This is the predicted operating
margin when all of the independent variables are 0.
Does not really have an interpretation here
• The relationship between operating margin and the
number of hotel and motel rooms within 5 km is
described by b1 = -0.0076. In this model, for each
additional room within 5 km of the La Quinta inn, the
operating margin decreases by 0.0076 % assuming
that the other independent variables in this model are
held constant
MULTIPLE REGRESSION
Interpreting the coefficients (cont’d)
• The coefficient b2 = 1.02 specifies that for each
additional km that the nearest competitor is to a La
Quinta inn, the average operating margin increases by
1.02%, assuming the constancy of the other
independent variable
• The relationship between office space and operating
margin is expressed by b3 = 0.198. Which means,
keeping all other independent variables fixed, that the
average operating margin increases by 1.98% for
every extra 10,000 m2 of office space
MULTIPLE REGRESSION
Interpreting the coefficients (cont’d)
• The relationship between operating margin and
college/university enrolment is described by b4 = 0.21, which
we interpret to mean that for each additional thousand
students, the operating margin increases by 0.21% when
the other variables are constant. Both office space and
enrolment produced positive coefficients, indicating that
these measures of economic activity are positively related to
the operating margin
• Similarly, the coefficient for household income b5 = 0.41
suggests that motels in more affluent communities have
higher operating margins
• Finally b6 = -0.14 indicates that the operating margin
decreases as you are further away from the downtown core.
It may be that business people prefer to stay close to
downtown
MULTIPLE REGRESSION
Testing the coefficients
• In the multiple regression model, we can test
to determine whether there is enough
evidence of a linear relationship between
each independent variable and the
dependent variable for the entire population
• P value
MULTIPLE REGRESSION
Testing the coefficients (cont’d)
COEFFICIENT VARIABLE CR p-VALUE LINEAR RELATIONSHIP?
1 Nb rooms -6.07 0.0000 YES
2 Nearest competition (km) 2.60 0.0108 YES
3 Office space (000 m2) 5.80 0.0000 YES
4 Enrollment 1.59 0.1159 NO
5 Income (000 $) 2.96 0.0039 YES
6 Distance (km) downtown -1.26 0.2107 NO

• The number of hotel/motel rooms, distance to the


nearest motel, amount of office space and median
household income are linearly related to the operating
margin. We found no evidence to infer that college
enrolment and distance to downtown center are
linearly related to operating margin
MULTIPLE REGRESSION
Prediction intervals
• We are trying to predict the operating margin for
a site with the following characteristics:
– 3815 rooms within 5 kilometres of the site,
– The closest other hotel/motel is 1.5 km away,
– The amount of office space is 47,600 m2,
– There is one college nearby with a total enrolment of
24,500,
– From the census, the median household income in
the area is $35,000,
– The distance to the downtown core is 18 km
MARGIN = 38.14 – 0.0076(3815) +1.02(1,5) + 0.198(47.6)
+0.21(24.5) + 0.41(35) – 0.14(18) = 37.1 %

You might also like