Module

Master of Applied Statistics ST111: Regression and analysis of variance
Pia Veldt Larsen
Module 3: Multiple linear regression

3.1 3.2 3.3 Introduction . . . . . . . . . . . . . . . . . . Matrix notation . . . . . . . . . . . . . . . . Fitting the model . . . . . . . . . . . . . . . 3.3.1 The least squares line . . . . . . . . 3.3.2 Coecient of multiple determination 3.3.3 Estimating the variance . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . 5 . 8 . 8 . 13 . 14 . 15
3.4
3.1
Introduction
The general idea of a simple linear regression model is that the response variable Yi is a straight-line function of a single explanatory variable xi . In this module, we extend the concept of simple linear regression models to multiple linear regression models by allowing the response variable to be a function of k explanatory variables xi,1 , . . . , xi,k . This relationship is straight-line and in its basic form it can be written as Yi = 0 + 1 xi,1 + 2 xi,2 + + k xi,k + i , (3.1)
where the random errors i , i = 1, . . . , n, are independent normally distributed random variables with zero mean and constant variance 2 . The denition of a multiple linear regression model is that mean of the response variable, E[Yi ] = 0 + 1 xi,1 + 2 xi,2 + + k xi,k , is a linear function of the regression parameters 0 , 1 , . . . , k . (The mean response E[Y ] in (3.1) is an ane function of the explanatory variables xi,1 , . . . , xi,k .) It is standard to assume normality in the denition of multiple linear regression models. In situations where the normality assumption is not satised, one might use a generalised linear model instead. The MAS course ST112 is concerned with generalised linear models. Example 3.1 Holiday cottages
http://statmaster.sdu.dk/courses/st111
February 12, 2008
3.1 Introduction
Price DDK 1000 (y) 745 895 442 440 1598
Age years (x1 ) 36 37 47 32 1
Area m2 (x2 ) 66 68 64 53 101
Table 3.1: Data on holiday cottages Table 3.1 contains the sales prices of 5 holiday cottages in Odsherred, Denmark, together with the age and the livable area of each cottage. Suppose it is thought that the price obtained for a cottage depends primarily on the age and livable area. A possible model for the data might be the linear regression model Yi = 0 + 1 xi,1 + 2 xi,2 + i , i = 1, . . . , 5,
where the random errors i are independent, normally distributed random variables with zero mean and constant variance. Further details on this dataset can be found here. Example 3.2 Ice cream consumption These data refer to a study on how ice cream consumption is related to a number of explanatory variables. In the exercises to Modules 1 and 2, you have tted a simple linear regression model to the ice cream consumption, using temperature as an explanatory variable. The model provided a reasonably good t to the relationship between ice cream consumption and temperature, but we may be able to rene the model by incorporating three more explanatory variables in the model: the ice cream price, the average annual family income, and the year (the data were collected over three years: 195153). It seems plausible that (some of) these variables aect the ice cream consumption. Table 3.2 shows some of the data on ice cream consumption, temperature, ice cream price and average family income. (The Yearvariable is coded such that 0 corresponds to 1951, 1 corresponds to 1952, and 2 corresponds to 1953.) Figure 3.1 shows a scatterplot of the ice cream consumption against each of the four explanatory variables. (An outlying point has been removed from the original dataset.) The scatterplots suggest that the ice cream consumption depends linearly of the temperature and the year. For the two remaining variables, the scatterplots are less convincing;
February 12, 2008
3.1 Introduction
Consumption Pints pp (y) 0.386 0.374 0.393 . . . 0.437
Temperature Fahrenheit (x1 ) 41 56 63 . . . 64
Price US$ (x2 ) 0.270 0.282 0.277 . . . 0.268
Family income US$ 1000 (x3 ) 78 79 81 . . . 91
Year (x4 ) 0 0 0 2
Table 3.2: Ice cream consumption data however, there is some (weak) indication of straight-line relationships. The linear regression model relating ice cream consumption to the four explanatory variables is given by Yi = 0 + 1 xi,1 + 2 xi,2 + 3 xi,3 + i , i = 1, . . . , 29,
where the random errors i are independent and normally distributed with zero mean and constant variance. Further details on this dataset can be found here. Example 3.3 Polynomial regression A special type of multiple regression is polynomial regression. In polynomial regression, the response variable is a function of second- or higher-order polynomials in one or more explanatory variables; for example, Yi = 0 + 1 xi + 2 x2 + i , i i = 1, . . . , n.
Note that, if we rename xi as xi,1 and x2 as xi,2 , we can write the above model as i Yi = 0 + 1 xi,1 + 2 xi,2 + i , i = 1, . . . , n.
Adding higher-order terms (e.g. x2 or x3 ) in a model, can be considered equivalent to adding i i new independent variables x2,i = x2 or x3,i = x3 to the model. i i Recall that a multiple linear regression model is dened to be linear in the regression parameters rather than in the explanatory variables. The convenience of this denition is apparent in connection with polynomial regression. In polynomial regression the response variable is a linear (ane) function of second- or higher order polynomials in one or more explanatory variables, but it is not a linear function of the explanatory variables themselves. However, the response variable is a linear function of the regression parameters. Thus, polynomial regression is included in the denition of multiple linear regression. http://statmaster.sdu.dk/courses/st111 February 12, 2008
3.1 Introduction
0.45
Ice cream consumption
Ice cream consumption 0.265 0.270 0.275 Price 0.280 0.285 0.290
0.40
0.35
0.30
0.25
0.25
0.30
0.35
0.40
0.45
80
85 Income
90
95
0.45
Ice cream consumption
Ice cream consumption 30 40 50 Temperature 60 70
0.40
0.35
0.30
0.25
0.25
0.30 0.0
0.35
0.40
0.45
0.5
1.0 Year
1.5
2.0
Figure 3.1: Ice cream consumption against explanatory variables Although the only dierence between simple linear regression models and multiple linear regression models is the number of explanatory variables in the model, multiple regression analysis is considerably more complicated than simple regression analysis. There are several reasons for this. Firstly, in the simple regression situation, we would always start by plotting the data and use information drawn from the scatterplot to guide us towards an appropriate model. When there are two explanatory variables, the scatterplot of the response variable against the two explanatory variables is a three dimensional plotwhich can be rather more dicult to interpret. When there are three or more explanatory variables, a direct plot of the data is not possible at all. Secondly, it can be a problem to choose a best-tting model, since there may be several dierent models which describe the data almost equally well. (Module 8 is concerned with the question of model selection.) Finally, supposing we have found a best-tting model, it can sometimes be dicult to interpret what the model means in real-life terms. In Section 3.2, we introduce matrix notation, and re-express the denition of a multiple linear regression model in this notation. With matrix notation it is possible to express results for multiple linear regression models in a concise and clear way. In Section 3.3, we estimate the parameters in the multiple linear regression model, in particular, we t the best straight line to the data, and estimate the variation in the data away from this line.
February 12, 2008
3.2 Matrix notation
3.2
Matrix notation
Statistical results for multiple linear regression models, such as parameter estimates, test statistics, etc., quickly become complex and tedious to write outin particular when the number of explanatory variables is more than just two or three. A very useful way to simplify the complex expressions is by introducing matrix notation. We shall introduce matrix notation through the following example. Example 3.1 (continued) Holiday cottages Writing out the ve equations corresponding to the ve observations in Table 3.1 in the example on price, age and livable area of holiday cottages in Odsherred, we get 745 = 0 + 361 + 662 + 1 895 = 0 + 371 + 682 + 2 442 = 0 + 471 + 642 + 3 440 = 0 + 321 + 532 + 4 1598 = 0 + 1 + 1012 + 5 . In order to express these equations more concisely, we introduce the following notation. Let yi denote the ith observed response variable and let the vector y be the column vector containing the yi s, that is y1 745 y2 895 y = y3 = 442 . (3.3) y4 440 y5 1598 Further, let x(i) = (1, xi,1 , xi,2 ) denote the 3-dimensional row vector which rst element is 1, and which last 2 elements are the values of the two explanatory variables for the ith observation. If we stack the ve row vectors x(1) , . . . , x(5) , we get the design matrix x, given by 1 x1,1 x1,2 1 36 66 1 x2,1 x2,2 1 37 68 x = 1 x3,1 x3,2 = 1 47 64 . (3.4) 1 x4,1 x4,2 1 32 53 1 x5,1 x5,2 1 1 101 Note that the matrix x is referred to as the design matrix (or model specication matrix) because it designs (or species) the exact form of the model: by changing x, we change the model into a new model. We can express the ve equations in (3.2) in matrix form as y = x + http://statmaster.sdu.dk/courses/st111 February 12, 2008 (3.2)
3.2 Matrix notation
where y and x are given in (3.3) and (3.4), respectively, and where is the column vector of regression parameters, and is the column vector containing the random errors 1 , . . . , 5 , that is, 1 2 0 = 1 and = 3 . 4 2 5 For example, the rst equation in (3.2) is given by y1 = x(1) + 1 , where x(1) is the rst row of x, and y1 and 1 are the rst elements of y and , respectively. That is, y1 = x(1) + 1
0 745 = (1, 36, 66) 1 + 1 2 = 0 + 361 + 662 + 1 . Similarly, the second equation in (3.2) is given by y2 = x(2) + 2 . Before reading on, make sure you understand how the remaining equations in (3.2) follow from the vectors y, and and the design matrix x. Above, we have expressed, in matrix form, the observed response variables y1 , . . . , y5 as functions of the explanatory variables xi,1 , xi,2 , for i = 1, . . . , 5. The corresponding multiple linear regression model for the unobserved response variables Y1 , . . . , Y5 can be written, in matrix form, as Y = x + , where the random errors i , i = 1, . . . , 5, in are independent normally distributed random variables with zero mean and constant variance 2 . Recall that in denition (3.1), the multiple linear regression model assumes the form Yi = 0 + 1 xi,1 + 2 xi,2 + + k xi,k + i , i = 1, . . . n,
where the i are independent normal random variables with zero mean and constant variance. Following the procedure in Example 3.1, we can write out the n equations, obtaining the equation system: Y1 = 0 + 1 x1,1 + 2 x1,2 + + k x1,k + 1 Y2 = 0 + 1 x2,1 + 2 x2,2 + + k x2,k + 2 . . . Yn = 0 + 1 xn,1 + 2 xn,2 + + k xn,k + n . http://statmaster.sdu.dk/courses/st111 February 12, 2008 (3.5)
3.2 Matrix notation
From this, we can construct the n-dimensional vector Y, containing the response variables, and the n (k + 1)-dimensional design matrix (or model specication matrix, or simply model matrix) x. The rst column in x is a vector of 1s (corresponding to the intercept 0 ) while the remaining k columns correspond to the k explanatory variables. In particular, the ith row x(i) of x is given by x(i) = (1, xi,1 , xi,2 , . . . , xi,k ). Thus, Y and x are given by, respectively, Y1 1 x1,1 x1,2 x1,k Y2 1 x2,1 x2,2 x2,k Y = . and x = . . . . . . . . . . . . . . . Yn 1 xn,1 xn,2 xn,k Further, let denote the vector of regression parameters, and let denote the vector of random errors, that is, 0 1 1 2 = . , = . . . . . . k n Recall that x has dimension n (k + 1), and observe that has dimension (k + 1) 1. Thus, we can matrix-multiply x to . The product x has dimension n 1, that is, it is an ndimensional column vector. Similarly to Example 3.1, we can express the system of equations (3.5) in matrix form as Y = x + , (3.6) where the random errors i , i = 1, . . . , n, in are independent normally distributed random variables with zero mean and constant variance 2 . The vector of tted values or predicted values is given by y = x. The tted values are estimates of the expected response for given values of the explanatory variables xi,1 , . . . , xi,k , i = 1, . . . , n. Example 3.2 (continued) Ice cream consumption In matrix form, the multiple linear regression model for the ice cream data is given by Y = x + , where the random errors i , i = 1, . . . , 29, in are independent normally distributed random variables with zero mean and constant variance 2 , and x is the 29 5 design matrix 1 41 0.270 78 0 1 56 0.282 79 0 x = 1 63 0.277 81 0 . . . . . . . . . . . . . . . . 1 64 0.268 91 2 http://statmaster.sdu.dk/courses/st111 February 12, 2008
3.3 Fitting the model
The rst column in x relates to the intercept 0 , and the last four columns correspond to the temperature, the ice cream price, the average family income, and the year, respectively.
3.3
Fitting the model
This section is concerned with tting multiple linear regression models. In order to t the best regression line, we shall use the principle of least squaresthe same principle we used in Module 2, when tting simple linear regression models. In Subsection 3.3.1, we discuss how the principle of least squares apply to the setting of multiple linear regression models. The resulting parameter estimates are presented in matrix notation. Subsection 3.3.2 provides a measure of the strength of the straight-line relationship, and in Subsection 3.3.3 an unbiased estimate of the common variance 2 is given.
3.3.1
The least squares line
As in the simple case, we use the principle of least squares to nd the best tting model. According to this principle, the best tting model is the one that minimises the sum of squared residuals, where the residuals are the deviations between the observed response variables and the values predicted by the tted model. As in the simple case: the smaller the residuals, the closer the t. Note that the residuals i are given by i = yi 0 1 xi,1 2 xi,2 k xi,k ,
n n
i = 1, . . . , n.
It follows that the residual sum of squares, RSS, is given by RSS =

i=1
2 i
=
i=1
(yi 0 1 xi,1 2 xi,2 k xi,k )2 .
We are interested in the values 0 , 1 , . . . , k of 0 , 1 , . . . , k which minimise this sum. In order to minimise the RSS with respect to 0 , 1 , . . . , k , we derive the k+1 partial derivatives of the RSS: RSS RSS RSS RSS . , , ,..., 0 1 2 k (Note that we are following exactly the same procedure as in the simple case.) Putting the derivatives equal to zero and re-arranging the terms, yields the following system of k + 1 equations with k + 1 unknowns
n n n n
0 n + 1
i=1 n n
xi,1 + 2
i=1 n
xi,2 + + k
i=1 n
xi,k =
i=1 n
yi xi,1 yi
i=1
0
i=1
xi,1 + 1
i=1
x2 + 2 i,1
i=1
xi,1 xi,2 + + k
i=1
xi,1 xi,k = . . .
(3.7)
n
0
i=1
xi,k + 1
i=1
xi,1 xi,k + 2
i=1
xi,2 xi,k + + k
i=1
x2 = i,k
i=1
xi,k yi . February 12, 2008
These equations must be solved simultaneously for 0 , 1 , . . . , k , in order to achieve the least squares estimates 0 , . . . , k of 0 , 1 , . . . , k , respectively. The least squares line for Y given x1 , . . . , xk is given by y = 0 + 1 x1 + 2 x2 + + k xk . You can see that the ideas behind tting multiple linear regression models is, in concept, a straightforward extension of the ideas developed in the context of simple linear regression. Example 3.1 (continued) Holiday cottages For the data on price (y), age (x1 ) and livable area (x2 ) of holiday cottages in Odsherred, Denmark, we need to calculate the following sums, in order to obtain the equation system (3.7). We nd that
5 5 5
xi,1 = 153,
i=1 5 i=1 5
x2 = 5899, i,1
i=1 5
xi,1 yi = 96387, xi,2 yi = 323036,

i=1
xi,2 = 352,
i=1 5 i=1
x2 = 26086, i,2
5
yi = 4120,
i=1 i=1
xi,1 xi,2 = 9697,
n = 5.
Substituting these results into the equation system (3.7), we get the following equations 50 + 1531 + 3522 = 4120 1530 + 58991 + 96972 = 96387 3520 + 96971 + 260862 = 323036. A computer will solve this system of equations, providing the least squares estimates 0 = 281.43, 1 = 7.611, 2 = 19.01.
Thus, the least squares line for these data is given by y = 281.43 7.611x1 + 19.01x2 .
In order to give explicit formulae for the least squares estimates of the regression parameters, it is convenient to switch to matrix notation. Without matrix notation, the formulae very quickly become unmanageable when the number of explanatory variables increases. Recall that the multiple linear regression model (3.6) is given in matrix form by Y = x + , http://statmaster.sdu.dk/courses/st111 February 12, 2008
10
where Y=
Y1 Y2 . . . Yn
, =
0 1 . . . k
, =
1 2 . . . n
and x =
1 1 . . . 1
x1,1 x2,1 . . . xn,1
x1,2 x2,2 . . . xn,2
x1,k x2,k . . . xn,k
and where the random errors i , i = 1, . . . , n, in are independent normally distributed random variables with zero mean and constant variance 2 . It can be shown that the vector of least squares estimates of is given by = xT x
1
xT y,
(3.8)
where y is the vector of observed response variables, and where the superscripts T and 1 denote transposed and inverse matrices, respectively. The transpose of an n k matrix a, is a k n matrix aT which has as rows, the columns of a (or, equivalently, as columns the rows of a). For example, let a be the n k matrix a1,1 a1,2 a1,k a2,1 a2,2 a2,k a= . , . . . . . . . . an,1 an,2 an,k then the transpose aT of a is given by the k n a1,1 a2,1 a1,2 a2,2 aT = . . . . . . a1,k a2,k matrix an,1 an,2 . . . an,k
The inverse of an n n matrix a exists if there that 1 0 a1 a = aa1 = In = . . . 0
where In is the n n identity matrix with 1s in the diagonal, and 0s elsewhere. If the matrix a1 exists, it is called the inverse of a. A matrix a is called invertible, if the inverse a1 exists. For example, let a be an invertible n n matrix a1,1 a1,2 a1,n a2,1 a2,2 a2,n a= . , . . . . . . . . an,1 an,2 an,n http://statmaster.sdu.dk/courses/st111 February 12, 2008
is an n n matrix a1 with the property 0 0 . .. . . 1 . , .. .. . . 0 0 1
11
then the inverse matrix a1 is given by 1,1 a a2,1 a1 = . . .
a1,2 a2,2 . . . an,2
a1,k a2,k . . . an,k
an,1
where the elements ai,j satisfy that a1 a = aa1 = In . Note that, in this course, it is not necessary to know in detail how to invert a matrixwe shall always use a computer. Example 3.1 (continued) Holiday cottages For the data on price, age and livable area of holiday cottages, we have from earlier that 745 1 36 66 895 1 37 68 y = 442 , x = 1 47 64 . 440 1 32 53 1598 1 1 101 It can be shown that the vector containing the least squares estimates of 0 , 1 and 2 is given by 0 = 1 2 = xT x
1
xT y 1 1 1 1 1 1 36 37 47 32 1 66 68 64 53 101 745 895 442 440 1598
5 153 352 = 153 5899 9697 352 9697 26 086 281.43 = 7.611 , 19.01
where we have computed the matrix multiplications on a computer. Note that the estimates are the same as the ones that we obtained earlier. Example 3.2 (continued) Ice cream consumption For the ice cream data, the observed response variables y given by 0.386 1 41 0.270 0.374 1 56 0.282 y = 0.393 , x = 1 63 0.277 . . . . . . . . . . . . 0.437 1 64 0.268 http://statmaster.sdu.dk/courses/st111
and the design matrix x are 78 0 79 0 81 0 , . . . . . . 91 2 February 12, 2008
12
where the last four columns of x correspond to temperature, price, and year, respectively. The least squares estimates are given by 0 0.5348 1 0.002946 1 T = 2 = xT x x y = -0.7359 -0.001861 3 0.04150 4
average family income,
The least squares line, describing how the ice cream consumption (y) is related to the temperature (x1 ), the ice cream price (x2 ), the average family income (x3 ), and the year (x4 ) is given by y = 0.535 + 0.00295 x1 0.736 x2 0.00186x3 + 0.0415 x4 . Recall that, in an exercise for Module 2, you tted a simple linear regression model to the ice cream consumption regarded as a function of the temperature only. If we t the simple model (with temperature as the only explanatory variable) on the data when the outlier has been removed, the least squares estimates are 0 = 0.2207 and 1 = 0.002735 (corresponding to the intercept and the temperature, respectively). You can see that these estimates change considerably when the three extra explanatory variables are included in the model. In general, including more explanatory variables in the model aects the least squares estimates. This is because the least squares estimators are not independent. The estimates are only left 1 is diagonalin which case the least squares estimators are unchanged if the matrix xT x independent. Example 3.4 Simple linear regression In the special case when k = 1, that is, when there is just one explanatory variable, the parameter estimates in formula (3.8) of the multiple linear regression model reduce to the parameter estimates of the simple linear regression. When k = 1, the vector of observed response variables y and the design matrix x are given by, respectively 1 x1 y1 1 x2 y2 x = . . . y = . , . . . . . . yn 1 xn It can be shown that the vector of least squares estimates is given by = xT x
1
xT y =
1 sxx
sxx y sxy x sxy
where x = n xi /n, y = n yi /n, sxx = i x2 ( i xi )2 /n and sxy = (xi x)(yi y). i i=1 i=1 Thus, the least squares estimates of the two regression parameters are given by, respectively 1 = sxy /sxx http://statmaster.sdu.dk/courses/st111 February 12, 2008
13
and 0 = y (sxy x) /sxx = y 1 x, which are exactly the least squares estimates that were derived in Module 2. The multiple linear regression model may also be tted using other statistical methods, for example, the method of maximum likelihood. It can be shown that the least squares estimates 0 , 1 , . . . , k of the regression parameters 0 , 1 , . . . , k are maximum likelihood estimates.
3.3.2
Coecient of multiple determination
Having tted a simple linear regression model to a set of data in Module 2, we would use the coecient of determination to measure how closely the tted model described the variation in the data. The denition of this measure easily generalises to the situation of multiple linear regression. Recall that the coecient of determination compares the amount of variation in the data away from the tted model with the total amount of variation in the data. Since the observed residuals i = yi yi denote the deviances between the observed data yi , and the values tted by the model yi = 0 + 1 xi,1 + 2 xi,2 + + k xi,k , it is natural to use the observed residuals, or the residual sum of squares RSS, to measure the variation away from the tted model. In the multiple linear regression case, the residual sum of squares, RSS, is given by
n
RSS = RSS() =
i=1
2 = = (y x) (y x). i
(3.9)
Following the line of arguments from the simple case, a measure of the strength of the straightline relationship between y and x is the proportional reduction in variation obtained by using the least tted model instead of the na model: y = y. That is, as in the simple case, the ve variation explained by the model (syy RSS) as a proportion of the total variation (syy ): r2 = syy RSS . syy (3.10)
The number r 2 is an estimate of the coecient of multiple determination: R2 = (Syy RSS) /Syy . As in the simple case R2 will always lie between zero and one: if it is close to 1, it is an indication that the data points lie close to the tted model; if it is close to zero, it is an indication that the model hardly provides any more information about the variation in the data than the na model does. ve
The coecient of multiple determination is a measure of how well a linear model describes the variation in the data compared to the na model; it is not a measure of whether or not ve http://statmaster.sdu.dk/courses/st111 February 12, 2008
14
a linear model is appropriate for the data. (A non-linear model might be more appropriate.) Methods for assessing the appropriateness of the assumption of linearity will be discussed in Module 4. Example 3.1 (continued) Holiday cottages The coecient of multiple determination for the data on price, age and livable area in holiday cottages is given by r 2 = 0.944 = 94.4%. The multiple linear regression model relating the price of holiday cottages in Odsherred to the age of the cottage and the livable area in the cottage seems to explain a large amount of the variation in the data. One should be careful, though, not over-interpreting the model: there are very few data points! Example 3.2 (continued) Ice cream consumption For the ice cream data, r 2 = 0.763. That is, the multiple linear regression model using price, temperature, income and year as explanatory variables explains 76.3% of the variation in ice cream consumption.
3.3.3
Estimating the variance
Estimating the common variance 2 in the multiple linear regression model (3.6) is done in much the same way as estimating the common variance in a simple linear regression model. Recall from Module 2, that we used the residual sum of squares, RSS, divided by the degrees of freedom in the model, n 2, as an unbiased estimate of the common variance in the simple model. For a multiple linear regression model, the RSS is given in (3.9), and the degrees of freedom are given by d.f. = number of observations - number of estimated parameters. = n (k + 1). Thus, an unbiased estimate of the variance 2 is given by RSS() . s2 = nk1 Example 3.1 (continued) Holiday cottages An unbiased estimate of the common variance 2 for the data on prices of holiday cottages is given by s2 = 25 344. Example 3.2 (continued) Ice cream consumption An unbiased estimate of the common variance 2 for the data on ice cream consumption is given by s2 = 0.0008779. http://statmaster.sdu.dk/courses/st111 February 12, 2008
3.4 Summary
15
3.4
Summary
A multiple linear regression model generalises the simple linear regression model by allowing the response variable to depend on more than one explanatory variable. In order to avoid complex expressions with lots of indices, matrix notation has been introduced. Using this notation, it is possible to present most results neatly. As in the case of simple linear regression models, we used the principle of least squares to t the regression line. According to the principle of least squares the best tting line is the line which minimises the deviations of the observed data away from the line. This line is called the least squares line. The regression parameters for the least squares line, the least squares estimates, are estimates of the unknown regression parameters in the model. The coecient of multiple determination is a measure of how well the tted line describes the variation in the data. Finally, an unbiased estimate of the common variance has been given. Keywords: multiple linear regression model, regression parameters, polynomial regression, matrix notation, model specication matrix, design matrix, model matrix, tted values, predicted values, residual, residual sum of squares, least squares line, least squares estimates, transposed matrix, inverse matrix, invertible matrix, coecient of multiple determination, degrees of freedom, unbiased variance estimate.
February 12, 2008

Module

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module

Uploaded by

Copyright:

Available Formats

Master of Applied Statistics ST111: Regression and analysis of variance

Pia Veldt Larsen

Module 3: Multiple linear regression

February 12, 2008

Price DDK 1000 (y) 745 895 442 440 1598

Age years (x1 ) 36 37 47 32 1

Area m2 (x2 ) 66 68 64 53 101

February 12, 2008

Consumption Pints pp (y) 0.386 0.374 0.393 . . . 0.437

Temperature Fahrenheit (x1 ) 41 56 63 . . . 64

Price US$ (x2 ) 0.270 0.282 0.277 . . . 0.268

Family income US$ 1000 (x3 ) 78 79 81 . . . 91

Ice cream consumption

Ice cream consumption

Ice cream consumption 30 40 50 Temperature 60 70

February 12, 2008

3.2 Matrix notation

3.2 Matrix notation

3.2 Matrix notation

3.3 Fitting the model

Fitting the model

The least squares line

It follows that the residual sum of squares, RSS, is given by RSS =

(yi 0 1 xi,1 2 xi,2 k xi,k )2 .

xi,k yi . February 12, 2008

3.3 Fitting the model

xi,1 yi = 96387, xi,2 yi = 323036,

xi,1 xi,2 = 9697,

3.3 Fitting the model

x1,1 x2,1 . . . xn,1

x1,2 x2,2 . . . xn,2

x1,k x2,k . . . xn,k

The inverse of an n n matrix a exists if there that 1 0 a1 a = aa1 = In = . . . 0

is an n n matrix a1 with the property 0 0 . .. . . 1 . , .. .. . . 0 0 1

3.3 Fitting the model

then the inverse matrix a1 is given by 1,1 a a2,1 a1 = . . .

a1,2 a2,2 . . . an,2

a1,k a2,k . . . an,k

xT y 1 1 1 1 1 1 36 37 47 32 1 66 68 64 53 101 745 895 442 440 1598

and the design matrix x are 78 0 79 0 81 0 , . . . . . . 91 2 February 12, 2008

3.3 Fitting the model

average family income,

sxx y sxy x sxy

3.3 Fitting the model

Coecient of multiple determination

3.3 Fitting the model

Estimating the variance

February 12, 2008

You might also like