You are on page 1of 13

REGRESSION AND CORRELATION

11.1SIMPLE REGRESSION ANALYSIS


Description
It is a statistical technique used for determining
the functional form of the relationship between
variables. The objective when using this method of
analysis is usually to predict or estimate the value
of one variable (dependent or response variable)
corresponding to given values of other variables
(independent variables or regressors).
A mathematical equation that allows us to predict values
of one dependent variable from known values of one or
more independent variables is called a regression equation.
Simple Linear Regression
analysis of the relationship between a response
variable and a single regressor
Multiple Regression
analysis of the relationship between a response
variable and multiple regressors
History: Sir Francis Galton showed that the heights of sons
of tall fathers over successive generations regressed toward
the mean height of the population. In other words, sons of
unusually tall fathers tend to be shorter than their fathers
and sons of unusually short fathers tend to be taller than
their fathers.

Simple Linear Regression Model


Y 0 1 X

where Y = response variable


X = regressor
0 = y-intercept
1 = slope
= random error such that ~ N 0, 2
Fitted Regression Line
^

Y b0 b1 X
^

where Y = predicted or fitted value


Estimating the Regression
Method of Least Squares

Coefficients:

The

The residual sum of squares is often called the sum


of squares of the errors about the regression line and is
denoted by SSE, i.e.,
^ 2
n
SSE ei2 Yi Y i Yi 0 1 X i 2
i 1
i 1
i 1

In the method of least squares, a minimization


procedure for estimating the parameters, the SSE is

minimized by differentiating it with respect to 0 and 1,


i.e.,

SSE 2 Yi o 1 X i
i 1
n

and

SSE 2 Yi o 1 X i X i
1
i 1

Setting the partial derivates to zero and rearranging


the terms, the normal equations are obtained given by
n

nb0 b1 X i Yi

i 1
i 1
n
n
n
b0 X i b1 X i2 X iYi
i 1
i 1
i 1

Solving simultaneously the normal equations yields


the computing formulas for b0 and b1. Hence, the
estimates of 0 and 1 are, respectively,
b1

s xy
s x2

and

b0 y b1 x .

NOTE:
b0 represents the value of Y when X is zero
b1 represents the change in variable Y for every one
unit change in variable X

PROPERTIES OF LEAST SQUARES ESTIMATES


Under the condition of the SLRM, the least squares
estimators b0 and b1 are the Best Linear Unbiased
Estimator (BLUE) of 0 and 1.
B. MAXIMUM LIKELIHOOD ESTIMATES
Recall, Yi 0 1 X i i
Assumptions:
i follows the normal distribution
Thus,

b0 y b1 x
s xy
b1
s x2

2
^2

Yi

b
0

b
1
Xi

MLE

are the MLEs of 0 , 1 and 2, respectively.

^ 2

Note: 1. is biased.
Yi b0 b1Xi
2.
MLE

^ 2

n 1

PROPERTIES OF THE FITTED REGRESSION LINE


1. ei 0
2. ei

is minimum.

^
3. Yi Yi
4. The regression line passes through

x, y .

Example 1:
Suppose that we want to predict a students grade in
freshman chemistry based upon his score on an intelligence
test administered prior to his attending college. Refer to the
following table.
Student

Test Score, x

1
2
3
4
5
6
7
8
9
10
11
12

65
50
55
65
55
70
65
70
55
70
50
55

Chemistry Grade,
y
85
74
76
90
85
87
94
98
81
91
76
74

The scatter plot of the values are given in the following


graph:
120
100
80
60

Series1

40
20
0
0

20

40

60

80

Observe that the points follow a straight line. Once a


reasonable linear relationship has been ascertained, we
usually try to express this mathematically by a straight line
equation called the linear regression line. The slopeintercept form of a line can be written in the form
y = a + bx
Definition 11.1.4. The least squares estimates of the
parameters in the regression line y = a + bx are obtained
from the formulas
b

n x i y i x i y i
n x i2

x i

and

a y bx

Exercises:.
1. Find the regression line for the data in Remark 7.1.3.
2. A study was made on the amount of converted sugar in a
certain process at various temperatures. The data were
coded and recorded as follows:
Temperature, x
Converted Sugar, y
1.0
8.1
1.1
7.8
1.2
8.5
1.3
9.8
1.4
9.5
1.5
8.9
1.6
8.6
1.7
10.2
1.8
9.3
1.9
9.2
2.0
10.5
a. Estimate the linear regression line.
b. Estimate the amount of converted sugar produced
when the coded temperature is 1.75.
3. A study was made by a retail merchant to determine the
relation between weekly advertising expenditures and
sales. The following data were recorded:
Advertising Costs, in $
Sales, in $
40
385
20
400
25
395
20
365

30
50
40
20
50
40
25
50

475
440
490
420
560
525
480
510

a. Plot a scatter diagram.


b. Find the equation of the regression line to predict
weekly sales from advertising expenditures.
c. Estimate the weekly sales when advertising costs are
$35.
CORRELATION ANALYSIS
Description
It is a statistical technique used to determine the
strength or degree of linear relationship existing between
two variables.
Correlation analysis attempts to measure the strength
of relationships between two variables by means of a single
number called a correlation coefficient.

Correlation Coefficient,
measures the degree of linear relationship between
two variables X and Y
its value ranges from -1 (perfect linear relationship
with negative slope) to +1 (perfect linear
relationship with positive slope)
A linear correlation coefficient is a measure of the
linear relationship between the two random variables X and
Y and usually denoted by r.
If points follow closely a straight line of positive
slope, then we have a high positive correlation between X
and Y. If points follow closely a straight line of negative
slope, then we have a high negative correlation between X
and Y.

Scatter Diagrams/Plots illustrating different values


6
5
4
Y 3
2
1
0
0

of :

3
X

>0
<0

estimated by the Pearson product-moment


correlation coefficient or simply the sample
correlation coefficient, r, given by
r

s xy

s x2 s 2y

SPxy
SS x SS y

where
6
5
4
Y 3
2
1
0
0

3
X

n
n
X

i Yi SP
1 n
i 1 i 1 xy ,
s xy
X iYi
n 1
n 1 i 1
n

Xi
n

1
2 i 1 SS x
s x2
X

i
n 1 and
n 1 i 1
n

Yi SS
n

1
y
2 i 1

s 2y
Y

n 1
n 1 i 1
n

qualitative interpretation of correlation coefficient


Absolute value of
Strength of linear
correlation
relationship
coefficient
between X and Y
0 - 0.2
Very weak
0.2 - 0.4
Weak
0.4 - 0.6
Moderate
0.6 - 0.8
Strong
0.8 - 1.0
Very strong

Pearson product-moment correlation coefficient


r

bs x
sy

where b is the value obtained from the estimation of the


linear regression line, sx is the standard deviation of the
values of X and sy is the standard deviation of the values of
Y.

Remarks: The value of r ranges from -1 to +1. A value of r


= -1 or r = +1 means that SSE = 0 or the points lie exactly
on a straight line. This is called a perfect linear relationship.
If r is close to -1 or +1, the linear relationship between the
two variables is strong and we say that we have a high
correlation. However, if r is close to 0, the linear
relationship between X and Y is weak or perhaps
nonexistent.

DESCRIPTIVE MEASURES OF ASSOCIATION


BETWEEN X AND Y IN REGRESSION MODEL
A. Coefficient of Determination, R2
The coefficient of determination, a measure of
goodness-of-fit, gives the proportion of the
variability in the response variable that is explained
by the model and is computed as

SSR b1s xy
R
2 x100% .
SST
sy
2

NOTE: [0%, 100%]

Test of Hypothesis
H0: = 0 (There is no correlation between X and Y.)
vs
H1: (a) 0 (There is correlation between X and Y.)
(b) > 0 (There is positive correlation between X and
Y.)
(c) < 0 (There is negative correlation between X
and Y.)
Test Statistic :

tc

r n2

Test Statistic:
z

1 r

with

vn2

n 3 1 r 1 r0
ln

2
1 r 1 r0

You might also like