You are on page 1of 37

17-1

17-2

CORRELATION ANALYSIS AND REGRESSION ANALYSIS

17-3

Correlation
Correlation
A measure of association between two numerical variables.

Example (positive correlation)


Typically, in the summer as the temperature increases people are thirstier.

17-4

Scatter Diagram
Scatter diagrams provide the relationship between two variables in a graphical form The diagram summarizes the nature of relationship between two variables Whether the relationship is positive or negative The diagram also explains the magnitude of the relationship

17-5

Scatter Diagrams with varied r values


Y r2 = 1, r = +1 Y r2 = 1, r = -1

X Y r2 = .81, r = +0.9 Y

r2 = 0, r = 0

17-6

Specific Example
For seven random summer days, a person recorded the temperature and their water consumption, during a threehour period spent outside.
Temperature (F)
Water Consumption (ounces)

75 83 85 85 92 97 99

16 20 25 27 32 48 48

17-7

How would you describe the graph?

17-8

How strong is the linear relationship?

17-9

Correlation Analysis is statistical technique used to measure the magnitude of linear relationship between two variables Correlation can be used along with regression analysis to determine the nature of the relationship between variables The prominent correlation coefficients are 1.The Pearson product moment correlation coefficient

Correlation Analysis

17-10

Measuring the Relationship


Pearsons Sample Correlation Coefficient, r
measures the direction and the strength of the linear association between two numerical paired variables.

17-11

Direction of Association
Positive Correlation Negative Correlation

Strength of Linear Association


r value
1 0 -1

17-12

Interpretation
perfect positive linear relationship no linear relationship perfect negative linear relationship

Strength of Linear Association

17-13

Other Strengths of Association


r value Interpretation
0.9 0.5 0.25 strong association moderate association weak association

17-14

Other Strengths of Association

17-15

17-16

Product Moment Correlation


The product moment correlation, r, summarizes the strength of association between two metric (interval or ratio scaled) variables, say X and Y. As it was originally proposed by Karl Pearson, it is also known as the Pearson correlation coefficient. It is also referred to as simple correlation, bivariate correlation, or merely the correlation coefficient.

17-17

Product Moment Correlation


From a sample of n observations, X and Y, the product moment correlation, r, can be calculated as: r varies between -1.0 and +1.0.

r=

( X
i =1 i

X) ( Yi Y )
2 n i

( X
i =1

X)

i 1 =

Y Y ( )

17-18

Ad Spending and Corresponding Sales of Royal Products

C o m p an y
1 2 3 4 5 6 7 8 9 1 0 1 1 1 2

A d vertisin g E xp ) (X
6 9 8 3 1 0 4 5 2 1 1 9 1 0 2

S alesY) (
1 0 1 2 1 2 4 1 2 6 8 2 1 8 9 1 7 2

Ad Ex(in Crores) Sales(in Thousands)

17-19

Product Moment Correlation


The correlation coefficient may be calculated as follows:

X Y
n

= (10 + 12 + 12 + 4 + 12 + 6 + 8 + 2 + 18 + 9 + 17 + 2)/12 = 9.333 = (6 + 9 + 8 + 3 + 10 + 4 + 5 + 2 + 11 + 9 + 10 + 2)/12 = 6.583

X ( X) -Y ( (10 -9.33)(6-6.58) + (12-9.33)(9-6.58) i ) iY= (12-9.33)(8-6.58) + (4-9.33)(3-6.58) = 1 + i

+ (12-9.33)(10-6.58) + (6-9.33)(4-6.58) + (8-9.33)(5-6.58) + (2-9.33) (2-6.58) + (18-9.33)(11-6.58) + (9-9.33)(9-6.58) + (17-9.33)(10-6.58) + (2-9.33)(2-6.58) = -0.3886 + 6.4614 + 3.7914 + 19.0814 + 9.1314 + 8.5914 + 2.1014 + 33.5714 + 38.3214 - 0.7986 + 26.2314 + 33.5714 = 179.6668

17-20

Product Moment Correlation


X2 ( -) = (10-9.33)2 + (12-9.33)2 + (12-9.33)2 + (4-9.33)2 i X = 1 i + (12-9.33)2 + (6-9.33)2 + (8-9.33)2 + (2-9.33)2 + (18-9.33)2 + (9-9.33)2 + (17-9.33)2 + (2-9.33)2 = 0.4489 + 7.1289 + 7.1289 + 28.4089 + 7.1289+ 11.0889 + 1.7689 + 53.7289 + 75.1689 + 0.1089 + 58.8289 + 53.7289 = 304.6668
n

= 1 i

Y 2 ( -) = (6-6.58)2 + (9-6.58)2 + (8-6.58)2 + (3-6.58)2 iY

+ (10-6.58)2+ (4-6.58)2 + (5-6.58)2 + (2-6.58)2 + (11-6.58)2 + (9-6.58)2 + (10-6.58)2 + (2-6.58)2 = 0.3364 + 5.8564 + 2.0164 + 12.8164 + 11.6964 + 6.6564 + 2.4964 + 20.9764 + 19.5364 + 5.8564 + 11.6964 + 20.9764 = 120.9168

Thus,

r=

179.6668 (304.6668) (120.9168)

= 0.9361

17-21

Product Moment Correlation


The correlation coefficient may be calculated as follows:

X Y
n

= (10 + 12 + 12 + 4 + 12 + 6 + 8 + 2 + 18 + 9 + 17 + 2)/12 = 9.333 = (6 + 9 + 8 + 3 + 10 + 4 + 5 + 2 + 11 + 9 + 10 + 2)/12 = 6.583

X ( X) -Y ( (10 -9.33)(6-6.58) + (12-9.33)(9-6.58) i ) iY= (12-9.33)(8-6.58) + (4-9.33)(3-6.58) = 1 + i

+ (12-9.33)(10-6.58) + (6-9.33)(4-6.58) + (8-9.33)(5-6.58) + (2-9.33) (2-6.58) + (18-9.33)(11-6.58) + (9-9.33)(9-6.58) + (17-9.33)(10-6.58) + (2-9.33)(2-6.58) = -0.3886 + 6.4614 + 3.7914 + 19.0814 + 9.1314 + 8.5914 + 2.1014 + 33.5714 + 38.3214 - 0.7986 + 26.2314 + 33.5714 = 179.6668

17-22

Rank correlation
Researchers often face situations where they have to take decisions based on data measured on ordinal scale scales in such cases Spearmans rank correlation is appropriate to relationship between variables. It can be calculated using following formula rs = 1 (( 6D2 )/( N(N2 -1))

The ranking of television Models New system Television Models Existing System
A B C D E F G H I J 3 5 10 2 7 6 4 1 8 9 1 5 9 3 2 4 6 7 10 8

17-23

Calculation of Rank correlation coefficient - R ) D Television Existing New D =(R


Models System(X) system(Y)
1 2 2

17-24

A B C D E F G H I J

3 5 10 2 7 6 4 1 8 9

12 50 91 3-1 25 42 6-2 7-6 10-2 81

4 0 1 1 25 4 4 36 4 1

17-25

rs = 1 (( 6D2 )/( N(N2 -1)) = = = = 1-((6X80) /(10(100-1))) 1-(480/990) 1-0.48 0.52

This indicates that there is a positive correlation between two variables. This means the both the systems are giving similar results

17-26

Regression
Regression
Specific statistical methods for finding the line of best fit for one response (dependent) numerical variable based on one or more explanatory (independent) variables.

Regression: 3 Main Purposes


To describe (or model) To predict (or estimate) To control (or administer)

17-27

17-28

Regression analysis examines associative relationships between a metric dependent variable and one or more independent variables in the following ways: Determine whether the independent variables explain a significant variation in the dependent variable Determine how much of the variation in the dependent variable can be explained by the independent variables: strength of the relationship. Predict the values of the dependent

Regression Analysis

17-29

Example
Plan an outdoor party. Estimate number of soft drinks to buy per person, based on how hot the weather is. Use Temperature/Water data and regression.

17-30

Real Life Applications


Estimating Seasonal Sales for Department Stores (Periodic)

17-31

Real Life Applications


Predicting Student Grades Based on Time Spent Studying

17-32

Practice Problems
Can the number of points scored in a basketball game be predicted by
The time a player plays in the game? By the players height?

17-33

Types of Regression Models


Positive Linear Relationship Relationship NOT Linear

Negative Linear Relationship

No Relationship

17-34

Least square method


The equation for regression line assumed by Least Squares method is Y=a+bx+ei Where ei =Yi-i Where Y is the dependent variable X is the independent variable a is the Y-intercept b is the slope of the line b=( (n(XY)-(XY))/ ((n(X2)-(X)2) a=Y-bX

Man Hours(X) 3.6 4.8 2.4 7.2 6.9 8.4 10.7 11.2 6.1 7.9 9.5 5.4 X=84.1

Calculations for determining constants a and b


Productivity in XY units(Y) 9.3 10.2 9.7 11.5 12 14.2 18.6 28.4 13.2 10.8 22.7 12.3 Y=172.9 X2 33.48 48.96 23.28 82.8 82.8 119.28 199.02 318.08 80.52 85.32 215.65 66.42 XY=1355.61 12.96 23.04 5.76 51.84 47.61 70.56 114.49 125.44 37.21 62.41 90.25 29.16 X2

17-35

17-36

b=1.768 a=2.01 Y=2.01+1.768X

17-37

The Strength of Association R2


R2 = ( Explained Variance) / ( Total Variance) Total Variance = (Explained Variance)+ (Unexplained Variance) Explained Variance=(Total Variance ) (Unexplained Variance)

You might also like