You are on page 1of 39

1

CHAPTER 8

Draw a scatter plot/diagram to see relationship between two variables. Understand and interpret the terms dependent variable and independent variable. Find linear regression model and make predictions. Study on the strength of the relationship called correlation analysis.
2

CHAPTER 8

In a simple linear relationship, only TWO variables are involved:


X = independent variable Y = dependent variable

CHAPTER 8

Examples: 1. A sociologist wants to find out if increase in crime rate is due to increase in cost of living. X = cost of living Y = crime rate 2. A fitness instructor wants to find out the relationship between weight loss and the amount of workout time. X = amount of workout time Y = weight
5

CHAPTER 8

A plot between the pairs (x, y) values. To examine relationship between two variables, X and Y.

Gives general idea whether X is related to Y.


Plots that give a certain pattern means there is a relationship between X and Y. Plots that have no particular pattern means there is no relationship between X and Y.
7

CHAPTER 8

Increasing pattern. As X increases, Y also increases.


Positive linear relationship between X and Y.
8

CHAPTER 8

Decreasing pattern. As X increases, Y decreases.


Negative linear relationship between X and Y.
9

CHAPTER 8

No particular pattern.
No relationship between X and Y.
10

CHAPTER 8

Question: You are a marketing analyst for Hasbro Toys. You gather the following data: Ad (RM) 1 2 3 4 5 Sales (Units) 1 1 2 2 4
11

Sketch a scatter plot of the data above.

CHAPTER 8

Answer:
Sales, Y

4 3 2 1 0
0 1 2 3 4 5
Advertising, X

1. Is X and Y related? 2. Positive or Negative Relationship?

12

13

CHAPTER 8

A mathematical equation that describes the linear relationship between X and Y. Can be used to predict the values of Y from known values of X. Represents a straight line, so it is of the form y=mx + c, where m is the slope and c is the y-intercept.

14

CHAPTER 8

In statistical regression, we write the linear model as

Y = + X +
where = y-intercept = slope = random error component

15

CHAPTER 8

This regression line is usually estimated by using the paired sample data. The estimated regression line is given by

Y ' a bX
where

a = estimated b = estimated
16

CHAPTER 8

The method used to find the values of a and b is slightly different from the familiar method you learned in algebra.

Uses the concept of Least-Square Method.

17

CHAPTER 8

Formula to estimate a and b:

n( XY ) ( X )( Y ) b 2 2 n( X ) ( X ) Y X a b n n

Now we can fit the regression line to the data using the values of a and b. The estimated regression line is

Y ' a bX
18

CHAPTER 8

Question: You are an economist for the county cooperative. You gather the following data. Fertilizer (lb.) 4 6 10 12 Yield (lb.) 3.0 5.5 6.5 9.0

Find the estimated regression line relating crop yield and fertilizer.
19

CHAPTER 8

Answer: Construct this table first. X Y X XY

4
6 10

3.0
5.5 6.5

16
36 100

12
33 65

12
Total: Mean: 32 8

9.0
24.0 6

144
296

108
218
20

CHAPTER 8

Answer: Using values from the table, estimate a and b.

4(218 ) (32 )( 24 ) b 0.65 2 4(296 ) (32 )

a 6 0.65(8) 0.8
Therefore, the estimated regression line is

Y ' 0.8 0.65X


21

CHAPTER 8

Answer:
Yield (Y)

10 8 6 4 2 0
0
Fertilizer (X)

y .8 .65x

10

15
22

CHAPTER 8

Answer: What do a and b in the regression line means? 1. Y-intercept, a = 0.8 Average Crop Yield (Y) is expected to be 0.8 lb. when no Fertilizer (X) is used. X = 0, Y = 0.8 2. Slope, b = 0.65 Crop Yield (Y) is expected to increase by 0.65 lb. for each 1 lb. increase in Fertilizer (X).
23

CHAPTER 8

Question: A student wants to know the relationship between number of pages and the price of the book. To analyze this, he selects a sample of 8 textbooks currently on sale in a bookstore. Develop a regression line to fit the data given.

24

CHAPTER 8

Question:
Book History Algebra Geometry Physics Sociology Biology Statistics Nursing No. of Pages (X) 500 700 800 600 400 500 600 800 Price (Y) 84 75 99 72 69 81 63 93
25

CHAPTER 8

Answer: Construct this table first.


X 500 700 Y 84 75 X 250,000 490,000 XY 42000 52500

800
600 400 500

99
72 69 81

640,000
360,000 160,000 250,000

79200
43200 27600 40500

600
800 Total: Mean: 4900 612.5

63
93 636 79.5

360,000
640,000 3150,000

37800
74400 397,200
26

CHAPTER 8

Answer: Using values from the table, estimate a and b.

8(397200 ) (4900 )( 636 ) b 0.0514 2 8(3150000 ) (4900 )

a 79.5 0.0514(612.5) 48
Therefore, the estimated regression line is

Y ' 48 0.0514X
27

CHAPTER 8

Now, that we have estimated the regression line, we can predict Y given any values of X. This can be found by substituting X into the estimated regression line, Y ' a bX However, the value of X to insert in the equation must be within the range of X in the data set.

28

CHAPTER 8

For Example 3, predict the price of the book that has 550 pages.

Y ' 48 0.0514(550) 76.27

Thus, if the book is 550 page thick, the price is estimated to be RM76.27

REMEMBER! To predict Y , X must have values within the data set range.
29

30

CHAPTER 8

Correlation measures the strength of a linear relationship between two variables. (strong? weak?)

Correlation coefficient tells us about the strength and direction of a relationship.

31

CHAPTER 8

A numerical measure for correlation of the quantitative data is the Pearson correlation coefficient, r. The formula is given by

[n(X ) (X ) ][nY Y ]
2 2 2 2

n(XY ) (X )(Y )

32

CHAPTER 8

0r1 Values of r close to 1 strong positive linear relationship between X and Y.

Values of r close to -1 strong negative linear


relationship between X and Y. Values of r close to 0 little or no linear relationship between X and Y.
33

CHAPTER 8

Question: A food analyst wants to know how much a person would spend on food, given certain amount of income. He selects a random sample of 7 people with their income and food expenditure as shown below.
Income (RM 00) 35 49 21 39 15 28 25

Food Expend. (RM 00)

15

11

34

CHAPTER 8

Question: (i) Find the estimated regression line for the data.

(ii) How much would a person spend on food if his income is RM 3000?
(iii) Compute Pearson correlation coefficient, r. Interpret the r value.

35

CHAPTER 8

Answer: Construct this table first.


Income, X 35 Food Exp, Y 9 X 1225 Y 81 XY 315

49
21 39 15

15
7 11 5

2401
441 1521 225

225
49 121 25

735
147 429 75

28
25 Total: Mean: 212 30.2857

8
9 64 9.1429

784
625 7222

64
81 646

224
225 2150
36

CHAPTER 8

Answer:

7(2150 ) (212 )( 64 ) b 0.2642 2 7(7222 ) (212 )

a 9.1429 0.2642(30.2857) 1.1414


(i) Therefore, the estimated regression line is

Y ' 1.1414 0.2642X


The slope, b = 0.2642 means the relationship is positive. That is, people with higher income will spend more on food.
37

CHAPTER 8

Answer: (ii) If income is RM3000, that is X=30, then food expenditure is

Y ' 1.1414 0.2642(30) 9.0674


So we expect him to spend RM906.74 on food if his income is RM3000.

38

CHAPTER 8

Answer: (iii) Pearson correlation coefficient, r

7(2150) (212)(64) [7(7222) (212) 2 ][7646 64 ]


2

0.9587

The value r = 0.9587 shows a very strong positive relationship between income and food expenditure. When income is high, the food expenditure also increases.
39

You might also like