You are on page 1of 8

APPLIED STATISTICS (SQQS2013) TUTORIAL 5: CORRELATION AND LINEAR REGRESSION 1.

A study is done to investigate if Statistics scores have some effect on students CPA scores. Data below are Statistics final examination scores of 10 randomly selected students and their corresponding CPA scores. Statistics Scores CPA 87 69 75 3.41 3.15 3.28 56 2.46 63 90 2.89 3.73 71 3.11 74 80 3.23 3.50 78 3.34

a) Identify the dependent and independent variables. b) Calculate the Pearson coefficient. Interpret the coefficient obtained. c) Can we conclude that there is a relationship between the Statistics and CPA scores at 2% significance level? d) Fit a least squares regression line. e) Based on your answer in (c), interpret the coefficient obtained. f) Is there enough evidence to conclude that the Statistics scores have positive significant effect on CPA scores at 2.5% significance level? g) Predict a CPA score if a student gets 65 in Statistics. h) Interpret the coefficient of determination. Solution: a) Dependent variable: CPA b) The correlation coefficient suggests a strong positive relationship between the Statistics and CPA scores. c) Reject There is a relationship between the Statistics and CPA scores at 2% significance level. d) 1

Independent variable: Statistics scores

e) The CPA score for a student who had zero mark in Statistic is 0.8110. For every one mark increase in Statistics, the CPA score will increase 0.0323 f)
( ) ( ( )( ) )

Reject The Statistics scores have positive significant effect on CPA scores at 2.5% significance level. g) h) 91.01% of the variation in CPA scores can be explained by the variation in the Statistics scores. Only 8.99% is unexplained, due to error. 2. An architect wants to determine the relationship between the heights (in feet) of a building (y) and the number of stories in the building (x). The following results are based on ten samples that have been measured.
900.0

800.0

700.0

y
600.0 500.0 400.0

30.00

40.00

50.00

60.00

70.00

Hint:

x 444, y 5968, xy 275237


S xx 870.4, S yy 123921.6, b0 73.5391

a) Does the scatter plot suggest an approximate linear relationship? Explain. b) Determine the strength of the relationship between the heights of a building and the number of stories in the building. Interpret the value. c) Fit a least squares line. d) Can we conclude that the number of stories in a building has positive significance effect on its heights at 5% significance level? Solution: a) Yes. The data values fluctuate on the estimated straight line. b) S xy 275237
(444)(5968 ) 10257.8 10 10257.8 r 0.9877 (870.4)(12 3921.6) The correlation coefficient suggests a strong positive linear relationship between heights of a building and the number of stories in the building.

c) b1

10257.8 11.7852 870.4 73.5391 11.7852x y

d) H0: 1 0 H1: 1 > 0 123921.6 11.7852(10 257.8) 2 Se 378.9219 8 11.7852 0 t test 17.8616 378.9219 870.4 t 0.05, 8 1.8595 Reject H0. There is enough evidence to conclude that number of stories in a building has positive significance effect on its heights at 5% significance level.

3. Suppose that the sales manager of a large automotive parts distributor wants to estimate the total annual sales of a region. Several factors appear to be related to sales, including the number of retail outlets (X1), number of automobiles registered (X2), personal incomes (X3), average age of automobiles (X4) and number of supervisors (X4). The following output is the results of the analysis obtained by the sales manager. Based on the output, answer the following questions.
ANOVA(b) Model 1 Sum of Squares 1594.237 8.617 df 5 4 9 Mean Square 318.847 2.154 F 148.003 Sig. .000(a)

Regression Residual Total

1602.855 a Predictors: (Constant), x5, x3, x2, x4, x1 b Dependent Variable: sales

Coefficients(a) Unstandardized Coefficients B -20.157 .000 1.696 .425 2.316 -.145 Std. Error 5.041 .003 .514 .043 .932 .203 Standardized Coefficients Beta -3.998 -.020 .311 .922 .144 -.042 -.148 3.299 9.775 2.483 -.714 .016 .889 .030 .001 .068 .515

Model

Sig.

(Constant) x1 x2 x3 x4 x5

a Dependent Variable: sales Correlations sales sales x1 x2 x3 x4 x5 Pearson Correlation Sig. (2-tailed) Pearson Correlation Sig. (2-tailed) Pearson Correlation Sig. (2-tailed) Pearson Correlation Sig. (2-tailed) Pearson Correlation Sig. (2-tailed) Pearson Correlation Sig. (2-tailed) .899(**) .000 .604 .064 .962(**) .000 -.369 .294 .243 .500 .775(**) .008 .820(**) .004 -.504 .137 .144 .691 .400 .252 -.314 .377 .364 .301 -.439 .204 .115 .751 .471 .169 1 x1 .899(**) .000 1 x2 .604 .064 .775(**) .008 1 x3 .962(**) .000 .820(**) .004 .400 .252 1 x4 -.369 .294 -.504 .137 -.314 .377 -.439 .204 1 x5 .243 .500 .144 .691 .364 .301 .115 .751 .471 .169 1

** Correlation is significant at the 0.01 level (2-tailed).

a) Write down the estimated equation of the regression line. b) Is there sufficient evidence to indicate that there is a positive relationship between sales and X2 at 2.5% level of significance? c) At the 5% significance level, test the overall validity of the model. d) Which explanatory variable has no significant effect on Y at 5% significance level? e) Which variable(s) has negative relationship with X1? f) Which two variables have the strongest relationship? g) Describe the strength and direction between X5 and the dependent variable. h) State the value for determination coefficient and interpret it. Solution: a) b) Failed to reject The relationship is not significant at 5% significance level. c) Reject The model is valid at 5% significance level. d) X1, X4 and X5 e) X4 f) X3 and sales g) There is a weak positive relationship between X5 and the dependent variable. h) 99.46% of the variation in total annual sales can be explained by the variation in number of retail outlets (X1), number of automobiles registered (X2), personal incomes (X3), average age of automobiles (X4) and number of supervisors (X4). Only 0.54% is unexplained, due to error.

4. The electric power consumed (y) each month by a chemical plant is thought to related to the average ambient temperature ( ), the number of days in the month ( ), the average product purity ( ) and the tons of product produced ( ). The past years historical data are available and are recorded. The output displayed the result of analysis.
Correlations y Y Pearson Correlation Sig. (2-tailed) N x1 Pearson Correlation Sig. (2-tailed) N x2 Pearson Correlation Sig. (2-tailed) N x3 Pearson Correlation Sig. (2-tailed) N x4 Pearson Correlation Sig. (2-tailed) N 1 . 15 .744(**) .001 15 .802(**) .000 15 .890(**) .000 15 .823(**) .000 x1 .744(**) .001 15 1 . 15 .849(**) .000 15 .914(**) .000 15 .934(**) .000 x2 .802(**) .000 15 .849(**) .000 15 1 . 15 .769(**) .001 15 .976(**) .000 15 x3 .890(**) .000 15 .914(**) .000 15 .769(**) .001 15 1 . 15 .868(**) .000 15 x4 .823(**) .000 15 .934(**) .000 15 .976(**) .000 15 .868(**) .000 15 1 . 15

15 15 ** Correlation is significant at the 0.01 level (2-tailed). Model Summary Adjusted R Square .925

.973(a) .946 a Predictors: (Constant), x4, x3, x1, x2 ANOVA(b) Sum of Squares Regression Residual Total 1838.698 104.410 1943.108

Model 1

R Square

Std. Error of the Estimate 3.23125

Model 1

df 4 10 14

Mean Square 459.675 10.441

F 44.026

Sig. .000(a)

a Predictors: (Constant), x4, x3, x1, x2 b Dependent Variable: y Coefficients(a) Unstandardized Coefficients Model 1 B (Constant ) x1 x2 x3 x4 a Dependent Variable: y 3.716 -1.400 1.335 5.896 -.755 Std. Error 2.274 .727 .564 .856 .545 -.643 1.497 1.453 -1.299 Standardized Coefficients Beta t 1.634 -1.924 2.367 6.891 -1.385 Sig. .133 .083 .040 .000 .196

a) b) c) d) e)

State the sample size for the above study. Which variables are the independent variables? Which variable is the dependent variable? What the intercept value? Which independent variable has the strongest relationship with the power consumption? State the value. f) Interpret the relationship between the average ambient temperature and power consumption. State whether the relationship is significant at = 0.05. g) Write down the regression model obtained. h) Interpret the values of intercept and temperature in the equation. i) List the variables of X that is able to make the variable of Y to decrease when it increases. j) List the independent variables that have significant effect on power consumption at = 0.05. k) Based on output, test at 2% level of significance. Is variable tons of product produced should be included in the model? l) Predict power consumption for a month in which = 25oF, = 24 days, = 15% and = 98 tons. m) How well the model fit the data? Solution: a) n = 15 b) The average ambient temperature ( ) The number of days in the month ( ) The average product purity ( ) The tons of product produced ( ). c) The electric power consumed (y) d) 3.716 e) The average product purity ( ), f) = 0.890

= 0.744, there is a strong positive relationship between the average temperature and power consumption.

Reject The relationship is significant at 5% significance level. g)

h) The power consumption without the chemical plant is 3.716. Assuming the other variables are constant, for every 1oF increase in temperature, the power consumption will decrease 1.4. i) j) k) Failed to reject The tons of product produced have no significant at 2% level of significance. So that, tons of product produced shouldnt be include in the model. l) m) 94.6% of the variation in electric power consumed can be explained by the variation in average ambient temperature, number of days in the month, average product purity and tons of product produced. Only 5.4% is unexplained, due to error. ( ) ( ) ( ) ( ) and and

You might also like