You are on page 1of 6

Chapter-IV Regression Analysis

Contents: 4.1 4.2 Introduction Regression Equations

How to Find the Regression Equation 4.3 4.4 Properties of the Regression coefficients Difference between Correlation and Regression

Chapter-IV Regression Analysis 4.1 Introduction


Regression analysis is a technique used for the modeling and analysis of numerical data consisting of values of a dependent variable (response variable) and of one or more independent variable (explanatory variables). The dependent variable in the regression equation is modeled as a function of the independent variables, corresponding parameters ("constants"), and an error term. The error term is treated as a random variable. It represents unexplained variation in the dependent variable. The parameters are estimated so as to give a "best fit" of the data. Most commonly the best fit is evaluated by using the least squares method, but other criteria have also been used. There are two types of variables in Regression Analysis. 1 2 Dependent variable Independent variable

Dependent variable is also known as regressed or predicted or explained variable .Independent variable is also known as regressor or predictor or explainer Simple regression is used to examine the relationship between one dependent and one independent variable. After performing an analysis, the regression statistics can be used to predict the dependent variable when the independent variable is known. Regression goes beyond correlation by adding prediction capabilities. The regression line (known as the least squares line) is a plot of the expected value of the dependent variable for all values of the independent variable. Technically, it is the line that "minimizes the squared residuals". The regression line is the one that best fits the data on a scatterplot. In the regression equation, if y is the dependent variable and x is the independent variable. Here are three equivalent ways to mathematically describe a linear regression model. 1 2 3 y = intercept + (slope x) + error y = constant + (coefficient x) + error y=a+bx+e

The slope quantifies the steepness of the line. It equals the change in Y for each unit change in X. It is expressed in the units of the Y-axis divided by the units of the X-axis. If the slope is positive, Y increases as X increases. If the slope is negative, Y decreases as X increases.

The Y intercept is the Y value of the line when X equals zero. It defines the elevation of the line. For two variables X and Y, we will have two regression lines and they show mutual relationship between two variables. The regression line of Y on X gives the most probable estimate of the values of Y for given values of X whereas regression line of X on Y gives the most probable estimate of the values of X for given values of Y. Only one regression line: In case of perfect correlation (r = +1), both the line of regression coincide and we get only one line.

4.2 Regression Equations


Regression Equations are algebraic expressions of the regression lines. Regression Equation of Y on X Y=a +b X According to the principle of least squares, the NORMAL EQUATIONS for estimating a and b are Y = Na + b X XY =a X +b X2 Regression Equation of X on Y X=a +b Y According to the principle of least squares, the NORMAL EQUATIONS for estimating a and b are X = Na + b Y XY =a Y +b Y2 Regression Equation from Deviations taken from Arithmetic means of X and Y Y-YMean =b yx (X-XMean) byx is the regression coefficient of Y on X (xi x mean)(yi - ymean) byx = __________________________ (xi x mean)2

How to Find the Regression Equation


Method 1 - Regression equation can be found by solving the two normal equations for estimating a and b.

Method 2 By making use of deviations from arithmetic mean formula ( for estimating byx). E.g. by using deviations from arithmetic mean formula
Five randomly selected students took a math aptitude test before they began their statistics course. The Statistics Department has three questions. i. What linear regression equation best predicts statistics performance, based on math aptitude scores? ii. iii. If a student made an 80 on the aptitude test, what grade would we expect her to make in statistics? How well does the regression equation fit the data?

In the table below, the xi column shows scores on the aptitude test. Similarly, the y i column shows statistics grades. The last two rows show sums and mean scores that we will use to conduct the regression analysis. Student 1 2 3 4 5 Mean xi 95 85 80 70 60 90 yi 85 95 70 65 70 77 (xi - xmean) 17 7 2 -8 -18 (yi - ymean) 8 18 -7 -12 -7 (xi - xmean)2 289 49 4 64 324 730 (yi - ymean)2 64 324 49 144 49 630 (xi - xmean)(yi - ymean) 136 126 -14 96 126 470

The regression equation is a linear equation of the form: y-ymean =b yx (x-xmean) byx is the regression coefficient of y on x (x - xmean)(y - ymean) byx = __________________________ (x - xmean)2 y - 77 = .643836 ( x - 78) y = .643836 x + 26.78082 Once you have the regression equation, using it is a snap. Choose a value for the independent variable (x), perform the computation, and you have an estimated value (y) for the dependent variable. =
730

470 ____________ = .64386

In our example, the independent variable is the student's score on the aptitude test. The dependent variable is the student's statistics grade. If a student made an 80 on the aptitude test, the estimated statistics grade would be: y = 0.643836 x + 26.78082 =0.643836 x 80 + 26.78082= 26.768 + 51.52 = 78.288

4.3 Properties of the Regression coefficients


1. Correlation Coefficient is the geometric mean between the regression coefficients. r2 = b yx x b xy 2. If one of the regression coefficients is greater than unity, the other must be less than unity. b yx 1 b xy 3. Both the regression coefficients will have the same sign. 4. The Correlation Coefficient will have the same sign as that of regression coefficients. 5. The arithmetic mean of the regression coefficients is greater than the Correlation Coefficient < 1

4.5

Difference between Correlation and Regression

The difference between regression and correlation needs to be emphasised. Both methods attempt to describe the association between two (or more) variables, and are often confused by students and professional scientists alike!

1 Correlation makes no a priori assumption as to whether one variable is dependent on the other(s) and is not
concerned with the relationship between variables; instead it gives an estimate as to the degree of association between the variables. In fact, correlation analysis tests for interdependence of the variables.

2 As regression attempts to describe the dependence of a variable on one (or more) explanatory variables; it implicitly
assumes that there is a one-way causal effect from the explanatory variable(s) to the response variable, regardless of whether the path of effect is direct or indirect.

4.6

Standard error of Estimate

The standard error of estimate is also called standard deviation of the error term t. It measures the variability of the

observed values around the regression line . Standard error of estimate y - a y b xy = ( ________________________ ) n-2

You might also like