You are on page 1of 16

UNIT 8 CORRELATION AND

REGRESSION ANALYSIS
Structure
8.0 Objectives
8.1 Introduction
8.2 Coalation
8.2.1 Concept
8.2.2 Correlation and Independence
82.3 Nonsense Correlation

8.3 Regression
8.3.1 Concept
8.3.2 Correlation and Regression
8.3.3 Simple Regression
8.3.4 Multiple Regression ,

8.4 Types of Data

8.6 Exercises
8.7 Key Words
8.8 Some Usefd Books
8.9 Answers or Hints to Check Your Progress
8.1 0 Answers or Hints toY3xercises
I

8.0 OBJECTIVES
\

After going through this unit,you w


illbe able to: .
refresh the concept of linear correlation;
state(thatzero correlation does not imply that the variables are independent; on
the other hand, independence of variables imply zero correlation;
appreciate tJle fact that t h i presence of a high degree of correlation does not
necessarily amount to the existence of a meaningful relationship among the
variables under consideration;
explain between the concept of correlation and that of regression;
refresh the method of least square in connection with two variable r&ssiop;
distinguish between direct r e g r e s s i m r e v e r s e regression;
understand how the approach to multiple regression is an extension of thk
approach followed in two variable regression and
know about various types of data that can be used'in regression analysis.
Quantitative hlethods-l
8.1 INTRODUCTION
Quantitative techniques are important tools of analysis in today's research in
Economics. These tools can be broadly divided into two classes: mathematical tools
and statistical tools. Econonlic research is often concerned with theorizing of some
economic phenomenon. Different mathematical tools are employed to express such
a theory in a precise mathematical form. This mathematical form of economic theory
is what is generally called a inathematicalmodel. A major purpose of the formulation
of a mathematical model is to subject it to further mathematical treatment to gain a
deeper understanding of the economic phenomenon that the researcher may be
priniarily interested in. However, the theoq so developed needs to be tested in the
real-world situation.In other words, the usefulness of a mathematical model depends
on its empirical verification. Thus, in economic research, often a researcher is hard-
pressed to put the mathematical model in such a form that it can render itself to
empirical verification. For h s purpose, various statisticaltechniqueshave been found
to be extremely useful. We should note here, that often such techniques have been
appropriately modified to suit the purposes of the economists. Consequently, a very
rich and powerful area of economic analysis known as Econometrics has grown
over the years. We may provide a working definition of Econometricshere. It may
be described as the application of statistical tools in the quantitative analysis of
economic phenomena. We may mention here that econometricianshave not only
provided important tools for economic analysis but also their contributions have
significantly enriched the subject matter of Statistical Science in general. Today, no
researcher can possibly ignore the need for being familiar with econometrictools for
the purpose of serious empirical economic analysis. In the subsequentunits, you will
learn about regression models of econometric analysis. The concepts of correlation
and regression form the core of regression models. You are already familiar with
these two concepts, as you have studied them in the compulsory course on
QuantitativeMethods (MEC-003). In this unit we are going to put the two concepts
in the perspectiveof en~piricalresearch in Economics. Here, our emphasiswill be on
examininghow the applications of these two concepts are important in studyingthe
possibility of relationshipthat may exist among economic variables.

8.2 CORRELATION
8.2.1 Concept
In the introductionto this chapter, we have already referred to a mathematical model
of some real-world observable economic phenomenon. In general, a model consists
of some hctional relationships, some equations, some identitiesand someconstraints.
Once, a model like this is formulated, the next issue is to examine how this model
works in the real-world situation, for example, in India. This is what is known as the
estimation of an econometric model. It may be mentioned here that Lawrence Klein
did some pioneering work in the formulation and estimation of suchmodels. In fact,
many complex econometric models consisting of hundreds of functions, equations,
identities and constraintshave been constructed and estimated for differenteconomies
of the world, including India, by using empirical data.
The estimation of such complete macro-econometric models, however, involves
certain issues that are beyond our scope. As a result, we shall abstract from such
kind of a model and focus on a single equation economic relationship and consider
itsempirical verification For example,intheKeynesianmodel of income detemimtion,
consumption function plays a pivotal role. The essence of this relationship is that
consumptiondepends on income. We may specifl a simpleconsumptionfunctionin Correlation and
Regression Analysis
the form of a linear equation with two constraints: one, autonomous part of
consumptionbeing positive and two, marginal propensity to consume being more
than zero but less than on..
Thus, our consumption equation is

This kind of a single equation and its estimationis commonly known as the'regression
model in the econometric literature. It may be mentioned here that such a single-
equation regression model need not be a part of any econometricmodel and can be
a mathematical formulationof some independentlyobserved economicphenomenon.
Any scientificinquiry has to be conducted systematically,and, economic inquiry is
no exception. In the case of our regression model involving consumption and incame,
for example,aprelirninarystep may be to examine, whether inthe real-world situation,
there exists any relationshipbetween consumptionand income at all. This is precisely
what we attempt at with the help of the concept of correlation.Thus, at the moment,
we are not concerned with the issue of dependenceof consumption on income or -
vice-versa. We are simply interested in the possible co-movement of the two
variables. We shall focus on the differencebetween correlationand regression later.
Correlation can be defined as a quantitativemeasure of the degree or strength of
relationshipthat may exist between two variables. You are already familiar with the
concept of Karl Pearson's coefficient of correlation. IfXand Yare two variables,
we know that this correlation coefficient is given by the ratio of q e covariance
'
between X and Y to the product of the standard deviation of X and that of Y. In
symbols:

The symbolshave usual meaning. Here, the covariance in the numerator is important.
This in fact, gives a measure of the simultaneous change in the two uariables. It is
divided by product of the standard deviation of X and Y to make the measure f k e
of any unit in order to facilitate a comparison between more than one set of
bi-variate data which may be expressed in different units. It may be noted here that
this measure! of correlatibn coefficient is independent of a shift in the origin and a
change of scale. The correlation coefficient lies between +1 and -1. In symbol :

If the two variables tend to move in the same direction, the correlation coefficient is
positive. In the event ofthe two variablesten- to move iri the opjpositedirections,
the correlation coefficient assumes a negative value. In the case of a perfect
correlationship, the correlation coefficient is either +1 or -1, which is almost
impossible in economics. When there does not seem to be any relationshipbetween
the two variables on the basis of the available data, the correlation coefficientmay
assume a value equal to zero. *
It should be noted here that Karl Pearson's correlation coefficient measures linear
correlationship between two variables. This means that there exists a proportional
relationship between the two variables i.e., the two variables change in a fixed
proportion. For example, we may find that the correlation coefficient between
Vuantitallve Methods-' disposable income and personal consunlp~ionexpenditure in India on the basis of some
national income data is 0.7. It only means that consumption in relation to income or
income in relation to consumption changes by a factor of 0.7. We again stress here that
at the moment we are not commenting on whether income is the independent variable
and consumption is the dependent variable or it is the other way round.
It is important he^ to comment on what is known as coefficientof determination.Although
iVis numerically equal to the square of the correlation coefficient, conceptually it is quite
different from the correlation coefficient. We shall discuss this concept in details in the
next unit.
Example 8.1
If three uncorrelated variables x,, x2 and x3 have the same standard deviation, find the
correlation coefficient between x, + x2 and x2 + x3.

Supposeu = x, + x, and similarly,~= x, + x,.Then,wegave to find r,,. Let a,,a, and a,


bc the standard deviationsof x, ,x, and x, respectivdy. Let cov(x, ,x,), cov(x, ,x,) and
cov(x, ,x; ) be covarianccs between t k pairsof the variables(x, ,x,), (x, ,x,) and (x, ,x,)
respectivdy.Since,it is given thatthe variablesare uncorrelakd, we have

Let XI, X2and X3be the means of X,, X2and X3 . So, respectively, we have,

Therefore, r,, = cov(U,V)-- -


a' = 0.5
a , 2a2
8.2.2 Correlation abd Independence
We should appreciate that in the real-world situation the relationship between two
variables may not be linear in nature. In fact, often variables are involved in all kinds
of non-linear relationships. Thus, we should be very clear that even when Karl
Pearson's correlation coejficient is found to be zero, the two variables might still be
related in a non-linear y n n e r . The frequently quoted statement, "Independence of
two variables implies zero correlation coefficient but the converse is not necessarily
true." exemplifies this fact. Statistics and consequently Econometrics of non-linear
relationships are quite involved in nature and beyond the scope of the present discussion.
Consequently, at this stage, linearity should be taken as a necessary simplieing
assumption. However, we shall see later that essentially a non-linear reIationship can
sometimes be reduced to a linear relationshipthrough some appropriate transformation
and the tools of linear analysis can still be effectivery applied to such transformed
relationships. We often employ su h techniques as apractical solution to the complexities
1
iilvolved in a non-linear relationship.
8.2.3 Nonsense Correlation Correlation and
Regression Analysis
Sometimes, two variables, even when seem to be not related in any manner, may
display a high degree of correlation. Yule called this kind oi'correlation as 'nonsense
correlation'. lf we measure two variables at regular time intervals, both the variables
may display a strong time-trend. As a result, the two variables may display a strong
correlationshipeven when they are unrelated. Thus one should be very careful while
using such a source of data. In fact, a new branch of econometrics, known as Time
Series econometrics, has been developed for exclusively handling such a situation.
Another situation when two, seeminglyunrelated, variables may display ahigh degree
of correlation is the result ofthe influence of a third variable on both of them. Thus,
the existence of a correlation between two variables does not necessarily imply a
relationship between them. It only indicates that the data are not inconsistent with
the possibility of such a relationship. The reasonableness of a possible relationship
must be established on theoretidal considerations first, and then we should proceed
with the computation of correlation.
Check Your Progress 1
1) i) Define correlation between two variables.
ii) How do you measure linear correlation between two variables?
iii) Why is this measure called a measure of linear correlation?

2) Explain how two independent variables have zero correlation but the converse
is not true.

3) Does the presence of strong correlation between two variables necessarily


imply the existence of a meaningful relationshipbetween them?

REGRESSION
8.3.1 Concept
The term regression literally means a backward movement. Francis Galton first used
the term in the late nineteenth century. He studied the relationship between the height
of parents and that of children. Galton observed that although tall parents had tall
children and similarly short parents had short children in a statistical sense, but in
general the children's height tended towards an average value. In other words, the
children's height moved backward or regressed to the average. However, now the
term regression in statistics has nothing to do with its earlier connotationof a backward
movement.
Quantitative Methods-l Regression analysis can be described as the study of the dependence of one variable
on another or more vatiables. In other words, we can use it for examining the
relationshipthat may exist among certain variables. For example, we may be interested
in issues like how the aggregate demand for money depends upon the aggregate
income level in an economy. We may employ regression techniqueto examine this.
Here, Aggregate demand for money is called the dependent variable and aggregate
income level is called the independent variable. Consequently, we have a simple
demand for money hction. In this context, we present the following table to show
some of the terms that are also used in the literature in place of dependent variable
and independent variable.
Table 8.1: Classifying Terms for Variables in Regression Analysis
Dependent Variable Independent Variable
Explained Variable ExplanatoryVariable
Regressand Regressor
Predictand Predictor
EndogenousVariable ExogenousVariable
ControlledVariable Control Variable
Target Variable Control Variable
Response Variable StimulusVariable
Source: Maddala (2002) and Gujrati (2003).
It is now important to clarify that the terms dependent and independent do not
necessarily imply a causal connection between the two types of variables. Thus,
regression analysis per-se is not really concerned with causality analysis.Acausal
connection has to be established first by some theory that is outside the parlance of
the regressionanalysis. In our earlier example of consumption function and the present
example of demand for money function we have theories like Keynesian income
hypothesis and transaction demand for money. On the basis of such theories perhaps
we can employ regression technique to get some preliminary idea of some causal
connection involvingcertain variables. In fact, causality study is now a highly speckhxd
branch of econometrics and goes far beyond the scope of the ordinary regression
analysis.
A major purpose of regression analysis is to predict the value of one variable given
the value of another or more variables. Thus, we may be interested in predicting the
aggregate demand of money from a given value of aggregateincome.
We should be clear that by virtue of the very nature of economics and other branches
of social science, the concern is a statistical relationship involving some variables
rather than an exact mathematical relationship as we may obtain in natural science.
Consequently,if we are able to establish some kind of a relationship between an
independent variable Xand a dependent variable Y, it can be expected to give us
ogy sort of an average value of Y for a given value ofX. This kind of arelationship is
known as a statistical or stochastic relationship. Regression method is essentially
concerned with the analysis of such kind of a stochasticrelationship.
From the above discussion, it should be clear that in our context, the dependent
variable is assumed to be stochasticor random. In contrast,the independent variables
are taken to be non-stochastic or non-random. However, we must mention here that
at an advanced level, even the independent variables are assumed to be stochastic.
In the next unit, we shall discuss the stochastic nature of the regression analysis in
details.
If a regression relationship has just one independent variable, it is called a two Correlation and
Regression Analysis
variable or simple regression. On the other hand, if we have more than one
independent variable in it, then it is multiple repsion.
8.3.2 Correlation and Regression
Earlier we made a reference ta the conceptual difference between correlation and
regression. We may discuss it here. In regression analysis, we examinethe nature of
the relationship between the dependent and the independent variables. Here, as
stated earlier, we try to estimate the average value of one variable h m the given
values of other variables. In correlation, on the other hand, our focus is on the
measurement of the strength of such a relationship.Consequently,in regression, we
classify the variables in two classes of dependent and independent variables. In
correlation,the treatment of the variables is rather symmetric;we do not have such
kind of a classification. Finally, in regression, at our level, we take the dependent
variable as random or stochastic and the independent variables as non-random or
fixed.In correlation, in contrast,all the variables are implicitly taken to be random in
nature.
8.3.3 Simple Regression
Here, we are focusing onjust one independent variable. The first thing that we have
to do is to specify h e relationship between Xand Y. Let us assume that there is a
linear relationship between the two variables like:

The concept of linearity, however, requires some clarification.We are postponing


that discussion to the next unit. Moreover, there can be various types of intrinsically
non-linear relationships also. The treatment of suchrelationshipsis beyond our scope.
Our purpose is to estimate the constants a and b from empirical observations on
X and Y.
Tlzr Method of Least Squares
Usually, we have a sample of observations of a given size say, n. If we plot the n
pairs of observations, we obtain a scatter-plot, as it is known in the literature. An
example of a scatter-plot is presented below.

X
Fig. 8.1: Scatter-Plot
Quantitative Methods-I A visual inspection of the scatter-plot makes it clear that for different values ofX,
the correspondingvalues of Yare not aligned on a straight line. As we have mentioned
earlier, in regression, we are concerned with an inexact or statisticalrelationship.
And this is the consequence of such a relationship. Now, the constants a and b are
respectively the intercept and slope of the straight line described by the above-
mentioned linear equation and several straight lines with differentpairs of the values
(a, b) can be passed through the above scatter. Our concern is the choice of a
particular pair as the estimates of a and b for the regression equation under
consideration. Obviously,this calls for an objective criterion.

Such a criterion is provided by the method of least squares. The philosophy behind
the least squares method is that we should fit in a straight line through the scatter-
plot, in such a manner thatthe vertical differences between the observed values of Y
and the correspondmg values obtained fiom the straight line for different values of
X, called errors,are minimum.The line fitted in such a fashion is called the regression
line. The values of a and b obtained fiom the regression line are taken to be the
estimates of the intercept and slope (regressioncoefficient)of the regression equation.
The values of Y obtained fiom regression line are called the estimated values of Y. A
stylized scatter-plotwith a straight line fitted in it is presented below:

The method of least square requires that we should choose our a and b in such a
manner that sum of the squares of the vertical differences between the actual values
or observed values of Yand the ones obtained &om the straight line is minimum.
Putting mathematically,

with respect to a and b


where f is called the estimated value of Y. The values of a and b so obtained are
known as the least-square estimatesof a and b and are normally denoted by 2 and b^
This is a well-known minimization procedure of calculus and you must have done
that in the course on QuantitativeMethods (MEC-003). You $so must haveobtained
the normal equationsand solved them for obtaining 6 and b" .We are leaving that as
an exercise for this unit. The earlier shown ~cat&r-~lot with a regression line is
presented below:

Fig. 8.3: Scatter-Plot with the Regression Line '


This regression line, obviously, has a negative intercept. If we recapitulate, the Correlation and
Regression Analysis
two normal equations that we obtained fiom the above-mentioned procedure are
given by

and

After solving the two equations simultaneouslywe obtain the least square estimates

- C (X- X)(Y- 0
h=
x (X- Xp
and

regression analysis,the slope coefficientassumes special significance. It measures


~n
the rate of change of the dependentvariable with respect to the independent variable.
As a result, it is this constant that indicateswhether there exists a relationship between
X and Y or not. The regression equation

is in fact called the regression of Y on X, the slope b of this equation is termed as the
regression coefficient of Y on X. It is also denoted by byx.Aglance at the expression
of the regression coefficient Yon X makes it quite clear that the above expression
can also be written as

Thus, putting the values of a and b, the regression equation of Y on X can be


written as

Reverse Regression
Suppose, in another regression relationshipXacts as the dependent variable and Y
as the independent variable. Then that relationshipis called the regression ofXon Y.
Here, we should dewtely avoid the temptation of expressingXin terms of Y fiom
the regression equation of Y on Xto obtain that ofXon Y and trying to mechanically
extract the least square estimates of its constants fiom the already known values of
B and i .The regression ofXon Y is in fact intrinsically differeat fiom that of Y onX.
Geometricallyspeaking, in regression ofXon Y, we minimize the sum ofthe squares
of the horizontal distances as against the mhhization of the sum of the squares of
the vertical distances in Yon X, for obtaining the least square estimates. If our
regression equation of X on Y is given by
d
Quantitative Methods-l
<
'

then its least square estimates are given by the criterion:

with respect to a' and b'

By applying the usual minimization procedure, we obtain the following two normal
equations:

and

We can simultaneouslysolve these two equations to get the least square estimates

and
-- A -
i' = X-b' Y

The slope b' of the regression of Xon Y is called the regression coefficient ofX
on Y. It measures the rate of change ofXwith respect to Y, in order to distinguish it
clearly from the regression coefficient of Y onX; we also use the symbol b, for it.

Puttipg the values of a' and b', the regression equation of X on Y can be written as

To highlight the *nt diffaencebetween the two kinds of r e m i o n , the regression


of Y on Xis sometirnes'termedas the direct regression and that of theXon Yis called
the reverse regression. Maddala (2002) gives an example of direct regression and
reverse regression in connection with the issue of gender bias in the offer of
emoluments. Let us assume that the variable Xrepresents qualifications and the
variable Yrepresents emoluments. We may be interested in finding whether males
and females with the1same qualifications receive the same emoluments or not. We
may examine this by running the direct regression of Y onX. Alternatively,we may
be curious about if males and females with the same emoluments possess the same
qualifications or not. We may investigate into this by running the reverse regression
ofXon Y. Thus, it is perhaps valid to run both the regressions in order to have a clear
insight into the question of gender bias in emoluments.

Properties
Let us now briefly consider some of the properties of the regression.
The product of the two regression coefficients is always equal to the square of Correlation and
1) Regression Analysis
the correlation coefficient:

2) The two regression coefficients have the same sign. In fact, the sign of the two
coefficients dependsupon the sign of the correlation coefficient. Since the standard
deviationsof both Xand Yare, by definition,positive; if correlationcoefficient is
positive, both the regression coefficients are positive and similarly, if correlation
coefficient happens to be negative, both the regressioncoefficientsbecome negative.
- -
3) The two regression lines always intersect each other at the point (X, Y).

4) When r = _+ 1 ,there is an exact linear relationship between X and Y and in that


case, the two regression lines coincide with each other.

5) When r = 0 , the two regression equations reduce to Y = Y and X = z.


In
such a situation, neither YnorXcan be estimated from their respective regression
equations.
As mentioned earlier, coefficient of determinationis an important concept in the context
of regression analysis. However, the concept will be more contextual if we discuss it in
the next unit.
Example 8.2
From the following results, obtain the two regression equations and the estimate of the
yield of crop, when the rainfall is 22 cm; and the rainfall, when the yield is 600 kg.
Yield in kg Rainfall in cm
Mean 508.4 26.7
Standard Deviation 36.8 4.6

Co-efficient of correlation between yield and rainfall = 0.52.

Let Y be yield and X be rainfall. So, for estimating the yield, we have to run the
regression of Y on Xand for the purpose of estimating the rainfall, we have to use
the regression ofXon Y.

Wehave, z = 2 6 . 7 , Y=508.4, a, =4.6, a, =36.8:and r=0.!2


36.8 4.6
:. regression coefficientsby, = 0.52 x -= 4.16and b, = 0.52~- = 0.065
4.6 36.8
Hence, the regression equation of Yon X is
Y -508.4 = 4.16(X - 26.7)
orY = 4.16X +397.33
Similarly, the regression equation of X on Y is
X - 26.7 = O.O65(Y - 508.4)
or X = 0.065Y - 6.346
When X = 22, Y = 4.16 x 22 + 397.33 = 488.8
When Y = 600, Y = 0.065 x 600 - 6.346 = 32.7
Hence, the estimated yield of crop is 488.8 kg and'the estimated rainfall is 32.7 cm.
i Quantitative Methods-]
8.4 TYPES OF DATA
We conclude this unit by discussing the types of data that may be used for the
purpose of economic analysis in general and regression analysis in particular. We
can use three kinds of data for the empiricalverificationof any economicphenomenon.
They are: time series, cross section, pooled or panel data.
Time Series Data
A time series is a collection of the values of a variable that are observed at different
points of time. Generally, the interval between two successivepoints of time remains
fixed. In other words, we collect data at regular time intervals. Such data may be
collected daily, weekly, monthly, quarterly or annually. We have for example, daily
data series for gold price, weekly money supply figures, monthlyprice index, quarterly
GDP series and annual budget data. Sometimes, we may have the same data in
more than one time interval series; for example, both quarterly and annual GDP
series may be available. The time interval is generally called the frequency of the
time series. It should be clear that the above-mentioned list of time intervals is by no
means an exhaustiveone. There can be, for example, an hourly time series like that
of stock price sensitivity index. Similarly,we may have decennial population census
figures. We should note that conventionally, if the frequency is one year or more, it is
called a low frequency time series. On the other hand, if the frequency is less than
one year, it is termed as a high frequency time series. A major problem with time
series is what is known as non-stationary data. The presence of non-stationarity is
the main reason for nonsense correlation that we talked about in connection with
our discussionon correlation.
Cross Section Data
In cross section data, we have observations for a variable for different units at the
same point of time. For example, we have the state domestic product figures for
different states in India for a particular year. Similarly, we may collect various stock
price figures at the same point of time in a particular day. Cross section data are also
not free from problems. One main problem with this kind of data is that of the
heterogeneity that we shall refer to in the next unit.
Pooled Data
Here, we may have time series observations for various cross sectional units. For
example, we may have time series of domestic product of each state for India and
we may have a panel of such series. This is why such kind of a data set is called
panel data. Thus, in this kind of data, we combine the element of time series with
that of cross section data. One major advantage with such kind of data is that we
may have quite a large data set and the problem of degrees of freedomthat mainly
arises due to the non-availability of adequatedata can largely be overcome. Recently,
the treatment of panel data has received much attention inempirical economic analysis.
Check Your Progress 2
1) Explain how regression is not primarily concerned with causality analysis.
2) Bring out the differencebetween correlationand regression. - Correlation and
Regression Analysis

..................................................................................................................
I
!
..................................................................................................................
1 3) What is the distinctionbetween Time Series Data and Cross Section Data?
!
..................................................................................................................

...............................................
...................................................................

..................................................................................................................

4) E x p l h the concept of reverse regression.

I
I 8.5 LET US SUM UP
I
I Regression models occupy a central place in empirical economic analysis. These
I
i
models are essentiallybased on the conceptsof comelation and regression. Correlation
- is a quantitative measure of the strength of the linear relationship that may exist
.among some variables. The existence of a high degree of correlation,however, is
not necessarily the evidence of a meaningll relationship. It only suggests that the
data are not inconsistent withthe possibility of such kind of a relationship. Regression
on the other hand focuses on the direction of a linear relationship. Here, one is
concerned with the dependence of one variable on other variables. Regression, in
itself, does not suggest any causalrelationship. Correlation and regression, both are
concerned with a statistical or stochastic relationship as against amathematical or an
exact relationship. In the conventional regression analysis,the dependent variable is
treated to be stochastic or random, whereas, the independent variables are taken to
,
be non-stochastic in nature. The constants of a regression equation are estimated
h m the empirical observationsby using the least squaretechnique. In atwo variable
regression equation, there is one dependent variable and one independent variable.
The slope coefficient of a regression equation is called the regression coefficient. It
measures the rate of change of the dependent variable with respect to the independent
variable. The distinction between the concept of direct regression and that of the
reverse regression is crucial in the regression analysis. Sometimes by running both
the kinds of regression, important insight can be gained in the empirical economic
analysis. In multiple regression, there are at least two independent variables.
Finally, in regression analysis, three types of data, namely, time series, cross section
and pooled, can be used.
Quantitative Methods-I
8.6 EXERCISES
1) Prove that correlation coefficient lies between - 1 and + 1.
2) Show that correlation coefficient is unaffected by a shift in the origin and a
change of scale.

3) For the regressionequation of Y onX, derive the least square estimators of the
parameters. Try and work out the same for the regression equation ofXon Y.
4) From the following data, derive that regression equation which you consider to
be economically more meaningful. Givejustification for your choice.
Output 5 7 9 11 13 15
Profit per unit 1.70 2.40 2.80 3.40 3.70 4.40

5) To study the effect of rain on yield of wheat, the followingresults were obtained:
Mean Standard Deviation
Yield in kg per acre 800 12
M a l l in inches 50 2
Correlation coefficient is 0.80.
Estimate the yield, when d a l l is 80 inches.

KEY WORDS
coefficient of : It is equal to the square ofthe correlationcoefficient.
Determination
Corklation : It is a quantitative measure of the strength of the
relationship that may exist among certain variables.
Cross Section Data : In cross section data, we have observations for a
variable for different unitsat the same point of time.
Econometrics : It is described as the application of statistical tools in
the quantitativeanalysis of economicphenomena.
Mathematical Model : The mathematical form of some economic theory is
what is generally called a mathematical model.
Method of Least : 1t is the method of estimating the parameters of a
Square regression equation in such a fashion that the sum of
the squares of the differences between the actual
values or observed values of the dependent variable
and their estimated values from the regression
equation is minimum.
Multiple Regression : It is a regression equation with more than one
independent variable.
Nonsense Correlation : The presence of correlation between two
variables when there does not exist any meanmgfd
relationship between them is known as nonsense
correlation.
Pooled Data : In pooled data, we have time series observations Correlation and
Regression Analysis
for various cross sectional units. Here, we combine
the element oftime series with that of cross section
data.
Regression Equation : It is the equation that specifies the relationship
between the dependent and the independent
variables for the purpose of estimatingthe constants
or the parameters of the equation with the help of
empirical data on the variables.
Regression : It is a statistical analysis of the nature of the
relationship between the dependent and the
independent variables.
Reverse Regression : It is an independent estimation of a new regression
equation when the independent variable ofthe origu7al
equation is changed into the dependent variable and
the dependent variable of the original equation is
changed into the independent variable.
Time Series Data : It is a series of the values of a variable obtained at
different points of time.
Two Variable : It is a regression equation with one independent
Regression variable.

8.8 SOME USEFUL BOOKS


Gujrati, Damodar N. (2003); Basic Econometrics, Fourth Edition, Chapter 2,
Chapter 3, and Chapter 7, McGraw-Hill, New York.
Maddala, G.S. (2002); Introduction to Econometrics, Third Edition, Chapter 3
and Chapter 4, John Wiley & Sons Ltd., West Sussex.
Pindyck, Robert S. and Rubinfeld, Daniel L. (1991), Econometric Models and
Economic Forecasts, Third Edition, Chapter 1; McGraw-Hill, New York.
Karmel, P.H. and Polasek, M. (1 986); AppliedStatisticsfor Economists, Fourth
Edition, Chapter 8, Khosla Publishing House, Delhi.

8.9 ANSWERS OR HINTS TO CHECK YOUR


PROGRESS
Check Your Progress 1
1) i) See section 8.2.1.
ii) See section 8.2.1.
No, because, it is assumed that there exists a linear relationship between
\
the variables.
2) See section 8.2.2.
3) See section 8.2.3.
Quantitative Methods-I Check Your Progress 2
1) See section 8.3.1.
2) Seesection8.3.2.
3) See section 8.4.

8.10 ANSWERS OR HINTS TO EXERCISES


1) Do Yourself.
2) Do Yourself.

3) Y = 0.257X + 0.50
4) Do Yourself.
5) 944 kg per acre.

You might also like