You are on page 1of 26

ECONOMETRICS I

CHAPTER 1: THE NATURE OF


REGRESSION ANALYSIS

Textbook: Damodar N. Gujarati (2004) Basic Econometrics,


4th edition, The McGraw-Hill Companies
HISTORICAL ORIGIN OF THE TERM
REGRESSION
• The term regression is introduced by Francis
Galton.
• He found that, although there was a tendency for
tall parents to have tall children and for short
parents to have short children, the average height
of children born of parents of a given height
tended to move or “regress” toward the averge
height in the population as a whole. This
tendency is called Galton’s law of universal
regression.
THE MODERN INTERPRETATION OF
REGRESSION
• Regression analysis is concerned with the study of the
dependence of one variable, the dependent variable,
on one or more other variables, the explanatory
variables, with a view to estimating and/or predicting
the (population) mean or average value of the former
in terms of the known or fixed (in repeated sampling)
values of the latter.
Examples of Regression Analysis
1. Reconsider Galton’s law of universal
regression.

We want to find out how the average height


of sons changes, given the father’s height.

Look at the scatter diagram or scattergram


on the next slide.
Figure 1.1 Hypothetical distribution of sons’ heights
corresponding to given heights of fathers.
Examples of Regression Analysis
2. Consider the heights of boys measured at
fixed ages.

Notice that corresponding to any given age we


have a range of heights. Therefore, knowing
the age, we may be able to predict the
average height corresponding to that age.
Figure 1.2 Hypothetical distribution of heights
corresponding to selected ages.
Examples of Regression Analysis
5. A labor economist may want to study the rate
of change of money wages in relation to the
unemployment rate.

Figure 1.3
Examples of Regression Analysis
6. From monetary economics it is known that, other things remaining
the same, the higher the rate of inflation π, the lower the
proportion k of their income that people would want to hold in the
form of money, as depicted in Figure 1.4 (next slide).

A quantitative analysis of this relationship will enable the monetary


economist to predict the amount of money, as a proportion of their
income, that people would want to hold at various rates of
inflation.
Figure 1.4 Money holding in relation to
the inflation rate π
STATISTICAL AND DETERMINISTIC
RELATIONSHIPS
• In the regression analysis we are concerned
with that what is known as the statistical, not
functional or deterministic, dependence
among variables, such as those of classical
physics.
• In statistical relationships among variables we
essentially deal with random or stochastic
variables. These variables have probability
distributions.
REGRESSION VERSUS CAUSATION
• Although regression analysis deals with the
dependence of one variable on other
variables, it does not necessarily imply
causation.
• A statistical relationship per se cannot logically
imply causation.
REGRESSION VERSUS CORRELATION
• In the correlation analysis we try to measure
the strength or degree of linear association
between two variables. The correlation
coefficient measures this strength of (linear)
association
• In regression analysis we try to estimate the
average value of one variable on the basis of
the fixed values of other variables.
REGRESSION VERSUS CORRELATION
• In correlation analysis we treat any two
variables symmetrically. There is no distinction
between variables. Both variables are
considered random.

• Most of the regression theory is based on the


assumption that the dependent variable is
stochastic but the explanatory variables are
fixed or nonstochastic.
TERMINOLOGY
Dependent variable Explanatory variable
Explained variable Independent variable
Predictand Predictor
Regressand Regressor
Response Stimulus
Endogenous Exogenous
Outcome Covariate
Controlled variable Control variable
TERMINOLOGY
• In a simple (two-variable) regression analysis
we study the dependence of a variable on
only a single explanatory variable, such as that
of consumption expenditure on real income.
• In a multiple regression analysis we study the
dependence of one variable on more than one
explanatory variable, such as that of money
demand on interest rates, income, and
inflation.
TERMINOLOGY
• The term random is a synonym for the term
stochastic. A random (stochastic) variable is a
variable that can take on any set of values,
positive or negative, with a given probability.
NOTATION
• Y: dependent variable
• X1, X2, … , Xk : explanatory variables
• Xk : kth explanatory variable
• Xki : ith observation on variable Xk (cross-sectional data)
• Xkt : tth observation on variable Xk (time series data)
• N (or T): the total number of observations or values in
the population.
• n (or t): the total number of observations in the
sample. (time series data)
TYPES OF DATA
• There are mainly three types of data for
empirical analysis:
1. Time series data
2. Cross sectional data
3. Pooled data
Time series data
• A time series is a set of observations on the
values that a variable takes at different times.
Cross-sectional data
• Cross-sectional data are data on one or more
variables collected at the same point in time.
GPA study hours/week
3.5 10
2.7 8
1.9 9
2.3 5
2.0 8
2.2 6
2.5 3
Pooled data
• In the pooled data there are elements of both
time and cross-sectional data.
time GPA study hs/week
2000 2.5 9
2000 2.7 8
2000 2.3 6
2005 1.9 5
2005 3.1 12
2010 2.4 7
2010 2.0 5
2010 3.9 11
2010 1.2 2
• Panel data is a special type of pooled data in
which the same cross-sectional unit is
surveyed over time.
person time GPA study
hs/week
1 2010 2.5 9
1 2011 2.7 7
1 2012 2.3 6
2 2010 1.9 8
2 2011 3.1 12
2 2012 2.4 6
3 2010 2.0 5
3 2011 3.9 11
3 2012 1.2 2
Sources of Data
• Government agencies (Department of
Commerce...)
• International agencies (World Bank...)
• Surveys

In the social sciences the data that one generally


obtains are nonexperimental in nature, that is, not
subject to the control of the researcher.
The quality of data which are used in
economics is often not that good.
1. Possibility of observational errors.
2. Approximations and roundoffs.
3. Nonresponce to surveys may cause
selectivity bias.
4. The sampling method used in obtaining the
data may vary so widely that it might be very
difficult to compare them.
5. Economic data are generally available at a
highly aggregate level. Such highly aggregated
data may not tell us much about the individual
or micro level units (GNP...) .
6. Because of confidentiality, certain data can be
published only in highly aggregate form
(health data...).

The researcher should always keep in mind that


the results of research are only as good as
the quality of data.

You might also like