You are on page 1of 13

CORRELATION ANALYSIS

Correlation is an analysis of the co-variation between two or more variables i.e. it gives linear
relationship between two or more variables. Correlation analysis is the statistical tool
generally used to describe the degree to which one variable is related another.

For instance, when demand of a certain commodity increases, then its price goes up & when
its demand decreases then its price comes down. Similarly, with age the height of the
children, with height the weight of the children, with money supply the general level of price
goes up. Such sort of relationship can as well be noticed for several other phenomena. The
theory by means of which quantitative connections between two sets of phenomena are
determined is called the ‘Theory of Correlation ’.

On the basis of correlation one can study the comparative changes occurring in two related
phenomena & their cause-effect relationship can be examined. It should however, be borne in
mind that relation ship like “black cat causes bad luck “, “filled up pitchers result in good
fortune” & similar other beliefs of the people cannot be explained by the theory of correlation
since they are imaginary & are incapable of being justified mathematically. Thus, correlation
is concerned with relationship between two related & quantifiable variables. If two quantities
vary in sympathy so that a movement (an increase or decrease) in the one trends to be
accompanied by a movement in the same or opposite direction in the other & the greater the
change in the one, the greater is the change in the other, the quantities are said to be
correlated. This type of relationship is known as correlation or what is sometimes called in
statistics as co-variation.

When both variables of such a character that they may be measured &
the result expressed in quantitative units & paired measurements for the two variables are
available for a group of individuals, it is possible under certain conditions to calculate a
constant known as the correlation coefficient, which will express the degree of relationship.

TYPES OF CORRELATION:

1. Positive or negative correlation


2. Simple, partial and multiple correlation
3. Linear and non-linear correlation.

 POSITIVE & NEGATIVE CORRELATION:

Whether the correlation is positive or negative would depend upon the direction in
which the variables are moving.
 Positive correlation: When both the variables are varying in the same direction, then
it is said to be positive correlation. If one variable is increasing and the other variable
is also increasing, then it is called as positive correlation.
Similarly if one variable is decreasing and the other variable is also decreasing, it is
said to be positive correlation.

E.g.1
X 5 15 25 35 45
Y 10 20 30 40 50

Here X variable is increasing and Y variable is also increasing, hence positive correlation

E.g.2
X 90 80 60 40 20
Y 75 70 50 30 10

Here X variable is decreasing and Y variable is also decreasing, hence positive correlation.

 Negative correlation: If one variable is increasing (or decreasing) then the other
variable is decreasing (or increasing) in the opposite direction then it is called as
negative correlation.

E.g. 1
X 25 30 44 65 80

Y 45 33 20 15 10

Here X variable is increasing and Y variable is decreasing, hence negative correlation.


Eg.2

X 100 90 80 70 60

Y 10 20 30 40 50

Here X variable is decreasing and Y variable is increasing, hence negative correlation

 SIMPLE, PARTIAL & MULTIPLE CORRELATION:

The study of correlation for two variables (of which one is independent & the other
is dependent) involves the application of simple correlation. When more than two
variables are involved in a study relating to correlation then it can either be as of
multiple correlations or of partial correlation. Multiple correlation studies the
relationship between a dependent variable & two or more independent variables. In
partial correlation we measure the correlation between a dependent variable & one
particular independent variable assuming that all other independent variables remain
constant.

Eg. X, Y – simple correlation


X Y, Z or more variables – partial or multiple correlations.

 LINEAR & NON-LINEAR CORRELATION:


The Non-linear correlation is also called as Curvilinear correlation.
The distinction is based upon the constancy of the ratio of change between variables.

 Linear correlation: when the amount of change in one variable tends to bear a
constant ratio to the amount of change in the other variable then the correlation is said
to be Linear.
In such a case if the values of the variables are plotted on a graph paper, then a
straight line is obtained. This is why the correlation is known as linear correlation.
 Non-linear correlation (Curvilinear correlation): when the amount of change in
one variable does not bear a constant ratio to the amount of change in the other
variable i.e., the ratio happens to be variable instead of constant, then the correlation
is said to be non-linear or curvilinear. In such a situation we shall obtain a curve if the
values of the variables are plotted on a graph paper.

 The different methods of studying correlation are as follows:

1. Scatter diagram method


2. Graphic method
3. Karl Pearson’s co-efficient of correlation
4. Concurrent deviation method
5. Method of least squared

• Scatter diagram method :


 Two variables are related to prepare a dot chart called as scatter diagram.
 The given data is plotted on graph paper in the form of dots i.e. for each pair of X
and Y values we put dots and apply as many points which are called as
observations.
 If the points vary i.e. more scattered then the relationship between the two
variables is poor.
 If the points are closely plotted then the relationship between two variables is
good.
Perfect positive correlation Perfect negative correlation

r = +1 r = -1

High degree of positive correlation High degree of negative correlation

Low degree of positive correlation Low degree of negative correlation


No correlation or uncorrelated

r=0

• Graphic method :
 This method is used for individual values of two variables are plotted on graph
paper.
 We get two curves one for variable X and one for variable Y. by examining the
direction and closeness of two curves you can conclude that the variables are
correlated or not.
 If both curves drawn on the graph paper are moving in the same direction then the
correlation is said to be positive. If the curves are moving in the opposite direction
correlation is said to be negative.

• Karl Pearson’s co-efficient of correlation :


 This method is most widely used in practice.
 It is popularly known as Karl Pearson’s co-efficient of correlation.
 It is denoted by the symbol ‘r’. Limits of r = ±1

 Direct method

r= N Σ XY – (Σ X) (Σ Y)

√ N Σ X2 – (Σ X) 2 √ N Σ Y2 – (Σ Y) 2

N is the number of observations


 Short cut method

1/n ∑ uv -- (u) (v)


r= ----------------------------------------
√1/n ∑ u2 – (u) 2 √ 1/n ∑ v2 – (v) 2

Where c1= ∑x and c2= ∑y


N N
U = x - c1 and V = y- c2

V = y- c2

♥ Rank correlation co-efficient or Spearman’s rank correlation coefficient:

This method was introduced by Charles Edward Spearman in 1904.The coefficient of


rank coefficient is based on the various values of the varieties & it is denoted by R or
sign “ρ”. it is applied in the problems in which data cannot be measured quantitatively
but qualitative assessments is possible such as beauty, honesty, etc. in this case , the
best individual is given rank number1, next rank 2 & so on. The coefficient of rank
correlation is given by formula

ρ= 1 - 6∑D2
N (N2 -1)

Where D2 is the square of the difference of corresponding ranks, & N is the number of
pairs of observation.
The limit of Rank correlation co-efficient is ± 1.

♥ Equal ranks:
In some cases it is found necessary to rank two or more individuals or entries as equal
in such cases, it is customary to give each individual an average rank.
If there are two entries at 4th position, then each of them is given the rank (4+5)/2,
which are 4.5.
If three are to be ranked equal at 4th place, they are given the rank (4+5+6)/3, that is 5

When equal ranks are assigned to some entries, then

ρ = 1- 6 [∑D2 + 1/12 (m13 – m1) + 1/12 (m23 – m2) + …… + 1/12 (mn3 – mn)]
N [N2 – 1]
M is the number of items whose ranks are common.

• LEAST SQUARE METHOD:


A technique for fitting a straight line through a set of points in such a way that the
sum of the squared vertical distance from the n point of the line is minimized

 Co-efficient of Correlation:
The extent or degree of relationship between the two variables is measured in terms of
another parameter called co-efficient of correlation.

PROPERTIES OF CORRELATION CO-EFFICIENT


1. The correlation co-efficient lies between ±1.

2. If r = ±1,
a. I.e. If r = +1, then it is a perfect positive correlation.
b. If r = 0, then there is no correlation.
c. If r = -1, then it is called as perfect negative correlation.

3. The co-efficient of correlation is independent of scale and origin.

4. The correlation co-efficient is the geometric mean of two regression co-efficient.


5. r = √ bxy. byx
a. Where, bxy and byx are the regression co-efficient.

6. The degree of relationship between 2 variables is symmetric


a. rxy = ryx
REGRESSION ANALYSIS

The term ‘regression’ was first used in 1877 by Sir Francis Galton who made a study
that showed that the height of children born to tall parents will tend to move back or
‘regress’ towards the mean height of the population. He designated the world
regression as the name of the process of predicting one variable from another variable.

Regression Analysis is a mathematical measure of the average relationship between


two or more variables in terms of original units of the data.

Regression analysis is a statistical device with the help of which we are in a position
to estimate or predict the unknown values of one variable from known values of
another variable. The variable which we are trying to predict is the “dependent
variable” and the variable which is used to predict the variable of interest is the
“independent variable”.

 REGRESSION LINE:

If a bivariate data are plotted as points on graph paper, it will be found that the
concentration points follow a certain pattern showing the relationship between the
variables. When the trend points are found to be linear, we determine the best fitting
straight line by method of least squares. Such straight lines which are used to obtain
best estimates of one variable for given values of other are called regression lines.

There are two types of regression lines:


1. Regression line x on y
2. Regression line y on x

The line of regression is the line of best fit.

 REGRESSION EQUATION:

A method, which express in the form of a mathematical equation the relationship


between the two variables x & y. This equation is known as the regression equation

There are two types of regression equation:


1. Regression equation x on y
Regression equation x on y is the value of x for the given changes in y.

X= a + by
Where X = Dependent variable
Y = Dependent variable
a = X-intercept
b = slope of the line
Here ‘a’ and ‘b’ are constant.

By using the methods of least squares


X= a + by
Normal Equation is:
∑X = Na + b∑ Y ----------------- (1)
∑XY = a ∑Y + b∑ Y2 ------------- (2)

2. Regression equation y on x
Regression equation y on x is the value of y for the given changes in x.

Y= a + bx
Where Y = Dependent variable
X = Dependent variable
a = Y-intercept
b = slope of the line
Here ‘a’ and ‘b’ are constant.

By using the methods of least squares


Y= a + bx
Normal Equation is:
∑Y = Na + b∑ X ----------------- (1)
∑XY = a ∑X + b∑ X2 ------------- (2)

REGRESSION COEFFICIENT:

The regression coefficient X on Y is denoted by symbol bxy

bxy = r σx
σy

r is the correlation coefficient


σx is standard deviation of X
σy is standard deviation of Y

It measures the change in X with unit change in Y.

By using regression coefficient bxy, the regression equation X on Y can be found out by using
the formula
X- X = r σx (Y-Y)
σy
X- X = bxy (Y-Y)

Similarly The regression coefficient Y on X is denoted by symbol byx

byx = r σy
σx
r is the correlation coefficient
σ X is standard deviation of X
σ Y is standard deviation of Y

It measures the change in Y with unit change in X.

By using regression coefficient byx, the regression equation Y on X can be found out by using
the formula

Y- =Y r σy (X-X)
σx
Y-Y = byx (X-X)

To find out bxy and byx, there are two methods, viz direct method and short cut method.

Direct method:

bxy = N ∑XY - ∑X ∑Y
N ∑Y2 - (∑Y)2
byx = N ∑XY - ∑X ∑Y
N ∑X2 - (∑X)2
Short cut method

bxy = ∑xy
∑y2

byx = ∑xy
∑x2

Here x = X – X
y = Y- Y
PROPERTIES OF REGRESSION COEFFICIENT:
1. Both the regression coefficient will have the same sign i.e. either negative or positive.

2. It is never possible that one of the regressions co-efficient is negative and other is
positive.

3. Both the regression coefficients cannot be greater than one.

4. If one of the regression coefficient is greater than one, then the other must be less than
one, i.e. if bxy > 1 , then byx < 1 .
5. r = √ bxy . byx

6. ‘r’ has the same sign as of the regression coefficient. If the regression coefficients
have negative sign, then r is also negative sign and vice-versa.

7. If bxy and byx are positive, then r is positive.

8. Regression co-efficient are independent of change of origin but not scale.


DR. D.Y.PATIL INSTITUTE OF MANAGEMENT
STUDIES.

ASSIGNMENT 1

Subject: Business statistics

Topic: Correlation &Regression (theory &practical)

Submitted to:
Prof. Ramdev

Submitted By:
Rakesh s gharat
Class – 3A
Roll No: 17
Div-1E
Sector: retail mgt

MBA I (SEM 1)

You might also like