You are on page 1of 19

BIVARIATE ANALYSIS

BIVARIATE ANALYSIS

The major differentiating point between univariate and bivariate analysis, in addition to looking at more than one variable, is that the purpose goes beyond simply descriptive: it is the analysis of the relationship between the two variables

Univariate Data 1 2 3 4

Bivariate Data

Involving a single variable Does not deal with causes or relationships The major purpose of univariate analysis is to DESCRIBE Central tendency - mean, mode, median, Dispersion - range, variance, max, min, quartiles, standard deviation. frequency distributions Bar graph, histogram, pie chart, line-graph, box-and-whisker plot

Involving two variables Deals with causes or relationships The major purpose of bivariate analysis is to EXPLAIN Analysis of two variables simultaneously Correlations, comparisons,

Relationships, causes, explanations


Independent and dependent variables

Bivariate analysis shows the RELATIONSHIP between two continuous variables.

Joint Probability Distribution

BIVARIATE ANALYSIS
Relationship simply refers to the extent to which it becomes easier to know/predict a value for the Dependent variable if we know a case's value on the Independent variable.

BIVARIATE ANALYSIS
1. 2. 3. 4. SCATTER DIAGRAMS COVARIANCE CORRELATION REGRESSION

SCATTER DIAGRAMS

BIVARIATE ANALYSIS

Scatter diagrams are of use for variables that are closely related and have a relatively very high covariance

UNIVARIATE

BIVARIATE

SUM OF SQUARES

SUM OF PRODUCTS
SP =

SS =
SS =

SP =
Covariance =

Variance =

Variance =

Covariance =

Covariance
Covariance is the joint variation of two variables about their common mean The covariance is sometimes called a measure of "linear dependence" between the two random variables. When the covariance is normalized, one obtains the correlation coefficient. From it, one can obtain the Pearson coefficient, which gives us the goodness of the fit for the best possible linear function describing the relation between the variables.

In this sense covariance is a linear gauge of dependence.

Covariance
Cr, Ni and V (ppm) in an Upper Pennsylvanian Shale Uncorrected Sum of Products = (XY) from Kansas Corrected Sum of Products = (X- )(Y- ) X Y XY Cr Ni V 205 255 195 130 165 100 180 215 135 26650 Sum of Products (SP):(XY) (X)(Y)/n 42075 Covariance = SP/n-1 19500

Sum of Products (SP):(XY) (X)(Y)/n =132000 (1110)(675)/5 220 135 200 29700 = 2150 235 145 205 34075 Covariance = SP/n-1 = 1110 675 935 152000 Mean=222 135 187 = 2150/4 = 537.5
S2 = 570 562.5 SD = 23.88 23.71

Sum of Squares (SS):(X2)-(X)2/n Variance = SS/n-1

Covariance provides a measure of the strength of the correlation between two or more sets of random variables.

Computation of Covariance between Cr, Ni and V


Cr X 205 255 195 220 235 X X2 42025 65025 38025 48400 55225 X2 Ni y 130 165 100 135 145 Y Y2 16900 27225 10000 18225 21025 93375 Y2 Cr*Ni XY 26650 42075 19500 29700 34075 152000 XY Z V Z Ni*V Cr*Z VARIANCE 2 Z YZ XZ Cr 180 32400 23400 36900 Cr Ni 215 46225 35475 54825 V 663.75 135 18225 13500 26325 200 205 935 40000 42025 178875 Z2 YZ 27000 29725 129100 XZ 44000 48175 210225 COVARIANCE Ni V 537.5 663.75 562.5 718.75 718.75 1007.5

Sum 1110 248700 675

Corrected Sum of products,

SSCr = (248700) - (1110)2 /5 = 248700-246420 =2280 SSNi = 93375 (675)2 /5 = 93375 91125 =2250

Sum of squares SSCr =

Interpretation of covariance values must proceed in the same manner as an interpretation of variances. Individual values are not too meaningful because they are dependent upon the units of measurement.

CORRELATION: rjk = COV jk/sjsk


In practice, the sample correlation coefficient r is commonly calculated by the equation, r jk r jk = =

Cr

Cr 1

Ni
1

Ni 0.949248

r CrNi =

r CrNi = SPCrNi /SSCrSSNi = 2150 / (2280)(2250) = 2150 / 2264.95 = 0.949248

rCrNi = COVCrNi/SCrSNi = 537.5 -/ 23.874*23.717 = 537.5 / 566.237 = 0.949248

In order to estimate the degree of interrelation between variables in a manner not influenced by the measurement units, the correlation coefficient r is used . Correlation is the ratio of the covariance of two variables to the product of their standard deviations

Correlation can have a value: 1 is a perfect positive correlation 0 is no correlation (the values don't seem linked at all) -1 is a perfect negative correlation

If r measures the linear relationship between two variables, it should be possible to compute the line of dependence between them. Linear Regression
Output Persons in employed Units X Y 1 1 3 2 5 3 6 4 5 5
Calculations X 1 3 5 6 5 20 Y 1 2 3 4 5 15 1 X2 1 XY

9 25
36 25 96

6 15
24 25 71

Y = a + bX
Y = Na + bX
15 = 5a + 20b

XY = aX +

bX2
b = 0.6875

71 = 20a + 96b
a = 0.25

Y = 0.25 + 0.6875x

Alternate way of finding the regression equations is by using deviations from respective means, instead of using normal equations. The regression of Y on X is given by X Y x=(X- ) y= (Y ) -3 -1 1 2 1 0 -2 -1 0 1 2 0 xy 6 1 0 2 2 11 x2 9 1 1 4 1 16 y2 xy = 11 1 1 3 2 5 3 6 4 5 5 20 15 4 1 0 1 4 10 x2 =16 byx = 11/16 = 0.6875

Y 3 = 0.6875 (X - 4) Y = 0.6875 X - 2.75 + 3 Y = 0.6875 + 0.25

= 20/5 = 4 = 15/5 = 3

Output in Units X 1 3 5 6 5

Persons employed Y 1 2 3 4 5

Regressed values (Y- Y) 2 4 1 0 1 4 ( -Y)2 4.253906 0.472656 0.472656 1.890625 0.472656

0.9375 2.3125 3.6875 4.375 3.6875

10

7.5625

Total sum of squares (SST) of Y: SST = (y - Y)2 = 10 Sum of squares due to regression (SSR):
SSR = ( -Y)2 = 7.5625

The left over variation can be called the sum of squares due to deviation (SSD): The goodness-of-fit-of the line to the points can be defined by

SSD = SST SSR = 10 7.5625 = 2.4375

R2 =

= 7.5625 / 10 = 0.75625

SUMMARY OUTPUT Regression Statistics Multiple R 0.869626 R Square 0.75625 Adjusted R Square 0.675 Standard Error 0.901388 Observations 5 ANOVA df Regression Residual Total 1 3 4 SS 7.5625 2.4375 10 Standard Error 0.987421 0.225347 MS 7.5625 0.8125 Significance F F 9.307692 0.055391

Coefficients Intercept 0.25 X Variable 1 0.6875 RESIDUAL OUTPUT Observation Predicted Y Residuals 1 0.9375 0.0625 2 2.3125 -0.3125 3 3.6875 -0.6875 4 4.375 -0.375 5 3.6875 1.3125

t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% 0.253185 0.816484 -2.89241 3.392414 -2.89241 3.392414 3.050851 0.055391 -0.02965 1.404655 -0.02965 1.404655 PROBABILITY OUTPUT Percentile Y 10 1 30 2 50 3 70 4 90 5

Thank you
V. Hanumantha Rao Director (Retd.), GSI