Professional Documents
Culture Documents
FinQuiz Notes 2 0 1 5
2. CORRELATION ANALYSIS
Scatter plot and correlation analysis are used to examine portfolio could be diversified or decreased.
how two sets of data are related. If there is zero covariance between two assets, it
means that there is no relationship between the
2.1 Scatter Plots rates of return of two assets and the assets can be
included in the same portfolio.
A scatter plot graphically shows the relationship
between two varaibles. If the points on the scatter plot Correlation coefficient measures the direction and
cluster together in a straight line, the two variables have strength of linear association between two variables. The
a strong linear relation. Observations in the scatter plot correlation coefficient between two assets X and Y can
are represented by a point, and the points are not be calculated using the following formula:
=
connected.
! "#$ !% % ! "#$ !% %
2.2 & Correlation Analysis & Calculating and
=
& &
2.3 Interpreting the Correlation Coefficient
47.78
simply a variance of the random variable.
'= = 0.478
Covariance can range from to + .
The covariance number doesnt tell the investor if + 40 250
the relationship between two variables (e.g.
returns of two assets X and ) is strong or weak. It
The correlation coefficient can range from -1 to
only tells the direction of this relationship. For
+1.
example,
Two variables are perfectly positively correlated
o Positive number of covariance shows that rates
if correlation coefficient is +1.
of return of two assets are moving in the same
Correlation coefficient of -1 indicates a perfect
direction: when the rate of return of asset X is
inverse (negative) linear relationship between
negative, the returns of other asset tend to be
the returns of two assets.
negative as well and vice versa.
When correlation coefficient equals 0, there is
o Negative number of covariance shows that rates
no linear relationship between the returns of
of return of two assets are moving in the opposite
two assets.
directions: when return on asset X is positive, the
The closer the correlation coefficient is to 1, the
returns of the other asset Y tend to be negative
stronger the relationship between the returns of
and vice versa.
two assets.
NOTE:
The null hypothesis is the hypothesis to be tested. The
alternative hypothesis is the hypothesis that is accepted
if the null is rejected.
' 2
9= ~9 2
Difference b/w Covariance & Correlation: The
covariance primarily provides information to the investor 1 ' 5
about whether the relationship between asset returns is where,
positive, negative or zero, but correlation coefficient tells
r is the sample coefficient of correlation calculated by
the degree of relationship between assets returns.
,
'=
<= <>
NOTE:
Correlation coefficients are valid only if the means,
variances & covariances of X and Y are finite and t = t-statistic (or calculated t)
constant. When these assumptions do not hold, then the n 2 = degrees of freedom
correlation between two different variables depends
largely on the sample selected. Decision Rule:
If test statistic is < t-critical or > + t-critical with n-2
2.4 Limitations of Correlation Analysis degrees of freedom, (if absolute value of t > tc), Reject
H0; otherwise Do not Reject H0.
NOTE:
Spurious correlation may suggest investment strategies
that appear profitable but actually would not be so, if Magnitute of r needed to reject the null hypothesis (H0:
implemented. = 0) decreases as sample size n increases. Because
as n increases the:
Testing the Significance of the Correlation o number of degrees of freedom increases
2.6 o absolute value of tc decreases.
Coefficient
o t-value increases
t-test is used to determine if sample correlation
coefficient, r, is statistically significant. In other words, type II error decreases when sample size
(n) increases, all else equal.
Two-Tailed Test:
Null Hypothesis H0 : the correlation in the population is 0
( = 0);
Reading 9 Correlation and Regression FinQuiz.com
NOTE:
Type I error = reject the null hypothesis although it is true. Practice: Example 7, 8, 9 & 10
Type II error = do not reject the null hypothesis although Volume 1, Reading 9.
it is wrong.
3. LINEAR REGRESSION
Regression analysis is used to: Independent variable: The variable used to explain the
dependent variable. Also called exogenous or
Predict the value of a dependent variable based on predicting variable.
the value of at least one independent variable
Explain the impact of changes in an independent Intercept (b0): The predicted value of the dependent
variable on the dependent variable. variable when the independent variable is set to zero.
b0 = y b1 x
Linear regression assumes a linear relationship between
the dependent and the independent variables. Linear Slope Coefficient or regression coefficient (b1): A
regression is also known as linear least squares since it change in the dependent variable for a unit change in
selects values for the intercept b0 and slope b1 that
(, *
the independent variable.
? =
minimize the sum of the squared vertical distances
between the observations and the regression line. ,' (
( ( * *
or
? =
Estimated Regression Model: The sample regression line
provides an estimate of the population regression line. ( ( 5
Note that population parameter values b0 and b1 are
not observeable; only estimates of b0 and b1 are Error Term: It represents a portion of the dependent
observeable. variable that cannot be explained by the independent
varaiable.
Example:
n =100
n 1
y = 5,411 .41; (x x)( yi y )
cov( X , Y ) = = 1,356,256
i
n 1
y = b0 + b1 x = 6,535 0.0312 x
cov( X , Y ) 1,356,256
b1 = = = 0.0312
s x2 43,528,688
b0 = y b1 x = 5,411.41 ( 0.0312)(36,009.45) = 6,535
3.2 Assumptions of the Linear Regression Model 3.4 The Coefficient of Determination
1. The regression model is linear in its parameters b0 and The coefficient of determination is the portion of the
b1 i.e. b0 and b1 are raised to power 1 only and total variation in the dependent variable that is
neither b0 nor b1 is multiplied or divided by another explained by the independent variable. The coefficient
regression parameter e.g. b0 / b1. of determination is also called R-squared and is denoted
as R2.
4. The variance of the error term is the same for all In case of a single independent variable, the coefficient
observations. (It is known as Homoskedasticity of determination is: R2 = r2
assumption).
5. Error values () are statistically independent i.e. the where,
error for one observation is not correlated with any
R2 = Coefficient of determination
other observation.
r = Simple correlation coefficient
6. Error values are normally distributed for any given
value of x.
Example:
3.3 The Standard Error of Estimate Suppose correlation coefficient between returns of two
assets is + 0.80, then the coefficient of determination will
Standard Error of Estimate (SEE) measures the degree of be 0.64. The interpretation of this number is that
variability of the actual y-values relative to the estimated approximately 64 percent of the variability in the returns
(predicted) y-values from a regression equation. Smaller of one asset (or dependent variable) can be explained
the SEE, better the fit. by the returns of the other asset (or indepepnent
44B
variable). If the returns on two assets are perfectly
49, A,'A B'' ' C B&9DE,9F: 4H = I
J1
correlated (r = +/- 1), the coefficient of determination will
be equal to 100 %, and this means that if changes in
or returns of one asset are known, then we can exactly
* *K 5 44B
predict the returns of the other asset.
4BB = 4H = I =I ,
J1 J1 NOTE:
Multiple R is the correlation between the actual values
where, and the predicted values of Y. The coefficient of
SSE = Sum of squares error determination is the square of multiple R.
n = Sample size
k = number of independent variables in the model Total variation is made up of two parts:
SST = SSE + SSR(or RSS)
Example:
n = 100
SSE = 2,252,363
Thus,
NOTE:
3.5 Hypothesis Testing
Higher level of confidence or lower level of significance
results in higher values of critical t i.e. tc. This implies
In order to determine whether there is a linear that:
relationship between x and y or not, significance test (i.e.
t-test) is used instead of just relying on b1 value. t-statistic Confidence intervals will be larger.
is used to test the significance of the individual Probability of rejecting the H0 decreases i.e. type II
coefficients (e.g. slope) in a regression. error increases.
The probability of Type-I error decreases.
Null and Alternative hypotheses
Stronger regression results lead to smaller standard errors
H0: b1 = 0 (no linear relationship)
of an estimated parameter and result in tighter
H1: b1 0 (linear relationship does exist)
) confidence interval. As a result probability of rejecting H0
b1 b1 increases (or probability of Type-I error increases).
Test statistic = t=
s b1 p-value: The p-value is the smallest level of significance
where, at which the null hypothesis can be rejected.
?S1 = Sample regression slope coefficient
Decision Rule: If p < significance level, H0 can be
b1 = Hypothesized slope
4T = Standard error of the slope
rejected. If p > significance level, H0 cannot be rejected.
* 5
[ \ ]^ _`
44B
44B
c f a
where,
=U *
nk
J1 _a` = _a bc d d g
Error
e e c _a
1
*K 5
44M
and
s f = s 2f
Total n1 =U *
* 5
s2 = squared SEE
n = number of observations
Or X = value of independent variable
= estimated mean of X
s2X= variance of independent variable
Source of Sum of Mean Sum of
DoF tc = critical t-value for n k 1 degrees of freedom.
Variability Squares Squares
Regression Example:
1 RSS MSR = RSS/1
(Explained)
Calculate a 95% prediction interval on the predicted
Error value of Y. Assume the standard error of the forecast is
n-2 SSE MSE = SSE/n-2
(Unexplained) 3.50%, and the forecasted value of X is 8%. And n = 36.
Assume: Y = 3% + (0.50)(X)
Total n-1 SST=RSS + SSE
The predicted value for Y is: Y =3% + (0.50)(8%)= 7%
F-Statistic or F-Test evaluates how well a set of
independent variables, as a group, explains the variation The 5% two-tailed critical t-value with 34 degrees of
in the dependent variable. In multiple regression, the F- freedom is 2.03. The prediction interval at the 95%
statistic is used to test whether at least one independent confidence level is:
variable, in a set of independent variables, explains a
significant portion of variation of the dependent 7% +/- (2.03 3.50%) = - 0.105% to 14.105%
variable. The F statistic is calculated as the ratio of the
average regression sum of squares to the average sum This range can be interpreted as, given a forecasted
of the squared errors, value for X of 8%, we can be 95% confident that the
dependent variable Y will be between 0.105% and
X<<
W4L
= Y
14.105%.
W4B <<H
ZYZ