Professional Documents
Culture Documents
ANALYSIS
By-Amisha
Pragya
Harshit
Pratishta
Shivangi
Pooja
Akash
DISCLAIMER!!
› All facts, ideas and thoughts mentioned throughout the
ppt are definitely not our own. They have been picked
from google baba and various books with long
unpronounceable names. So if you want to blame
anyone, BLAME THE MATH GENIUSES who are putting us
through this ordeal.
› Also this presentation will be long, so GIDDY UP!!!!!!!!
WHAT IS REGRESSION?
›Correlation: change in one variable corresponding with change in another
variable.
›Regression is a statistical measurement used in social science, finance,
investing and other disciplines that attempts to determine the strength of the
relationship between one dependent variable (usually denoted by Y) and a
series of other changing variables(known as independent variables).
›The term regression was first used by Francis Galton.
›The dictionary meaning of the word regression is ‘stepping back’ or ‘going
back’.
›Regression is the measures of the average relationship between two or more
variables in terms of the original units of the data.
lo pe
S
B=
Y(pre)= (A+BX)+ E
MULTIPLE REGRESSION
› Multiple regression analysis is a powerful technique used for predicting the
unknown value of a variable from the known value of two or more variables.
› The variable whose value is to be predicted is known as the dependent
variable and also known as criterion variable.
› The ones whose known values are used for prediction are known
independent(exploratory) variables and also called as predictors.
Multiple regression allows us to:
› Use several variables at once to explain the variation in a continuous dependent
variable.
› Isolate the unique effect of one variable on the continuous dependent variable
while taking into consideration that other variables are affecting it too.
› It shows criterion variables as a function of the predictor variables
› The predictor variables can be “fixed” treatment variable or classification
variable.
› For example:
a = constant/intercept value
bn= partial regression coefficient
Xn= predictor variables
Standardized Regression Equation
● Based on the standard score.
● Weights of IVs in unstandardized regression equation have different units .
● In order to standardize those units we use standard scores, free from different units.
● Standard score,
Z= (X-X)
S.D
Y (pre)= 1X 1 + 2 X 2 + 3 X 3- - - - - -+ nx n
● In this, one unit change in IV is associated with how many standard score change in DV .
PARTIAL CORRELATION
AND
SEMI-PARTIAL
CORRELATION
Partial Correlation
A direct procedure for controlling correlation third variable is by
partial correlation.
Its allows researcher to measure the relationship between two
variables while eliminating or holding constant the effects of the
third variable.
Here three variables are X,Y and Z it is possible to compute three
individual Pearson correlation
,rXY meaning correlation between X and Y.
,rXZ meaning correlation between X and Z.
,rYZ meaning correlation between Y and Z.
Formula Of Partial Correlation
r12 r23r13
r2 (1.3)
1 r132
Cont.
Where,
r12 = correlation coefficient between var 1 and 2
r12 = correlation coefficient between var 1 and 3
r23 = correlation coefficient between var 2 and 3
Multiple correlation coefficient
› Multiple correlation coefficient denotes Correlation of one
variable With Other Multiple variables and is denoted by “R”.
Coefficient of Determination (R 2 )
DV Q
R
3. SEQUENTIAL
REGRESSION METHOD
THE SEQUENTIAL REGRESSION METHOD
Also known as researcher-controlled regression method,
covariance analysis,hierarchical analysis,& block-entry analysis.
The researcher-controlled regression methods are really variations
on a theme.
It is the researchers who specify the order of entry of predictors
into the equation.
The main issue that the researcher face is to determine how many
variables one instructed to enter the equation at any one time.
ASSUMPTIONS
Assumptions
Normality of distribution of errors.
○ Assessed using:
■ Graphical Method:
● Q-Q plots
● Histogram:
○ Assessed using:
■ Graphical Method:
● Scatterplots of observed vs. predicted value or residual
vs. predicted value.
○ Fig 1, shows a non-linear or a curvilinear
relationship.
○ Fig. 2, shows a linear relationship.
Homogeneity of variation assumption (Homoscedasticity):
■ Assessed through:
● Durbine-Watson Test:
○ This tests the null hypothesis that the residuals are
not linearly auto-correlated.
○ While d can assume values between 0 and 4,
values around 2 indicate no autocorrelation.
○ As a rule of thumb values of 1.5 < d < 2.5 show that
there is no autocorrelation in the data.
DATA REQUIREMENTS
● Criterion variable must be on a continuous scale, i.e., criterion
variable cannot be dichotomous or categorical.
• Combined variance:38.6%
• Examination of B and ß values(regression coefficient):
Depression, hostility and psychoticism correlate positively and
anxiety correlate negatively but as the standard coefficient explains
very less correlation by anxiety and interpersonal sensitivity its
correlation is neglected.
• Level of significance: p<0.05 , depression, hostility and psychoticism
found to be significant
• Regression Equation:
• In raw score form:
Suc Cog =24.630 +1.203IS + 5.368Dp – 0.625Anx + 2.284 Hos. + 4.349psh
• In standard score form:
Suc Cog= 0.063 IS + 0.316Dp + 0.035Anx + 0.142 Hos. + 0.216psh
STEPWISE REGRESSION:
•COMBIMED VARIANCE: 38.5%
• Contribution in prediction:
Depression predicts the most ,
then pyshoticism
and last hostility.
(all correlate positively)
• all significant
•Equation:
• Raw score score:
Suc Cog= 24.883 + 5.621Dp + 4.518Psh + 2.275Hos.
• Standard score form:
Suc Cog= .331Dp + .214psh + .142Hos.
ADVANTAGES AND
DISADVANTAGES
ADVANTAGES
Ability to determine the relative influence of one or more predictor
variables to the criterion value.
It provides a functional relationship b/w two or more related
variables.
It provides a measure of errors of estimates made through
the regression line.
Ability to identify the outliers.
This technique is highly used in our day to day life .example-Birth
rate, death rate, tax rate,etc .
DISADVANTAGES
It is assumed that the cause and effect relationship b/w
the variable remains unchanged and this mey lead to
erraneous and misleading results.
Limited dates mey lead to misleading result.
It involves very lengthy and complicated procedure of
calculation and analysis.
It can not be used in case of qualitative phenomenon like
honesty,crime etc.
ISSUES
Adding more IV to a multiple regression procedure does not mean
the regression will be “better” or offer better prediction, in fact it can
make things worse, that is called “OVERFITTING”.
The addition of more IVS create more relationship among them so
not only one the IVS potentially related to the OV, they are also
potentially related to each other. When this happen, it is called
“multicollinearity”.
Raw score:
Y=
Standardized score:
Y=
• The goal of multiple regression is to produce a model in the
form of a linear equation that identifies the best weighted
combination of independent variables in the study to
optimally predict the criterion variable.