You are on page 1of 24

Regression explained in simple terms

A Vijay Gupta Publication


SPSS for Beginners Vijay Gupta 2000. All rights reside with author.

vjbooks.net

egression e!plained
Copyright 2000 Vijay Gupta Published by VJ oo!s "nc#

All rights reserved. "o part of this book #ay be used or reprodu$ed in any for# or by any #eans% or stored in a database or retrieval syste#% without prior written per#ission of the publisher e!$ept in the $ase of brief &uotations e#bodied in reviews% arti$les% and resear$h papers. 'aking $opies of any part of this book for any purpose other than personal use is a violation of (nited States and international $opyright laws. )or infor#ation $onta$t Vijay Gupta at vgupta*000+aol.$o#. ,ou $an rea$h the author at vgupta*000+aol.$o#. -ibrary of .ongress .atalog "o./ Pending 0SB"/ Pending )irst year of printing/ 2000 1ate of this $opy/ April 22% 2000 3his book is sold as is% without warranty of any kind% either e!press or i#plied% respe$ting the $ontents of this book% in$luding but not li#ited to i#plied warranties for the book4s &uality% perfor#an$e% #er$hantability% or fitness for any parti$ular purpose. "either the author% the publisher and its dealers% nor distributors shall be liable to the pur$haser or any other person or entity with respe$t to any liability% loss% or da#age $aused or alleged to be $aused dire$tly or indire$tly by the book. Publisher/ V5Books 0n$. $ditor/ Vijay Gupta Author/ Vijay Gupta

vjbooks.net

About the Author


Vijay Gupta has taught statisti$s and e$ono#etri$s to graduate students at Georgetown (niversity. A Georgetown (niversity graduate with a 'asters degree in e$ono#i$s% he has a vision of #aking the tools of e$ono#etri$s and statisti$s easily a$$essible to professionals and graduate students. 0n addition% he has assisted the 6orld Bank and other organi7ations with statisti$al analysis% design of international invest#ents% $ost8benefit and sensitivity analysis% and training and troubleshooting in several areas. 9e is $urrently working on/ a pa$kage of SPSS S$ripts :'aking the )or#atting of ;utput <asy: a #anual on 6ord a #anual for <!$el a tutorial for <8Views an <!$el add8in :3ools for <nri$hing <!$el4s 1ata Analysis .apa$ity: <!pe$t the# to be available during fall 2000. <arly versions $an be downloaded fro# www.vgupta.$o#.

vjbooks.net

%"&$AR R$GR$''"(&
0nterpretation of regression output is dis$ussed in se$tion **. ;ur approa$h #ight $onfli$t with pra$ti$es you have e#ployed in the past% su$h as always looking at the 8s&uare first. As a result of our vast e!perien$e in using and tea$hing e$ono#etri$s% we are fir# believers in our approa$h. ,ou will find the presentation to be &uite si#ple 8 everything is in one pla$e and displayed in an orderly #anner. 3he a$$eptan$e =as being reliable>true? of regression results hinges on diagnosti$ $he$king for the breakdown of $lassi$al assu#ptions2. 0f there is a breakdown% then the esti#ation is unreliable% and thus the interpretation fro# se$tion * is unreliable. 3he table in se$tion 2 su$$in$tly lists the various possible breakdowns and their i#pli$ations for the reliability of the regression results 2. 6hy is the result not a$$eptable unless the assu#ptions are #et@ 3he reason is that the strong state#ents inferred fro# a regression =i.e. 8 :an in$rease in one unit of the value of variable A $auses an in$rease in the value of variable , by 0.2* units:? depend on the presu#ption that the variables used in a regression% and the residuals fro# the regression% satisfy $ertain statisti$al properties. 3hese are e!pressed in the properties of the distribution of the residuals (that explains why so many of the diagnostic tests shown in sections 3-4 and the corrective methods are based on the use of the residuals). 0f these properties are satisfied% then we $an be $onfident in our interpretation of the results. 3he above state#ents are based on $o#ple! for#al #athe#ati$al proofs. Please $he$k your te!tbook if you are $urious about the for#al foundations of the state#ents. Se$tion 2 provides a brief s$he#a for $he$king for the breakdown of $lassi$al assu#ptions. 3he testing usually involves infor#al =graphi$al? and for#al =distribution8based hypothesis tests like the ) and 3? testing% with the latter involving the running of other regressions and $o#puting of variables.

)# "nterpretation o* regression results

<ven though interpretation pre$edes $he$king for the breakdown of $lassi$al assu#ptions% it is good pra$ti$e to first $he$k for the breakdown of $lassi$al assu#ptions =se$tion B?% then to $orre$t for the breakdowns% and then% finally% to interpret the results of a regression analysis.
2

6e will use the phrase :.lassi$al Assu#ptions: often. .he$k your te!tbook for details about these assu#ptions. 0n si#ple ter#s% regression is a statisti$al #ethod. 3he fa$t that this generi$ #ethod $an be used for so #any different types of #odels and in so #any different fields of study hinges on one area of $o##onality 8 the #odel rests on the bedro$k of the solid foundations of well8established and proven statisti$al properties>theore#s. 0f the spe$ifi$ regression #odel is in $on$ordan$e with the $ertain assu#ptions re&uired for the use of these properties>theore#s% then the generi$ regression results $an be inferred. 3he $lassi$al assu#ptions $onstitute these re&uire#ents.
2

0f you find any breakdown=s? of the $lassi$al assu#ptions% then you #ust $orre$t for it by taking appropriate #easures. .hapter C looks into these #easures. After running the :$orre$ted: #odel% you again #ust perfor# the full range of diagnosti$ $he$ks for the breakdown of $lassi$al assu#ptions. 3his pro$ess will $ontinue until you no longer have a serious breakdown proble#% or the li#itations of data $o#pel you to stop.

Vjbooks.net

Assu#e you want to run a regression of wage on age% work experience% education% gender, and a du##y for sector of employment =whether e#ployed in the publi$ se$tor?. wage D fun$tion=age% work experience% education% gender% sector? or% as your te!tbook will have it% wage D * E 2Fage E 2Fwork experience E BFeducation Always look at the #odel fit =IA";VAJ? first. 1o not #ake the #istake of looking at the 8s&uare before $he$king the goodness of fit. Signifi$an$e of the #odel (!"id the model explain the deviations in the dependent variable#)
3he last $olu#n shows the goodness of fit of the #odel. 3he lower this nu#ber% the better the fit. 3ypi$ally% if ISigJ is greater than 0.0G% we $on$lude that our #odel $ould not fit the data+.

GFgender A&(VAa

HFsector

,odel * egression esidual 3otal

'um o* '-uares $4$%4.3& $((&$.4) %'*)'&.&

d* $ %&)+ %&&(

,ean '-uare %'&'(.)) (*.3%&

. 4%4.(*(

'ig# .''' b

a. 1ependent Variable/ 6AG< b. 0ndependent Variables/ =.onstant?% 6; KL<A% <1(.A30;"% G<"1< % P(BLS<.% AG<

0f Sig M .0*% then the #odel is signifi$ant at NNO% if Sig M .0G% then the #odel is signifi$ant at NGO% and if Sig M.*% the #odel is signifi$ant at N0O. Signifi$an$e i#plies that we $an a$$ept the #odel. 0f SigP.%* then the #odel was not signifi$ant =a relationship $ould not be found? or : 8s&uare is not signifi$antly different fro# 7ero.:

Vjbooks.net

3he ) is $o#paring the two #odels below/ )# wage / ) 0 21age 0 21work experience 0 +1education + 31gender + 41sector 2. wage D * =0n for#al ter#s% the ) is testiong the hypothesis/ * D 2 D 2 D B, G, HD0
0f the ) is not signifi$ant% then we $annot say that #odel * is any better than #odel 2. 3he i#pli$ation is obvious88 the use of the independent variables has not assisted in predi$ting the dependent variable.

'um o* s-uares 3he 3SS =3otal Su# of S&uares? is the total deviations in the dependent variable. 5he aim o* the regression is to explain these de6iations =by finding the best betas that $an #ini#i7e the su# of the s&uares of these deviations?. 3he <SS =<!plained Su# of S&uares? is the a#ount of the 3SS that $ould be e!plained by the #odel. 3he 8s&uare% shown in the ne!t table% is the ratio <SS>3SS. 0t $aptures the per$ent of deviation fro# the #ean in the dependent variable that $ould be e!plained by the #odel. 3he SS is the a#ount that $ould not be e!plained =3SS #inus <SS?.

0n the previous table% the $olu#n :Su# of S&uares: holds the values for 3SS% <SS% and SS. 3he row :3otal: is 3SS =*0HC0N.N in the e!a#ple?% the row : egression: is <SS =GBG*B.2N in the e!a#ple?% and the row : esidual: $ontains the SS =G22NG.BC in the e!a#ple?.

Vjbooks.net

Adjusted R7s-uare 'easures the proportion of the 6ariance in the dependent variable =wage? that was e!plained by variations in the independent variables. 0n this e!a#ple% the IAdjusted 8 S&uareJ shows that G0.NO of the varian$e was e!plained. R7s-uare 'easures the proportion of the 6ariation in the dependent variable =wage? that was e!plained by variations in the independent variables. 0n this e!a#ple% the : 8S&uare:4 tells us that G*O of the variation =and not the varian$e? was e!plained.

a8b ,odel 'ummary

,odel

Variables $ntered Remo6ed -./0123, 2"45678.9, :29"2/, . ;4<1=25, c,d 6:2

R '-uare

Adjusted R '-uare

'td# $rror o* the $stimate

.$%'

.$'&

$.%3'(

a. 1ependent Variable/ 6AG< b. 'ethod/ <nter $. 0ndependent Variables/ =.onstant?% 6; KL<A% <1(.A30;"% G<"1< % P(BLS<.% AG< d. All re&uested variables entered.

'td $rror o* $stimate Std error of the esti#ate #easures the dispersion of the dependent variables esti#ate around its #ean =in this e!a#ple% the IStd. <rror of the <sti#ateJ is G.*2?. .o#pare this to the #ean of the IPredi$ted: values of the dependent variable. 0f the Std. <rror is #ore than *0O of the #ean% it is high.

5he reliability o* indi6idual coe**icients 3he table I.oeffi$ientsJ provides infor#ation on the $onfiden$e with whi$h we $an support the esti#ate for ea$h su$h esti#ate =see the $olu#ns I3J and ISig.J.? 0f the value in ISig.J is less than 0.0G% then we $an assu#e that the esti#ate in $olu#n IBJ $an be asserted as true with a NGO level of $onfiden$eG. Always interpret the :Sig: value first. 8f this value is more than ' .% then the coefficient estimate is not reliable because it has >too> much dispersion?variance. 5he indi6idual coe**icients 3he table I.oeffi$ientsJ provides infor#ation effe$t of individual variables =the :<sti#ated .oeffi$ients: or IbetaJ 88see $olu#n IBJ? on the dependent variable Con*idence "nter6al

0f the value is greater than 0.0G but less than 0.*% we $an only assert the vera$ity of the value in IBJ with a N0O level of $onfiden$e. 0f ISigJ is above 0.*% then the esti#ate in IBJ is unreliable and is said to not be statisti$ally signifi$ant. 3he $onfiden$e intervals provide a range of values within whi$h we $an assert with a NGO level of $onfiden$e that the esti#ated $oeffi$ient in IBJ lies. )or e!a#ple% :3he $oeffi$ient for age lies in the range .0N* and .*BG with a NGO level of $onfiden$e% while the $oeffi$ient for gender lies in the range 82.GNQ and 8*.BH2 at a NGO level of $onfiden$e.:

Vjbooks.net

Coe**icientsa
9nstandardi:ed Coe**icients ,odel =.onstant? AG< <1(.A30;" G<"1< P(BLS<. 6; KL<A -%.)(' .%%) .+++ -(.'3' %.+4% .%'' 'td# $rror .4(' .'%4 .'($ .()& .(&( .'%+ t -4.33& ).*3$ 3%.*(( -+.'(3 $.&$+ $.)$4 'ig# .''' .''' .''' .''' .''' .''' <3= Con*idence "nter6al *or %o;er 9pper ound ound -(.*43 -.&&+ .'&% .%4$ .+(& .)($ -(.$&+ -%.4*3 %.%*) (.3%4 .'*+ .%34

a. 1ependent Variable/ 6AG<

Re g r e s s i o n St a n d a r d i z e d Pr e d i c t e d Va l u e

Plot o* residual 6ersus predicted dependent 6ariable 3his is the plot for the Sc a t t e r p l o t standardi7ed predi$ted variable and the standardi7ed residuals. De p e n d e n t Va r ia b l e : WA GE 4 3he pattern in this plot indi$ates the presen$e of #is8 3 spe$ifi$ationH and>or heteroskedasti$ityQ. 2
1

-1

-2 -4 -2 0 2 4 6 8 10

Re g r e s s io n St a n d a r d iz e d Re s i d u a l

3his in$ludes the proble#s of in$orre$t fun$tional for#% o#itted variable% or a #is8#easured independent variable.

for#al test su$h as the <S<3 3est is re&uired to $on$lusively prove the e!isten$e of #is8 spe$ifi$ation. eview your te!tbook for the step8by8step des$ription of the <S<3 test.
Q

A for#al test like the 6hite4s 3est is ne$essary to $on$lusively prove the e!isten$e of heteroskedasti$ity.

eview

your te!tbook for the step8by8step des$ription of the <S<3 test.

Vjbooks.net

Plot o* residuals 6ersus independent 6ariables

3he definite positive pattern indi$atesC the presen$e of heteroskedasti$ity $aused% at least in part% by the variable edu$ation.

Pa r t ia l Re s id u a l Pl o t De p e n d e n t Va r ia b l e : WA GE
50 40

30

20

10

WA GE

-1 0 -2 0 -2 0 -1 0 0 10 20

EDUCA TION

3he plot of age and the residual has no patternN% whi$h i#plies that no heteroskedasti$ity is $aused by this variable.

Pa r t ia l Re s id u a l Pl o t De p e n d e n t Va r ia b l e : WA GE
50 40

30

20

10

W AG E

-1 0 -2 0 -3 0 -2 0 -1 0 0 10 20 30 40

A GE

A for#al test like the 6hiteRs 3est is re&uired to $on$lusively prove the e!isten$e and stru$ture of

heteroskedasti$ity . N So#eti#es these plots #ay not show a pattern. 3he reason #ay be the presen$e of e!tre#e values that widen the s$ale of one or both of the a!es% thereby :s#oothing out: any patterns. 0f you suspe$t this has happened% as would be the $ase if #ost of the graph area were e#pty save for a few dots at the e!tre#e ends of the graph% then res$ale the a!es using the #ethods. 3his is true for all s$atter graphs.

Vjbooks.net

Plots o* the residuals 3he histogra# and the P8P plot of the residual suggest that the residual is probably nor#ally distributed*0. ,ou $an also use other tests to $he$k for nor#ality.
His t o g r a m De p e n d e n t Va r ia b l e : WA GE
6 00

5 00

No r m a l P-P Pl o t o f Re g r e s s ion St a n da r d iz e d Re s id u a l De p e n de n t Va r ia b l e : WA GE
1 .00

4 00

3 00
.7 5

0deali7ed "or#al .urve. 0n order to #eet the $lassi$al assu#ptions% .the residuals should% roughly% follow this $urves shape.

E x p e c t e d Cu m P r o b

.5 0

Fr e q u e n c y

2 00

1 00 0
-3 -2 -1 0. 1 2 3 4 5 6 7 8 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 0 0 0 0 0 0 0 0 0

Std . De v = 1 .00 Me a n = 0.00 N = 1 9 9 3 .00


.0

.2 5

0.00 0 .0 0 .2 5 .5 0

3he thi$k $urve should lie $lose .7 5 1 .0 0 to the diagonal.

Re g r e s s i o n St a n d a r d i z e d Re s i d u a l

Ob s e r v e d Cu m Pr o b

*0

3he residuals should be distributed nor#ally. 0f not% then so#e $lassi$al assu#ption has been violated.

Vjbooks.net

Regression output interpretation guidelines


&ame (* 'tatistic> Chart Sig.8) ?hat @oes "t ,easure (r "ndicateA 6hether the #odel as a whole is signifi$ant. 0t tests whether 8 s&uare is signifi$antly different fro# 7ero Critical Values Comment

8 below .0* for NNO $onfiden$e in the ability of the #odel to e!plain the dependent variable

8 below .0G for NGO $onfiden$e in the ability of the #odel to e!plain the dependent variable

8 below 0.* for N0O $onfiden$e in the ability of the #odel to e!plain the dependent variable SS% <SS S 3SS 3he #ain fun$tion of these values lies in $al$ulating test statisti$s like the )8test% et$. 3he <SS should be high $o#pared to the 3SS =the ratio e&uals the 8s&uare?. "ote for interpreting the SPSS table% $olu#n :Su# of S&uares:/ :3otal: D3SS% : egression: D <SS% and : esidual: D SS

5he *irst statistic to loo! *or in 'P'' output. 0f Sig.8) is insignifi$ant% then the regression as a whole has failed. "o #ore interpretation is ne$essary =although so#e statisti$ians disagree on this point?. ,ou #ust $on$lude that the :1ependent variable $annot be e!plained by the independent>e!planatory variables.: 3he ne!t steps $ould be rebuilding the #odel% using #ore data points% et$.

0f the 8s&uares of two #odels are very si#ilar or rounded off to 7ero or one% then you #ight prefer to use the )8test for#ula that uses SS and <SS.

Vjbooks.net

&ame (* 'tatistic> Chart S< of egression

?hat @oes "t ,easure (r "ndicateA 3he standard error of the esti#ate predi$ted dependent variable

Critical Values

Comment

3here is no $riti$al value. 5ust $o#pare the std. error to the #ean of the predi$ted dependent variable. 3he for#er should be s#all =M*0O? $o#pared to the latter.

,ou #ay wish to $o##ent on the S<% espe$ially if it is too large or s#all relative to the #ean of the predi$ted>esti#ated values of the dependent variable. 3his often #is8used value should serve only as a su##ary #easure of Goodness of )it. 1o not use it blindly as a $riterion for #odel sele$tion. Another su##ary #easure of Goodness of )it. Superior to 8s&uare be$ause it is sensitive to the addition of irrelevant variables.

8S&uare

Proportion of variation in the dependent variable that $an be e!plained by the independent variables Proportion of varian$e in the dependent variable that $an be e!plained by the independent variables or 8s&uare adjusted for T of independent variables 3he reliability of our esti#ate of the individual beta

Between 0 and *. A higher value is better.

Adjusted 8s&uare

Below *. A higher value is better

38 atios

-ook at the p8value =in the $olu#n ISig.J? it #ust be low/ 8 below .0* for NNO $onfiden$e in the value of the esti#ated $oeffi$ient

)or a one8tailed test =at NGO $onfiden$e level?% the $riti$al value is =appro!i#ately? *.HG for testing if the $oeffi$ient is greater than 7ero and =appro!i#ately? 8*.HG for testing if it is below 7ero.

8 below .0G for NGO $onfiden$e in the value of the esti#ated $oeffi$ient

8 below .* for N0O $onfiden$e in the value of the esti#ated $oeffi$ient

Vjbooks.net

&ame (* 'tatistic> Chart .onfiden$e 0nterval for beta

?hat @oes "t ,easure (r "ndicateA 3he NGO $onfiden$e band for ea$h beta esti#ate

Critical Values

Comment

3he upper and lower values give the NGO $onfiden$e li#its for the $oeffi$ient

Any value within the $onfiden$e interval $annot be reje$ted =as the true value? at NGO degree of $onfiden$e

.harts/ S$atter of predi$ted dependent variable and residual

'is8spe$ifi$ation and>or heteroskedasti$ity

3here should be no dis$ernible pattern. 0f there is a dis$ernible pattern% then do the <S<3 and>or 16 test for #is8spe$ifi$ation or the 6hiteRs test for heteroskedasti$ity

<!tre#ely useful for $he$king for breakdowns of the $lassi$al assu#ptions% i.e. 8 for proble#s like #is8 spe$ifi$ation and>or heteroskedasti$ity. At the top of this table% we #entioned that the )8statisti$ is the first output to interpret. So#e #ay argue that the UP <18U <S01 plot is #ore i#portant =their rationale will be$o#e apparent as you read through the rest of this $hapter and $hapter C?. .o##on in $ross8se$tional data.

.harts/ plots of residuals against independent variables

9eteroskedasti$ity

3here should be no dis$ernible pattern. 0f there is a dis$ernible pattern% then perfor# 6hite4s test to for#ally $he$k.

0f a partial plot has a pattern% then that variable is a likely $andidate for the $ause of heteroskedasti$ity. .harts/ 9istogra#s of residuals Provides an idea about the distribution of the residuals 3he distribution should look like a nor#al distribution A good way to observe the a$tual behavior of our residuals and to observe any severe proble# in the residuals =whi$h would indi$ate a breakdown of the $lassi$al assu#ptions?

Vjbooks.net

Problems caused by brea!do;n o* classical assumptions


3he fa$t that we $an #ake bold state#ents on $ausality fro# a regression hinges on the $lassi$al linear #odel. 0f its assu#ptions are violated% then we #ust re8spe$ify our analysis and begin the regression anew. 0t is very unsettling to reali7e that a large nu#ber of institutions% journals% and fa$ulties allow this fa$t to be overlooked. 6hen using the table below% re#e#ber the ordering of the severity of an i#pa$t. 3he worst i#pa$t is a bias in the ) =then the #odel $ant be trusted? A se$ond disastrous i#pa$t is a bias in the betas =the $oeffi$ient esti#ates are unreliable? .o#pared to the above% biases in the standard errors and 3 are not so har#ful =these biases only affe$t the reliability of our $onfiden$e about the variability of an esti#ate% not the reliability about the value of the esti#ate itself?

'ummary o* impact o* a brea!do;n o* a classical assumption on the reliability ;ith ;hich regression output can be interpreted

Violation "mpact
'easure#ent error in dependent variable 'easure#ent error in independent variable 0rrelevant variable ;#itted variable 0n$orre$t fun$tional for#

R2

'td error =of esti#ate?

'td error =of ?

Count o* 6iolations

B B B B B

B B B B B

2 4 2 4 4

B B

B B

B B

B B

Vjbooks.net

Violation "mpact
9eteroskedasti$ity .ollinearity Si#ultaneity Bias

R2

'td error =of esti#ate?

'td error =of ?

Count o* 6iolations

B B B

B B B

2 2 4

Legend for understanding the table

3he statisti$ is still reliable and unbiased. 3he statisti$ is biased% and thus $annot be relied upon. (pward bias in esti#ation 1ownward bias in esti#ation.

Vjbooks.net

@iagnostics
3his se$tion lists so#e #ethods of dete$ting for breakdowns of the $lassi$al assu#ptions. 6hy is the result not a$$eptable unless the assu#ptions are #et@ 3he reason is si#ple 8 the strong state#ents inferred fro# a regression =e.g. 8 :an in$rease in one unit of the value of variable A $auses an in$rease of the value of variable , by 0.2* units:? depend on the presu#ption that the variables used in a regression% and the residuals fro# that regression% satisfy $ertain statisti$al properties. 3hese are e!pressed in the properties of the distribution of the residuals. 7hat explains why so many of the diagnostic tests shown in sections +.4-+.$ and their relevant corrective methods, shown in this chapter, are based on the use of the residuals. 0f these properties are satisfied% then we $an be $onfident in our interpretation of the results. 3he above state#ents are based on $o#ple!% for#al #athe#ati$al proofs. Please refer to your te!tbook if you are $urious about the for#al foundations of the state#ents. 6ith e!perien$e% you should develop the habit of doing the diagnosti$s before interpreting the #odel4s signifi$an$e% e!planatory power% and the signifi$an$e and esti#ates of the regression $oeffi$ients. 0f the diagnosti$s show the presen$e of a proble#% you #ust first $orre$t the proble# and then interpret the #odel. e#e#ber that the power of a regression analysis =after all% it is e!tre#ely powerful to be able to say that :data shows that A $auses , by this slope fa$tor:? is based upon the fulfill#ent of $ertain $onditions that are spe$ified in what have been dubbed the :$lassi$al: assu#ptions. efer to your te!tbook for a $o#prehensive listing of #ethods and their detailed des$riptions. 0f a for#al** diagnosti$ test $onfir#s the breakdown of an assu#ption% then you #ust atte#pt to $orre$t for it. 3his $orre$tion usually involves running another regression on a transfor#ed version of the original #odel% with the e!a$t nature of the transfor#ation being a fun$tion of the $lassi$al regression assu#ption that has been violated*2.

Collinearity13
.ollinearity between variables is always present. A proble# o$$urs if the degree of $ollinearity is high enough to bias the esti#ates. "ote/ .ollinearity #eans that two or #ore of the independent>e!planatory variables in a regression have a linear relationship. 3his $auses a proble# in the interpretation of the regression results. 0f the variables have a $lose linear relationship% then the esti#ated regression $oeffi$ients and 38 statisti$s #ay not be able to properly isolate the uni&ue effe$t>role of ea$h variable and the $onfiden$e with whi$h we $an presu#e these effe$ts to be true. 3he $lose relationship of the
**

(sually% a :for#al: test uses a hypothesis testing approa$h. 3his involves the use of testing against distributions like the 3% )% or .hi8S&uare. An :infor#al4 test typi$ally refers to a graphi$al test.
*2

1onRt worry if this line $onfuses you at present 8 its #eaning and relevan$e will be$o#e apparent as you read through this $hapter.
*2

Also $alled 'ulti$ollinearity.

Vjbooks.net

variables #akes this isolation diffi$ult. ;ur e!planation #ay not satisfy a statisti$ian% but we hope it $onveys the funda#ental prin$iple of $ollinearity. Su##ary #easures for testing and dete$ting $ollinearity in$lude/ unning bivariate and partial $orrelations =see se$tion G.2?. A bivariate or partial $orrelation $oeffi$ient greater than 0.C =in absolute ter#s? between two variables indi$ates the presen$e of signifi$ant $ollinearity between the#. .ollinearity is indi$ated if the 8s&uare is high =greater than 0.QG *B? and only a few 38values are signifi$ant. .he$k your te!tbook for #ore on $ollinearity diagnosti$s.

,is7speci*ication
'is8spe$ifi$ation of the regression #odel is the #ost severe proble# that $an befall an e$ono#etri$ analysis. (nfortunately% it is also the #ost diffi$ult to dete$t and $orre$t. "ote/ 'is8spe$ifi$ation $overs a list of proble#s. 3hese proble#s $an $ause #oderate or severe da#age to the regression analysis. ;f graver i#portan$e is the fa$t that #ost of these proble#s are $aused not by the nature of the data>issue% but by the #odeling work done by the resear$her. 0t is of the ut#ost i#portan$e that every resear$her realise that the responsibility of $orre$tly spe$ifying an e$ono#etri$ #odel lies solely on the#. A proper spe$ifi$ation in$ludes deter#ining $urvature =linear or not?% fun$tional for# =whether to use logs% e!ponentials% or s&uared variables?% and the a$$ura$y of #easure#ent of ea$h variable% et$.

'is8spe$ifi$ation $an be of several types/ in$orre$t fun$tional for#% o#ission of a relevant independent variable% and>or #easure#ent error in the variables. Se$tions Q.B.$ to Q.B.f list a few su##ary #ethods for dete$ting #is8spe$ifi$ation. efer to your te!tbook for a $o#prehensive listing of #ethods and their detailed des$riptions.

'imultaneity bias
Si#ultaneity bias #ay be seen as a type of #is8spe$ifi$ation. 3his bias o$$urs if one or #ore of the independent variables is a$tually dependent on other variables in the e&uation. )or e!a#ple% we are using a #odel that $lai#s that in$o#e $an be e!plained by invest#ent and edu$ation. 9owever% we #ight believe that invest#ent% in turn% is e!plained by in$o#e. 0f we were to use a si#ple #odel in whi$h in$o#e =the dependent variable? is regressed on invest#ent and edu$ation =the independent variables?% then the spe$ifi$ation would be in$orre$t be$ause invest#ent would not really be :independent: to the #odel 8 it is affe$ted by in$o#e. 0ntuitively% this is a proble# be$ause the si#ultaneity i#plies that the residual will have so#e relation with the variable that has been in$orre$tly spe$ified as :independent: 8 the residual is $apturing =#ore in a #etaphysi$al than for#al #athe#ati$al sense? so#e of the un#odeled reverse relation between the :dependent: and :independent: variables.
*B

So#e books advise using 0.C.

Vjbooks.net

"ncorrect *unctional *orm


0f the $orre$t relation between the variables is non8linear but you use a linear #odel and do not transfor# the variables% then the results will be biased. 6hy should an in$orre$t fun$tional for# lead to severe proble#s@ egression is based on finding $oeffi$ients that #ini#i7e the :su# of s&uared residuals.: <a$h residual is the differen$e between the predi$ted value =the regression line? of the dependent variable versus the reali7ed value in the data. 0f the fun$tional for# is in$orre$t% then ea$h point on the regression :line: is in$orre$t be$ause the line is based on an in$orre$t fun$tional for#. A si#ple e!a#ple/ assu#e , has a log relation with A =a log $urve represents their s$atter plot? but a linear relation with :-og A.: 0f we regress , on A =and not on :-og A:?% then the esti#ated regression line will have a syste#i$ tenden$y for a bias be$ause we are fitting a straight line on what should be a $urve. 3he residuals will be $al$ulated fro# the in$orre$t :straight: line and will be wrong. 0f they are wrong% then the entire analysis will be biased be$ause everything hinges on the use of the residuals. -isted below are #ethods of dete$ting in$orre$t fun$tional for#s/ Perfor# a preli#inary visual test. Any pattern in a plot of the predi$ted variable and the residuals plot i#plies #is8spe$ifi$ation =and>or heteroskedasti$ity? due to the use of an in$orre$t fun$tional for# or due to o#ission of a relevant variable. 0f the visual test indi$ates a proble#% perfor# a for#al diagnosti$ test like the <S<3 test or the 16 test. .he$k the #athe#ati$al derivation =if any? of the #odel. 1eter#ine whether any of the s$atter plots have a non8linear pattern. 0f so% is the pattern log% s&uare% et$@ 3he nature of the distribution of a variable #ay provide so#e indi$ation of the transfor#ation that should be applied to it. )or e!a#ple% se$tion 2.2 showed that wage is non8nor#al but that its log is nor#al. 3his suggests re8spe$ifying the #odel by using the log of wage instead of wage. .he$k your te!tbook for #ore #ethods.

(mitted 6ariable
"ot in$luding a variable that a$tually plays a role in e!plaining the dependent variable $an bias the regression results. 'ethods of dete$tion *G in$lude/ Any pattern in this plot i#plies #is8spe$ifi$ation =and>or heteroskedasti$ity? due to the use of an in$orre$t fun$tional for# or due to the o#ission of a relevant variable. 0f the visual test indi$ates a proble#% perfor# a for#al diagnosti$ test su$h as the <S<3 test. Apply your intuition% previous resear$h% hints fro# preli#inary bivariate analysis% et$. )or e!a#ple% in the #odel we ran% we believe that there #ay be an o#itted variable bias be$ause of the absen$e of two $ru$ial variables for wage deter#ination 8 whether the labor is unioni7ed and the professional se$tor of work =#edi$ine% finan$e% retail% et$.?. .he$k your te!tbook for #ore #ethods.

*G

3he first three tests are si#ilar to those for 0n$orre$t )un$tional for#.

Vjbooks.net

"nclusion o* an irrele6ant 6ariable


3his #is8spe$ifi$ation o$$urs when a variable that is not a$tually relevant to the #odel is in$luded*H. 3o dete$t the presen$e of irrelevant variables/ <!a#ine the signifi$an$e of the 38statisti$s. 0f the 38statisti$ is not signifi$ant at the *0O level =usually if 3M *.HB in absolute ter#s?% then the variable #ay be irrelevant to the #odel.

,easurement error
3his is not a very severe proble# if it only affli$ts the dependent variable% but it #ay bias the 38 statisti$s. 'ethods of dete$ting this proble# in$lude/ Knowledge about proble#s>#istakes in data $olle$tion 3here #ay be a #easure#ent error if the variable you are using is a pro!y for the a$tual variable you intended to use. 0n our e!a#ple% the wage variable in$ludes the #oneti7ed values of the benefits re$eived by the respondent. But this is a subje$tive #oneti7ation of respondents and is probably undervalued. As su$h% we $an guess that there is probably so#e #easure#ent error. .he$k your te!tbook for #ore #ethods 'easure#ent errors $ausing proble#s $an be easily understood. ;#itted variable bias is a bit #ore $o#ple!. 3hink of it this way 8 the deviations in the dependent variable are in reality e!plained by the variable that has been o#itted. Be$ause the variable has been o#itted% the algorith# will% #istakenly% apportion what should have been e!plained by that variable to the other variables% thus $reating the error=s?. e#e#ber/ our e!planations are too infor#al and probably in$orre$t by stri$t #athe#ati$al proof for use in an e!a#. 6e in$lude the# here to help you understand the proble#s a bit better.

Ceteros!edasticity
9eteroskedasti$ity i#plies that the varian$es =i.e. 8 the dispersion around the e!pe$ted #ean of 7ero? of the residuals are not $onstant% but that they are different for different observations. 3his $auses a proble#/ if the varian$es are une&ual% then the relative reliability of ea$h observation =used in the regression analysis? is une&ual. 3he larger the varian$e% the lower should be the i#portan$e =or weight? atta$hed to that observation. As you will see in se$tion C.2% the $orre$tion for this proble# involves the downgrading in relative i#portan$e of those observations with higher varian$e. 3he proble# is #ore apparent when the value of the varian$e has so#e relation to one or #ore of the independent variables. 8ntuitively, this is a problem because the distribution of the residuals should have no relation with any of the variables (a basic assumption of the classical model). 1ete$tion involves two steps/ -ooking for patterns in the plot of the predi$ted dependent variable and the residual

*H

By dropping it% we i#prove the reliability of the 38statisti$s of the other variables =whi$h are relevant to the #odel?. But% we #ay be $ausing a far #ore serious proble# 8 an o#itted variableV An insignifi$ant 3 is not ne$essarily a bad thing 8 it is the result of a :true: #odel. 3rying to re#ove variables to obtain only signifi$ant 38statisti$s is bad pra$ti$e.

Vjbooks.net

0f the graphi$al inspe$tion hints at heteroskedasti$ity% you #ust $ondu$t a for#al test like the 6hiteRs test. Se$tion Q.G tea$hes you how to $ondu$t a 6hiteRs test *Q.

Chec!ing *ormally *or heteros!edasticityD ?hiteEs test

3he 6hiteRs test is usually used as a test for heteroskedasti$ity. 0n this test% a regression of the s&uares of the residuals is run on the variables suspe$ted of $ausing the heteroskedasti$ity% their s&uares% and $ross produ$ts. =residuals?2 D b0 E b* educ E b2 workLex E b2 =educ?2 E bB =workLex?2 E bG =educFworkLex?

a ,odel 'ummary

Variables $ntered =@1-./0, =@12"45, 2"41-./0, -ork 2xperience, 2"45678.9

R '-uare

Adjusted R '-uare

'td# $rror o* the $s timate

.'3+

.'3$

.(%'(

a. 1ependent Variable/ SWL <S

.al$ulate nF QB.H.

6hiteRs 3est D 0.02Q% nD20*H

3hus% nF

D .02QF20*H D

.o#pare this value with 2 =n?% i.e. with 2 =20*H? =2 is the sy#bol for the .hi8S&uare distribution? As nF
2

2 =20*H? D *2B obtained fro# 2 table. =)or NGG $onfiden$e? heteroskedasti$ity $an not be $onfir#ed.

M 2 %

"ote/ Please refer to your te!tbook for further infor#ation regarding the interpretation of the 6hite4s test. 0f you have not en$ountered the .hi8S&uare distribution>test before% there is no need to pani$V 3he sa#e rules apply for testing using any distribution 8 the 3% )% U% or .hi8S&uare. )irst% $al$ulate the re&uired value fro# your results. 9ere the re&uired value is the sa#ple si7e =:n:? #ultiplied by the 8s&uare. ,ou #ust deter#ine whether this value is higher than that in the standard table for the relevant distribution =here the .hi8S&uare? at the re$o##ended level of $onfiden$e =usually NGO? for the appropriate degrees of freedo# =for the 6hite4s test% this e&uals the sa#ple si7e :n:? in the table for the distribution =whi$h you will find in the ba$k of #ost e$ono#etri$s>statisti$s te!tbooks?. 0f the for#er is higher% then the hypothesis is reje$ted. (sually the reje$tion i#plies that the test $ould not find a proble#*C.

*Q

;ther tests/ Park% Glejser% Goldfelt8Wuandt. efer to your te!t book for a $o#prehensive listing of #ethods and their detailed des$riptions.
*C

6e use the phraseology :.onfiden$e -evel of :NGO.: 'any professors #ay frown upon this% instead preferring to use :Signifi$an$e -evel of GO.: Also% our e!planation is si#plisti$. 1o not use it in an e!a#V 0nstead% refer to the $hapter on :9ypothesis 3esting: or :.onfiden$e 0ntervals: in your te!tbook. A $lear understanding of these $on$epts is essential.

Vjbooks.net

A&'?$R' 5( C(&C$P59A% F9$'5"(&' (& R$GR$''"(& A&A%G'"' *. 6hy is the regression #ethod you use $alled X-east S&uaresR@ .an you justify the use of su$h a #ethod@ Ans/ 3he #ethod #ini#ises the s&uares of the residuals. 3he for#ulas for obtaining the esti#ates of the beta $oeffi$ients% std errors% et$. are all based on this prin$iple. ,es% we $an justify the use of su$h a #ethod/ the ai# is to #ini#ise the error in our predi$tion of the dependent variable% and by #ini#ising the residuals we are doing just that. By using the :s&uares: we are pre$luding the proble# of signs thereby giving positive and negative predi$tion errors the sa#e i#portan$e. 2. 3he $lassi$al assu#ptions #ostly hinge on the properties of the residuals. 6hy should it be so@ Ans/ this is linked to &uestion *. 3he esti#ation #ethod is based on #ini#ising the su# of the s&uared residuals. As su$h% all the powerful inferen$es we draw fro# the results Ylike 2% betas% 3% )% et$.Z are based on assu#ed properties of the residuals. Any deviations fro# these assu#ptions $an $ause #ajor proble#s. 2. Prior to running a regression% you have to $reate a #odel. 6hat are the i#portant steps and $onsiderations in $reating su$h a #odel@ Ans/ the #ost i#portant $onsideration is the theory you want to test>support>refute using the esti#ation. 3he theory #ay be based on theoreti$al>analyti$al resear$h and derivations% previous work by others% intuition% et$.. 3he li#itations of data #ay $onstrain the #odel. S$atter plots #ay provide an indi$ation of the transfor#ations needed. .orrelations #ay tell you about the possibility of $ollinearity and the #odeling ra#ifi$ations thereof. B. 6hat role #ay $orrelations% s$atter plots and other bivariate and #ultivariate analysis play in the spe$ifi$ation of a regression #odel@ Ans/ prior to any regression analysis% it is essential to run so#e des$riptives and basi$ bivariate and #ultivariate analysis. Based on the inferen$es fro# these% you #ay want to $onstru$t a #odel whi$h $an answer the &uestions raised by the initial analysis and>or $an in$orporate the insights fro# the initial analysis. G. After running a regression% in what order should you interpret the results and why@ Ans/ first% $he$k for the breakdown of $lassi$al assu#ptions Y$ollinearity% heteroskedasti$ity% et$..Z. 3hen% you are sure that no #ajor proble# is present% interpret the results in roughly the following order/ Sig )% Adj 2 % Std error of esti#ate% Sig83% beta% .onfiden$e interval of beta H. 0n the regression results% are we $onfident about the $oeffi$ient esti#ates@ 0f not% what additional infor#ation Ystatisti$Z are we using to $apture our degree of un8$onfiden$e about the esti#ate we obtain@ Ans/ no% we are not $onfident about the esti#ated $oeffi$ient. 3he std error is being used to
Vjbooks.net

Vjbooks.net

.% ( ? @"AGRA, .( R R$ GR$ ' ' "( & A&A% G' "'

.learly define the issue and the results being sought

Problem>issue

,ethod
;-S% '-<% -ogit% 3i#e Series% et$.

,odel
Y, D a E b AZ or other

.inal @ataset Prepare data *or analysis

Read data into so*t;are program

@escripti6es8 correlations scatter charts


9istogra# to see distribution% nor#ality tests% $orrelations

"nterpretation
;btain so#e intuitive understanding of the data series% identify outliers% % et$.

Run Regression 5est .or rea!do;n o* Classical Assumptions H*or %inear RegressionI &o rea!do;n rea!do;n

'easure#ent <rror

9eteroskedasti$ity% Auto$orrelation% 'isspe$ifi$ation% .ollinearity% Si#ultaneity Bias

;#itted Variable% 0rrelevant Variable% 0n$orre$t )un$tional )or#

Create a ne; model Jsee list below?


9eteroskedasti$ity/ 6-S% Auto$orrelation/ G-S% A 0'A Si#ultaneity Bias/ 0V% 2S-S% 2S-S% .ollinearity/ 0V% dropping a variable ,ay need to create ne; 6ariables

Running Correct ,odel and ,ethod

@iagnostics

Cypothesis 5ests
egression s$he#ati$ flow $hart *NNN Vijay Gupta

.% ( ? @"AGRA, .( R R$ GR$ ' ' "( & A&A% G' "'

egression s$he#ati$ flow $hart *NNN Vijay Gupta

You might also like