35 views

Uploaded by gautambastian

- Anova Summary Output
- furlong.pdf
- Berat Basah
- Olea - Computerized Fixed and Adaptive Tests
- 14.pdf
- Eviews 7.0 Manual
- What is a Factorial Variance_Anova
- october calendar 18-19 ap stats pdf
- JWARP20101100010_61492857 (1)
- Jurnal Fix
- The Impacts of Human Resource Development, Organizational Commitment, and Compensation on Employee Performance (A study conducted at Sulut Bank in North Sulawesi)
- Content Server
- Jurnal 72.pdf
- 14. Correlation Between Cane Growth and Flowering Behavior of Dendrobium Orchid Cultivars
- RTEP333.pdf
- Stats Project Final Copy
- 2. Format. Man - Influence of Customer Relationship Management on the Business Performance of Smes in Case of Manufacturing Industry, Srikakulam
- Regression 2
- CB_REPORT
- The Melodic Improvisations of Children Aged 6 12 a Developmental Perspective

You are on page 1of 22

doc)

Professor Robert Y. Shapiro

Analysis of Political Data

Sample Assignment # 1

Prepared by TA: Narayani Lasala, January 29, 2009

Assignment 1. Using any data you wish, examine and write up a four or more variable causal

model by estimating multiple regression equations. Present the structural equations and the

path diagram. Interpret the regression coefficients (focus on the usual unstandardized

Step 1:itTheory,

coefficients; Path

is not required to interpretDiagram, and Recoding

the standardized coefficients nor to decompose any

zero-order relationships into direct, indirect, and noncausal [spurious]

1) Develop a simple theoretical relationship between four (or more) effects). Test forvari

first-order

aes and

interactions (and, if you think

present it in a flow chart. necessary, any theoretically compelling higher order interactions).

Examine possible multicollinearity (in the correlations matrix) and provide some analysis of residuals (i.e.,

heteroskedasticity, as shown below), and, as needed, outliers if you have a small data set.

For this assignment, we examine the relationship between four (or more) variables, i.e.

three (or more) independent variables and one dependent variable. In this example, we

estimate the following relationships.

Race

X1

(White)

1a 1d

1b 1c

2c

Education 2b

Attitude regarding the

Party ID

X2 4a responsibility of

X4

(Republican)

government for

poverty alleviation

2a

3a

Y

3b (Not its responsibility)

Income

X3

To do this we selected the following variables from GSS 2006: “race” (white, black,

other), “educ” (highest year of school completed), “income06” (total family income),

“partyid” (political party affiliation) and “helppoor” (self placement on a five point scale

that goes from “I strongly agree the government should improve living standards” to “I

strongly agree that people should take of themselves”).1

2) Obtain the frequency distribution for the original four variables (Check the

missing values)

. tab race, miss

race of |

respondent | Freq. Percent Cum.

------------+-----------------------------------

white | 3,284 72.82 72.82

black | 634 14.06 86.87

other | 592 13.13 100.00

------------+-----------------------------------

Total | 4,510 100.00

highest |

year of |

school |

completed | Freq. Percent Cum.

------------+-----------------------------------

0 | 22 0.49 0.49

1 | 4 0.09 0.58

2 | 28 0.62 1.20

3 | 13 0.29 1.49

4 | 11 0.24 1.73

5 | 23 0.51 2.24

6 | 69 1.53 3.77

7 | 32 0.71 4.48

8 | 85 1.88 6.36

9 | 127 2.82 9.18

10 | 152 3.37 12.55

11 | 215 4.77 17.32

12 | 1,204 26.70 44.01

13 | 422 9.36 53.37

14 | 628 13.92 67.29

15 | 212 4.70 72.00

16 | 687 15.23 87.23

17 | 167 3.70 90.93

18 | 208 4.61 95.54

19 | 78 1.73 97.27

20 | 112 2.48 99.76

1. I'd like to talk with you about issues some people tell us are important. Please look at CARD BC. Some people think

that the government in Washington should do everything possible to improve the standard of living of all poor

Americans; they are at Point 1 on this card. Other people think it is not the government's responsibility, and that each

person should take care of himself; they are at Point 5.

2

dk | 2 0.04 99.80

. | 9 0.20 100.00

------------+-----------------------------------

Total | 4,510 100.00

total family |

income | Freq. Percent Cum.

-------------------+-----------------------------------

under $1 000 | 43 0.95 0.95

$1 000 to 2 999 | 38 0.84 1.80

$3 000 to 3 999 | 29 0.64 2.44

$4 000 to 4 999 | 27 0.60 3.04

$5 000 to 5 999 | 40 0.89 3.92

$6 000 to 6 999 | 45 1.00 4.92

$7 000 to 7 999 | 48 1.06 5.99

$8 000 to 9 999 | 83 1.84 7.83

$10000 to 12499 | 142 3.15 10.98

$12500 to 14999 | 145 3.22 14.19

$15000 to 17499 | 126 2.79 16.98

$17500 to 19999 | 102 2.26 19.25

$20000 to 22499 | 157 3.48 22.73

$22500 to 24999 | 125 2.77 25.50

$25000 to 29999 | 212 4.70 30.20

$30000 to 34999 | 231 5.12 35.32

$35000 to 39999 | 217 4.81 40.13

$40000 to 49999 | 394 8.74 48.87

$50000 to 59999 | 332 7.36 56.23

$60000 to 74999 | 360 7.98 64.21

$75000 to $89999 | 284 6.30 70.51

$90000 to $109999 | 229 5.08 75.59

$110000 to $129999 | 162 3.59 79.18

$130000 to $149999 | 89 1.97 81.15

$150000 or over | 213 4.72 85.88

refused | 442 9.80 95.68

dk | 195 4.32 100.00

-------------------+-----------------------------------

Total | 4,510 100.00

political party |

affiliation | Freq. Percent Cum.

-------------------+-----------------------------------

strong democrat | 700 15.52 15.52

not str democrat | 736 16.32 31.84

ind,near dem | 527 11.69 43.53

independent | 997 22.11 65.63

ind,near rep | 327 7.25 72.88

not str republican | 637 14.12 87.01

strong republican | 495 10.98 97.98

other party | 65 1.44 99.42

. | 26 0.58 100.00

-------------------+-----------------------------------

Total | 4,510 100.00

should govt |

improve standard |

3

of living? | Freq. Percent Cum.

-------------------+-----------------------------------

govt action | 369 8.18 8.18

2 | 204 4.52 12.71

agree with both | 915 20.29 32.99

4 | 261 5.79 38.78

people help selves | 209 4.63 43.41

dk | 30 0.67 44.08

. | 2,522 55.92 100.00

-------------------+-----------------------------------

Total | 4,510 100.00

3) Recode the variables if necessary and obtain the frequency distribution of the

recoded variables.

You are advised not to collapse categories of any variable unless you have compelling

reason to do so. Recode so that the values start from “0” while retaining the original

number of categories. This makes it easier to interpret the regression results, that is, to

interpret the constant when the variables take their lowest value, 0.

Race

We recode this variable by reversing the order of the categories so that the larger value is

assigned to whites (1) because we believe being white will have a positive impact on the

dependent variable, we also combine“other” and “black” into a non-white category which

will be coded “0”.

(1226 differences between race and RACE)

. tab RACE

RECODE of |

race (race |

of |

respondent) | Freq. Percent Cum.

------------+-----------------------------------

0 | 1,226 27.18 27.18

1 | 3,284 72.82 100.00

------------+-----------------------------------

Total | 4,510 100.00

For this variable, we retain the original values and treat 22/98 as missing because values

begin with “0”

(2 differences between educ and EDUC)

. tab EDUC

RECODE of |

educ |

(highest |

year of |

4

school |

completed) | Freq. Percent Cum.

------------+-----------------------------------

0 | 22 0.49 0.49

1 | 4 0.09 0.58

2 | 28 0.62 1.20

3 | 13 0.29 1.49

4 | 11 0.24 1.73

5 | 23 0.51 2.24

6 | 69 1.53 3.78

7 | 32 0.71 4.49

8 | 85 1.89 6.38

9 | 127 2.82 9.20

10 | 152 3.38 12.58

11 | 215 4.78 17.36

12 | 1,204 26.76 44.12

13 | 422 9.38 53.50

14 | 628 13.96 67.46

15 | 212 4.71 72.17

16 | 687 15.27 87.44

17 | 167 3.71 91.15

18 | 208 4.62 95.78

19 | 78 1.73 97.51

20 | 112 2.49 100.00

------------+-----------------------------------

Total | 4,499 100.00

We recode this variable so that the values start from “0” while retaining the original

number of categories.

.recode income06(1=0)(2=1)(3=2)(4=3)(5=4)(6=5)(7=6)(8=7)(9=8)(10=9)

(11=10)(12=11)(13=12)(14=13)(15=14)(16=15)(17=16)(18=17)(19=18)(20=19)(2

1=20)(22=21) (23=22)(24=23)(25=24)(26/98=.), gen (INCOM)

(4510 differences between income06 and INCOM)

. tab (INCOM)

RECODE of |

income06 |

(total |

family |

income) | Freq. Percent Cum.

------------+-----------------------------------

0 | 43 1.11 1.11

1 | 38 0.98 2.09

2 | 29 0.75 2.84

3 | 27 0.70 3.54

4 | 40 1.03 4.57

5 | 45 1.16 5.73

6 | 48 1.24 6.97

7 | 83 2.14 9.11

8 | 142 3.67 12.78

9 | 145 3.74 16.52

10 | 126 3.25 19.78

11 | 102 2.63 22.41

12 | 157 4.05 26.47

13 | 125 3.23 29.69

14 | 212 5.47 35.17

15 | 231 5.96 41.13

5

16 | 217 5.60 46.73

17 | 394 10.17 56.91

18 | 332 8.57 65.48

19 | 360 9.30 74.77

20 | 284 7.33 82.11

21 | 229 5.91 88.02

22 | 162 4.18 92.20

23 | 89 2.30 94.50

24 | 213 5.50 100.00

------------+-----------------------------------

Total | 3,873 100.00

For this variable, we retain the original values and treat 7 (other party) as missing.

(65 differences between partyid and REPUBLICAN)

. tab REPUBLICAN

RECODE of |

partyid |

(political |

party |

affiliation |

) | Freq. Percent Cum.

------------+-----------------------------------

0 | 700 15.84 15.84

1 | 736 16.66 32.50

2 | 527 11.93 44.42

3 | 997 22.56 66.98

4 | 327 7.40 74.38

5 | 637 14.42 88.80

6 | 495 11.20 100.00

------------+-----------------------------------

Total | 4,419 100.00

recode so that the high value (5, recoded into 4) is assigned to those who think people

should help themselves, and those who agree that it is government’s responsibility are

coded “0” Also, the “dk” is recoded “.”

(1988 differences between helppoor and GOVRES)

. tab GOVRES

RECODE of |

helppoor |

(should |

govt |

improve |

standard of |

living?) | Freq. Percent Cum.

------------+-----------------------------------

0 | 369 18.85 18.85

1 | 204 10.42 29.26

2 | 915 46.73 76.00

6

3 | 261 13.33 89.33

4 | 209 10.67 100.00

------------+-----------------------------------

Total | 1,958 100.00

4) Filter observations with missing value on any variables in the model, if you are

estimating a set of equations and want all equations based on the same cases.

When you estimate a regression, Stata drops observations with missing values in any of

the variables included in the model automatically. But, when you estimate more then one

regression, different observations may be dropped because different variables are

included in the different models. To make sure that exactly the same sample is used in all

regressions, you have to follow either of the following two methods.

Method #1 Drop the cases with missing value in any of the five variables

Normally, dropping observations with missing value drop in any of the newly recoded

variables is not recommended because once you drop them you cannot recover them. But,

for the sake of simplicity in this exercise, you can choose this method.

(2849 observations deleted)

This second method is highly recommended for real data analysis. The first command,

“ mark newvariable” creates a new variable named newvariable that equals 1 for all

cases. The second command “markout newvariable variablelist” adjusts the values of

newvariable from 1 to 0 for the cases in which values of any of the variables in

variablelist ( in this case RACE, EDUC, INCOM, REPUBLICAN and GOVRES) are

missing. Here we name the newvariable “nomiss”.

. mark nomiss

. markout nomiss RACE EDUC INCOM REPUBLICAN GOVRES

Then include “ if nomiss ==1” at the end of the regression models you estimate.

Observations will be used in the estimate only if they have no missing values in any of

the variables that are used in this analysis. We will use method #2 in this handout.

Examples are shown below.

(obs=1661)

-------------+---------------------------------------------

7

RACE | 1.0000

EDUC | 0.1997 1.0000

INCOM | 0.2259 0.3931 1.0000

REPUBLICAN | 0.2566 -0.0004 0.1191 1.0000

GOVRES | 0.2043 0.1178 0.1889 0.2716 1.0000

For this model we run four regressions. Write up the regression equation for each of them

using the estimated coefficients and t-values. Use the “beta” option to obtain the

standardized coefficients which you will use to write up a path diagram. (Note that we

include “if nomiss==1” at the end of each command.)

1) Regress X2 on X1

. reg EDUC RACE if nomiss==1, beta

-------------+------------------------------ F( 1, 1659) = 68.90

Model | 674.593006 1 674.593006 Prob > F = 0.0000

Residual | 16243.1216 1659 9.79091117 R-squared = 0.0399

-------------+------------------------------ Adj R-squared = 0.0393

Total | 16917.7146 1660 10.1913944 Root MSE = 3.129

------------------------------------------------------------------------------

EDUC | Coef. Std. Err. t P>|t| Beta

-------------+----------------------------------------------------------------

RACE | 1.44842 .1744958 8.30 0.000 .1996871

_cons | 12.40138 .149854 82.76 0.000 .

------------------------------------------------------------------------------

2) Regress X3 on X1 and X2

-------------+------------------------------ F( 2, 1658) = 178.46

Model | 9339.41411 2 4669.70705 Prob > F = 0.0000

Residual | 43385.5371 1658 26.1673927 R-squared = 0.1771

-------------+------------------------------ Adj R-squared = 0.1761

Total | 52724.9512 1660 31.7620188 Root MSE = 5.1154

------------------------------------------------------------------------------

INCOM | Coef. Std. Err. t P>|t| Beta

-------------+----------------------------------------------------------------

RACE | 1.966103 .2911319 6.75 0.000 .1535411

EDUC | .6397802 .0401371 15.94 0.000 .3624045

_cons | 5.483277 .5547762 9.88 0.000 .

------------------------------------------------------------------------------

8

. reg REPUBLICAN RACE EDUC INCOM if nomiss==1, beta

-------------+------------------------------ F( 3, 1657) = 45.66

Model | 492.36588 3 164.12196 Prob > F = 0.0000

Residual | 5956.35537 1657 3.59466226 R-squared = 0.0764

-------------+------------------------------ Adj R-squared = 0.0747

Total | 6448.72125 1660 3.88477184 Root MSE = 1.896

------------------------------------------------------------------------------

REPUBLICAN | Coef. Std. Err. t P>|t| Beta

-------------+----------------------------------------------------------------

RACE | 1.130313 .1093783 10.33 0.000 .2523994

EDUC | -.0549386 .0159755 -3.44 0.001 -.0889839

INCOM | .0339408 .0091024 3.73 0.000 .0970496

_cons | 2.219035 .2115915 10.49 0.000 .

------------------------------------------------------------------------------

-------------+------------------------------ F( 4, 1656) = 52.59

Model | 257.513083 4 64.3782709 Prob > F = 0.0000

Residual | 2027.19011 1656 1.22414862 R-squared = 0.1127

-------------+------------------------------ Adj R-squared = 0.1106

Total | 2284.70319 1660 1.37632722 Root MSE = 1.1064

------------------------------------------------------------------------------

GOVRES | Coef. Std. Err. t P>|t| Beta

-------------+----------------------------------------------------------------

RACE | .2903471 .0658539 4.41 0.000 .1089254

EDUC | .0183701 .0093559 1.96 0.050 .0499881

INCOM | .0244208 .0053341 4.58 0.000 .1173151

REPUBLICAN | .1367426 .014336 9.54 0.000 .2297342

_cons | .6329858 .1275091 4.96 0.000 .

------------------------------------------------------------------------------

+ .1367426* REPUBLICAN

coefficients (It is not required to interpret the standardized coefficients nor to decompose

any zero-order relationships into direct, indirect, and noncausal [spurious] effects, they

are only included for reference.)

9

EDUC = 12.40138 + 1.44842 *RACE

.1367426* REPUBLICAN

Got to next page:

Race

X1

.1996 .1089

.15354 .2524

254

11

.04999 Government’s

Education 891

X2 -.0889 Party ID Responsibility

.2297

X4 Y

.362 .09704

4045 .1173

INCOM

X3

We can decompose the total effect of each of the independent variables on the dependent

variable. Calculate the decomposition tables for each of X1, X2, X3, X4 and Y,

according to the following rules. Do not round when doing the calculations. (You may

round when presenting the final result.)

Direct effect = Standardized beta coefficient from relevant the regression

equation

Indirect effect = Sum of (the products of beta coefficients of all arrows for an

indirect path to a dependent variable for) all possible indirect

paths

Spurious effect = Total effect – Direct Effect – Sum of all Indirect effect

10

Decomposition of Effects for x2 (EDUC)

Variables Total Direct Indirect Calculation of Indirect Effects Spurious

Effects Effects Effects Effects

Decomposition of Effects for x3 (INCOM)

Variables Total Direct Indirect Calculation of Indirect Effects Spurious

Effects Effects Effects Effects

Educ 0.3931 .3624045 0 0.030696

Decomposition of Effects for x4 (REPUBLICAN)

Variables Total Direct Indirect Calculation of Indirect Effects Spurious

Effects Effects Effects Effects

(0.1997*.3624*.09704)+

(.15354*.09704)

Educ -0.0004 -.08899 0.035167 .3624*.09704 0.053423

Variables Total Direct Indirect Calculation of Indirect Effects Spurious

Effects Effects Effects Effects

0.2043 .1089254 0.09521 0.1997*0.04998+

0.1997* (-0.0889)* 0.2297+

0.1997*.3624*0.09704* 0.2297+

0.1997*.3624*(0.1173)+

0.15354* 0.09704*0.2297+

(0.15354)*(0.1173)

Educ 0.1178 .04999 0.0301595 ( -.0889)*( 0.2297)+ 0.037652

0.3624* 0.09704*(0.2297)+

0.3624*(0.1173)

Income 0.1889 0.117315 0.02229 0.09704*0.2297) 0.04929

Party ID 0.2716 0.2297 0 0.0419

NEXT:

first-order interactions):

11

1) X1 and X2

2) X1 and X3

3) X1 and X4

4) X2 and X3

5) X2 and X4

6) X3 and X4

2) . gen racINCOM = RACE*INCOM

3) . gen racrepub = RACE*REPUBLICAN

4) . gen edurepub= EDUC*REPUBLICAN

5) . gen eduINCOM= EDUC*INCOM

6) . gen incomrepub= INCOM*REPUB

. reg GOVRES RACE EDUC INCOM REPUBLICAN raceduc racINCOM racrepub edurepub

eduINCOM incomrepub, beta

-------------+------------------------------ F( 10, 1650) = 24.10

Model | 291.20039 10 29.120039 Prob > F = 0.0000

Residual | 1993.5028 1650 1.20818352 R-squared = 0.1275

-------------+------------------------------ Adj R-squared = 0.1222

Total | 2284.70319 1660 1.37632722 Root MSE = 1.0992

------------------------------------------------------------------------------

GOVRES | Coef. Std. Err. t P>|t| Beta

-------------+----------------------------------------------------------------

RACE | .9902724 .2931196 3.38 0.001 .3715064

EDUC | .0239648 .0287538 0.83 0.405 .0652123

INCOM | .0287287 .0229921 1.25 0.212 .1380098

REPUBLICAN | -.1350117 .0698544 -1.93 0.053 -.2268262

raceduc | -.0352751 .0202833 -1.74 0.082 -.1978869

racINCOM | -.0131167 .011881 -1.10 0.270 -.094953

racrepub | -.032165 .0367194 -0.88 0.381 -.0600898

edurepub | .013037 .0051297 2.54 0.011 .3244292

eduINCOM | -.0011666 .0015857 -0.74 0.462 -.1041171

incomrepub | .0075223 .0027743 2.71 0.007 .2464588

_cons | .7505942 .368533 2.04 0.042 .

------------------------------------------------------------------------------

-.0352751 *raceduc -.0131167*racINCOM -.032165*racrepub +.013037*edurepub -.0011666*eduINCOM

C) Estimate the regression without the insignificant interaction terms, i.e., with

the significant interactions.

In this section, run the regression retaining only those interaction terms that turned

statistically significant in the previous section. For this example, we will see if omitting

raceduc, racINCOM, racrepub and eduINCOM makes the fit of the regression

significantly different.

. reg GOVRES RACE EDUC INCOM REPUBLICAN edurepub incomrepub, beta

-------------+------------------------------ F( 6, 1654) = 38.55

12

Model | 280.290217 6 46.7150361 Prob > F = 0.0000

Residual | 2004.41297 1654 1.2118579 R-squared = 0.1227

-------------+------------------------------ Adj R-squared = 0.1195

Total | 2284.70319 1660 1.37632722 Root MSE = 1.1008

------------------------------------------------------------------------------

GOVRES | Coef. Std. Err. t P>|t| Beta

-------------+----------------------------------------------------------------

RACE | .2845684 .0655477 4.34 0.000 .1067575

EDUC | -.0122706 .0164101 -0.75 0.455 -.0333903

INCOM | .0059682 .0091095 0.66 0.512 .0286707

REPUBLICAN | -.1310843 .0678247 -1.93 0.053 -.220228

edurepub | .0115822 .0050109 2.31 0.021 .2882262

incomrepub | .0067193 .0026649 2.52 0.012 .2201474

_cons | 1.348882 .2181173 6.18 0.000 .

------------------------------------------------------------------------------

*REPUBLICAN + .0115822 *edurepub +.0067193* incomrepub

+.0075223*incomrepub

8) Chow test

The Chow test tests whether a regression equation with interaction terms explains a

significantly greater amount of variance than a regression equation without interaction

terms. The null hypothesis is the difference between the explained variance of the two

equations is zero in the population.

Rejecting the null hypothesis tells you that the interaction terms bring additional

explanatory power in a statistically significant way. The Chow test formula produces an

F-statistics to be compared to the critical values in the F-statistic table. If the F-test value

of the Chow test exceeds the critical value found in the table, you can reject the null

hypothesis.

We will compare the following three regression models using Chow Test.

Model ➀: The Regression Model with no interaction terms.

GOVRES = .6329858 +.2903471 *RACE + .0183701* EDUC +.0244208 *INCOM + .1367426* REPUBLICAN

GOVRES = 7505942+ .9902724*RACE + 0239648* EDUC + .0287287*INCOM + (-.1350117)*REPUBLICAN

-.0352751 *raceduc -.0131167*racINCOM -.032165*racrepub +.013037*edurepub -.0011666*eduINCOM

+.0075223*incomrepub

The following is the formula for the Chow Test:

13

K = Number of original independent variables +1 (for the constant)

N = Number of observations

R2K = R2 for the original regression equation (with no interaction terms)

R2K+M = R2 for the regression equation with interaction terms

(1-0.1275) / (1661 -5 -6)

The critical value of F statistics with df1=6 (degree of freedom of the numerator, M),

df2=1651 (degree of freedom of the denominator, N-K-M), and α = 0.05 is 2.09. Since

4.67>2.09, we reject the null hypothesis that the equation with six interaction terms and

the equation with no interaction term explain just the same amount of variance.

(Remember: rejecting the null hypothesis tells you that the interaction terms bring

additional explanatory power in a statistically significant way.)

The equivalent of the Chow test can be done with Stata by typing “test” command

right after executing the regression command. See the following examples. Compare this

F-statistic with the hand-calculated one in the previous section. (Small differences may

result from rounding.)

Run the regression with the larger model (➁ in this case with 6 interaction terms),

against model ➀ (Model with no interact terms) Run the regression with the larger

model and then test the terms that are NOT in the smaller model

. reg GOVRES RACE EDUC INCOM REPUBLICAN raceduc racINCOM racrepub edurepub

eduINCOM incomrepub, beta

(Output omitted)

. test raceduc racINCOM racrepub edurepub eduINCOM incomrepub

( 1) raceduc = 0

( 2) racINCOM = 0

( 3) racrepub = 0

( 4) edurepub = 0

( 5) eduINCOM = 0

( 6) incomrepub = 0

F( 6, 1650) = 4.65

Prob > F = 0.0001

Estimated F-value (6, 1650) is 4.65. Since 4.65>2.09 critical value, we reject the null

hypothesis that the equation with six interaction terms and the equation with no

interaction term explain just the same amount of variance.

14

***YOU CAN REPEAT VARIATIONS OF THIS TEST FOR SUBSET OF THE

INTERACTION TERMS TO HELP DETERMINE WHICH ONES TO KEEP***

obtain the predicted values, residuals, and (less important) standardized

residuals.

The “predict” command applies to the regression estimated right before typing it into the

command window. In this section, we will give new names for the predicted values and

estimated residuals after executing the “predict” command. When you type “predict

newvariable” without adding any option, you will obtain the predicted values of your

dependent variable. In this exercise we have named this newvariable, yhat.

So, right after the estimated regression equation that you are focusing on:

. predict yhat

(option xb assumed; fitted values)

This command can also be used to obtain residuals by and standardized residuals as

shown below by adding to the “predict newvariable” “, resid” and “,

rstandard” respectively. We have named the variable which contains the residuals

“e” and the variable containing the standardized residuals “std_e”.

. predict e, resid

. predict std_e, rstandard

residuals.

The command for obtaining the histogram is “hist” followed by the variable name. The

command “qnorm” followed by the variable name will give you a normal probability

plot of this variable. The option “saving (name for graph, replace)” saves the

generated images to the working directory. For example, we named the file containing

the histogram of the standard residuals, “Histogram_std_e”

Histogram

To save and display the histogram

. hist std_e, saving(Histogram_std_e, replace)

(bin=32, start=-2.2421422, width=.15336815)

(note: file Histogram_std_e.gph not found)

(file Histogram_std_e.gph saved)

15

.3 .5

.4

Density

.2

.1

0

-2 -1 0 1 2 3

Standardized residuals

qnorm std_e, saving(NPP_std_e, replace)

4 2

Standardized residuals

0 -2

-4

-4 -2 0 2 4

Inverse Normal

Useful for examining heteroskedasticity and other possible abnormalities.

The unstandardized residuals should be on the y axis and the independent variables

should be on the x axis.

RACE

. graph twoway scatter e RACE, saving(e_RACE, replace)

(file e_RACE.gph saved)

16

4

2

Residuals

0

-2

0 .2 .4 .6 .8 1

RECODE of race (race of respondent)

EDUC

graph twoway scatter e EDUC, saving(e_EDUC, replace)

(file e_EDUC.gph saved)

4

2

Residuals

0

-2

0 5 10 15 20

RECODE of educ (highest year of school completed)

INCOM

graph twoway scatter e INCOM, saving(e_INCOM, replace)

4

2

Residuals

0

-2

0 5 10 15 20 25

RECODE of income06 (total family income)

17

REPUBLICAN

graph twoway scatter e REPUBLICAN, saving(e_REPUBLICAN, replace)

4

2

Residuals

0

-2

0 2 4 6

RECODE of partyid (political party affiliation)

value of the (unstandardized) residual. Focus on the squared residual. Why?

Squared residuals will be named “squared_e” and the variable containing the

absolute values of the residuals will be “absolute_e”. The command “abs

(variable)” gives you the absolute value of variable.

. gen squared_e=e*e

. gen absolute_e= abs(e)

(unstandardized) residuals to all other variables in the model. What do we

find?:

(obs=1661)

| GOVRES RACE EDUC INCOM REPUBL~N square~e absolu~e

-------------+---------------------------------------------------------------

GOVRES | 1.0000

RACE | 0.2043 1.0000

EDUC | 0.1178 0.1997 1.0000

INCOM | 0.1889 0.2259 0.3931 1.0000

REPUBLICAN | 0.2716 0.2566 -0.0004 0.1191 1.0000

squared_e | 0.0354 -0.0420 -0.1545 -0.1468 -0.0440 1.0000

absolute_e | -0.0434 -0.0858 -0.1544 -0.1632 -0.0591 0.9565 1.0000

f) Examine the means of the residual (in absolute values, not shown here) and

squared residuals by different categories of independent variables. Why?

18

This can be done using the command “tab variable A name, sum(variable B name)”.

However, in order to make this analysis clearer, we collapse the independent variables

into fewer categories. In this example, we collapse the independent variables that have

many categories EDUC, INCOME and REPUBLICAN into three categories each and

leave RACE (and GOVRES) intact.

. recode EDUC (0/12=0)(13/16=1)(17/20=2), gen(EDUC2)

(1655 differences between EDUC and EDUC2)

(1641 differences between INCOM and INCOM2)

(1411 differences between REPUBLICAN and REPUBLICAN2)

each independent variable. Focus on the mean of each category. Why? What do we find?

RECODE of |

race (race |

of | Summary of squared_e

respondent) | Mean Std. Dev. Freq.

------------+------------------------------------

0 | 1.3321228 1.6271939 436

1 | 1.1807221 1.5715731 1225

------------+------------------------------------

Total | 1.2204636 1.5872673 1661

RECODE of |

EDUC |

(RECODE of |

educ |

(highest |

year of |

school | Summary of squared_e

completed)) | Mean Std. Dev. Freq.

------------+------------------------------------

0 | 1.4125591 1.7571281 687

1 | 1.1320202 1.4776772 737

2 | .9386629 1.3135448 237

------------+------------------------------------

Total | 1.2204636 1.5872673 1661

. tab INCOM2, summ(squared_e)

RECODE of |

INCOM |

(RECODE of |

income06 |

(total |

family | Summary of squared_e

19

income)) | Mean Std. Dev. Freq.

------------+------------------------------------

0 | 1.5703851 1.8339021 337

1 | 1.1768049 1.5589693 1020

2 | .97904393 1.3033466 304

------------+------------------------------------

Total | 1.2204636 1.5872673 1661

. tab REPUBLICAN, summ(squared_e)

RECODE of |

partyid |

(political |

party |

affiliation | Summary of squared_e

) | Mean Std. Dev. Freq.

------------+------------------------------------

0 | 1.3544698 1.754 250

1 | 1.1022653 1.5308627 283

2 | 1.2853162 1.6759844 189

3 | 1.4062153 1.6491756 350

4 | 1.2662294 1.6828067 127

5 | .99803069 1.3768972 273

6 | 1.101894 1.398698 189

------------+------------------------------------

Total | 1.2204636 1.5872673 1661

Sample do-file

*Obtain the frequency distribution

tab educ, miss

tab income06, miss

tab partyid, miss

tab helppoor, miss

*Recoding variables

recode race (1=1) (2/3=0), gen (RACE)

tab RACE

recode educ (22/98=.), gen(EDUC)

tab EDUC

recode income06(1=0)(2=1)(3=2)(4=3)(5=4)(6=5)(7=6)(8=7)(9=8)(10=9)

(11=10)(12=11)(13=12)(14=13)(15=14)(16=15)(17=16)(18=17)(19=18)(20=19)(21=20)(22

=21) (23=22)(24=23)(25=24)(26/98=.), gen (INCOM)

tab (INCOM)

recode partyid (7/8=.), gen(REPUBLICAN)

tab REPUBLICAN

recode helppoor (1=0)(2=1)(3=2)(4=3)(5=4)(8=.), gen(GOVRES)

tab GOVRES

*Dropping missing cases, (either method)

drop if RACE==.| EDUC==.| INCOM==.| REPUBLICAN==.|GOVRES==.

20

** or,

mark nomiss

markout nomiss RACE EDUC INCOM REPUBLICAN GOVRES

*Obtain correlations

corr RACE EDUC INCOM REPUBLICAN GOVRES if nomiss== 1

**Regressions

reg EDUC RACE if nomiss==1, beta

reg REPUBLICAN RACE EDUC INCOM if nomiss==1, beta

reg GOVRES RACE EDUC INCOM REPUBLICAN if nomiss==1, beta

gen racINCOM = RACE*INCOM

gen racrepub = RACE*REPUBLICAN

gen edurepub= EDUC*REPUBLICAN

gen eduINCOM= EDUC*INCOM

gen incomrepub= INCOM*REPUB

reg GOVRES RACE EDUC INCOM REPUBLICAN raceduc racINCOM racrepub edurepub

eduINCOM incomrepub, beta

**Chow Test

reg GOVRES RACE EDUC INCOM REPUBLICAN raceduc racINCOM racrepub edurepub

eduINCOM incomrepub, beta

**Run the regression from which you want to obtain predicted values first.

reg GOVRES RACE EDUC INCOM REPUBLICAN

predict yhat

predict e, resid

predict std_e, rstandard

hist std_e, saving(Histogram_std_e, replace)

qnorm std_e, saving(NPP_std_e, replace)

**Plot the (unstandardized) residuals with each of the independent variables

graph twoway scatter e RACE, saving(e_RACE, replace)

graph twoway scatter e EDUC, saving(e_EDUC, replace)

graph twoway scatter e INCOM, saving(e_INCOM, replace)

graph twoway scatter e REPUBLICAN, saving(e_REPUBLICAN, replace)

value of the (unstandardized) residual.

gen squared_e=e*e

gen absolute_e= abs(e)

residuals to all other variables in the model

corr GOVRES RACE EDUC INCOM REPUBLICAN squared_e absolute_e

21

**Examine the means of the residual (in absolute values, not shown here) and

squared residuals by different categories of independent variables.

**First collapse indep. var. into fewer categories.

recode INCOM (0/10=0)(11/20=1)(21/24=2), gen(INCOM2)

recode REPUBLICAN (0 1=0) (2 3 4=1) (5 6=2), gen(REPUBLICAN2)

tab RACE, summ(squared_e)

tab EDUC2, summ(squared_e)

tab INCOM2, summ(squared_e)

tab REPUBLICAN, summ(squared_e)

22

- Anova Summary OutputUploaded byS R Saini
- furlong.pdfUploaded byMary Ann Urian
- Berat BasahUploaded byyuyun lhea
- Olea - Computerized Fixed and Adaptive TestsUploaded byRaisa Hategan
- 14.pdfUploaded bydewa juliana
- Eviews 7.0 ManualUploaded byMwawi
- What is a Factorial Variance_AnovaUploaded bygmitsuta
- october calendar 18-19 ap stats pdfUploaded byapi-344176657
- JWARP20101100010_61492857 (1)Uploaded byNdlondong Legi
- Jurnal FixUploaded byarishu0105
- The Impacts of Human Resource Development, Organizational Commitment, and Compensation on Employee Performance (A study conducted at Sulut Bank in North Sulawesi)Uploaded byinventionjournals
- Content ServerUploaded byAlmog Ugav
- Jurnal 72.pdfUploaded byirwandk
- 14. Correlation Between Cane Growth and Flowering Behavior of Dendrobium Orchid CultivarsUploaded byH. Mehraj
- RTEP333.pdfUploaded byBernard Owusu
- Stats Project Final CopyUploaded byboss jain
- 2. Format. Man - Influence of Customer Relationship Management on the Business Performance of Smes in Case of Manufacturing Industry, SrikakulamUploaded byImpact Journals
- Regression 2Uploaded byNedy Swift
- CB_REPORTUploaded byaliarafat110
- The Melodic Improvisations of Children Aged 6 12 a Developmental PerspectiveUploaded byAnonymous 5tOz8Y8o
- ch11-SimpleRegressionUploaded byYusuf Sahin
- OUTPUT2 glmUploaded byady
- mkUploaded bypata nahi hai muje
- Setting the Processing Parameters in InjUploaded byLuminita Georgeta
- Viscocity LabUploaded byQuagmilion
- 11 IJAERS-JAN-2015-32-Optimization of catalyst synthesis parameters by Response Surface Methodology for glycerol production by hydrogenolysis of sucrose.pdfUploaded byIJAERS JOURNAL
- Summary of Formulas About Simple Linear RegressionUploaded byWilliam Noguera
- CBSE UGC NET Management Paper 2 June 2005 (1)Uploaded byjitintoteja_82
- ECON1320 - Lecture 3 SummaryUploaded byPeterMajor
- ARDL paperUploaded byMd Fouad Bin Amin

- Of Fakes, Frauds and Fools - Cowardly ToolsUploaded byDr. Randy Gonzalez
- Improving Science Vocabulary of Grade 9 Students of BNHS - Alvin PunongbayanUploaded byAlvinPunongbayan
- 28-FINAL.docxUploaded byRalph Deiparine
- 1.4HumanInterventionStudiesClinicalTrialsUploaded byzxhym
- Kuhn, Thomas - Copernican Revolution, The (Harvard, 1985).pdfUploaded byLarissa Toledo
- STAT-Ramil.docxUploaded bymelsy_8
- mod1unit2cg.pdfUploaded byNayLin
- 5m2aUploaded byRivka Share
- Chapter 3Uploaded byKamal Kannan G
- ANOVA ExampleUploaded byPranav Aggarwal
- Scientific Method for DummiesUploaded byBill Kolesnik
- Descriptive StatisticsUploaded byschoolssm2
- Question No 2 What Are Sampling TechniquesUploaded bynaithani12345
- AnovaUploaded byZainab Abizer Merchant
- Karl PearsonUploaded byLee Ting
- Lambdin (2012) - Significance tests as sorceryUploaded byRB.ARG
- sir_explanationUploaded bySathish Srinivasan
- SkriptUploaded byRiza Arieyanda
- CapabilityUploaded bySreenath Padmanabhan
- Física Contemporânea Cap 1Uploaded byEnzo Victorino Hernandez Agressott
- Data PresentationUploaded byShubhrajit Maitra
- Management Theory and Environmental Forces Project UploadUploaded byajdiaz07
- Audit SamplingUploaded byAhmed Kamel Aly
- SIGNIFICANCE OF REPORT WRITINGUploaded bysamta_jain
- Course syllabus_QMT181_june_2016_latest.pdfUploaded byadam
- CorrelationUploaded byShahid Imran
- Applied Regression Analysis Solutions.pdfUploaded bySarah Johnson
- ExamW08.pdfUploaded byJamie Samuel
- eBooks - Business - 101 Things a Six Sigma Black Belt Should Know by Thomas PyzdekUploaded bydave.martin.eng
- age to gas mileage stats project written report mofoUploaded byapi-317551921