Professional Documents
Culture Documents
Tan
201379280
IE 230
Regression Analysis Problem Set
1 Draw a random sample of 40 observations from the set of 50 observations given.
Indicate the corresponding observation numbers not drawn. Describe briefly how
you performed the random sampling.
The numbers of the observations which were not drawn are 50, 11, 48, 44, 21, 29,
22, 3, 40 and 1. The numbers were drawn by creating a random function in excel.
The index and rank functions were used to determine the corresponding number of
the observation. Finally, the function was copied to select 40 samples.
2 Compute for the descriptive statistics (mean, median, lowest and highest, range etc)
of the quantitative variables. Output individual scatter plots with PROD as the
dependent variable. Comment on the plots generated.
Descriptive Statistics: PROD, COST FERT, TYPE FERT, COST CHEM, TYPE CHEM, ...
Total
Variable Count N N* CumN Percent CumPct Mean SE Mean TrMean
PROD
40 40 0 40
100
100 13213
877 13159
COST FERT
40 40 0 40
100
100 3507
267 3499
TYPE FERT
40 40 0 40
100
100 0.6500 0.0764 0.6667
COST CHEM
40 40 0 40
100
100 1142.8
89.5 1118.0
TYPE CHEM
40 40 0 40
100
100 0.6750 0.0750 0.6944
LABOR COST
40 40 0 40
100
100
800
131
727
OTHER COST
40 40 0 40
100
100 123.1
13.3 114.7
EXP
40 40 0 40
100
100 23.98
1.49 23.86
ELEM LEVEL
40 40 0 40
100
100 0.5750 0.0792 0.5833
HIGH LEVEL
40 40 0 40
100
100 0.1500 0.0572 0.1111
LOAN
40 40 0 40
100
100 12233
1108 11647
Variable
StDev Variance CoefVar
Sum Sum of Squares Minimum
Q1
PROD
5544 30735718 41.96 528527
8182206933
4800 8000
COST FERT
1686 2843671 48.08 140280
602867149
683 2229
TYPE FERT 0.4830 0.2333 74.31 26.0000
26.0000 0.0000 0.0000
COST CHEM 565.9 320242.0 49.52 45712.5
64730347.8 248.0 804.0
TYPE CHEM 0.4743 0.2250 70.27 27.0000
27.0000 0.0000 0.0000
LABOR COST
828 685617 103.56 31982
52310996
40
155
OTHER COST 84.1 7077.4 68.36 4922.8
881867.2
28.3 59.0
EXP
9.41
88.64 39.27 959.00
26449.00 10.00 15.25
ELEM LEVEL 0.5006 0.2506 87.07 23.0000
23.0000 0.0000 0.0000
HIGH LEVEL 0.3616 0.1308 241.08 6.0000
6.0000 0.0000 0.0000
LOAN
7005 49069878 57.26 489305
7899209825
3000 7175
N for
Variable Median
Q3 Maximum Range
IQR
Mode Mode
PROD
12800 18000 24000 19200 10000
5000
5
COST FERT
3303 4766
6460 5777 2538
2736
2
TYPE FERT 1.0000 1.0000 1.0000 1.0000 1.0000
1
26
COST CHEM 993.2 1439.3 2540.0 2292.0 635.3 900, 1100, 2160
1
27
*
0
*
0
30
11
1
23
0
34
10000
7
DF
SS
MS
Regression
1 313238900 313238900 13.44 0.001
Residual Error 38 885454104 23301424
Total
39 1198693004
Unusual Observations
COST
Obs FERT PROD Fit SE Fit Residual St Resid
18 1500 24000 9840 1195
14160
3.03R
R denotes an observation with a large standardized residual.
Resid
2.74R
-0.77 X
-1.36 X
DF
1
SS
MS
F
P
66367279 66367279 2.23 0.144
Full Model:
Regression Analysis: PROD versus COST FERT, TYPE FERT, ...
The regression equation is
PROD = 6120 + 0.274 COST FERT + 5972 TYPE FERT - 1.41 COST CHEM - 1456 TYPE CHEM
+ 3.80 LABOR COST + 1.99 OTHER COST + 50.9 EXP - 1164 ELEM LEVEL
+ 1766 HIGH LEVEL + 0.0606 LOAN
Predictor
Coef SE Coef
T
P
Constant
6120
2883 2.12 0.042
COST FERT 0.2744 0.5369 0.51 0.613
TYPE FERT
5972
1310 4.56 0.000
COST CHEM -1.406 1.148 -1.22 0.231
TYPE CHEM
-1456
1251 -1.16 0.254
LABOR COST 3.797 1.031 3.68 0.001
OTHER COST 1.992 7.128 0.28 0.782
EXP
50.93 75.85 0.67 0.507
ELEM LEVEL -1164
1381 -0.84 0.406
HIGH LEVEL
1766
2005 0.88 0.385
LOAN
0.06057 0.08397 0.72 0.477
S = 3403.74 R-Sq = 72.0% R-Sq(adj) = 62.3%
PRESS = 643139707 R-Sq(pred) = 46.35%
Analysis of Variance
Source
DF
SS
MS
F
P
Regression
10 862714254 86271425 7.45 0.000
Residual Error 29 335978750 11585474
Total
39 1198693004
Source
DF
Seq SS
COST FERT 1 313238900
TYPE FERT 1 259617883
COST CHEM 1 70350045
TYPE CHEM 1 10557017
LABOR COST 1 164864264
OTHER COST 1 5557800
EXP
1 1859277
ELEM LEVEL 1 22045316
HIGH LEVEL 1 8596654
LOAN
6027098
Unusual Observations
COST
Obs FERT PROD Fit SE Fit Residual St Resid
18 1500 24000 16123 1637
7877
2.64R
40 3550 21000 13471 2059
7529
2.78R
R denotes an observation with a large standardized residual.
COST
Obs FERT PROD Fit SE Fit Residual St Resid
18 1500 24000 16073 1602
7927
2.69R
40 3550 21000 13563 2002
7437
2.77R
R denotes an observation with a large standardized residual.
Predictor
Coef SE Coef
T
P
Constant
8477
1781 4.76 0.000
TYPE FERT
6469
1160 5.58 0.000
COST CHEM -1.1351 0.9778 -1.16 0.254
TYPE CHEM
-1575
1149 -1.37 0.180
LABOR COST 4.4817 0.6693 6.70 0.000
ELEM LEVEL -1587
1235 -1.29 0.208
HIGH LEVEL
1476
1844 0.80 0.429
S = 3269.28 R-Sq = 70.6% R-Sq(adj) = 65.2%
PRESS = 544797820 R-Sq(pred) = 54.55%
Analysis of Variance
Source
DF
SS
MS
F
P
Regression
6 845982018 140997003 13.19 0.000
Residual Error 33 352710986 10688212
Total
39 1198693004
Source
DF
Seq SS
TYPE FERT 1 320649696
COST CHEM 1 5673547
TYPE CHEM 1
311077
LABOR COST 1 472094611
ELEM LEVEL 1 40409379
HIGH LEVEL 1 6843708
Unusual Observations
TYPE
Obs FERT PROD Fit SE Fit Residual St Resid
18 1.00 24000 15604 1329
8396
2.81R
27 1.00 18000 24041 1514
-6041
-2.08R
40 0.00 21000 14436 1738
6564
2.37R
R denotes an observation with a large standardized residual.
Regression Analysis: PROD versus TYPE FERT, LABOR COST, ELEM LEVEL
The regression equation is
PROD = 7089 + 5901 TYPE FERT + 4.14 LABOR COST - 1778 ELEM LEVEL
Predictor
Coef SE Coef
T
P
Constant
7089
1139 6.22 0.000
TYPE FERT
5901
1108 5.32 0.000
LABOR COST 4.1406 0.6429 6.44 0.000
ELEM LEVEL -1778
1069 -1.66 0.105
S = 3320.76 R-Sq = 66.9% R-Sq(adj) = 64.1%
PRESS = 487537954 R-Sq(pred) = 59.33%
Analysis of Variance
Source
DF
SS
MS
F
P
Regression
3 801705802 267235267 24.23 0.000
Residual Error 36 396987202 11027422
Total
39 1198693004
Source
DF
Seq SS
TYPE FERT 1 320649696
LABOR COST 1 450577679
ELEM LEVEL 1 30478427
Unusual Observations
TYPE
Obs FERT PROD Fit SE Fit Residual St Resid
18 1.00 24000 13548
790
10452
3.24R
27 1.00 18000 24019 1520
-6019
-2.04R
40 0.00 21000 12255 1087
8745
2.79R
R denotes an observation with a large standardized residual.
Final Model: PROD = 6225 + 5698 TYPE FERT + 4.11 LABOR COST
5 Once final model is obtained, determine R 2 , R2adj, s2(b) and Confidence interval for
the estimates.
Final Model: PROD = 6225 + 5698 TYPE FERT + 4.11 LABOR COST
R2 = 64.3%
R2 (adj) = 62.3%
S2(b) =
Confidence Interval =
6