HW#9 PDF

Martin De Haro Garcia
STATS 212
Professor: Dr. Meiers
Lab Professor: Micheal
6 December 2018
HW #9: Linear Regression
Part 1: Ch.16 – Question 2 & 7
a.
REGRESSION
REGRESSION
/VARIABLES= Correct
/DEPENDENT= Time
/METHOD=ENTER
/STATISTICS=COEFF R ANOVA.
Model Summary (Time)
R R Square Adjusted R Square Std. Error of the Estimate
.18 .03 -.09 3.63
ANOVA (Time)
Sum of Squares df Mean Square F Sig.
Regression 3.34 1 3.34 .25 .628
Residual 105.33 8 13.17
Total 108.67 9
Coefficients (Time)
Unstandardized Coefficients Standardized Coefficients
B Std. Error Beta t Sig.
(Constant) 17.20 2.45 .00 7.01 .000
Correct -.21 .42 -.18 -.50 .628
The Regression equation is Y’ = -0.21 (# correct) + 17.20
Gather from the highlighted Coefficients

b. Y’ -0.21 (8) + 17.20 = 15.52
According to the book, the correct answer should be 15.49 but since PSPP gave me the rounded value
with only two decimal places that were the difference lied.
c.
Time (Y) Correct (x) Y’ = -.214x + 17.202 Y’ - Y
Difference Between
Predicted and
Actual
14.50 5 16.132 1.632
13.40 7 15.704 2.304
12.70 6 15.918 3.218
16.40 2 16.774 .374
21.00 4 16.346 -4.654
13.90 3 16.56 2.66
17.30 12 14.634 -2.666
12.50 5 16.132 3.632
16.70 4 16.346 -.354
22.70 3 16.56 -6.14
7.
DESCRIPTIVES
DESCRIPTIVES
/VARIABLES= Years_of_Experience Level_of_Education
Score_on_Great_Chef_Test Number_Position
/STATISTICS=MEAN STDDEV.
Valid cases = 19; cases with missing value(s) = 0.
Variable N Mean Std Dev
Years of Experience 19 14.84 7.92
Level of Education 19 2.16 .69
Score on Great Chef Test 19 84.47 11.32
# Posito 19 7.42 3.01
REGRESSION
REGRESSION
/VARIABLES= Years_of_Experience Level_of_Education Number_Position
/DEPENDENT= Score_on_Great_Chef_Test
/METHOD=ENTER
Model Summary (Score on Great Chef Test)
.47 .22 .06 10.96
ANOVA (Score on Great Chef Test)
Regression 504.08 3 168.03 1.40 .282
Residual 1800.66 15 120.04
Total 2304.74 18
Coefficients (Score on Great Chef Test)
(Constant) 96.38 10.07 .00 9.57 .000
Years of Experience .96 .55 .67 1.74 .102
Level of Education -5.79 3.93 -.35 -1.47 .162
# Posito -1.84 1.41 -.49 -1.30 .212
a. The best predictor of the three years is year of experience, but because none is significant (.102)
they are all equally good or bad
b. The regression equation is:
Y’ = 0.96 (X1) – 5.79 (X2) – 1.84 (X3) + 96.38
Y’ = 0.96(12) – 5.79(2) -1.84(5) + 96.38 = 87.12

Part 2
1. Is the fixed acidity of a wine a valid predictor of the alcohol content? Please explain
how you came to that conclusion.
DESCRIPTIVES
DESCRIPTIVES
/VARIABLES= alcohol fixedacidity
alcohol 1599 10.42 1.07
fixedacidity 1599 8.32 1.74
CORRELATIONS
CORRELATION
/VARIABLES = alcohol fixedacidity
/PRINT = ONETAIL SIG.
Correlations
alcohol fixedacidity
alcohol Pearson Correlation 1.00 -.06
Sig. (1-tailed) .007
N 1599 1599
fixedacidity Pearson Correlation -.06 1.00
Sig. (1-tailed) .007
N 1599 1599
REGRESSION
REGRESSION
/VARIABLES= fixedacidity
/DEPENDENT= alcohol
/METHOD=ENTER
Model Summary (alcohol)
.06 .00 .00 1.06
ANOVA (alcohol)
Regression 6.90 1 6.90 6.10 .014
Residual 1807.86 1597 1.13
Total 1814.76 1598
Coefficients (alcohol)
(Constant) 10.74 .13 .00 82.63 .000
fixedacidity -.04 .02 -.06 -2.47 .014
Fixed acidity of wine is a valid predictor of the alcohol content because the significance value
is .014 for both the ANOVA and Coefficients which is lower than .05 alpha level. Even though it
is valid, it is not a great predictor since the correlation coefficient is extremely weak (r = -.062).
2. Construct a scatter plot with a best fit regression line for the analysis above
GRAPH
/SCATTERPLOT(BIVAR)=fixedacidity WITH alcohol
/MISSING=LISTWISE.
I used SPSS to Do this!, PSPP cannot do this

Graph
3. What is the predicted alcohol content of a wine that has a fixed acidity of 6.1?
I used the regression formula from above and plug in the 6.1 for x:
Y’ = 10.74 - .04(6.1)
Y’ = 10.74 - .244
Y’ = 10.496 (This is the predicted alcohol content)
4. Determine if fixed acidity and freesulfurdioxide levels are a valid predictor of alcohol
content.
DESCRIPTIVES
DESCRIPTIVES
/VARIABLES= alcohol fixedacidity freesulfurdioxide
alcohol 1599 10.42 1.07
fixedacidity 1599 8.32 1.74
freesulfurdioxide 1599 15.87 10.46
CORRELATIONS
CORRELATION
/VARIABLES = alcohol fixedacidity freesulfurdioxide
/PRINT = ONETAIL SIG.
Correlations
alcohol fixedacidity freesulfurdioxide
alcohol Pearson Correlation 1.00 -.06 -.07
Sig. (1-tailed) .007 .003
N 1599 1599 1599
fixedacidity Pearson Correlation -.06 1.00 -.15
Sig. (1-tailed) .007 .000
N 1599 1599 1599
freesulfurdioxide Pearson Correlation -.07 -.15 1.00
Sig. (1-tailed) .003 .000
N 1599 1599 1599
REGRESSION
REGRESSION
/VARIABLES= fixedacidity freesulfurdioxide
/DEPENDENT= alcohol
/METHOD=ENTER
Model Summary (alcohol)
.10 .01 .01 1.06
ANOVA (alcohol)
Regression 18.47 2 9.24 8.21 .000
Residual 1796.29 1596 1.13
Total 1814.76 1598
Coefficients (alcohol)
(Constant) 10.93 .14 .00 76.45 .000
fixedacidity -.05 .02 -.07 -2.94 .003
freesulfurdioxide -.01 .00 -.08 -3.21 .001
Fixed acidity and free sulfur dioxide are both valid predictors of alcohol content because their p
values are .003 and .001 which are both lower than the .05 alpha level. This means that they are
significantly correlated with alcohol content. The ANOVA is also in agreement because the
significance value is .000 which is also lower than the .05 alpha level. Both the correlation
coefficients are extremely low r = -.062 and r = -.069 which signifies that they are very weakly
correlated with alcohol content.

5. Did adding freesulfurdioxide to the prediction model with fix acidity improve or not
improve the prediction model? Please explain how you made this determination.
Adding free sulfur dioxide to the prediction model with fixed acidity improved the prediction
model as noted on question 4, both variables were correlated with the alcohol content. This
signifies that we can plug both variables into a regression equation to determine the alcohol
content. The ANOVA conducted by PSPP had a significance value of .014, but it lowered to .000
when adding free sulfur dioxide. Meaning that there’s an even lower chance that we can make a
type I error. Adding free sulfur dioxide enables us to account for more shared variance.
Y’ = -.05x - .01z + 10.93
Where Y’ = predicted alcohol content, x = fixed acidity, y = free sulfur dioxide
6. For a wine that has an alcohol level of 9.1 and fixed acidity of 5.0, what
would be the anticipated freesulfurdioxide level of the wine?
9.1 = -.05(5.0) - .01z + 10.93
9.1 = -.25 - .01z + 10.93
9.1 = -.01z +10.68
-1.58 = -.01z
z = 158 (anticipated free sulfur dioxide level)
• PSPP rounds to 2 decimal places
This value is a little too high compared with other free sulfur dioxide. Therefore I decided to
conduct another multivariable regression by PSPP where this was the case:
REGRESSION
REGRESSION
/VARIABLES= alcohol fixedacidity
/DEPENDENT= freesulfurdioxide
/METHOD=ENTER
Model Summary (freesulfurdioxide)
.17 .03 .03 10.31
ANOVA (freesulfurdioxide)
Regression 5227.94 2 2613.97 24.60 .000
Residual 169617.04 1596 106.28
Total 174844.98 1598
Coefficients (freesulfurdioxide)
(Constant) 31.91 2.89 .00 11.03 .000
alcohol -.78 .24 -.08 -3.21 .001
fixedacidity -.95 .15 -.16 -6.42 .000
Y’ = -.953x -.777z + 31.908, where x = fixed acidity, z = alcohol, and Y’ = anticipated/predicted
free sulfur dioxide level
Y’ = -.95(5.0) - .78(9.1) + 31.91
Y’ = -4.75 – 7.098 + 31.91
Y’ = 20.062 (This is the final anticipated free sulfur dioxide level)

7. In terms of the two variable above, which variable contributes more to the
predictive variability in alcohol content? Please explain how you came to this
conclusion.
In the terms of the two variables above, the variable that contributes more to the
predictive variability in alcohol content was free sulfur dioxide. This is because it is
significant because the value was .000. slope cannot be used to determine which variable
is a better predictor because the scale is different. Meaning that we must take a look at the
Standardized Coefficients Beta magnitudes.

8. Run a regression analysis using all the variables to determine if the variables combined are a
valid predictor of wine quality. Please explain how you came to that conclusion and report the
shared variance explained by the prediction model.
REGRESSION
REGRESSION
/VARIABLES= alcohol sulphates pH density totalsulfurdioxide
freesulfurdioxide chlorides residualsugar citricacid volatileacidity
fixedacidity
/DEPENDENT= quality
/METHOD=ENTER
Model Summary (quality)
.60 .36 .36 .65
ANOVA (quality)
Regression 375.75 11 34.16 81.35 .000
Residual 666.41 1587 .42
Total 1042.17 1598
Coefficients (quality)
(Constant) 21.97 21.19 .00 1.04 .300
alcohol .28 .03 .36 10.43 .000
sulphates .92 .11 .19 8.01 .000
pH -.41 .19 -.08 -2.16 .031
density -17.88 21.63 -.04 -.83 .409
totalsulfurdioxide .00 .00 -.13 -4.48 .000

freesulfurdioxide .00 .00 .06 2.01 .045
chlorides -1.87 .42 -.11 -4.47 .000
residualsugar .02 .02 .03 1.09 .276
citricacid -.18 .15 -.04 -1.24 .215
volatileacidity -1.08 .12 -.24 -8.95 .000
fixedacidity .02 .03 .05 .96 .336
The variables combined are valid predictor for wine quality because the significance value
obtained from the ANOVA equaled .000. which is a lower value than .05 alpha level. I came to
this conclusion because some variables such as citric acid, fixed acidity, etc. are not significantly
correlated to the wine quality, but other variables such as chloride and volatile acidity are
significant. The shared variance explained by the prediction model is .36 or 36%.
9.What are the two variables from the analysis directly above the contribute the most weight
to the prediction model? How did you come to that conclusion.
The two variables form the analysis directly above that contributed the most weight to the
prediction model are chloride and density because they have the greatest magnitude of slope with
value of -17.88 and -1.87. Meaning that every individual unit in chloride or density will change
the predicted wine quality.
10. Run another regression analysis like #8 but only use the two variables that you selected in
#9. Based on your analysis, do you think any of the omitted variables combined helped or
hurt the predictive validity of predicting wine quality? Please explain how you came to that
conclusion.
DESCRIPTIVES
DESCRIPTIVES
/VARIABLES= quality density chlorides.
Variable N Mean Std Dev Minimum Maximum
quality 1599 5.64 .81 3.00 8.00
density 1599 1.00 .00 .99 1.00
chlorides 1599 .09 .05 .01 .61

CORRELATIONS
CORRELATION
/VARIABLES = density quality chlorides
/PRINT = TWOTAIL SIG.
Correlations
density quality chlorides
density Pearson Correlation 1.00 -.17 .20
Sig. (2-tailed) .000 .000
N 1599 1599 1599
quality Pearson Correlation -.17 1.00 -.13
Sig. (2-tailed) .000 .000
N 1599 1599 1599
chlorides Pearson Correlation .20 -.13 1.00
Sig. (2-tailed) .000 .000
N 1599 1599 1599
REGRESSION
REGRESSION
/VARIABLES= chlorides density
/DEPENDENT= quality
/METHOD=ENTER
Model Summary (quality)
.20 .04 .04 .79
ANOVA (quality)
Regression 41.44 2 20.72 33.05 .000
Residual 1000.72 1596 .63

Total 1042.17 1598
Coefficients (quality)
(Constant) 72.02 10.67 .00 6.75 .000
chlorides -1.68 .43 -.10 -3.90 .000
density -66.45 10.71 -.16 -6.20 .000
I think that omitting other variables did not help or hurt the predictive validity of predicting wine
quality because in both ANOVAs, the significance level is .000. The density is now a much
bigger predictor of wine quality with a slope of -66.45 vs. -17.88 when compared with the other
variables from question 9. Meaning that density has more effect to it when predicting the wine
quality.

HW#9 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HW#9 PDF

Uploaded by

Copyright:

Available Formats

Martin De Haro Garcia

Professor: Dr. Meiers

Lab Professor: Micheal

HW #9: Linear Regression

Part 1: Ch.16 – Question 2 & 7

The Regression equation is Y’ = -0.21 (# correct) + 17.20

Gather from the highlighted Coefficients

Time (Y) Correct (x) Y’ = -.214x + 17.202 Y’ - Y

14.50 5 16.132 1.632

13.40 7 15.704 2.304

12.70 6 15.918 3.218

16.40 2 16.774 .374

21.00 4 16.346 -4.654

13.90 3 16.56 2.66

17.30 12 14.634 -2.666

12.50 5 16.132 3.632

16.70 4 16.346 -.354

22.70 3 16.56 -6.14

b. The regression equation is:

Y’ = 0.96 (X1) – 5.79 (X2) – 1.84 (X3) + 96.38

Y’ = 0.96(12) – 5.79(2) -1.84(5) + 96.38 = 87.12

I used SPSS to Do this!, PSPP cannot do this

Y’ = 10.496 (This is the predicted alcohol content)

correlated with alcohol content.

Y’ = -.05x - .01z + 10.93

Where Y’ = predicted alcohol content, x = fixed acidity, y = free sulfur dioxide

9.1 = -.05(5.0) - .01z + 10.93

9.1 = -.25 - .01z + 10.93

9.1 = -.01z +10.68

z = 158 (anticipated free sulfur dioxide level)

• PSPP rounds to 2 decimal places

Y’ = -.953x -.777z + 31.908, where x = fixed acidity, z = alcohol, and Y’ = anticipated/predicted

free sulfur dioxide level

Y’ = -.95(5.0) - .78(9.1) + 31.91

Y’ = -4.75 – 7.098 + 31.91

Y’ = 20.062 (This is the final anticipated free sulfur dioxide level)

Standardized Coefficients Beta magnitudes.

R R Square Adjusted R Square Std. Error of the Estimate

.60 .36 .36 .65

Sum of Squares df Mean Square F Sig.

Regression 375.75 11 34.16 81.35 .000

Residual 666.41 1587 .42

Total 1042.17 1598

Unstandardized Coefficients Standardized Coefficients

B Std. Error Beta t Sig.

(Constant) 21.97 21.19 .00 1.04 .300

alcohol .28 .03 .36 10.43 .000

sulphates .92 .11 .19 8.01 .000

pH -.41 .19 -.08 -2.16 .031

density -17.88 21.63 -.04 -.83 .409

totalsulfurdioxide .00 .00 -.13 -4.48 .000

chlorides -1.87 .42 -.11 -4.47 .000

residualsugar .02 .02 .03 1.09 .276

citricacid -.18 .15 -.04 -1.24 .215

volatileacidity -1.08 .12 -.24 -8.95 .000

fixedacidity .02 .03 .05 .96 .336

Variable N Mean Std Dev Minimum Maximum

quality 1599 5.64 .81 3.00 8.00

density 1599 1.00 .00 .99 1.00

chlorides 1599 .09 .05 .01 .61

density quality chlorides

density Pearson Correlation 1.00 -.17 .20

Sig. (2-tailed) .000 .000