You are on page 1of 14

Martin De Haro Garcia

STATS 212

Professor: Dr. Meiers

Lab Professor: Micheal

6 December 2018

HW #9: Linear Regression

Part 1: Ch.16 – Question 2 & 7

a.

REGRESSION
REGRESSION
/VARIABLES= Correct
/DEPENDENT= Time
/METHOD=ENTER
/STATISTICS=COEFF R ANOVA.
Model Summary (Time)
R R Square Adjusted R Square Std. Error of the Estimate
.18 .03 -.09 3.63
ANOVA (Time)
Sum of Squares df Mean Square F Sig.
Regression 3.34 1 3.34 .25 .628
Residual 105.33 8 13.17
Total 108.67 9
Coefficients (Time)
Unstandardized Coefficients Standardized Coefficients
B Std. Error Beta t Sig.
(Constant) 17.20 2.45 .00 7.01 .000
Correct -.21 .42 -.18 -.50 .628

The Regression equation is Y’ = -0.21 (# correct) + 17.20

Gather from the highlighted Coefficients


b. Y’ -0.21 (8) + 17.20 = 15.52

According to the book, the correct answer should be 15.49 but since PSPP gave me the rounded value
with only two decimal places that were the difference lied.

c.

Time (Y) Correct (x) Y’ = -.214x + 17.202 Y’ - Y

Difference Between

Predicted and

Actual

14.50 5 16.132 1.632

13.40 7 15.704 2.304

12.70 6 15.918 3.218

16.40 2 16.774 .374

21.00 4 16.346 -4.654

13.90 3 16.56 2.66

17.30 12 14.634 -2.666

12.50 5 16.132 3.632

16.70 4 16.346 -.354

22.70 3 16.56 -6.14

7.

DESCRIPTIVES
DESCRIPTIVES
/VARIABLES= Years_of_Experience Level_of_Education
Score_on_Great_Chef_Test Number_Position
/STATISTICS=MEAN STDDEV.
Valid cases = 19; cases with missing value(s) = 0.
Variable N Mean Std Dev
Years of Experience 19 14.84 7.92
Level of Education 19 2.16 .69
Score on Great Chef Test 19 84.47 11.32
# Posito 19 7.42 3.01

REGRESSION
REGRESSION
/VARIABLES= Years_of_Experience Level_of_Education Number_Position
/DEPENDENT= Score_on_Great_Chef_Test
/METHOD=ENTER
/STATISTICS=COEFF R ANOVA.
Model Summary (Score on Great Chef Test)
R R Square Adjusted R Square Std. Error of the Estimate
.47 .22 .06 10.96
ANOVA (Score on Great Chef Test)
Sum of Squares df Mean Square F Sig.
Regression 504.08 3 168.03 1.40 .282
Residual 1800.66 15 120.04
Total 2304.74 18
Coefficients (Score on Great Chef Test)
Unstandardized Coefficients Standardized Coefficients
B Std. Error Beta t Sig.
(Constant) 96.38 10.07 .00 9.57 .000
Years of Experience .96 .55 .67 1.74 .102
Level of Education -5.79 3.93 -.35 -1.47 .162
# Posito -1.84 1.41 -.49 -1.30 .212

a. The best predictor of the three years is year of experience, but because none is significant (.102)
they are all equally good or bad

b. The regression equation is:

Y’ = 0.96 (X1) – 5.79 (X2) – 1.84 (X3) + 96.38

Y’ = 0.96(12) – 5.79(2) -1.84(5) + 96.38 = 87.12


Part 2

1. Is the fixed acidity of a wine a valid predictor of the alcohol content? Please explain
how you came to that conclusion.
DESCRIPTIVES
DESCRIPTIVES
/VARIABLES= alcohol fixedacidity
/STATISTICS=MEAN STDDEV.
Valid cases = 1599; cases with missing value(s) = 0.
Variable N Mean Std Dev
alcohol 1599 10.42 1.07
fixedacidity 1599 8.32 1.74

CORRELATIONS
CORRELATION
/VARIABLES = alcohol fixedacidity
/PRINT = ONETAIL SIG.
Correlations
alcohol fixedacidity
alcohol Pearson Correlation 1.00 -.06
Sig. (1-tailed) .007
N 1599 1599
fixedacidity Pearson Correlation -.06 1.00
Sig. (1-tailed) .007
N 1599 1599

REGRESSION
REGRESSION
/VARIABLES= fixedacidity
/DEPENDENT= alcohol
/METHOD=ENTER
/STATISTICS=COEFF R ANOVA.
Model Summary (alcohol)
R R Square Adjusted R Square Std. Error of the Estimate
.06 .00 .00 1.06
ANOVA (alcohol)
Sum of Squares df Mean Square F Sig.
Regression 6.90 1 6.90 6.10 .014
Residual 1807.86 1597 1.13
Total 1814.76 1598
Coefficients (alcohol)
Unstandardized Coefficients Standardized Coefficients
B Std. Error Beta t Sig.
(Constant) 10.74 .13 .00 82.63 .000
fixedacidity -.04 .02 -.06 -2.47 .014

Fixed acidity of wine is a valid predictor of the alcohol content because the significance value

is .014 for both the ANOVA and Coefficients which is lower than .05 alpha level. Even though it

is valid, it is not a great predictor since the correlation coefficient is extremely weak (r = -.062).

2. Construct a scatter plot with a best fit regression line for the analysis above

GRAPH
/SCATTERPLOT(BIVAR)=fixedacidity WITH alcohol
/MISSING=LISTWISE.

I used SPSS to Do this!, PSPP cannot do this


Graph
3. What is the predicted alcohol content of a wine that has a fixed acidity of 6.1?

I used the regression formula from above and plug in the 6.1 for x:

Y’ = 10.74 - .04(6.1)

Y’ = 10.74 - .244

Y’ = 10.496 (This is the predicted alcohol content)

4. Determine if fixed acidity and freesulfurdioxide levels are a valid predictor of alcohol
content.

DESCRIPTIVES
DESCRIPTIVES
/VARIABLES= alcohol fixedacidity freesulfurdioxide
/STATISTICS=MEAN STDDEV.
Valid cases = 1599; cases with missing value(s) = 0.
Variable N Mean Std Dev
alcohol 1599 10.42 1.07
fixedacidity 1599 8.32 1.74
freesulfurdioxide 1599 15.87 10.46

CORRELATIONS
CORRELATION
/VARIABLES = alcohol fixedacidity freesulfurdioxide
/PRINT = ONETAIL SIG.
Correlations
alcohol fixedacidity freesulfurdioxide
alcohol Pearson Correlation 1.00 -.06 -.07
Sig. (1-tailed) .007 .003
N 1599 1599 1599
fixedacidity Pearson Correlation -.06 1.00 -.15
Sig. (1-tailed) .007 .000
N 1599 1599 1599
freesulfurdioxide Pearson Correlation -.07 -.15 1.00
Sig. (1-tailed) .003 .000
N 1599 1599 1599
REGRESSION
REGRESSION
/VARIABLES= fixedacidity freesulfurdioxide
/DEPENDENT= alcohol
/METHOD=ENTER
/STATISTICS=COEFF R ANOVA.
Model Summary (alcohol)
R R Square Adjusted R Square Std. Error of the Estimate
.10 .01 .01 1.06
ANOVA (alcohol)
Sum of Squares df Mean Square F Sig.
Regression 18.47 2 9.24 8.21 .000
Residual 1796.29 1596 1.13
Total 1814.76 1598
Coefficients (alcohol)
Unstandardized Coefficients Standardized Coefficients
B Std. Error Beta t Sig.
(Constant) 10.93 .14 .00 76.45 .000
fixedacidity -.05 .02 -.07 -2.94 .003
freesulfurdioxide -.01 .00 -.08 -3.21 .001

Fixed acidity and free sulfur dioxide are both valid predictors of alcohol content because their p

values are .003 and .001 which are both lower than the .05 alpha level. This means that they are

significantly correlated with alcohol content. The ANOVA is also in agreement because the

significance value is .000 which is also lower than the .05 alpha level. Both the correlation

coefficients are extremely low r = -.062 and r = -.069 which signifies that they are very weakly

correlated with alcohol content.


5. Did adding freesulfurdioxide to the prediction model with fix acidity improve or not
improve the prediction model? Please explain how you made this determination.

Adding free sulfur dioxide to the prediction model with fixed acidity improved the prediction

model as noted on question 4, both variables were correlated with the alcohol content. This

signifies that we can plug both variables into a regression equation to determine the alcohol

content. The ANOVA conducted by PSPP had a significance value of .014, but it lowered to .000

when adding free sulfur dioxide. Meaning that there’s an even lower chance that we can make a

type I error. Adding free sulfur dioxide enables us to account for more shared variance.

Y’ = -.05x - .01z + 10.93

Where Y’ = predicted alcohol content, x = fixed acidity, y = free sulfur dioxide

6. For a wine that has an alcohol level of 9.1 and fixed acidity of 5.0, what
would be the anticipated freesulfurdioxide level of the wine?

9.1 = -.05(5.0) - .01z + 10.93

9.1 = -.25 - .01z + 10.93

9.1 = -.01z +10.68

-1.58 = -.01z

z = 158 (anticipated free sulfur dioxide level)

• PSPP rounds to 2 decimal places

This value is a little too high compared with other free sulfur dioxide. Therefore I decided to

conduct another multivariable regression by PSPP where this was the case:
REGRESSION
REGRESSION
/VARIABLES= alcohol fixedacidity
/DEPENDENT= freesulfurdioxide
/METHOD=ENTER
/STATISTICS=COEFF R ANOVA.
Model Summary (freesulfurdioxide)
R R Square Adjusted R Square Std. Error of the Estimate
.17 .03 .03 10.31
ANOVA (freesulfurdioxide)
Sum of Squares df Mean Square F Sig.
Regression 5227.94 2 2613.97 24.60 .000
Residual 169617.04 1596 106.28
Total 174844.98 1598
Coefficients (freesulfurdioxide)
Unstandardized Coefficients Standardized Coefficients
B Std. Error Beta t Sig.
(Constant) 31.91 2.89 .00 11.03 .000
alcohol -.78 .24 -.08 -3.21 .001
fixedacidity -.95 .15 -.16 -6.42 .000

Y’ = -.953x -.777z + 31.908, where x = fixed acidity, z = alcohol, and Y’ = anticipated/predicted

free sulfur dioxide level

Y’ = -.95(5.0) - .78(9.1) + 31.91

Y’ = -4.75 – 7.098 + 31.91

Y’ = 20.062 (This is the final anticipated free sulfur dioxide level)


7. In terms of the two variable above, which variable contributes more to the
predictive variability in alcohol content? Please explain how you came to this
conclusion.

In the terms of the two variables above, the variable that contributes more to the

predictive variability in alcohol content was free sulfur dioxide. This is because it is

significant because the value was .000. slope cannot be used to determine which variable

is a better predictor because the scale is different. Meaning that we must take a look at the

Standardized Coefficients Beta magnitudes.


8. Run a regression analysis using all the variables to determine if the variables combined are a
valid predictor of wine quality. Please explain how you came to that conclusion and report the
shared variance explained by the prediction model.

REGRESSION
REGRESSION
/VARIABLES= alcohol sulphates pH density totalsulfurdioxide
freesulfurdioxide chlorides residualsugar citricacid volatileacidity
fixedacidity
/DEPENDENT= quality
/METHOD=ENTER
/STATISTICS=COEFF R ANOVA.
Model Summary (quality)

R R Square Adjusted R Square Std. Error of the Estimate

.60 .36 .36 .65

ANOVA (quality)

Sum of Squares df Mean Square F Sig.

Regression 375.75 11 34.16 81.35 .000

Residual 666.41 1587 .42

Total 1042.17 1598

Coefficients (quality)

Unstandardized Coefficients Standardized Coefficients

B Std. Error Beta t Sig.

(Constant) 21.97 21.19 .00 1.04 .300

alcohol .28 .03 .36 10.43 .000

sulphates .92 .11 .19 8.01 .000

pH -.41 .19 -.08 -2.16 .031

density -17.88 21.63 -.04 -.83 .409

totalsulfurdioxide .00 .00 -.13 -4.48 .000


freesulfurdioxide .00 .00 .06 2.01 .045

chlorides -1.87 .42 -.11 -4.47 .000

residualsugar .02 .02 .03 1.09 .276

citricacid -.18 .15 -.04 -1.24 .215

volatileacidity -1.08 .12 -.24 -8.95 .000

fixedacidity .02 .03 .05 .96 .336

The variables combined are valid predictor for wine quality because the significance value
obtained from the ANOVA equaled .000. which is a lower value than .05 alpha level. I came to
this conclusion because some variables such as citric acid, fixed acidity, etc. are not significantly
correlated to the wine quality, but other variables such as chloride and volatile acidity are
significant. The shared variance explained by the prediction model is .36 or 36%.

9.What are the two variables from the analysis directly above the contribute the most weight
to the prediction model? How did you come to that conclusion.

The two variables form the analysis directly above that contributed the most weight to the
prediction model are chloride and density because they have the greatest magnitude of slope with
value of -17.88 and -1.87. Meaning that every individual unit in chloride or density will change
the predicted wine quality.

10. Run another regression analysis like #8 but only use the two variables that you selected in
#9. Based on your analysis, do you think any of the omitted variables combined helped or
hurt the predictive validity of predicting wine quality? Please explain how you came to that
conclusion.

DESCRIPTIVES
DESCRIPTIVES
/VARIABLES= quality density chlorides.
Valid cases = 1599; cases with missing value(s) = 0.

Variable N Mean Std Dev Minimum Maximum

quality 1599 5.64 .81 3.00 8.00

density 1599 1.00 .00 .99 1.00

chlorides 1599 .09 .05 .01 .61


CORRELATIONS
CORRELATION
/VARIABLES = density quality chlorides
/PRINT = TWOTAIL SIG.
Correlations

density quality chlorides

density Pearson Correlation 1.00 -.17 .20

Sig. (2-tailed) .000 .000

N 1599 1599 1599

quality Pearson Correlation -.17 1.00 -.13

Sig. (2-tailed) .000 .000

N 1599 1599 1599

chlorides Pearson Correlation .20 -.13 1.00

Sig. (2-tailed) .000 .000

N 1599 1599 1599

REGRESSION
REGRESSION
/VARIABLES= chlorides density
/DEPENDENT= quality
/METHOD=ENTER
/STATISTICS=COEFF R ANOVA.
Model Summary (quality)

R R Square Adjusted R Square Std. Error of the Estimate

.20 .04 .04 .79

ANOVA (quality)

Sum of Squares df Mean Square F Sig.

Regression 41.44 2 20.72 33.05 .000

Residual 1000.72 1596 .63


Total 1042.17 1598

Coefficients (quality)

Unstandardized Coefficients Standardized Coefficients

B Std. Error Beta t Sig.

(Constant) 72.02 10.67 .00 6.75 .000

chlorides -1.68 .43 -.10 -3.90 .000

density -66.45 10.71 -.16 -6.20 .000

I think that omitting other variables did not help or hurt the predictive validity of predicting wine

quality because in both ANOVAs, the significance level is .000. The density is now a much

bigger predictor of wine quality with a slope of -66.45 vs. -17.88 when compared with the other

variables from question 9. Meaning that density has more effect to it when predicting the wine

quality.

You might also like