INFERENCE IN REGRESSION COEFFICIENTS

- tests whether i 0; i = 1, 2, 3, k
SIMPLE LINEAR REGRESSION
- used to estimate the dependent variable Y for given set of independent variable X.
Y = a + bX + or
Y = 0 + 1X + ; where

and

inference in 1 may be performed to determine if it is significantly different from zero (1 0), using

a linear relationship (linearity) exists between Y and Xi if the p-value of 1 (using t-test) < .
R2 is the proportion of the total variance (s2) of Y that can be explained by the linear regression of Y on X.

Example:
Using the example about the file, HCTRBC.sav, find the linear
regression model that estimates the RBC (Y, in x1012/L), given the
hematocrit (X, in % vol) of a patient.
Find Y = 0 + 1X +

Estimate the RBC of a patient with hematocrit of 43.2 %.

Find the residual of the simple linear regression model if a patient has HCT of 40.7%.

ID
1
2
3
4
5
6
7
8
9
10
SUMS:

HCT
(% vol)
X
40.7
40.3
40.9
38.7
38.2
39.4
38
38.2
43.4
38.3

RBC
(x1012/L)
Y
4.4
4.3
4.4
4.1
4.1
4.2
4.1
4
4.6
4.1

X2
1656.49
1624.09
1672.81
1497.69
1459.24
1552.36
1444
1459.24
1883.56
1466.89

Y2
19.36
18.49
19.36
16.81
16.81
17.64
16.81
16
21.16
16.81

XY
179.08
173.29
179.96
158.67
156.62
165.48
155.8
152.8
199.64
157.03

X = 396.1

Y = 42.3

X2 = 15716.37

Y2 = 179.25

XY = 1678.37

MULTIPLE LINEAR REGRESSION

Y = 0 + 1X1 + 2X2 + + kXk + or = +

linear relationship (linearity) exists between Y and Xk if the p-value of the k < , using the individual t-tests of
the ANOVA result.
- Hypotheses are as follows:
Ho: ! = 0.
Ha: ! 0.
Diagnostic checking of the linear regression model may be applied by checking if:
the residuals are normally distributed (Kolmogorov-Smirnov Test of Normality)
Ho: The residuals are normally distributed.
Ha: The residuals are not normally distributed.
the residuals have constant variance (by using Levenes test or Bartletts test)
Ho: The variances are equal.
Ha: The variances are not equal.
Examples:
1. A researcher wants to determine if which among the variables (mother and fathers height; taller grandfathers height)
determine a sons height (expressed in inches). The data is in heights.sav. Test all hypotheses at = 0.05.
-

Linear Regression Results:

2
R = _________________
Do the linear regression results show that at
least, one of the coefficients significantly differ
from zero?
Ho: _______________________________
Ha: _______________________________
Test statistic: _______ p-value: ________
Conclusion: ________________________
Which of the variables coefficients significantly
differ from zero?
Mothers height:
Ho: ____________________________
Ha: ____________________________
Regression coefficient: ____________
Test statistic: _______ p-value: ____
Fathers height:
Ho: ____________________________
Ha: ____________________________
Regression coefficient: ____________
Test statistic: _______ p-value: ____
Taller grandfathers height:
Ho: ____________________________
Ha: ____________________________
Regression coefficient: ____________
Test statistic: _______ p-value: ____
Are the residuals normally distributed?
Ho: ____________________________
Ha: ____________________________
Test statistic: _______ p-value: ____

Summary of the Findings:

_________________________________________________________
_________________________________________________________
_________________________________________________________
_________________________________________________________
2. (bloodlead.sav) A group of researchers wanted to determine the factors that contributes to the amount of blood lead level
(in g/dL) in radiator repair workers. Data such as number of radiators repaired per day, years of employment, and renal
function tests [FBS (in mmol/L), creatinine (in mol/L), crea (in mg/dL), BUN (in mmol/L), presence of protein in urine,
and eGFR (in mL/min/1.73m)] were gathered. Conduct a multiple regression model to determine the factors that
contribute to the amount of blood lead level in radiator repair workers. Use 5% level of significance.
Linear Regression Results:
2
R = _________________
Regression equation: ________________________________________________________________________________
Do the linear regression results show that at least, one of the coefficients significantly differ from zero?

Ho: _______________________________________________________________________________________________

Ha: _______________________________________________________________________________________________

Test statistic: _______ p-value: ________

Conclusion: ________________________________________________________________________________________
Which of the variables coefficients significantly differ from zero?
Number of radiators repaired per day
Ho: _________________________________________ Ha: _________________________________________________
Regression coefficient: ____________

Test statistic: _______ p-value: _______
Conclusion: ________________________________________________________________________________________
Years of employment (yrs)

Ho: _________________________________________ Ha: _________________________________________________
Regression coefficient: ____________

Test statistic: _______ p-value: _______
Conclusion: ________________________________________________________________________________________
Renal function tests

FBS (in mmol/L)
Ho: ______________________________________ Ha: ___________________________________________

Regression coefficient: ____________

Test statistic: _______ p-value: _______

Conclusion: ________________________________________________________________________________
Creatinine (in mol/L)
Ho: ______________________________________ Ha: ___________________________________________

Regression coefficient: ____________

Test statistic: _______ p-value: _______

Conclusion: ________________________________________________________________________________
Crea (in mg/dL)
Ho: ______________________________________ Ha: ___________________________________________

Regression coefficient: ____________

Test statistic: _______ p-value: _______

Conclusion: ________________________________________________________________________________
BUN (in mmol/L)
Ho: ______________________________________ Ha: ___________________________________________

Regression coefficient: ____________

Test statistic: _______ p-value: _______

Conclusion: ________________________________________________________________________________

Presence or Absence of Protein
Ho: ______________________________________ Ha: ___________________________________________

Regression coefficient: ____________

Test statistic: _______ p-value: _______

Conclusion: ________________________________________________________________________________

eGFR (in mL/min/1.73 m)
Ho: ______________________________________ Ha: ___________________________________________

Regression coefficient: ____________

Test statistic: _______ p-value: _______

Conclusion: _____________________________________________________________________________

Are the residuals normally distributed?

Ho: ______________________________________ Ha: ___________________________________________
Test statistic: _______ p-value: _______
Conclusion: _____________________________________________________________________________

Summarize your findings using the table below:

Variables
Coefficient
t stat
p-value

Number of Radiators repaired per day

Years of employment

Renal Function Tests

FBS (mmol/L)

Creatinine (mol/L)
Crea (mg/dL)
BUN (mmol/L)
eGFR (ml/min/1.72m)
Multiple linear regression R2 = ___________

__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________

MULTIPLE LOGISTIC REGRESSION

= + + + , or

Consider
-

! + where p = P(Y=1)
!
that
= !! !!! !! !!! !! !!! !!
!!!

= +

used when the dependent variable Y is dichotomous variable, when at least one of the independent variables Xi ,
i 1,2,,k, is interval/ratio.
validity of the model may be tested using the Hosmer and Lemeshow test, in which:
Ho: the data fits the model.
Ha: The data does not fit the model.

Example 1: An oncologist is interested to determine the variables that lead to papillary tumor growth, cancerous cells which
are found in the throat. Data from 40 patients who may have lived with exposure to radioactive iodine in the last 5 years and
who have had thyroiditis in the last six months is at thyroiditis.sav.
Model Fit Test:
Ho: ________________________
Ha: ________________________
Test Statistic: __________
p-value: ______________
Conclusion: __________________

Which of the variables significantly

coefficients significantly differ from
zero?
Nuclear Location (in km)
Ho: _________________________________________ Ha: _________________________________________________
Regression coefficient: ____________

Test statistic: _______ p-value: _______
Conclusion: ________________________________________________________________________________________
Gender
Ho: _________________________________________ Ha: _________________________________________________
Regression coefficient: ____________

Test statistic: _______ p-value: _______
Conclusion: ________________________________________________________________________________________
Hashimotos Thyroiditis
Ho: _________________________________________ Ha: _________________________________________________
Regression coefficient: ____________

Test statistic: _______ p-value: _______
Conclusion: ________________________________________________________________________________________
Dental or Chest xray in the last 2 years
Ho: _________________________________________ Ha: _________________________________________________
Regression coefficient: ____________

Test statistic: _______ p-value: _______
Conclusion: ________________________________________________________________________________________
High Dosage of xray in the last 2 years
Ho: _________________________________________ Ha: _________________________________________________
Regression coefficient: ____________

Test statistic: _______ p-value: _______
Conclusion: ________________________________________________________________________________________
Immediate and second degree family history of thyroid cancer
Ho: _________________________________________ Ha: _________________________________________________
Regression coefficient: ____________

Test statistic: _______ p-value: _______
Conclusion: ________________________________________________________________________________________
Summarize your findings using the table below:

Variables

Nuclear Location

Gender

Hashimotos Thyroiditis

Dental or chest xray in the last 2 years
High dosage of xray in the last 2 years
Family history

Coefficient

stat

p-value

Odds Ratio
estimate

The data is fit for logistic regression !! = 7.013, = 0.535 .

__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________

Example 2: (renalcast.sav) A group of researches wanted to determine the variables that leads to renal cast formation of
construction workers. Years in the occupation, if painting is included in the occupation, and urinary findings, such as BUN,
uric acid, PH, and presence of bacteria were recorded. Conduct a multiple logistic regression model to determine the
variables that leads to renal cast formation of construction workers. Use 5% level of significance.
Model Fit Test:
Ho: ________________________ Ha: ________________________
Test Statistic: __________ p-value: ______________
Conclusion: __________________
Summarize your findings using the table below:

Variables

Coefficient

stat

p-value

Odds Ratio
estimate

Years in Occupation
Painting
BUN
Uric Acid
pH
Bacteria

The data is fit for logistic regression !! = _________________, = __________________ .

__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
