Inferential Statistics Project1

inferential_statistics_project1
July 3, 2017
1 What is the True Normal Human Body Temperature?

Background The mean normal body temperature was held to be 37 C or 98.6 F for more than
120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868
book. But, is this value statistically correct?
In [36]: # libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pylab
# colors
green = '#7fc97f'
purple = '#beaed4'
organe = '#fdc086'
yellow = '#ffff99'
blue = '#386cb0'
df = pd.read_csv('data/human_body_temperature.csv')
In [16]: df.head()
Out[16]: temperature gender heart_rate

0 99.3 F 68.0
1 98.4 F 81.0
2 97.8 M 73.0
3 99.2 F 66.0
4 98.0 F 73.0
1.0.1 1. Check Normality

In [33]: # histogram
plt.hist(df.temperature, color = blue)
plt.xlabel("Body Temperature")
plt.title("Hitogram of Body Temperature")
1
plt.show()
plt.clf()
According to the histogram we can see that the distribution is very normal except slightly
skewed to the right. We can also further test this using qqplot.
In [40]: # use qqplot to check normality

stats.probplot(df.temperature, dist="norm", plot=pylab)
pylab.title("QQplot of Body Temperature")
pylab.show()
2
According to the qqplot, it is evident that the distribution is very normal.
1.0.2 2. Is the sample size large? Are the observations independent?

In [47]: # test sample size
len(df.temperature)
Out[47]: 130
Since the sample = 130 > 30, we conclude that it is large enough.
The observations are independent.
1.0.3 3. Is the true population mean really 98.6 degrees F?

H0 : The true population mean is 98.6 degrees F. ( = 98.6)
We will use 2-tail test because the null hypothesis is NOT equal instead of great
than/smaller than.
Ideally t and Z test will both work for large dataset. In this case, since we do not know the
variance of the poulation, we will use t test.
In [51]: sample_mean = df.temperature.mean()

sample_std = df.temperature.std()
[sample_mean, sample_std]
Out[51]: [98.24923076923078, 0.7331831580389454]
3
In [104]: p_value = stats.ttest_1samp(df.temperature, popmean = 98.6)
p_value
Out[104]: Ttest_1sampResult(statistic=-5.4548232923645195, pvalue=2.410632041556127
Since p-value < 0.05, we reject the null hypothesis and conclude that the true population cannot
be 98.6.
1.0.4 4. At what temperature should we consider someones temperature to be abnormal?

If we use the CI of the Z test, for temperature < 98.122 or > 98.376 should be consider abnormal.
In [113]: CI = stats.t.interval(0.95, len(df.temperature)-1, sample_mean, stats.sem

[round(item, 3) for item in CI]
Out[113]: [98.122, 98.376000000000005]
1.0.5 5. Is there a significant difference between males and females in normal temperature?
We can test this using: * Overlap * Probability of superiority * Pooled variance
In [71]: female_temp = df.temperature[df.gender == "F"]

male_temp = df.temperature[df.gender == "M"]
female_mean = female_temp.mean()
male_mean = male_temp.mean()
female_var = female_temp.var()
male_var = male_temp.var()
n1 = len(female_temp)
n2 = len(male_temp)
print("Female: (mean, var, len)", female_mean, female_var, n1)
print("Male: (mean, var, len)", male_mean, male_var, n2)
Female: (mean, var, len) 98.39384615384613 0.5527740384615375 65

Male: (mean, var, len) 98.1046153846154 0.488259615384615 65
In [86]: bins = np.linspace(97, 99, 1000)

plt.hist(female_temp)
plt.hist(male_temp)
plt.show()
4
Overlap
In [79]: threshold = (female_mean * n1 + male_mean * n2)/(n1 + n2) # need need to w
threshold
Out[79]: 98.24923076923076
In [88]: overlap_rate = sum(female_temp < threshold)/n1 + sum(male_temp > threshold

overlap_rate
misclassification_rate = overlap_rate / 2
misclassification_rate
Out[88]: 0.42307692307692313
The misclassification rate is really high, which mean there is not much difference between 2
distribution.
Probability of Superiority
In [96]: new_female_temp = np.random.choice(female_temp, n1, replace=True)
new_male_temp = np.random.choice(male_temp, n2, replace=True)
sum(x > y for x,y in zip(new_female_temp, new_male_temp))/n1
Out[96]: 0.53846153846153844
Since the probability of superiority is not really high (close to 90%), we cannot tell there is a
difference between body temperature of female and male. However, we still need pooled variance
to prove rigorously.
5
Pooled Variance H0 : The difference between 2 distribution is 0.
In [114]: diff = female_mean - male_mean

pooled_var = (n1 * female_var + n2 * male_var)/(n1 + n2)
p_value = diff/np.sqrt(pooled_var)
p_value
Out[114]: 0.40089173785982207
Since 0.40 is too big for the default significant level 0.05, we conclude that there NO difference
between the body temperature of female and male.

Inferential Statistics Project1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Inferential Statistics Project1

Uploaded by

Copyright:

Available Formats

inferential_statistics_project1

1 What is the True Normal Human Body Temperature?

Out[16]: temperature gender heart_rate

1.0.1 1. Check Normality

In [40]: # use qqplot to check normality

1.0.2 2. Is the sample size large? Are the observations independent?

1.0.3 3. Is the true population mean really 98.6 degrees F?

In [51]: sample_mean = df.temperature.mean()

Out[51]: [98.24923076923078, 0.7331831580389454]

Out[104]: Ttest_1sampResult(statistic=-5.4548232923645195, pvalue=2.410632041556127

1.0.4 4. At what temperature should we consider someones temperature to be abnormal?

In [113]: CI = stats.t.interval(0.95, len(df.temperature)-1, sample_mean, stats.sem

Out[113]: [98.122, 98.376000000000005]

In [71]: female_temp = df.temperature[df.gender == "F"]

Female: (mean, var, len) 98.39384615384613 0.5527740384615375 65

In [86]: bins = np.linspace(97, 99, 1000)

In [88]: overlap_rate = sum(female_temp < threshold)/n1 + sum(male_temp > threshold

In [114]: diff = female_mean - male_mean

You might also like