You are on page 1of 10

Math 1040 Skittles Term Project-Parts 2-6

DeAnna Brewster
Math 1040
Professor Hilton
12/5/2016
Math 1040 Skittles Term Project-Part 2

Candy Color is a Qualitative Variable because it describes the color (a characteristic) of the candy. The level of
measurement Nominal because color describes a category, there is no natural order, differences do not make
sense and ratios do not make sense.
Number of Candies per bag is a Quantitative-Discrete Variable because it is a numerical measure and is
countable without skipping any values. The level of measurement is Ratio because it is quantitative data, there
is a natural order, differences do make sense and the natural zero means none.
Total number of Skittles Counted = Sample Size = 3551
Summary:
716

Red
Orange
Yellow
Green
Purple
Total
20.16% 698
19.66% 726
20.44% 710
19.99% 701
19.74% 3551
100%
Pie Chart

Pareto Chart

Summary; candies per bag, per student in the class.

Summary statistics:
Column
Total
Candies Per
Bag

Mean

Std. dev. Median Range Min Max Q1 Q3 IQR Mode

60 59.183333 3.1108976

59

14

50

64 58 61.5

3.5

59

Fences:
Lower Fence = 58 1.5(3.5) = 52.75
Upper Fence = 61.5 + 1.5(3.5) = 66.75
Outliers = 50, 52
The Total Number of Candies from my bag was 61 therefore my bag was not one of the outliers.
Shape: For the qualitative data the graphs dont show dispersion (how spread out the data is) and
on a Pareto chart the data is entered from highest to lowest. This information seems pretty
uniform.
For the quantitative data the graphs seem to be a little skewed right with the Mean being
a little larger than the median. But just by looking at them they look a little skewed Left maybe
because of the outliers.

Math 1040 Skittles Term Project-Part 3


Before beginning, I think that the results will show that height cannot be used to predict the
number of candies that will be in a bag of skittles purchased, because a persons height has
nothing to do with how the company packages a bag of skittles, they package by weight.
The explanatory variable is Height of Purchaser (X) and the response variable is # of candies that
are in the bag of skittles purchased(Y).

R (correlation coefficient) = 0.17042887


o Correct Critical Value for determining whether there is a significant relationship =
0.361
o In the beginning I thought the data would show that there was no significant
relationship and yes, this is the case.

Regression Equation = 0.1287705x + 50.713668

2
12
22
32
42
52

o According to the regression equation the number of candies in a bag purchased by


someone who is 63.5 inches tall would be 58.9
o It is really not appropriate to use the regression equation because there is no
significant relationship between the height of the purchaser and the bag of skittles.
R-sq = 0.029046 The Coefficient of determination, 2.9% of the variation in the number of
candies per bag can be explained by the regression line relationship with height of the
purchaser. The stronger the correlation the higher the r-sq will be. Because the r-sq value
is 0.029046, closer to 0, this shows there is very little relationship between the height of
the purchaser and how many candies are in a bag of skittles.
If there were a significant relationship between height of the purchaser and candies per
bag, it would be inappropriate to predict the number of candies per bag if it was
purchased by the retired Huston Rockets player Yao Ming who is 90 inches tall because
his height is outside the scope of the collected data. It would be extrapolation.

Systematic sample:
Height
64
70
61
80
65
66

3 Candies
52
57
58
59
61
62

Correlation Coefficient r = 0.1456714081


Regression Equation: = 0.769230769x + 52.96153846
Critical value = 0.811
Based on the data there is not a significant linear relationship between X and Y for this smaller data set.

Math 1040 Skittles Project Part 4: Probability

Problem 1: Suppose you are going to randomly select two Skittles from the bag YOU purchased.
(a) What is the probability that both Skittles are purple if you select them with replacement?
9/61 * 9/61 = .0218
(b) What is the probability that both Skittles are purple if you select them without replacement?
9/61 *8/60 = .0197
(c) What is the probability that at least one Skittle is purple if you select them with replacement?
= .2733
Problem 2: Suppose all of the Skittles in the class data set are combined into one large bowl and you are going
to randomly select one Skittle.
(a) What is the probability that you select a green Skittle?
710/3551 = .1999
(b) What is the probability that you select a Skittle that is NOT green?
1- P(all are green) = 1- 710/3551 = .8001
(c) What is the probability that you select a Skittle that is red OR yellow?
716/3551 +726/3551 = .4061
(d) What is the probability that you select a Skittle that is orange GIVEN that it is a secondary color
(secondary colors are green, orange and purple)?
698/2109 = .3310

Problem 3: Suppose all of the Skittles in the class data set are combined into one large bowl and you are going
to randomly select ten Skittles with replacement and count how many are yellow.
(a) Show that this meets the requirements of the binomial probability distribution and identify n and p.
n = 10, p = .1966
1.
2.
3.
4.

There are a fixed number of trials, 10.


There are two disjoint outcomes, Yellow or not yellow.
The probability of success is consistent at .2044
The trials are independent because we are replacing the yellow skittle the outcome of one trial does
not affect the outcome of the other trials.

(b) What is the probability that exactly 4 of the 10 Skittles are yellow? Calc>Vars>Binompdf
Trials = 10, P = .2044, x = 4 P(4 of 10 will be yellow) = .0930
(c) For samples of size 10, what is the expected value and standard deviation for the number of yellow
skittles that will be included?
N = 10, p = .2044, x = 4
Expected value = n*p = 10*.2044 = 2.044
Standard deviation =

= 1.2752

Problem 4: For this problem, treat a 2.17 ounce bag of Skittles as an individual. Suppose the values for our class
data are the parameter values for all 2.17 ounce bags of Skittles. In other words, assume = mean number of
candies per bag in our class data set and = standard deviation of number of candies per bag in our class data.
Mean = 59.18 SD= 3.111
(a) Describe the sampling distribution for the mean number of candies per bag for samples of 32 bags.
Include center, spread and shape. Note: The shape of the SAMPLING DISTRIBUTION is different
from the shape of the population, which you determined in Part 2 of the project.
Center: The mean number of candies per bag for the sample size of 32 bags equals the mean of the
population at 59.18. The balancing point stays the same.
Spread: The standard deviation of the distribution of the sample mean = .5499522991, less than the
standard deviation of the population. The spread will get smaller as the sample size gets bigger.
Shape: The shape is approximately normal since the sample size is greator than 30, in this case the
sample size is 32.
(b) What is the probability that the mean number of candies per bag for a sample of 32 bags is greater than
58.5? Calc>2nd>Vars>normalcdf
lower 58.5, upper 1E99, 59.18, 3.11/
P(x > 58.5) = 1- .1081418575 = .8919 or approximately 89.2%

Math 1040 Skittles Project Part 5: Confidence Intervals

Explain in general the purpose and meaning of a confidence interval. (5 points)


o A confidence interval is an interval of numbers based on a point estimate that gives a range of
likely values for an unknown parameter. Confidence intervals are used to measure uncertainty. A
higher confidence associated with the confidence interval means that there is a greater degree of
certainty that the parameter falls within the bounds of the interval. Therefore, a higher
confidence level indicates that the parameters must be broader to ensure that level of confidence.
Identify the requirements for computing confidence intervals. List the requirements separately for a
confidence interval for a population proportion and for a population mean. (5 points)
o

Population Proportion
Verify that n (1- ) >10 (the normality condition)
Verify that n 0.05N (the sample size is no more than 5% of the population size, the
independence condition.

Population Mean
Sample data came from a Simple Random Sample or randomized experiment
Sample size is small relative to the population size (n 0.05N)
The data came from a population that is normally distributed, or the sample size is large.
A(1-a)100% confidence is interval for is given by
Lower Bound - *
Upper Bound + *
Where

is the critical value with n-1 degrees of freedom

Using values for the class data that you computed in Part 2 of the project, construct a 99% confidence
interval estimate for the true proportion of yellow candies using the class data as your sample.
Remember that for this computation, n is the number of CANDIES for the entire class data. Include all
your work, showing the formula used and appropriate values inserted (neatly written and scanned or
typed). (10 points) Calc>Stats>Tests> 1-PropZInt
X = 726 # of yellow Skittles in class data
n= 3551 total # of all skittles in class data
C= .99
Lower Bound (.18702, .22188) Upper Bound

Give an appropriate interpretation of your interval. (5 points)


With 99% confidence the true proportion of yellow skittles is between .187 and .222.

Based on your interval for the true proportion of yellow candies, was the proportion of yellow candies in
the single bag of candy you purchased a likely value for the true population proportion? Explain how
you know using actual values from your data and computations. (5 points)
No, from the bag I purchased, I had 11 yellow candies out of a total of 61 for a proportion of .180. So
my bag falls a little short of the .187 needed and does not fall within the range of .187 and .222.

Using values you computed in Part 2 of the project, construct a 95% confidence interval estimate for the
true mean number of candies per bag using the class data as your sample, but for this computation, n is
the number of BAGS. Include all your work, showing the formula used and appropriate values inserted
(neatly written and scanned or typed). (10 points)
Calc>Stats> Tests> Tinterval> Stats
n= 60 which is greater than 30 so I did not take out the outliers
= 59.183333
= 3.1108976
= .95
Lower Bound (58.38, 59.987) Upper Bound

Give an appropriate interpretation of your interval. (5 points)


With 95% confidence the true mean number of candies per bag is between 58.38 and 59.987.

Based on your interval for the true mean number of candies per bag, was the total number of candies in
the single bag you purchased a likely value for the population mean? Explain how you know using
actual values from your data and computations. (5 points)
My bag of skittles contained 61 candies so, it does not fall within the likely values of 58.38 and 59.987
for the population mean.

Math 1040 Skittles Project Part 6: Hypothesis Testing

Explain, in general, the purpose and meaning of a hypothesis test. (4 points)

A hypothesis test is a procedure based on sample results and probability that tests hypotheses about a
population. It is used to determine whether there is enough evidence in a sample of data to infer that a certain
condition is true for the entire population.
A hypothesis test examines two opposing hypotheses about a population: the null hypothesis and the alternative
hypothesis. The null hypothesis is the statement being tested indicating no change, effect, difference or
relationship in the population. It is assumed to be true until evidence indicates otherwise. The alternative
hypothesis is the statement that you are trying to find evidence to support.
Based on the sample data, the test determines whether to reject the null hypothesis.

Using values for the class data that you computed in Part 2 of the project and a 0.05 significance level,
test the claim that 20% of all Skittles candies are red. Show all the steps (neatly written and scanned,
typed, or copied from StatCrunch) including:
1. The hypotheses with correct notation (4 points)

2. The conditions for performing the hypothesis test, along with checking that they are methint:
they are not all met! (5 points)
1. Simple Random Sample-This requirement was not met because our entire class was assigned
to purchase a bag of skittles we did not use chance or an objective device to select people to
purchase bags of skittles from the population to be included in the sample. This was a
convenience sample.
2.

3551(.20)(1-.20) = 568.16 10

3. n< .05N The sample size of 3551skittles is less than all of the skittles in the population

3. The test statistic (2 points)


Calc>Stat>Tests> 1-PropZTest (

, x: 716, n: 3551, Prop:

, x: 716, n: 3551, Prop:

= .2433
4. The p-value (2 points)
Calc>Stat>Tests> 1-PropZTest (
P= .8078
5. The appropriate decision about the null hypothesis and an appropriate conclusion (4 points) P=
.8078 which is greater than .05
We do not reject the null hypothesis because there is insufficient evidence to conclude that
true.

is

There is insufficient evidence to conclude that the proportion of red skittles does not equal .20.
6. Also describe the Type I and Type II errors for this test. (8 points)
Type I Error- A Type I Error would conclude that the proportion of red skittles is not equal to .20
when it really is.
Type II Error- A Type II Error would be that we fail to conclude that the proportion of red skittles
does not equal .20 when the proportion really does not equal .20.

Using values for the class data that you computed in Part 2 of the project and a 0.01 significance level,
test the claim that the mean number of candies in a bag of Skittles is more than 58. Show all the steps
(neatly written and scanned, typed, or copied from StatCrunch) including:
1. The hypotheses with correct notation (4 points)

2. The conditions for performing the hypothesis test, along with checking that they are methint:
they are not all met! (5 points)
1. The sample is obtained using a simple Random Sample or from a randomized experiment.
This requirement was not met because our entire class was assigned to purchase a bag of
skittles we did not use chance or an objective device to select people to purchase bags of
skittles from the population to be included in the sample. This is a convenience sample.
2. The Sample has no outliers and comes from a normal population, or the Sample size (n) is
30. There are outliers in our sample but we have a sample size of 60 30.
3. The sample values are independent of each other.
3. The test statistic (2 points) Calc>Stats>Tests> T-Test
(

: 58,

59.183333,

.1108976,

t = 2.9464
4. The p-value (2 points) Calc>Stats>Tests> T-Test

: 58,

59.183333,

.1108976,

P= .0023
5. The appropriate decision about the null hypothesis and an appropriate conclusion (4 points)
.0023 < 0.01 therefore we reject the null hypothesis.
There is sufficient evidence to conclude that the mean number of candies per bag is more than 58.
6. Also interpret the p-value for this test. (4 points)
If the mean number of candies in the bag is 58 then the probability of getting a sample mean of
59.183 or more is .0023.

You might also like