Professional Documents
Culture Documents
Math 1040
April 14, 2018
Spring 2018 Term Project – Skittles Candy Analysis
To facilitate the class’s understanding of concepts taught throughout the semester, our
class performed several statistical analyses on Skittles candies. This work was performed both
individually and as a part of groups which enabled a student to practice the concepts covered in
the lesson material, but also encouraged discussion and validation with other members of the
class. The steps of the project were staggered so that they corresponded to the ideas,
methodologies, and graphs taught up to that point in the semester. For example, after we
covered different methods of sampling, the first part of the project involved class members
obtaining a simple random sample of candies from 2.17-ounce bags of Skittles (Original). This
resulted in 74 good sample (two were removed since they were clear outliers to the data) and
Part 1
The second part involved a discussion as a group of the expected results, a comparison
of this expectation to the observed results, graphing of the data using both a Pareto chart and a
Pie chart, as well as a discussion of whether the data represents a simple random sampling:
Part 2 - Group
The expected proportions/percentages for Red, Orange, Yellow, Green, and Purple are 20%
each. This is based on the assumption the colors have even chances of appearing. In reality,
even though the Skittles are distributed by standardized processes and machinery, variability
will, to some extent, still be introduced. Therefore, it is highly unlikely each color will account for
exactly 20% in each bag.
Count Count Count Count
Count Red Orange Yellow Green Purple
Expected
Proportion 20.0% 20.0% 20.0% 20.0% 20.0%
Observed
Proportion 20.3% 19.9% 20.5% 20.2% 19.1%
Adam Sweeney
Math 1040
April 14, 2018
1) Pareto Chart
Adam Sweeney
Math 1040
April 14, 2018
Pie Chart
Yes, the data represents a random sampling of 2.17-ounce bags of Skittles, at least within the
Salt Lake City, Utah area. The population represented by this sample is all 2.17-ounce bags of
Skittles available for purchase. The bags were presumably purchased from various (and
somewhat unique) stores by each member of the class, though likely these stores were
conveniently accessible for each student. The results could perhaps be distorted if the production
process, delivery process, or availability of 2.17-ounce bags of Skittles were different for this
geographic region and, in particular, for the stores that were most convenient to the students. A
likely better representation of the population would be to purchase 2.17-ounce bags of Skittles
from different geographical locations and from different stores, varying days and times of
purchase leading up to the assignment. This would probably provide a better sampling of the
population since this increases the chances of purchasing bags of Skittles from different
production groups.
This was followed by an individual comparison of the class’s data to each student’s individual
Count Red Count Orange Count Yellow Count Green Count Purple Total Count
My Bag 13 (21%) 8 (12.9%) 14 (22.6%) 17 (27.4%) 10 (16.1%) 62
Class Counts 893 (20.3%) 874 (19.9%) 900 (20.5%) 889 (20.2%) 838 (19.1%) 4394
The graphs (and information presented in the tables) of the class data essentially match what I
expected to see regarding each color’s count approximating 20% of the total (within a 1%
margin of error). I note that the sample from my personal bag of Skittles varied quite a bit more,
over 7% different in a couple of cases. I believe that the class counts benefit from a wider
sampling, especially since one sample is almost certainly not sufficient for gathering the
appropriate data. The class counts appear to have fairly consistent numbers, with the exception
of the two entries where a significant variance occurred (~630% more than “usual” in one of the
cases). These outliers would potentially skew the proportions if included, especially the case
with 106 Skittles, 58 of which were purple. This proportion of ~55% is nearly triple the class
average (not including this case) and so would inflate the purple Skittle proportion. This
emphasizes to me the importance of doing everything possible to eliminate “bad data” that
could skew results, as well as the importance of acquiring a good sample of data to better
illustrate the behavior of the population.
Later in the semester, the project groups performed a more detailed statistical breakdown of
the class totals. This included determining the mean, standard deviation, minimum, median,
and maximum vales of the data, as well as identification of the first and third quartiles. This
breakdown was accompanied by a frequency histogram and box plot of candy counts per bag:
Part 3 – Group
Individually, students were asked to answer a question regarding the findings of the variable
“Total candies in each bag” as well as to write a paragraph explaining the difference between
quantitative data and qualitative (or categorical) data. My responses are below:
Part 3 – Individual
The findings regarding the variable “Total candies in each bag” generally follows what I would
expect: the total Skittles in each bag, while having some variance/outliers in count, would most
frequently be near the average (~59 Skittles per bag). This is represented in both the bell-shaped
frequency histogram and the “centered” (i.e. approximately equal whisker length, bell-shaped data)
Adam Sweeney
Math 1040
April 14, 2018
box diagram. This is also represented by the mean and median number of candies per bag being very
close to equal in value (less than one Skittle difference). The histogram shows the majority of bags
contain a number of candies within one standard deviation (~3 Skittles) of the average. This is
corroborated by the box diagram, which shows that the majority of bags are fall very close to the
mean and median with only two outliers falling outside the lower fence. These findings are
supported by the 62 skittles found in my own bag, a count that is less than one standard deviation of
the average produced by the 74 total bag counts of the class.
Quantitative data can have arithmetic operations performed on it to provide further meaning.
Examples could include a grade point average, salary earned, number of cats owned, or counts of
Skittles in specific size bags. A mean, median, mode, standard deviation, and quartile can be
calculated for this data. For example, the total number of Skittles in a bag and for several bags could
be averaged (mean) or observed to determine if one (or several) totals are repeated more than
others (mode). This kind of data lends itself to scatter plots (e.g. time-series), histograms, and box
plots where trends can be observed. The Skittle example can be plotted in a histogram to present
potential even variance in totals (evenly distributed) or if a majority falls near a consistent number
with a minority on either side (bell-shaped) and so on. Presenting this data in pie chart would not
present any meaningful information as there is no “grand total” to compare the bag totals against.
Adam Sweeney
Math 1040
April 14, 2018
A few weeks later, after the class discussed the concept of confidence intervals, groups were
asked to construct and interpret a 99% confidence interval estimate for the population
proportion of yellow candies. They were also asked to construct and interpret a 90% confidence
interval estimate for the population mean number of candies per bag.
Part 4 – Group
𝟎.𝟐𝟎𝟓(𝟏−𝟎.𝟐𝟎𝟓)
2. Upper: 𝟎. 𝟐𝟎𝟓 + 𝟐. 𝟓𝟕𝟓𝟖 × √ = 𝟎. 𝟐𝟐𝟏
𝟒𝟑𝟗𝟒
𝒖𝒑𝒑𝒆𝒓 𝒍𝒊𝒎𝒊𝒕−𝒍𝒐𝒘𝒆𝒓 𝒍𝒊𝒎𝒊𝒕
e. The margin of error is equal to 𝟐
𝟎.𝟐𝟐𝟏−𝟎.𝟏𝟖𝟗
xii. = 𝟎. 𝟎𝟏𝟔 or 1.6%
𝟐
This confidence interval, 𝟎. 𝟐𝟎𝟓 ± 𝟎. 𝟎𝟏𝟔, indicates that if a large number of different
samples is obtained, we expect 99% of intervals will encapsulate the population proportion of
Yellow Candies out of all Candies.
Adam Sweeney
Math 1040
April 14, 2018
90% Confidence Interval for the Population Mean Number of Candies per Bag:
a. Sample mean Number of Candies per Bag (x̄):
𝑪𝒂𝒏𝒅𝒊𝒆𝒔 𝒊𝒏 𝒆𝒂𝒄𝒉 𝒃𝒂𝒈 𝟒𝟑𝟗𝟒
xiii. 𝒙̄ = = = 𝟓𝟗. 𝟒 Candies per Bag
𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒃𝒂𝒈𝒔 𝟕𝟒
b. Since we have the sample mean, we will construct a confidence interval for a
population mean ().
c. The two requirements that must be met to construct a confidence interval for a
population mean are:
xiv. The sample was obtained through a simple random sample since
several students obtained a 2.17-ounce bag of Skittles from various and
(at least somewhat) unique locations.
xv. 𝒏 = 𝟕𝟒 ≥ 𝟑𝟎 Verified
d. 90% Confidence Interval is (58.8, 59.9).
𝒔
xvi. Lower and Upper bounds:𝒙̄ ± 𝒕𝜶 × √𝒏 where α = 0.10, 𝒕.𝟏𝟎 = 𝟏. 𝟔𝟔𝟔𝟎, and
𝟐 𝟐
𝒔 = 𝟐. 𝟖𝟏𝟐𝟒𝟏𝟐
𝟐.𝟖𝟏𝟐𝟒𝟏𝟐
1. Lower: 𝟓𝟗. 𝟒 − 𝟏. 𝟔𝟔𝟔𝟎 × √𝟕𝟒 = 𝟓𝟖. 𝟖
𝟐.𝟖𝟏𝟐𝟒𝟏𝟐
2. Upper: 𝟓𝟗. 𝟒 + 𝟏. 𝟔𝟔𝟔𝟎 × = 𝟓𝟗. 𝟗
√𝟕𝟒
𝒖𝒑𝒑𝒆𝒓 𝒍𝒊𝒎𝒊𝒕−𝒍𝒐𝒘𝒆𝒓 𝒍𝒊𝒎𝒊𝒕
e. The margin of error is equal to 𝟐
𝟓𝟗.𝟗−𝟓𝟖.𝟖
xvii. = 𝟎. 𝟓𝟓 or 0.55 Candies per Bag
𝟐
This confidence interval, 𝟓𝟗. 𝟒 ± 𝟎. 𝟓𝟓, indicates that if a large number of different samples is
obtained, we expect 90% of intervals will encapsulate the population mean Number of
Candies per Bag.
Individuals were asked to generally explain the purpose and meaning of a confidence interval.
Part 4 – Individual
Ultimately, the term project culminated in each student writing a reflection essay about
concepts learned throughout the semester. This reflection could cover topics including what
the student learned, how mathematics and statistics skills will impact future classes in the
student’s school career, how the project helped to develop the student’s problem solving skills,
Part 5 – Reflection
I believe one of the key takeaways from this semester is that statistics analysis can be applied to
a very diverse repertoire of problems or situations. As demonstrated in the variety of examples provided
in this course’s material, statistical analysis is used in an attempt to provide insight into the populations
we are a part of. This includes everything from the likelihood of a candy bar being within a margin of
error of average weight to correlating gun-related incidents during a period of change in firearm
legislation. This information can influence decisions that have significant impact on people’s lives.
Examples include new initiatives a business is considering (and thereby an employer’s potential success
or failure), the lawmaker’s we vote for (and thereby the laws and policies we abide by), or even the
viewership of television shows (and thereby the longevity of one of our recreational avenues). I have
been aware of the use of statistics throughout my life, but I believe I better understand the breadth of
Another takeaway from this semester is how these analyses are performed and how inferences
are made. I have previously been rather dubious about the authenticity or accuracy of statistics as they
often appear skewed to sell an argument. I would recall the old joke, “[Insert random percentage here]%
Adam Sweeney
Math 1040
April 14, 2018
of all statistics are made up”. I believe this course has prepared me to both better recognize potentially
skewed data, as well as to also better understand and trust carefully performed analyses. In particular,
the group project throughout the semester has made me better aware of the need for consistent scales,
appropriate graphical representations of different types of data, the influence of outliers, and the
importance for proper modeling of data distributions. I believe I can leverage this information in my own
work and better appreciate the need and use of it in the world around me.
This project really helped reinforce the concepts taught this semester. We were able to
obtain a sample and walk through the different levels of analysis taught, including identifying
the mean and median (and understanding when to use which), and developing confidence
intervals. The group work encouraged discussion of topics and provided opportunities for
students to clarify concepts to each other. This practice, in particular, helped me to ensure I
questions for others. I believe this project was a practical way of ensuring students remained