Basic Statistics Lecture Notes

STATISTICS
- the practice or science of collecting and analyzing

numerical data in large quantities, especially for the
purpose of inferring proportions in a whole from those in a
representative sample.
Merriam-Webster dictionary defines statistics as
"classified facts representing the conditions of a people in a
state especially the facts that can be stated in numbers or
any other tabular or classified arrangement .
Statistician Sir Arthur Lyon Bowley defines statistics as
"Numerical statements of facts in any department of inquiry
placed in relation to each other.
TWO MAJOR AREAS
Descriptive Statistics.
- It comprises those methods concerned with collecting and
describing a set of numerical data so as to yield meaningful
inference. This statistics provides information only about
the collected data and in no way draws inferences. This can
either be graphical or computational like construction of
tables, charts, graphs, and other relevant computations. It
may also include the study of relationships between and
among variables.
a city, we are talking about the number of people in the city

- a measurable attribute of the city. Therefore, population
would be a quantitative variable.
A constant, or controlled variable, is a variable that is kept
the same in both conditions and the same throughout the
experiment. A constant is valuable to an experiment
because it ensures that both groups are receiving the same
treatment, except the manipulated variable, and that the
variable is not changing with time. It is important that the
constant variable doesn't change because this change may
directly cause a change in the dependent variable. Thus, if
there was a change in the dependent variable without a
constant in the experiment, the researcher wouldn't know if
it was due to the manipulated variable or the change in a
different variable. The constant allows the researcher to see
the impact of an independent variable on the dependent
variable.
Types of Data & Measurement Scales:

1.
Nominal
Inferential Statistics.
-If descriptive statistics is concerned only on presentation of
- Nominal scales are used for labeling variables, without
data, inferential statistics comprises those methods
anyquantitative value. Nominal scales could simply be
concerned with the analysis of a subset of data leading to
called labels. Here are some examples, below. Notice
predictions or inferences about the entire set of data. It
that all of these scales are mutually exclusive (no overlap)
involves all the techniques by which decisions about a
and none of them have any numerical significance. A good
statistical population are made based only on a sample
way to remember all of this is that nominal sounds a lot
having been observed or a judgment having been obtained.
like name and nominal scales are kind of like names or
It is concerned more with generalizing information or
labels.
making inference about the population. Considered as the

central function of modern statistics, inferential statistics is
concerned with two types of problems: (a) estimation of
population parameters, and (b) tests of hypothesis.
Variable
is an attribute that describes a person, place, thing, or idea.
The value of the variable can "vary" from one entity to
another. For example, a person's hair color is a potential
variable, which could have the value of "blond" for one
person and "brunette" for another.
Qualitative vs. Quantitative Variables
Variables can be classified as qualitative (aka, categorical)
or quantitative (aka, numeric).
Qualitative
variables take on values that are names or labels. The
color of a ball (e.g., red, green, blue) or the breed of a dog
(e.g., collie, shepherd, terrier) would be examples of
qualitative or categorical variables. Quantitative.
Quantitative
- variables are numeric. They represent a measurable
quantity. For example, when we speak of the population of
5021 MATH 7 BASIC STATISTICS

1
LARA, KIT B.
2.
Ordinal
-Ordinal scales are typically measures of non-numeric

concepts like satisfaction, happiness, discomfort, etc.
Ordinal is easy to remember because is sounds like
order and thats the key to remember with ordinal
scalesit is the order that matters, but thats all you really
get from these.
Advanced note: The best way to determine central
tendency on a set of ordinal data is to use the mode or
median; the mean cannot be defined from an ordinal set.
10:30-12:00 DAILY S508
Sampling Methods can be classified into one of two categories:
Probability Sampling: Sample has a known

probability of being selected
3.
Interval
Non-probability Sampling: Sample does not have

known probability of being selected as in convenience or
-Interval scales are numeric scales in which we know not
voluntary response surveys
only the order, but also the exact differences between the
Probability Sampling
values. The classic example of an interval scale
In probability sampling it is possible to both determine which
is Celsius temperature because the difference between
sampling units belong to which sample and the probability that
each value is the same. For example, the difference
each sample will be selected. The following sampling
between 60 and 50 degrees is a measurable 10 degrees,
methods are examples of probability sampling:
as is the difference between 80 and 70 degrees. Time is
1.
Simple Random Sampling (SRS)
another good example of an interval scale in which
2.
Stratified Sampling
the increments are known, consistent, and measurable.
3.
Cluster Sampling
4.
Systematic Sampling
5.
Multistage Sampling (in which some of the

methods above are combined in stages)
Of the five methods listed above, students have the most
trouble distinguishing between stratified
4.
sampling and cluster sampling.
Ratio
Stratified Sampling is possible when it makes sense to
Ratio scales are the ultimate nirvana when it comes to
partition the population into groups based on a factor that may
measurement scales because they tell us about the order,
influence the variable that is being measured. These groups
they tell us the exact value between units, AND they also
are then called strata. An individual group is called a stratum.
have an absolute zerowhich allows for a wide range of
With stratified samplingone should:
both descriptive and inferential statistics to be applied. At
partition the population into groups (strata)
the risk of repeating myself, everything above about interval
obtain a simple random sample from each group

(stratum)
data applies to ratio scales + ratio scales have a clear

definition of zero. Good examples of ratio variables include
collect data on each sampling unit that was randomly

sampled from each group (stratum)
height and weight.

Ratio scales provide a wealth of possibilities when it comes
to statistical analysis. These variables can be meaningfully
added, subtracted, multiplied, divided (ratios). Central
tendency can be measured by mode, median, or mean;
measures of dispersion, such as standard deviation and
coefficient of variation can also be calculated from ratio
Stratified sampling works best when a heterogeneous

population is split into fairly homogeneous groups. Under
these conditions, stratification generally produces more precise
estimates of the population percents than estimates that would
be found from a simple random sample. Table 3.2 shows some
examples of ways to obtain a stratified sample.
scales.

2
LARA, KIT B.
10:30-12:00 DAILY S508
Table 3.2. Examples of Stratified Samples

Example 1
Population
Example 2
All people in U.S.
Groups (Strata)
4 Time Zones in the U.S.
Example 3
All PSU intercollegiate
All elementary students in the local
athletes
school district
26 PSU intercollegiate teams
11 different elementary schools in the
(Eastern,Central, Mountain,Pacific)
local school district
Obtain a Simple Random
500 people from each of the 4 time
5 athletes from each of the 26
20 students from each of the 11
Sample
zones
PSU teams
elementary schools
Sample
4 500 = 2000 selected people
26 5 = 130 selected
11 20 = 220 selected students
athletes
Cluster Sampling is very different from Stratified Sampling. With cluster sampling one should
divide the population into groups (clusters).
obtain a simple random sample of so many clusters from all possible clusters.
obtain data on every sampling unit in each of the randomly selected clusters.
It is important to note that, unlike with the strata in stratified sampling, the clusters should be microcosms, rather than subsections, of
the population. Each cluster should be heterogeneous. Additionally, the statistical analysis used with cluster sampling is not only
different, but also more complicated than that used with stratified sampling.
Table 3.3. Examples of Cluster Samples
Example 1
Population
All people in U.S.
Example 2
All PSU intercollegiate athletes
Example 3
All elementary students in a local
school district
Groups (Clusters)
4 Time Zones in the U.S.
26 PSU intercollegiate teams
(Eastern,Central,
11 different elementary schools in

the local school district
Mountain,Pacific.)
Obtain a Simple Random
2 time zones from the 4 possible
8 teams from the 26 possible
4 elementary schools from the l1
Sample
time zones
teams
possible elementary schools
Sample
every person in the 2 selected
every athlete on the 8 selected
every student in the 4 selected
time zones
teams
elementary schools
Each of the three examples that are found in Tables 3.2 and 3.3 were used to illustrate how both stratified and cluster sampling could
be accomplished. However, there are obviously times when one sampling method is preferred over the other. The following
explanations add some clarification about when to use which method.
With Example 1: Stratified sampling would be preferred over cluster sampling, particularly if the questions of interest are
affected by time zone. For example the percentage of people watching a live sporting event on television might be highly affected by
the time zone they are in. Cluster sampling really works best when there are a reasonable number of clusters relative to the entire
population. In this case, selecting 2 clusters from 4 possible clusters really does not provide much advantage over simple random
sampling.

3
LARA, KIT B.
10:30-12:00 DAILY S508
With Example 2: Either stratified sampling or cluster sampling could be used. It would depend on what questions are being
asked. For instance, consider the question "Do you agree or disagree that you receive adequate attention from the team of doctors at
the Sports Medicine Clinic when injured?" The answer to this question would probably not be team dependent, so cluster sampling
would be fine. In contrast, if the question of interest is "Do you agree or disagree that weather affects your performance during an
athletic event?" The answer to this question would probably be influenced by whether or not the sport is played outside or inside.
Consequently, stratified sampling would be preferred.
With Example 3: Cluster sampling would probably be better than stratified sampling if each individual elementary school
appropriately represents the entire population as in aschool district where students from throughout the district can attend any school.
Stratified sampling could be used if the elementary schools had very different locations and served only their local neighborhood (i.e.,
one elementary school is located in a rural setting while another elementary school is located in an urban setting.) Again, the questions
of interest would affect which sampling method should be used.
The most common method of carrying out a poll today is using Random Digit Dialing in which a machine random dials phone
numbers. Some polls go even farther and have a machine conduct the interview itself rather than just dialing the number! Such "robo
call polls" can be very biased because they have extremely low response rates (most people don't like speaking to a machine) and
because federal law prevents such calls to cell phones. Since the people who have landline phone service tend to be older than people
who have cell phone service only, another potential source of bias is introduced. National polling organizations that use random digit
dialing in conducting interviewer based polls are very careful to match the number of landline versus cell phones to the population they
are trying to survey.
Non-probability Sampling
The following sampling methods that are listed in your text are types of non-probability sampling that should be avoided:
1.
2.
volunteer samples
haphazard (convenience) samples
Since such non-probability sampling methods are based on human choice rather than random selection, statistical theory cannot
explain how they might behave and potential sources of bias are rampant. In your textbook, the two types of non-probability samples
listed above are called "sampling disasters."
Read the article: "How Polls are Conducted" by the Gallup organization available in Canvas.
The article provides great insight into how major polls are conducted. When you are finished reading this article you may want to go to
the Gallup Poll Web site, http://www.gallup.com, and see the results from recent Gallup polls. Another excellent source of public
opinion polls on a wide variety of topics using solid sampling methodology is the Pew Reserach Center website
at http://www.pewresearch.org When you read one of the summary reports on the Pew site, there is a link (in the upper right corner) to
the complete report giving more detailed results and a full description of their methodology as well as a link to the actual questionnaire
used in the survey so you can judge whether their might be bias in the wording of their survey.
It is important to be mindful of margin or error as discussed in this article. We all need to remember that public opinion on a given topic
cannot be appropriately measured with one question that is only asked on one poll. Such results only provide a snapshot at that
moment under certain conditions. The concept of repeating procedures over different conditions and times leads to more valuable and
durable results. Within this section of the Gallup article, there is also an error: "in 95 out of those 100 polls, his rating would be between
46% and 54%." This should instead say that in an expected 95 out of those 100 polls, the true population percent would be within the
confidence interval calculated. In 5 of those surveys, the confidence interval would not contain the population percent.
Sample?
Sometimes "measuring" or "testing" something destroys it. The government requires automakers who want to sell cars in the
U.S. to demonstrate that their cars can survive certain crash tests. Obviously, the company can't be expected to crash every car,
to see if it survives! So the company crashes only a sample of cars.
Another reason for sampling is that not all units in the population can be identified, such as all the air molecules in the LA
basin. So to measure air pollution, you take a sample of air molecules. Also, even if all those air molecules could be identified, it
would be too expensive and too time consuming to measure them all.
Types of Samples:

4
LARA, KIT B.
10:30-12:00 DAILY S508
Non-probability (non-random) samples:

These samples focus on volunteers, easily available units, or those that just happen to be present when the research is done.
Non-probability samples are useful for quick and cheap studies, for case studies, for qualitative research, for pilot studies, and
for developing hypotheses for future research.
Convenience sample: also called an "accidental" sample or "man-in-the-street" samples. The researcher selects units that are
convenient, close at hand, easy to reach, etc.
Purposive sample: the researcher selects the units with some purpose in mind, for example, students who live in dorms on
campus, or experts on urban development.
Quota sample: the researcher constructs quotas for different types of units. For example, to interview a fixed number of
shoppers at a mall, half of whom are male and half of whom are female.
Other samples that are usually constructed with non-probability methods include library research, participant observation,
marketing research, consulting with experts, and comparing organizations, nations, or governments.
Probability-based (random) samples:
These samples are based on probability theory. Every unit of the population of interest must be identified, and all units must
have a known, non-zero chance of being selected into the sample.
Simple random sample: Each unit in the population is identified, and each unit has an equal chance of being in the sample.
The selection of each unit is independent of the selection of every other unit. Selection of one unit does not affect the chances of
any other unit.
For example, to select a sample of 25 people who live in your college dorm, make a list of all the 250 people who live in the
dorm. Assign each person a unique number, between 1 and 250. Then refer to a table of random numbers. Starting at any point
in the table, read across or down and note every number that falls between 1 and 250. Use the numbers you have found to pull
the names from the list that correspond to the 25 numbers you found. These 25 people are your sample. This is called the table
of random numbers method.
Another way to select this simple random sample is to take 250 ping-pong balls and number then from 1 to 250. Put them into
a large barrel and mix them up, and then grab 25 balls. Read off the numbers. Those are the 25 people in your sample. This is
called the lottery method.
Systematic random sampling: Each unit in the population is identified, and each unit has an equal chance of being in the
sample.
For example, to select a sample of 25 dorm rooms in your college dorm, make a list of all the room numbers in the dorm. Say
there are 100 rooms. Divide the total number of rooms (100) by the number of rooms you want in the sample (25). The answer is
4. This means that you are going to select every fourth dorm room from the list. But you must first consult a table of random
numbers. Pick any point on the table, and read across or down until you come to a number between 1 and 4. This is your
random starting point. Say your random starting point is "3". This means you select dorm room 3 as your first room, and then
every fourth room down the list (3, 7, 11, 15, 19, etc.) until you have 25 rooms selected.
This method is useful for selecting large samples, say 100 or more. It is less cumbersome than a simple random sample using
either a table of random numbers or a lottery method. For example, you might have to sample files in a large filing cabinet. It is
easier to select every 17th file than to pull out all the files and number them, etc.
However, you must be aware of problems that can arise in systematic random sampling. If the selection interval matches
some pattern in the list (e.g., each 4th dorm room is a single unit, where all the others are doubles) you will introduce systematic
bias into your sample.
Stratified random sampling: Each unit in the population is identified, and each unit has a known, non-zero chance of being in
the sample. This is used when the researcher knows that the population has sub-groups (strata) that are of interest.

5
LARA, KIT B.
10:30-12:00 DAILY S508
For example, if you wanted to find out the attitudes of students on your campus about immigration, you may want to be sure
to sample students who are from every region of the country as well as foreign students. Say your student body of 10,000
students is made up of 8,000 - West; 1,000 - East; 500 - Midwest; 300 - South; 200 - Foreign.
If you select a simple random sample of 500 students, you might not get any from the Midwest, South, or Foreign. To make
sure that you get some students from each group, you can divide the students into these five groups, and then select the same
percentage of students from each group using a simple random sampling method. This is proportional stratified random
sampling.
However, you may still have too few of some types of students. Instead, you may divide students into the five groups and then
select the same number of students from each group using a simple random sampling method. This is disproportionate stratified
random sampling. This allows you to have enough students in each sub-group so that you can perform some meaningful
statistical analyses of the attitudes of students in each sub-group. In order to say something about the attitudes of the total
student population of the university, however, you will have to apply weights to the findings for each sub-group, proportional to its
presence in the total student body.
Cluster sampling: cluster sampling views the units in a population as not only being members of the total population but as
members also of naturally-occurring in clusters within the population. For example, city residents are also residents of
neighborhoods, blocks, and housing structures.
Cluster sampling is used in large geographic samples where no list is available of all the units in the population but the
population boundaries can be well-defined. For example, to obtain information about the drug habits of all high school students
in a state, you could obtain a list of all the school districts in the state and select a simple random sample of school districts.
Then, within in each selected school district, list all the high schools and select a simple random sample of high schools. Within
each selected high school, list all high school classes, and select a simple random sample of classes. Then use the high school
students in those classes as your sample.
Cluster sampling must use a random sampling method at each stage. This may result in a somewhat larger sample than
using a simple random sampling method, but it saves time and money. It is also cheaper to administer than a statewide sample
of high school seniors, because there are many fewer sites to obtain information from.
The differences between Probability (Random) Sampling and Non-Probability (Non-Random) Sampling are summarized below.
Probability (Random) Sampling
Non-Probability (Non-Random) Sampling
Allows use of statistics, tests hypotheses
Exploratory research, generates hypotheses
Can estimate population parameters
Population parameters are not of interest
Eliminates bias
Adequacy of the sample can't be known
Must have random selection of units
Cheaper, easier, quicker to carry out

6
LARA, KIT B.
10:30-12:00 DAILY S508

Basic Statistics Lecture Notes

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basic Statistics Lecture Notes

Uploaded by

Copyright:

Available Formats

STATISTICS

- the practice or science of collecting and analyzing

a city, we are talking about the number of people in the city

Types of Data & Measurement Scales:

- Nominal scales are used for labeling variables, without

data, inferential statistics comprises those methods

anyquantitative value. Nominal scales could simply be

concerned with the analysis of a subset of data leading to

called labels. Here are some examples, below. Notice

predictions or inferences about the entire set of data. It

that all of these scales are mutually exclusive (no overlap)

involves all the techniques by which decisions about a

and none of them have any numerical significance. A good

statistical population are made based only on a sample

way to remember all of this is that nominal sounds a lot

having been observed or a judgment having been obtained.

like name and nominal scales are kind of like names or

It is concerned more with generalizing information or

making inference about the population. Considered as the

5021 MATH 7 BASIC STATISTICS

-Ordinal scales are typically measures of non-numeric

10:30-12:00 DAILY S508

Sampling Methods can be classified into one of two categories:

Probability Sampling: Sample has a known

Non-probability Sampling: Sample does not have

-Interval scales are numeric scales in which we know not

voluntary response surveys

values. The classic example of an interval scale

In probability sampling it is possible to both determine which

is Celsius temperature because the difference between

sampling units belong to which sample and the probability that

each value is the same. For example, the difference

each sample will be selected. The following sampling

between 60 and 50 degrees is a measurable 10 degrees,

methods are examples of probability sampling:

as is the difference between 80 and 70 degrees. Time is

Simple Random Sampling (SRS)

another good example of an interval scale in which

the increments are known, consistent, and measurable.

Multistage Sampling (in which some of the

sampling and cluster sampling.

Stratified Sampling is possible when it makes sense to

Ratio scales are the ultimate nirvana when it comes to

partition the population into groups based on a factor that may

measurement scales because they tell us about the order,

influence the variable that is being measured. These groups

are then called strata. An individual group is called a stratum.

have an absolute zerowhich allows for a wide range of

With stratified samplingone should:

both descriptive and inferential statistics to be applied. At

partition the population into groups (strata)

the risk of repeating myself, everything above about interval

obtain a simple random sample from each group

data applies to ratio scales + ratio scales have a clear

collect data on each sampling unit that was randomly

height and weight.

Stratified sampling works best when a heterogeneous

5021 MATH 7 BASIC STATISTICS

10:30-12:00 DAILY S508

Table 3.2. Examples of Stratified Samples

All people in U.S.

4 Time Zones in the U.S.