Professional Documents
Culture Documents
Nominal
Inferential Statistics.
-If descriptive statistics is concerned only on presentation of
labels.
LARA, KIT B.
2.
Ordinal
3.
Interval
only the order, but also the exact differences between the
Probability Sampling
1.
2.
Stratified Sampling
3.
Cluster Sampling
4.
Systematic Sampling
5.
4.
Ratio
they tell us the exact value between units, AND they also
scales.
LARA, KIT B.
Example 2
Groups (Strata)
Example 3
athletes
school district
(Eastern,Central, Mountain,Pacific)
Sample
zones
PSU teams
elementary schools
Sample
26 5 = 130 selected
athletes
Cluster Sampling is very different from Stratified Sampling. With cluster sampling one should
obtain a simple random sample of so many clusters from all possible clusters.
obtain data on every sampling unit in each of the randomly selected clusters.
It is important to note that, unlike with the strata in stratified sampling, the clusters should be microcosms, rather than subsections, of
the population. Each cluster should be heterogeneous. Additionally, the statistical analysis used with cluster sampling is not only
different, but also more complicated than that used with stratified sampling.
Table 3.3. Examples of Cluster Samples
Example 1
Population
Example 2
All PSU intercollegiate athletes
Example 3
All elementary students in a local
school district
Groups (Clusters)
(Eastern,Central,
Mountain,Pacific.)
Obtain a Simple Random
Sample
time zones
teams
Sample
time zones
teams
elementary schools
Each of the three examples that are found in Tables 3.2 and 3.3 were used to illustrate how both stratified and cluster sampling could
be accomplished. However, there are obviously times when one sampling method is preferred over the other. The following
explanations add some clarification about when to use which method.
With Example 1: Stratified sampling would be preferred over cluster sampling, particularly if the questions of interest are
affected by time zone. For example the percentage of people watching a live sporting event on television might be highly affected by
the time zone they are in. Cluster sampling really works best when there are a reasonable number of clusters relative to the entire
population. In this case, selecting 2 clusters from 4 possible clusters really does not provide much advantage over simple random
sampling.
LARA, KIT B.
With Example 2: Either stratified sampling or cluster sampling could be used. It would depend on what questions are being
asked. For instance, consider the question "Do you agree or disagree that you receive adequate attention from the team of doctors at
the Sports Medicine Clinic when injured?" The answer to this question would probably not be team dependent, so cluster sampling
would be fine. In contrast, if the question of interest is "Do you agree or disagree that weather affects your performance during an
athletic event?" The answer to this question would probably be influenced by whether or not the sport is played outside or inside.
Consequently, stratified sampling would be preferred.
With Example 3: Cluster sampling would probably be better than stratified sampling if each individual elementary school
appropriately represents the entire population as in aschool district where students from throughout the district can attend any school.
Stratified sampling could be used if the elementary schools had very different locations and served only their local neighborhood (i.e.,
one elementary school is located in a rural setting while another elementary school is located in an urban setting.) Again, the questions
of interest would affect which sampling method should be used.
The most common method of carrying out a poll today is using Random Digit Dialing in which a machine random dials phone
numbers. Some polls go even farther and have a machine conduct the interview itself rather than just dialing the number! Such "robo
call polls" can be very biased because they have extremely low response rates (most people don't like speaking to a machine) and
because federal law prevents such calls to cell phones. Since the people who have landline phone service tend to be older than people
who have cell phone service only, another potential source of bias is introduced. National polling organizations that use random digit
dialing in conducting interviewer based polls are very careful to match the number of landline versus cell phones to the population they
are trying to survey.
Non-probability Sampling
The following sampling methods that are listed in your text are types of non-probability sampling that should be avoided:
1.
2.
volunteer samples
haphazard (convenience) samples
Since such non-probability sampling methods are based on human choice rather than random selection, statistical theory cannot
explain how they might behave and potential sources of bias are rampant. In your textbook, the two types of non-probability samples
listed above are called "sampling disasters."
Read the article: "How Polls are Conducted" by the Gallup organization available in Canvas.
The article provides great insight into how major polls are conducted. When you are finished reading this article you may want to go to
the Gallup Poll Web site, http://www.gallup.com, and see the results from recent Gallup polls. Another excellent source of public
opinion polls on a wide variety of topics using solid sampling methodology is the Pew Reserach Center website
at http://www.pewresearch.org When you read one of the summary reports on the Pew site, there is a link (in the upper right corner) to
the complete report giving more detailed results and a full description of their methodology as well as a link to the actual questionnaire
used in the survey so you can judge whether their might be bias in the wording of their survey.
It is important to be mindful of margin or error as discussed in this article. We all need to remember that public opinion on a given topic
cannot be appropriately measured with one question that is only asked on one poll. Such results only provide a snapshot at that
moment under certain conditions. The concept of repeating procedures over different conditions and times leads to more valuable and
durable results. Within this section of the Gallup article, there is also an error: "in 95 out of those 100 polls, his rating would be between
46% and 54%." This should instead say that in an expected 95 out of those 100 polls, the true population percent would be within the
confidence interval calculated. In 5 of those surveys, the confidence interval would not contain the population percent.
Sample?
Sometimes "measuring" or "testing" something destroys it. The government requires automakers who want to sell cars in the
U.S. to demonstrate that their cars can survive certain crash tests. Obviously, the company can't be expected to crash every car,
to see if it survives! So the company crashes only a sample of cars.
Another reason for sampling is that not all units in the population can be identified, such as all the air molecules in the LA
basin. So to measure air pollution, you take a sample of air molecules. Also, even if all those air molecules could be identified, it
would be too expensive and too time consuming to measure them all.
Types of Samples:
LARA, KIT B.
LARA, KIT B.
For example, if you wanted to find out the attitudes of students on your campus about immigration, you may want to be sure
to sample students who are from every region of the country as well as foreign students. Say your student body of 10,000
students is made up of 8,000 - West; 1,000 - East; 500 - Midwest; 300 - South; 200 - Foreign.
If you select a simple random sample of 500 students, you might not get any from the Midwest, South, or Foreign. To make
sure that you get some students from each group, you can divide the students into these five groups, and then select the same
percentage of students from each group using a simple random sampling method. This is proportional stratified random
sampling.
However, you may still have too few of some types of students. Instead, you may divide students into the five groups and then
select the same number of students from each group using a simple random sampling method. This is disproportionate stratified
random sampling. This allows you to have enough students in each sub-group so that you can perform some meaningful
statistical analyses of the attitudes of students in each sub-group. In order to say something about the attitudes of the total
student population of the university, however, you will have to apply weights to the findings for each sub-group, proportional to its
presence in the total student body.
Cluster sampling: cluster sampling views the units in a population as not only being members of the total population but as
members also of naturally-occurring in clusters within the population. For example, city residents are also residents of
neighborhoods, blocks, and housing structures.
Cluster sampling is used in large geographic samples where no list is available of all the units in the population but the
population boundaries can be well-defined. For example, to obtain information about the drug habits of all high school students
in a state, you could obtain a list of all the school districts in the state and select a simple random sample of school districts.
Then, within in each selected school district, list all the high schools and select a simple random sample of high schools. Within
each selected high school, list all high school classes, and select a simple random sample of classes. Then use the high school
students in those classes as your sample.
Cluster sampling must use a random sampling method at each stage. This may result in a somewhat larger sample than
using a simple random sampling method, but it saves time and money. It is also cheaper to administer than a statewide sample
of high school seniors, because there are many fewer sites to obtain information from.
The differences between Probability (Random) Sampling and Non-Probability (Non-Random) Sampling are summarized below.
Probability (Random) Sampling
Eliminates bias
LARA, KIT B.