You are on page 1of 7

B

i
o

1
8
0

C
o
u
r
s
e

N
o
t
e
s

1

MODULE 1: DESCRIPTIVE STATISTICS
CHAPTER 1. INTRODUCTION

Why Study Statistics?
1. Data or numerical information are everywhere. These data could be in the form of
contamination levels in water samples, survival rates of patients undergoing medical
therapy, census figures, or information that helps determine which brand of milk to
purchase.

2. Statistical techniques are used to make decisions that affect our daily lives.
Environmental Protection Agency is interested in the water quality. Water
samples are periodically taken to establish the level of contamination and
maintain the level of quality.

The Food and Drug Administration has placed stringent requirements on
pharmaceutical firms to establish the effectiveness of proposed new drug
products. Thus, statistics has played an important role in the development and
testing of Salk vaccine for polio, chemotherapeutic agents in the treatment of
cancer and many other preparations.

3. No matter what your future line of work, you make decisions that involve analysis of
data. In order to make informed decision, you need the following:
1. determine whether the existing info is adequate or additional info is required.
2. gather additional info if it is needed, to avoid misleading results
3. summarize the info in a useful and informative manner
4. analyze the available info;
5. draw conclusions and make inferences while assessing the risk of incorrect
conclusion.

Statistics provide tools in designing experiments or surveys and in analyzing data.
Statistics can help answer the questions like
1. What options to choose?
2. How do we make a choice?
3. Why choose such option?

Meaning of Statistics
For the purposes of this course, we will use the word statistics in two ways--in the
singular sense and in the plural sense.
In the singular sense, statistics is a science of collecting, organizing, presenting,
analyzing, and interpreting data to assist in making more effective decisions.

In the plural sense, statistics are numerical information by which we enhance the
understanding of data.
Example: average starting salary of college graduates
the average rice yields of the three treatments


Two Major Branches of Statistics
1. Descriptive statistics techniques used to describe a mass of data in a clear,
concise and informative way. It deals with the methods or organizing, summarizing,
and presenting data. It also make apparent the relationships between variables





B
i
o

1
8
0

C
o
u
r
s
e

N
o
t
e
s

2

Examples:
a. What is the relationship between fish
migrations in a river at certain times
of the year, and the water level in
that river at the same time? Simple
dual-axis histogram of fish catches
and water levels in the river




b. What is the relationship
between fish food regime, fish
growth, maximal length and
meat quality? Factorial map
based on a multivariate
analysis; it highlight the
relationship (= correlation)
between herbivory and high
growth rate, and the high meat
quality of carnivore fishes.



2. Inferential statistics techniques that use sample data to make general statements
about a population. Generally, they consist in quantifying a dependent variable as a
function of driving variables

Example:
Does the survival rate S of fish in an aquaculture pond depend upon (= is a function
of) stocking density D?
Protocol: 40 fish ponds with different stocking densities are monitored and the
survival rate of fish in each pond is recorded. The relationship between S and D is
calculated by a linear regression: Survival rate: = f(stocking density) + error
Once this regression is calculated, it allows predicting (- inferring) the survival rate of
fish given the stocking density in any new pond.


Populations and Samples
Population includes all members of some defined group; the entire set of individuals
or objects of interest. The measurements obtained from all individuals or objects of interest.
The measure or characteristic of the population is called a parameter
A sample, on the other hand, defines a portion or part of the population. In many
research situations, it is not feasible to involve or measure all members of the population. So
a sample is selected, and only members of the sample are included in the research study.
The measure or characteristic of the sample is called the statistic.

Variables
In any study, we always focus on a particular group of subjects. These subjects can
be individuals, plants, animals, and other entities. Data are then taken on some
characteristic of the subjects, for example, a group of individuals. If the characteristic can
take on different values for different individuals, then that characteristic is referred to as a
variable. For example, a group of Bio 180 students in UP Cebu will be found to differ in sex,
height, weight, BMI, and in many other ways. These characteristics are thus called variables
as far as this group is concerned. On the other hand, if a characteristic is the same for every

B
i
o

1
8
0

C
o
u
r
s
e

N
o
t
e
s

3

member of the group, this characteristic is referred to as a constant as far as the group is
concerned. If all the students in this example are single and sophomores, then the
characteristics civil status and year level are constants.

Types of Variables

















There are two basic types of variables
1. Qualitative variable have observations or values that represent attributes or
qualities with no inherent meaning as numbers
2. Quantitative variable take numeric values that are counts or measures of an
quantity

Classification of Quantitative Variables
Discreet variables have a finite number of values between any two values or can
take only designated values and there are usually gaps between values
Usually associated with counting

Continuous variable can take an infinite number of values; can assume any value
within a specific range
usually associated with measurements

Example: Identify each of the following as examples of qualitative or quantitative variables.
1. The temperature in Cebu at 2:00 pm on any given day
2. Whether or not a mobile phone battery is defective
3. The weight of a lead pencil
4. The length of time billed for a long distance telephone call
5. The brand of energy drink
6. The type of book taken out of the library

Levels of Measurement
Data collection requires that we make measurements of our observations. In this
process, it is necessary to give some attention to different levels of measurement.
Measurement in the broadest sense is a process of assigning numbers to characteristics
according to a defined rule. The level of measurement of data dictates the calculation that
can be done to summarize and present the data. It also determines the appropriate
statistical methods that should be used to analyze the data of a particular research study.
Measurement scales are differentiated according to the degree of "precision" in the
measurement. (The more precise or sensitive the method of measurement the better.) If we
say an individual is tall, that is not as precise as saying the individual is five feet, nine inches.
Variable
Qualitative Quantitative
Discrete
Continuous
Vegetation in the forest
Taxon of an animal
Socio-economic status
Religion

number of study site

number of casualties

Size
Ages of the tree
Oxygen rate = 4.6 mg.l
-1



B
i
o

1
8
0

C
o
u
r
s
e

N
o
t
e
s

4



There are four levels of measurement: nominal, ordinal, interval, and ratio. These are
arranged in hierarchical order such that the next higher level carries out properties of the
lower level with some additional properties.

1. Nominal level. This is the lowest level of measurement. Nominal variables take
values that give names or labels to various categories with no logical ordering.
Information that can be obtained from processing data on these variables is limited to
frequency counts and percentages. Some examples of nominal variables are gender
(male or female), genotype (AA, Aa or aa), in the taxa Pinus or in the taxa Abies,
etc.), and place of origin. Often, these are called categorical data because you
categorize the data elements according to what category it is in.

2. Ordinal level. Variables measured in the ordinal scale are basically nominal with the
categories having an inherent ordering. However, the difference between categories
cannot be measured and has no meaning. Information that can be obtained from
processing data on these variables is limited to frequency counts with additional
insight on the rank or order of the categories specified. Examples would be rankings
based on size of animals, how fast a viral disease spreads, how deep of an orange
color a shirt is, etc.

3. Interval level. It includes all the characteristics of the ordinal level, but in addition, the
difference between two consecutive values is constant. Thus, intervals between
categories can be quantified and have meaning. However, it does not have a true
starting or zero point. Thus, ratios are not meaningful and having a value of zero
does not necessarily mean absence of the attribute being measured. Some
examples of interval variables are temperature measured in Celcius or Farenheit - it
makes no sense to say that 40 degrees is twice as hot as 20 degrees, intelligence
quotient.

4. Ratio level. This is the highest level of measurement. It has all the properties of the
interval level variable in addition to having a true zero point which reflects an
absence of the characteristic measured. With this additional property, statements can
be made relative not only to the equality of differences between any two points on the
scale, but also to the proportional amounts of the characteristic two different objects
possess. For example, the difference between 45 and 50 kilos is the same as the
difference between 90 and 95 kilos. Additionally, however, a basket of mangoes
weighing 4 kilos weighs twice as much as one weighing 2 kilos. Other examples of
ratio variables are height, width, area, length, and price.


Consider the following table as a summary.

B
i
o

1
8
0

C
o
u
r
s
e

N
o
t
e
s

5




Summation Notation

The summation notation is used to express the sum of numbers in its simplest
form. It is used to express the relationships among variables and write it in a more concise
form. The expression X means "add all the scores for variable X." Formally, if there are N
observations on X represented by X
1
, X
2
, ..., X
N
, we express their sum as


N 2
N
1 i
1 i
X ... X X X



where: i is the index of the summation 1 is the lower limit of the summation
X
i
represents ith addend N is the upper limit of the summation





Properties of Summation Notation:

1. The sum of N terms of a constant c is equal to N times the constant, that is,

where c is a constant


2. The sum of N terms of a constant c multiplied by a variable X
=

=1


B
i
o

1
8
0

C
o
u
r
s
e

N
o
t
e
s

6

is equal to the constant multiplied by the sum of the variable, that is,

where c is a constant



Example: (c = 2)
I X 2X
1
2
3
1
3
5
2
6
10
X = 9 2X = 18

3. The sum of sums or differences of two variables X and Y is equal to the sum or
difference of the sums of each variable, that is,





Example:
i X Y X + Y
1
2
3
2
3
4
4
1
3
6
4
7
X = 9 Y = 8 (X+Y) = 17

4. If a and b are constants, then





Example: (a = 5, b = 10)
i X Y 5X 10Y 5X + 10Y
1
2
3
2
3
4
4
1
3
10
15
20
40
10
30
50
25
50
X = 9 Y = 8 (5X+10Y) = 125

(5X + 10Y) = 5X + 10Y
125 = 5(9) + 10(8)










A variable is a parameter that varies if measured several times
Examples: temperature; number of fish species caught; number of people; etc.

=1

=1

=1

=1

=1

=1

=1

=1


Note: 2X = 2X
18 = 2(9)
18 = 18
Note: (X + Y) = X + Y
17 = 9 + 8
17 = 17


B
i
o

1
8
0

C
o
u
r
s
e

N
o
t
e
s

7


Variables can be:
continuous (expressed in real numbers or in decimals)
discontinuous (expressed as integers)
= discrete = in classes
quantitative
e.g. number of children per family = 0 / 1 / 2 / 3 / etc
semi-quantitative (expressed in ordered classes)
= ordinal = semi-qualitative
e.g. water current = slow / medium / strong
qualitative (expressed in words)
= nominal
e.g. patient = smoker/non-smoker or man/woman

Repetitions are repeated measures of the same variable
Examples: several dates of sampling; several sites of sampling; several fish sessions
analyzed; several individuals measured. With one measure only, one could
not see any variation in a variable.

You might also like