You are on page 1of 31

Introduction to Basic

Statistics
Siti Noormi Alias
Ph.D Candidate
Department of Professional Development and Continuing Education
Faculty of Educational Studies
Universiti Putra Malaysia

Scales of Measurement
1. Nominal
The lowest scale
Numbers assigned to identify attributes
No order/sequence
2. Ordinal
Numbers assigned in ranking order
Arrange from lowest to highest or vice versa
3. Interval
Arbitrary zero (no absolute zero)
Zero does not represent absence of the characteristic
4. Ratio
The highest scale
True zero (represents absence of the characteristic)
Types of Statistics
Depends on:
1. Purpose
Descriptive vs. Inferential
2. Assumption of normality
Parametric vs. nonparametric
3. Number of variables
Univariate vs. Bivariate vs. Multivariate
Normality

Assessing
normality for
univariate
statistics
Graphically
Descriptive
Statistical Test
Skewness
Kolmogorav
smirnav (Large
sample size, n >
50)
Shapiro wilks
(Small sample
size, n < 50)
Kurtosis
Inter Quartile
Range
PP-Plot
QQ-Plot
Boxplots
Sig-p
normal
1/2 or 0
between 95%
confidence
Interval (CI)
Limitation:
Uncertain/need
support from
other sources
Skewness and Kurtosis

Boxplots

Summary Table of Statistical Tests
Level of
Measurement

Sample Characteristics

Correlation

1
Sample

2 Sample

K Sample (i.e., >2)

Independent

Dependent

Independent

Dependent

Categorical
or Nominal

2
or
bi-
nomial

2


Macnarmar
s
2

2


Cochrans Q



Rank or
Ordinal



Mann
Whitney U

Wilcoxin
Matched
Pairs Signed
Ranks

Kruskal Wallis
H

Friendmans
ANOVA

Spearmans
rho

Parametric
(Interval &
Ratio)

z test
or t test

Independent
sample t-test

Paired-
sample t-test

1 way ANOVA
between
groups

1 way
ANOVA
(within or
repeated
measure)

Pearsons r

Factorial (2 way) ANOVA





(Plonskey, 2001)
Introduction to
T-Test
Types of t-test
1. One-sample t-test
2. Paired or dependent sample t-test
3. Independent sample t-test
Requirement
DV Interval or ratio
IV Nominal or ordinal (k = 2)
Assumptions: Randomization, Normal Distribution

Paired Sample t Test
The paired samples t-test is a kind of research called repeated
measures test (aka, within-subjects design), commonly used in
before-after-designs.
Two observations from each participant
The second observation is dependent upon the first since they come
from the same person.
Comparing a mean of difference scores to a distribution of means of
difference scores


Independent t Test
Compares the difference between two means of
two independent groups.
Single observation from each participant from two
independent groups
The observation from the second group is
independent from the first since they come from
different subjects.
Comparing a the difference between two means
to a distribution of differences between mean
scores .


Introduction to Analysis of
Variation (ANOVA)
To compare differences between group means
ANOVA allows for 3 or more groups

Assumptions of ANOVA
each group is approximately normal
check this by looking at histograms and/or normal quartile plots, or use
assumptions
can handle some non-normality, but not severe outliers
standard deviations of each group are approximately equal
rule of thumb: ratio of largest to smallest sample st. dev. must be less than
2:1
ANOVA Test Hypotheses
H
0
:
1
=
2
=
3
(All of the means are equal)
H
A
: Not all of the means are equal
For Our Example:
H
0
:
Mid-size
=
SUV
=
Pickup

The mean mileages of Mid-size vehicles, Sports Utility Vehicles, and Pickup
trucks are all equal.
H
A
: Not all of the mean mileages of Mid-size vehicles, Sports Utility Vehicles,
and Pickup trucks are equal.
F Statistic
Like any other test, the ANOVA test has its own test statistic
The statistic for ANOVA is called the F statistic, which we get from the
F Test
The F statistic takes into consideration:
number of samples taken
sample size of each sample
means of the samples
standard deviations of each sample
Introduction to Pearson
Product-Moment Correlation
Purpose to determine relationship between two metric variables
Requirement:
DV Interval/Ratio
IV Interval/Ratio
Assumptions:
Randomization
Normally distributed


Definition
A correlation is a statistical method used to measure and describe the
relationship between two variables.
A relationship exists when changes in one variable tend to be
accompanied by consistent and predictable changes in the other
variable.

Positive Linear Correlation
x x
y
y y
x
Figure 9-2 Scatter Plots
(a) Positive
(b) Strong
positive
(c) Perfect
positive
Negative Linear Correlation
x x
y
y y
x
(d) Negative
(e) Strong
negative
(f) Perfect
negative
Figure 9-2 Scatter Plots
No Linear Correlation
x
x
y
y
(g) No Correlation (h) Nonlinear Correlation
Figure 9-2 Scatter Plots

Correlation Coefficient r
A correlation typically evaluates three aspects of the
relationship:
the direction
the form
the degree

24
Correlations: Measuring and Describing
Relationships (cont.)
The direction of the relationship is measured by the
sign of the correlation (+ or -).
A positive correlation means that the two variables
tend to change in the same direction; as one
increases, the other also tends to increase.
A negative correlation means that the two variables
tend to change in opposite directions; as one
increases, the other tends to decrease.
25
Correlations: Measuring and Describing
Relationships (cont.)
The most common form of relationship is a straight
line or linear relationship which is measured by the
Pearson correlation.
26
Correlations: Measuring and Describing
Relationships (cont.)
The degree of relationship (the strength or
consistency of the relationship) is measured by the
numerical value of the correlation. A value of 1.00
indicates a perfect relationship and a value of zero
indicates no relationship.
Properties of the
Linear Correlation Coefficient r
1. -1 r 1
2. Value of r does not change if all values of
either variable are converted to a different
scale.
3. The value of r is not affected by the choice of x
and y. Interchange x and y and the value of r will not
change.
4. r measures strength of a linear relationship.
Common Errors Involving Correlation
1. Causation: It is incorrect to conclude that
correlation implies causality.
2. Averages: Averages suppress individual
variation and may inflate the correlation
coefficient.
3. Linearity: There may be some relationship
between x and y even when there is no
significant linear correlation.
Formal Hypothesis Test
To determine whether there is a significant linear
correlation between two variables
Two methods
Both methods let H
0
: =
(no significant linear correlation)
H
1
:
(significant linear correlation)
Guildford Rule of Thumb
Correlation coefficient, r Strength of relationship
< .20 Negligible relationship
.20 - .40 Low relationship
.41 - .70 Moderate relationship
.71 - .90 High relationship
> .90 Very high relationship

You might also like