Lecture13 Statistics

ME 327: Design and Control of Haptic Systems
Autumn 2015
Lecture 13:
Statistics for user studies
Allison M. Okamura
Stanford University
experiment design
Stanford University ME 327: Design and Control of Haptic Systems Allison M. Okamura, 2015
Main design types
Within Subjects: Each subject does each experimental
condition, in pseudo-random order. Because individuals vary
significantly on their haptic capabilities, most haptic human
factors experiments are done this way. Include a practice
session to minimize effects of order.
Between Subjects: Different subjects experience different
conditions. This is often useful when novelty or experience are
highly important, prohibiting subjects from doing the
experiment more than once, or when each subjects
participation time is limited.
Mixed: a combination of the above.
Main design types
After independent variables are selected, you need to select

your dependent variables, the quantities you will measure.
Look at both mean values and variations across trials and
across subjects. Use statistics to determine the probability that
observed trends were due to chance. We often say that a
finding is significant if we find p < 0.05.
Objective vs. Subjective: both are often used in haptics.
Design steps & considerations
1. Define hypothesis and objectives
2. Define factors to be studied
3. Define dependent variable
4. Define population and sampling method
Randomization
Non-representative population
Balanced vs. Unbalanced samples
5. Define appropriate approach for data analysis
statistical analysis
statistics topics
basic statistics
ANOVA
post-hoc tests
case study
t-test
the t-test assesses whether the means of two
groups are statistically different from each other
variance
variance is important!
variance = 2
standard deviation = =
difference between means
it is easier to tell two groups apart

when there is low variability
t-test
numerator is the difference between the means

denominator is a measure of variability
computing the t-value
numerator: difference between the means
denominator: standard error of the difference
final formula for the t-test:
interpreting the results
the t-value will be positive or negative depending on which
mean is larger
look up the t-value in a table of significance (if you dont
have a computer program that does this!)
set a risk level (called alpha level or p-value, the rule of
thumb is = 5% or p-value = 0.05)
need to know the degrees of freedom (e.g., sum of the
persons in both groups minus 2)
for significant difference, the absolute value of the
calculated t-value must be greater than the one found
from the table
meaning of significance level ()
Significance level
Group 1 Group 2

A 5% significance means that there is a risk that

5% of the Group 1 distribution may belong to the
Group 2 distribution
ANOVA (analysis of variance)
also known as F-test

allows comparisons of the means of two or more
groups (unlike only two in t-test)
tells only that there is a significant difference within
the groups but not which groups are significant
different from one another
needs post-hoc test for comparison between groups
ANOVA assumptions
Normal distributions and homogeneity of variance.
Therefore, in a one-factor ANOVA, it is assumed
that each of the populations is normally distributed
with the same variance (2).
In between-subjects analyses, it is assumed that
each score is sampled randomly and independently.
Research has shown that ANOVA is "robust" to
some violations of its assumptions.
ANOVA tends to be conservative when its
assumptions are violated.
ANOVA: Terminology
response variable (dependent variable): the primary variable of
interest measured in the experiment
factor (independent variable, predictor variable): variable that has
an effect on the measurement of the response variable
factor levels (treatment level): the particular values that a factor
can have
two types of factors:
fixed-effect: factor which its levels included in the study are the only levels of
interest or maybe the only possible levels (ex. gender, marital status)
random-effect: factor which its levels included in the study are not the only
ones we are interested in making inferences about (ex. samples of
merchandise, users)
error types
Significance level
Group 1 Group 2

Type I error ( error): Error of finding a significance by chance

when there is no significance in the data
Type II error ( error): Error of rejecting a significance when in
fact there is a significance in the data
what are we testing?
null Hypothesis
similar to the t-value in t-test, in ANOVA we calculate

an F-value
the null hypothesis is rejected if the F-value is above
the critical F-value at a chosen level of significance (p)
=> at least one mean is significantly different.
we normally choose p = 0.05
what is the F-value?
Similar to t-value, F-value measures the signal-

to-noise ratio in terms of variance
if the null hypothesis is true
(no difference in the means), then F = 1

for significant difference, F > 1
F is always a positive number
statistics software
S-Plus, SPSS
Matlab (statistics toolbox)
Excel (Additional installation may be required)
case study 1:
Augmentation of Stiffness
Perception using Skin
Stretch Feedback
Zhan Fan Quek, Samuel Schorr, Ilana Nisky,
Allison Okamura (Stanford),
and William Provancher (Utah)
Z.F.$Quek,$S.$B.$Schorr,$I.$Nisky,$A.$M.$Okamura$and$W.$R.$Provancher.$Sensory$Augmentation$of$
Stiffness$using$Fingerpad$Skin$Stretch.$In$IEEE#World#Haptics#Conference,$pages$467G472,$2013.
Motivation
Interaction with
objects of different
stiffness using a stylus
results in different
amount of fingerpad
skin stretch.
Can we increase the

perception of stiffness
of an object by
rendering additional
skin stretch cues?
Experiment 1
Hypothesis
Rendering skin stretch in conjunction with force
feedback can increase the perception of rendered
stiffness
Procedure
Method of Constant Stimuli
1-DoF Skin Stretch Device
Experiment Procedure
$
Tissue
Reference conditions
$ $
$
$
$
Comparison conditions
Each comparison condition is repeated 12 times, for a total of 144

trials per reference condition ( Total of 576 trials )
Psignifit (an externally

downloaded MATLAB
toolbox) is used to
generate the
psychometric curve
Point of Subjective
Equality (PSE) is used to
determine the shift in
stiffness perception
Results
12 Subjects (9 males, 3 females), Age 18-41
Results - Analysis
Results - Analysis
Model
$
anovan(PSE,{Factor_SSR, Factor_Subj},'random',2,
'model', [1 0; 0 1]);
$ $ $
Results - Analysis
Sum(of( Mean(
Source DoF F1Number P
Squares Squares
SSR 4027 2 2013.7 9.54 0.001
Subject 4070 11 370.05 1.75 0.1263
Error 4643 22 211.05 G G
Mean Square Error (MSE)
Results--Analysis
Results Analysis
$
Performed post-hoc comparison of mean between groups
0.05
Using Bonferroni correction - effect is significant if <

$
0.2 0.0 MSE from
0.2,0.0 = MSE from
ANOVA results

ANOVA results
Post-hoc
Size of effect
analysis
0.2 > 0.0 11.33 2.7 0.0065
0.4 > 0.2 14.52 3.5 0.0011
0.4 > 0.0 25.84 6.2 < 0.001
case study 2:
Evaluation of Tactile
Feedback Methods for
Wrist Rotation Guidance
Andrew A. Stanley and Katherine J. Kuchenbecker
(UPenn)
Andrew$A.$Stanley$and$Katherine$J.$Kuchenbecker.$Evaluation$of$tactile$feedback$methods$for$
wrist$rotation$guidance.$IEEE#Transactions#on#Haptics,$5(3):240251,$JulySeptember$2012.
Motivation
Low-cost tactile motion
guidance for stroke
patient upper limb
rehabilitation
Vibration feedback
proved challenging for
wrist rotation guidance
Bark et al. 2011
Kapur et al. 2010
Tactile Actuator (Tactor) Design
Algorithms for Guiding Motion
1-DOF
Wrist
Rotation
Two
algorithms
per tactor
Experimental Setup
N = 10 subjects (9 right-handed, 1
left-handed, 7 male, 3 female, age
20-30 =22.2 years
2-hour study compensated by $25
gift cards
Experimental Setup
Calibrated target angles to each
subjects range of max wrist
pronation/supination
Subjects wore noise-canceling
headphones and kept eyes closed
during trials
Approved under UPenns IRB
Experimental Setup
10 feedback conditions (5 actuators,
2 algorithms each) presented in
pseudo-random order across
subjects
3 tasks always presented in order of
increasing complexity
Direction Response Task
Random delay 2-5 seconds
Magnitude of cue 75% across all trials
Headphone beep to signal end of trial after turning 45 degrees
Metrics:
Median reaction time and
IQR for each subjects 12
trials
Proportion of trials in
which subject initially and
ultimately moved correct
direction
Angle Targeting Task
Move to target angle and stay within 15deadband for 1 second
Magnitude/frequency of cue scales proportional to error
Headphone beep to signal end of trial
Metrics:
Rise time (10-90%)
Max overshoot
Settling time (within dead
band)
Trajectory Following Task
Continuously varying trajectory of random combination of sine waves, 30
seconds per trajectory
Magnitude of cues and dead band same as targeting task
Metric:
RMS error between users
trajectory and edges of
dead band tolerance
Statistical Methods
Three-way ANOVA with first order interactions
Factor 1: Device Type (tapper, dragger, vibration, etc.)
Factor 2: Algorithm Type (steady/pulsing)
Factor 3: Subject Number
Factors 1&2 fixed effects, Factor 3 random effect
Formatting data to work with ANOVA can get messy for multiple factors:
tenPercentPt = min(timePlot(anglePlot >= .1));
ninetyPercentPt = min(timePlot(anglePlot >= .9));
riseTime(i,scripts,tactors) = ninetyPercentPt-tenPercentPt;
medianAbsoluteError = reshape(medianAbsoluteError,10,nUsers*nTrials);
dataBeingTested = testMetrics(j,:);
[pIJ,tableIJ,statsIJ,termsIJ] = anovan(dataBeingTested,
{Tactor Script User}, 'random', 3, 'model', 'interaction',
'varnames',{'Tactor';'Script';'User'}, 'display', 'off');
Statistical Methods
ANOVA requires that data is sampled from a normal distribution
Use Lillifors test to see whether ANOVA can be applied, may need to
transform data
Before Transformation: After:
Timed data
typically increases
variance with
magnitude, take
logarithm to help
normalize
Statistical Methods
Subjects completed questionnaire after Continuous scales are more likely to
each form of feedback and after provide data from a normal distribution
completing full experiment than heavily discretized scales
Matlab Image
Processing
Toolbox can
help with
grunt work:
Statistical Methods
Need to run multiple No multiple

comparison tests for each comparison tests
pair necessary
ANOVA only tells you whether any of the feedback types differs from at least
one other feedback type, does NOT specify which pairs differ
Multiple Comparison Tests
If you have a relatively simple ANOVA and arent worried about
interactions, fixed vs. random effects etc.:
USE multcompare.m
If you run an overly complicated user study:

Consult someone with a PhD in statistics and write your own
custom script to run multiple comparison tests taking
confounding factors, degrees of freedom, etc. into account (it
gets messy):
Custom multiple comparison tests (page 1 of 5):
Pro Tip: Design a simple user study

so that you can use multcompare.m
Results:
Multiple pairwise
comparison tests show:
Both Squeezer algorithms
fastest
Vibration Pulsing, Twister
Pulsing and both Dragger
algorithms slowest
Squeezer fastest of devices
followed by Tapper
Twister faster than Dragger
Steady faster than Pulsing
(0.27 sec on average)
Results:
Results:

Lecture13 Statistics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture13 Statistics

Uploaded by

Copyright:

Available Formats

ME 327: Design and Control of Haptic Systems

After independent variables are selected, you need to select

it is easier to tell two groups apart

numerator is the difference between the means

final formula for the t-test:

A 5% significance means that there is a risk that

also known as F-test

Type I error ( error): Error of finding a significance by chance

similar to the t-value in t-test, in ANOVA we calculate

Similar to t-value, F-value measures the signal-

Can we increase the

Each comparison condition is repeated 12 times, for a total of 144

Psignifit (an externally

SSR 4027 2 2013.7 9.54 0.001

Subject 4070 11 370.05 1.75 0.1263

Error 4643 22 211.05 G G

Mean Square Error (MSE)

Kapur et al. 2010

Need to run multiple No multiple

If you run an overly complicated user study:

Pro Tip: Design a simple user study

You might also like