You are on page 1of 38

Dr Asna M.

Zain, RSci AMIChemE


CDB3093
Analytical Chemistry

Data Handling, Statistic and Errors

Outline
Sample handling and management
QC and QA
Errors in analysis
Statistical analysis parameters
Descriptive statistics
Inferential statistics

Example questions

Nature and Scope

Chemical
analysis

Subject

Solve using chemical or


physico-chemical process
as underlying principles of
the technique

Set of instruction
Reliability in accuracy,
reproducibility

Analytical
problem

Method

Procedures

Validate

Based on
purpose and
intended quality

Techniques and method of analysis

Techniques

Radiochemical
Electrochemical

Thermal

Chromatography

Mass spectrometry

AAS

Gravimetry

Atomic & molecular

spectrometry

HPLC
GC

FTIR

Validation method

Performance
characteristic
of detector
for single
analyte
calibration
standards

Process
repeated for
mixed analyte
calibration
standards

Process
repeated for
analyte
calibration
standard with
possible
interfering
substances
and for
reagent blank

Process
repeated for
analyte
calibration
standard with
anticipated
matrix
component to
evaluate
matrix
interference

Analysis of
spike
simulated
matrix
matrix with
added known
amount of
analyte, to
test
recoveries

Field trials in
routine lab
with more
junior
personnel to
test
ruggedness

Sampling and sample handling


Reflects the real composition of sample
Due to varying in time and elapse
between sample collection and analysis
proper storage is required to prevent
loss of analyte

Preservative to maintain the sample


condition for storage or for analysis
Prior to analysis such as extraction,
grinding, concentrate or dissolution
6

Analysis

Representative sample
Coning and quartering solid
grab sample /composite of grab water/liquid
Random pick

2
2
1

3
4

2
1

4
4

Quality control and quality assurance


QC - ensure the
operational techniques and
activities in analytical lab
provide result suitable for
intended purpose
Meet specific requirement
in context of defined
problem e.g. accuracy and
precision, calibration

QA - managerial
component/ responsibility of
an analytical lab with all QC
procedures are in place.
Build confidence through lab
participation by inter lab
studies.
Proficiency test to the lab
performance or analyst.

Confidence in validity
Cost effective
8

Method performance and


certification studies
undertaken

Errors in analytical measurement


Measurement error used statistical method to assess the error and
minimize by careful experimental design and control
Absolute and relative error
Absolute error given by the Ea = Xm Xt
Relative error, Er = (Xm Xt)/ Xt

Determinate errors
Systematic error lead to bias in the measured value from analyst, equipment or
procedure which require record keeping, training or equipment maintenance.

Indeterminate error
Random error source from random fluctuations in measured quantities occurs in
closely controlled environment
Minimize by careful experimental design and
control of the environmental factors

Accumulated error
Aggregated error count in every measurement made in analytical procedures
and contributed to the final calculated results.
9

Determinate and indeterminate error


Determinate error
Instrumental error include
instrument fault,
uncalibrated weights and
uncalibrated glasswares
Operative error due to
lack of skill and training

Errors in methods -source


from coprecipitation, slight
solubility, side reactions,
incomplete reactions and
impurities in reagents

10

Indeterminate error
Accidental error or random error

Use probability or statistic to


come into conclusion about the
error
Indeterminate error should follow
the normal distribution or
Gaussian curve
represent the standard deviation
of infinite population and measure
the precision by the spread of
normal population distribution as
in Fig 3.2

Gaussian distribution
Random errors follow a Gaussian or normal distribution.

We are 95% certain that the true value falls within 2 (infinite population),
IF there is no systematic error.

11

Gary Christian, Analytical Chemistry,

6th Ed. (Wiley)

Fig. 3.2 Normal error curve.

Way to express accuracy - Absolute error


and relative error
Absolute error
Difference between true
value and measured
value

Absolute or mean error


expressed as percentage of
true value is relative error

If true value is 2.62 g and


the measured value is
2.52 g, thus the absolute
error, Ea is -0.10 g

Based on the same


measurement, relative error, Er
is (-0.10/2.62) x 100% = -3.8%

If the Xm is based on
average of several
measurement the value
is called mean error.
12

Relative error

The relative accuracy is the


measured value or mean
expressed as a percentage of
the true value, (2.52/2.62) x
100% = 96.2%

Example 3.6
The results of an analysis are 36.97 g, compared with
the accepted value of 37.06 g.
What is the relative error in parts per thousand, ppt?
Absolute error = 36.97 g 37.06 g = -0.09 g
Relative error = -0.09 /37.06 x 1000% = -2.4 ppt

13

Statistical analysis
Used statistical model
follow a normal (Gaussian) distribution
Average or normalize data if data set is small
to apply Gaussian distribution

A batch may contain a sample or more with


different variety or reason e.g. parameters,
holding time

14

Accuracy and precision


You cant have accuracy without good precision.
But a precise result can have a determinate or systematic error.

Fig. 3.1. Accuracy and precision.


Gary Christian, Analytical Chemistry,

15

6th Ed. (Wiley)

R chart and X chart


Use control chart to present or evaluate the batch of QC sample.
R chart was used to present the precision which record the property of interest in a running
sequence. Show centerline or average, standard deviation and warning or control limit

This X chart requires result from known sample composition and used to evaluate accuracy.
16

Warning limit of 2 standard deviation and control limit of 3 standard deviation.

Statistical parameters
software Excel, SPSS, Minitab, SYSTAT

Descriptive statistic
Check data for any problematic or non normality data set depart from bell shape or with
outliers, use frequency chart or normal plot

Means,
standard deviation,

or S (data <10),

Relative standard deviation, RSD /coefficient of variation, CV

Variance,

Skewness and kurtosis for any trend about the data indicating cluster or particular pattern
Skewness asymmetric with high frequencies on one side and a long tail of low freq on other side
Kurtotic distribution has high peak and long tail on both side

Confidence limit, CL
17

Data distribution

18

Confidence limit
Estimate the range within a given probability which the true value might fall defined by
the experimental mean and standard deviation
The range is called confidence interval and the limit is called confidence limit.
The likelihood that the true value fall within the range is called the probability or
confidence level
Select a confidence level (95% is good) for the number of samples analyzed
=(degrees of freedom +1).
Confidence limit = x ts/N.
It depends on the precision, s, and the confidence level you select.

19

Inferential statistic
Researcher need to make inferences about population
of sample
Types of inferential statistic
Significance Test, F test and T-test
Analysis of variance (ANOVA)

Q-Test (to discard bad data)


20

Significance test
Compare the result of a method with the accepted method
results to decide whether the data is significantly different from
another set of data (in the mean or availability and spread)
Used statistical table like F test or t test
F test indicate a significant different between two method based on
their standard deviation
F is defined in term of variances of two methods where the variance
is the square of the standard deviation
F = s12/s22
(Eq. 3.10)
where s12 > s22
If the calculated F value from Eq. 3.10 exceeds a tabulated F value at
the selected confidence level (e.g Table 3.2 at 95% confidence level),
then there is a significant different between variances of the two
methods
21

F value
F = s12/s22.

You compare the variances of two different methods to see if there is a


significant difference in the methods, at the 95% confidence level.

22

Gary Christian, Analytical Chemistry,

6th Ed. (Wiley)

Example 3.16
You are developing a new calorific procedure for determining the
glucose content in blood serum.You have chosen the standard FolinWu procedure with which to compare your results. From the following
two sets of replicate analyses on the same sample, determine whether
the variance of your method differs significantly from that of the
standard methods using F test.

23

Your method (mg/dL)

Folin-Wu method (mg/dL)

127
125
123
130
131
126
129

130
128
131
129
127
125

t-Test
Analysis of variance between means
Require assumption before the test

Do the sample follow a normal distribution? If small is sample then the test is
incorrect, moderate sample size of 40-100 to be accurate
the variance for the two groups is about the same. Check homogeneity of variance
assumption, can lead to inaccurate result particularly for small groups with unequal
sample sizes
observations to be assumed to be independent, such that one subject does not
influence anothers subject score.
Statistic calculate the sample means divided by a variance for comparison with the critical value
obtained from a probability table at the selected p value (0.05, 0.01 or 0.001)
if the t statistic is equal or exceed the critical value, then the difference between the two group
means is significant at the chosen level of alpha.
The test can be one-sided or twosided. The former is used when the mean for a particular
group is hypothesized to be higher than the mean for other group, the latter is used when the
mean are expected to be different.
24

Example 3.18
A new gravimetric method is developed for iron (III) in which the iron
is precipitated in crystalline form with an organoboron cage
compound. The accuracy of the method is checked by analyzing the
iron in an ore sample and comparing with the results using the
standard precipitation with ammonia and weighing of Fe2O3. The
results, reported as % Fe for each analysis, were as follows:

25

Find the F and t value,

Test method Reference method

given

20.10
20.50
18.65
19.25
19.40
19.99

18.89
19.20
19.00
19.70
19.40

ANOVA
Multiple t-test when there are more than a few groups

A comparison of group means no limitation on the no. of group comparison


ANOVA was used to examine the variability of scores within and between
groups.

Subject scores within groups vary due to differences in individual and random
error
ANOVA assume the observation are independent, normal and group variances
are equal
ANOVA test determine if any group mean is significant different from any other
group mean by overall F test.
If no different (i.e. F-test is not significant), then the is no point in comparing any
of the groups retain null hyphothesis.
If F-test is significant indicate at least one group mean is significantly different
from one other group mean. investigate the hypothesis for the groups.
26

Q-test
QCalc = outlier difference/range.
If QCalc > QTable, then reject the outlier as due to a systematic error.

27

Example of Q-test
Performed Q-test to find outlier data from
the following measurement and made your
conclusion to the data.

28

Sydney

Cherry

Tien

Dick

10.2
10.8
11.6

9.9
9.4
7.8

10.0
9.2
11.3

9.5
10.6
11.6

Correlation
Association between two variables that takes on a
value between +1.0 and -1.0

If the two variables are positively correlated, then as


one increases, the other increase.
If the two variables are negatively correlated, then
one variable increases, the other decreases
It there are not associated at all the correlation is
zero
A scatter plot of zero correlation will show a circular
fields of points on x-y axis or no particular
relationships between x and y.
A positive correlation appear as linear line and
increasing but negative correlation will appear as
linear with decreasing line.
Made inferences for association between two
variables in population, by assume data are normal
distribute
Pearson correlation ,
or

29

Regression
Regression consider a continuous group of variables such as age, divide the
group into the continuous nature of the age
Regression create a linear equation to predicts the score in a dependent
variable.
The equation represent a line that best fit through a scatter plot of points
describing the relationship between variable and one or more independent
variables
The beta weight or coefficient of the independent variables in the equation give
info on relationships between the independent and dependent variables
The slope of single line best fit data of the x-y axis, represent the beta weight
and reflect changes in the value of the dependent variable that associated with
each change of one unit in the independent variable.
Regression analysis assume independence, normality and constant variance, and
linear relationship between independent and dependent variables.
30

Regression
Simple linear regression a single independent variables
was used to estimate the score for dependent variable
Multiple linear regression determine amount of variance
a set of independent variables explains in the dependent
variable is significantly different from zero.
R2 value indicate the degree of variance score between the
dependent to its independent (R2 = 0.16
16 %)

31

A least-squares plot gives the best straight line through experimental points.
Excel will do this for you.

Gary Christian,
Analytical Chemistry,
6th Ed. (Wiley)

32

Fig. 3.7. Straight-line plot.

This Excel plot gives the same results for slope and intercept as calculated in
the example.

Riboflavin (Vit B3) is determined in a cereal sample by its fluorescence


intensity in 5% HAc sol. A calibration curve was prepared by measuring
the fluorescence intensities of a series of standards of increasing
concentrations. The following data were obtained. Used the method
least squares to obtain the best straight line for the calibration curve
and to calculate the concentration of riboflavin in the sample.
M= (xi-x)(yi-y)
(xi-x)2
b= y-mx

Gary Christian,
Analytical Chemistry,
6th Ed. (Wiley)

33

Fig. 3.8. Least-squares plot of data from Example 3.21.

Manual solution for example 3.21

34

EXCEL spreadsheet solution for 3.21


Select LINEST from the statistical function list (in the Paste Function window
click on fx in the tool bar to open).
LINEST calculates key statistical functions for a graph or set of data.

35

Fig. 3.10. Using LINEST for statistics.

Use of spreadsheets in analytical


chemistry
We often use relative cell references in formulas.

If a number from a given cell is to be a constant in the formula, place $ in


front of that cells descriptors.

36

Fig. 3.5. Relative and absolute cell references.

EXCEL Mathematical function


Excel has a number of mathematical and statistical functions.

Click on fx on the tool bar to open the Paste Function.

Math & trig syntaxes:


LOG10
PRODUCT
POWER
SQRT
Statistical syntaxes:
AVERAGE
MEDIAN
STDEV
TTEST
VAR

37

References
Gary D. Christian, 2003 Analytical Chemistry, 6th Ed., Wiley,
QD101.2 C57 2003
Daniel C Harris, Exploring Chemical Analysis Second Ed., W.H
Freeman and Company, 2000 QD 75.2. H368.
Seamus P.J. Higson, Analytical chemistry, Oxford University Press,
2004 QD 101.2.H54

38

You might also like