You are on page 1of 6

Reliability Analysis of Non-Parametric Statistical

Tests f or t he Evaluation of Linear Dr i f t in


Experimental Dat a
P. Cappa2, S. A. S c i ~ t o ~ * ~ and S. Silvestri
' Department of Mechanics and Aeronautics - University of Rome "La Sapienza" -
ViaEudossiana 18,00184 - Rome, Italy.
Clinical Engineering Service - Children's Hospital "Bambino Gesu' of Rome -
Piazza S. Onofrio 4,00165 - Rome, Italy.
Department of Mechanical and Industrial Engineering - University of Roma
Tre - Viadella Vasca Navale 79,00146 - Rome, Italy,
Abstract: During usual data gathering, the statistical analysis eficiency strongly depends on the
noise level superimposed on the signal. It has been found that some well known statistical tests,
commonly utilised in data acquisition in order to detect the presence of drift, can fail under some
conditions. Thus, a statistical procedure for the predictive reliability estimation of the utilised
statistical method could be usefil in the design of experimental analysis. This paper reports the results
of a simulation study carried out to evaluate the performance in drift detection of non-parametric
tests suck as the Wald-Wolfowitz run test, in comparison with the Mann-Wkitney, reverse
arrangement test. In order to detect the sensitivity of the tests to evaluate a monotonous drift, a
simulation program was developed. In the program a Gaussian raw data sequence with a linear
pattern of variable slope and with variable variance was simulated and given as the input to the tests.
The capability to detect the presence of drift as a function of angular coeficient and variance of the
noise superimposed on the signal was verified. The obtained data were synthesised in graphs so that
the experimentalist could determine preliminarily the eflectiveness of each of the considered statistical
methods in terms of percentage of success in detecting the presence of drift phenomena as a function
of drift relevance and the noise amplitude. Finally, the graphs permitted the elucidation of the causes
of contradictovy failing results observed in long term experimental analysis.
Key Words: Drift, zero-shift, non parametric tests, statistical reliability, pulmona y ventilators
Not at ion:
A value of reverse arrangements in the Mann-
Whitney reverse arrangement test obtained
examining the whole data set;
generic reverse arrangement value in the
Mann-Whitney reverse arrangement test;
reverse arrangement obtained examining xi
and xi;
number of positive runs in the Wald-Wolfovitz
run test;
number of negative runs in the Wald-Wolfovitz
run test;
Ai
h,
N total number of runs;
N1
N2
Y
RAT
RT
X
x m
PA
0 'Y
'Y
random variable of run distribution;
reverse arrangement test;
run test;
random variable;
mean value of a random variable;
mean of arrangements in the Mann-Whitney
reverse arrangement test;
mean of runs in the Wald-Wolfovitz run test;
variance of arrangements in the Mann-Wtney
reverse arrangement test;
variance of m in the Wald-Wolfovitz run test;
variance associated to y where y=a+bx.
Strain 2001 Vol. 37 No. 2 67
Introduction
Pulmonary ventilators, as it well known, are
commonly utilised even for long periods of time [I , 21
and, unfortunately, ventilatory parameter drift is a
common problem with which clinical technicians have
often to deal. Drift of these parameters, as it is obvious,
could be very harmful for patient's health [3-51.
Anyway, in spite of their importance, drift tests are not
prescribed currently by ventilator manufacturers.
Furthermore, the "Standard specification for
ventilators intended for use in critical care" [6] (the
only available reference) describes experimental
methods to conduct endurance tests that seem too
rigorous to be practised in common maintenance
procedures and offers no standard procedure for
statistical data analysis. For this reason, in a previous
phase of research, a PC-based automatic procedure
with a user friendly interface for ventilator drift tests
171has been designed. This is in use at the Clinical
Engineering Service+(CES) of the Children's Hospital
"Bambino GesG"++and has helped technicians with
maintenance procedures for the 68 pulmonary
ventilators currently (for about 1.5 M$ total value)
installed. In particular, the proposed methodology to
verlfy the possible ventilator drift could be extended
also to all of the fields where a zero shift analysis is
required.
During the verification phase of the proposed method,
some ventilation parameters, such as airway pressure,
tidal volume and respiratory flow, were continuously
acquired for a 20 day time period and the collected
data were post-processed by means of statistical tools
[8-171 in order to evaluate the possible drift of the
examined ventilators. In particular, non-parametric
tests, i.e. the Wald-Wolfovitz run test (RT) and the
Mann-Whitney reverse arrangement test (RAT), were
used to process data in order to highlight the presence
of a systematic trend in the observed results as a
function of time. The results were more than
satisfactory even though in some cases different tests
provided different responses for the same raw data
and the same confidence coefficient (a level). As it is
useful know the reliability that could be expected from
such methods before their application, an investigation
of non-parametric test failure causes was of interest.
As, it was observed that discordance mostly appeared
when the presence of noise is particularly high in the
acquired data and, it is well known, that the efficiency
of statistical analysis strongly depends not only on the
noise level superimposed on the signal but also on the
accuracy of measurement set-up, we decided to
implement a procedure for the predictive estimation of
the reliability of the utilised non-parametric tests as a
function of noise level and linear drift slope. In order
to achieve this aim and to identify the reasons for
different responses, a simulation study was carried out
by applying statistical methods to a variable slope
linear drift with variable variance Gaussian white
noise superimposed on the data set.
Non-Parametric Test Description
The statistical procedures which do not assume a
specific distribution function for the original random
variable of interest are called distribution free or non-
parametric procedures. One of the best known
distribution free procedures used for data evaluation is
the well known chi-square goodness of fit test, but also
RT and RAT are widely utilised valuable data
processing techniques for drift detection. Every
statistical test gives its response of acceptance or
rejection with the starting hypothesis at a certain level
of confidence or significance, also called a level.
Usually, a level equal to 95%, which obviously means
that there is a 5% probability to fail, is commonly
accepted for experimental data processing.
Tobetter understand the considerations that follow the
Wald-Wolfovitz and Mann-Whitney test, are briefly
described.
Wald-Wolfovitz run t est
Let us consider a sequence of N observed values of a
random variable x where each observation can be
classified into one of two mutually exclusive
categories, which may be identified simply by a plus
(+) or a minus (-). For example, in the case of a
sequence of measured values xi, i= 1,2,3, . .., N with a
mean value x,, we will count a (+) or a (-) for each xi L
x, and xi <x , , respectively. A run is defined as a
sequence of identical observations, positive if referred
to (+) observations or negative if vice versu, that is
followed and preceded by a different observation, (-)
or (+). The number of runs occurring in the whole
sequence of observations gives an indication as to
whether or not data are independent observations of
the same random variable. More specifically, if a
sequence of N observations of the same random
variable are independent, the probability of a (+) or a
(-) result does not change from one observation to the
'The Clinical Engineering Service was established in 1980 and manages about 5000 electro-medical devices for a global value of about 40 million US$.
"TheChildren's Hospital "Bambino Gesu (about 730 bed-medical facility) is a private and non-profit-making hospital located inthe Vatican City, i.e. the independent
Papal st at e within the city of Rome (Italy), and isofficially recognised by the Italian Government as a "Research and Care Institute of a Scientific Nature".
68 Strain 2001 Vol. 37 No. 2
N ( 2 N + 5 ) ( N - 1 )
next and, as a consequence, the sampling distribution
of the number of runs occurred in the sequence is a U,.I =
-79 (7)
I L
random variable r with a mean value and a variance
evaluated as follows: Then a normalised Gaussian curve is developed with a
mean value and variance calculated according to
equations (6) and (7), respectively. If A lies in the
confidence interval defined by the GI level, then the
response is positive and drift is not present. Also in
this case, limited tabulation of percentage points for
the distribution function are available in literature
[16].
As already observed, both tests allow the estimation
of linear drift tendency of a data set with a certain
(1)
2N, N* +1
PT =7
(2)
a, =
where N1 is the number of positive runs and N2 the
number of negative runs. Then, a normalised Gaussian
curve is obtained by means of equations (1) and (2). If
N1 lies in the confidence interval defined by the a level
2 2 N, N, @N, N, - N )
N' ( N -1)
then the response is positive and two categories have
same distributions indicating an absence of drift.
Limited tabulation of percentages points for the
distribution function of runs are also available in the
literature [16].
Mann-Whitney reverse arrangement test
Given a sequence of N observed values of a random
variable x where the observations are denoted by xi,
i=l, 2, 3..., N, each time that xi>xj for i<j it must be
counted as a reverse arrangement and the total
number of reverse arrangements is denoted by A. A
general definition for A is as follows. For a set of
observations xl, x2, .... xN we can define
A'-1
A - ~ A ,
,=1
( 3)
level of confidence, but they do not take into account
in any way the effect of the data variance, which
could significantly affect the reliability of their
response. However, in order to evaluate the "level of
dispersion" of data, simple calculation of variance of
the whole data set was not found useful because, as
well known, variance value can be strongly
dependent on the slope of the linear underlying
tendency. Therefore, it is necessary to individuate a
statistical index able to estimate data dispersion due
to the effects induced by a reduced measurement
system accuracy, independently from the slope of
data trend. In order to obtain objective diagrams for
the reliability evaluation of the two considered tests
the determination of 02y [ B] , i.e. the variance
associated with y where y is linearly related to the
input x, seemed to be an appropriate index to attain
our aim, i.e. to separate the noise floor contribution
from drift tendency.
as the total number of reverse arrangements where any
element of the sum Ai is defined by
In order to check the independence of 0 2 y from slope
of data set, a simulation was carried out by randomly
applying Gaussian noise with a variance crz equal to
calculating global variance az and a 2 y . Results are
shown in Figure 1 where o2 and a 2 y are represented
as a function of the angular coefficient expressed in
A'
a set value of 25 to different linear slope data sets and
4 =pv
and
(4)
, = , + I
1 rfx, z XI
= 0 otherwise (5)
xo .................................
xo.
for any k j , is termed a reverse arrangement.
If the sequence of N observations are independent
observations of the same population, i.e. no drift is
present, then the number of reverse arrangements is a K i i i i : , , i , , .i i - i ~ , i ,
random variable A, with a mean value and a variance
as follows.
,co.
J
0 0 5 1 ' 5 1 2 5 3 35 1 4 5 5 L 5 6
Slope [YO]
N(N-1)
Figure I: Comparison between o2 and 0 2 y as a function of
the linear slope with Gaussian noise superimposed
(6)
Strain 2001 Vol. 37 No. 2 69
Figure 2: Flow chart ofsofl ware for t he determination of
test reliability graphs
percentage. A first sight examination of this figure
confirms that a 2 y is an effective indicator of data
dispersion, sufficiently independent from the
angular coefficient of the data trend, i.e. from
monotonous instrumentation shift.
Simulation and Graph Description
To determine the reliability of the two tests for drift
detection as a function of the level of dispersion of the
collected data, a simulation software was designed in
LabViewTM. With reference to Figure 2, the program is
composed of two main modules. The first part was
developed for the generation of a linear function with
variable angular coefficient with the possibility of
overlapping it with Gaussian white noise of variable
variance. The second module performs RT and RAT on
the data provided from the first part of the software,
and in addition calculates azy (Figure 2).
Once the level of reliability ( a level) is established
equal to 95%, the software provides a graph in
which the success/failure of the specific test is
represented as a function of dy that, as previously
mentioned, is evidently related to the system
accuracy, and the angular coefficient of the linear data
trend that is related to the drift tendency of the
utilised set-up.
From an overall analysis of the diagram provided by
the simulation software for the above mentioned tests
in the case of monotonous drift, it is possible to
observe that, as expected, the increase in noise
amplitude determines an increase in test unreliability.
Furthermore, three main zones have been outlined
(Figures 3 and 4): (a) a success zone, where the test
gives a reliable response, (b) a failure zone, inside
which the test is completely unreliable, and (c) an
uncertainty zone, i.e. a data set with unstable results.
In particular, it is possible to observe that the zone of
unreliability increases with a direct dependency on the
data dispersion level.
The identification of limit lines for test reliability
allows one to determine the angular coefficient of the
minimal noticeable drift once the value of a*,, for the
collected data set is calculated. Furthermore, diagrams
also show that in case of monotonous drift RAT results
more efficient than RT.
Consequently, once the experimentalist determines the
aZy, the minimal angular coefficient can be estimated,
i.e. the instrumentation drift, by each of the mentioned
tests.
Application t o Experimental Data and
Discussion
In order to validate the results provided by the
examined drift test analysis, the application to those
data sets previously acquired during pulmonary
ventilator parameter drift analysis [7] was decided.
The measured fundamental parameters (tidal volume,
airway pressure, percentage of oxygen, etc.) were
acquired for a twenty days time period by connecting
a ventilator to a patient simulator, the physical
characteristics of which were assumed constant as a
function of time. Mechanical characteristics of patient
simulator are guaranteed by the manufacturer to be
stable over a period of time much longer than 20 days,
0
0 0.1 0.2 0.3 0.4 0.5 0 0.05 0.1 0.15
E Reliabi'
Slope [%]
Slope [YO]
Figure 3: Zones of reliability, unrel i abi l q and uncertainty for
run test as a function of dy, and mi ni mal appreciable drifi
slope with a level equal to 95%
Figure 4: Zones of reliability, unreliability and uncertain?. for
reverse arrangement test as a function of dy and minimal
appreciable dr i p slope with a level equal t o 95%
70 Strain 2001 Vol. 37 No. 2
which is the duration we chose for data acquisition.
The analysis of the obtained results outlined the
presence of a generally noticeable drift in the airway
pressure values. This phenomenon was evident for
peak pressure values (Figure 5) for which an increase
of 18mmH20 appeared during test time length, i.e. the
airway peak pressure increased with a
0.037mmH20/hour slope and both the statistical tests
outlined that tendency. However, with reference to
other pressure values, such as end inspiration (Figure
6) or mean pressures, despite an expected drift
tendency, the applied statistical tests provided
discordant responses. In this particular case, drift
presence, for the same level, emerged by utilising RAT,
while RT excluded it.
The calculation of oZy on the data set relative to end
inspiration pressure provided a value of oZy
=8.51mm2H20. From the examination of Figures 3 and
4 it emerges that the minimal noticeable drift for oZy
=8.51mm2H20 must have an angular coefficient higher
than 0.13% for RAT and higher than 0.42% for RT. As
the parameter was acquired with a 4 samples per hour
frequency the minimal noticeable drift turns out to be
equal to 0.04mmH20/hour, a value that lies out of the
zone of reliability for RT but inside the zone of
reliability for RAT. Thus, only the RAT result has to be
taken into consideration because it is capable of
identifying the system instability.
The magnitude of ventilator parameter variation
outlined by the examined case study does not seem to
be relevant at a first sight comparison with actual
pathophysiological changes in human beings. Besides,
pulmonary ventilators have been taken as a case study
for their supposed stability. In fact, as it is well known,
they are usually very expensive medical devices
specifically devoted to high risk utilisation and, as
expected, their functioning is remarkably stable. As a
Figure 5: Example of raw data acquired from a pulmonary
mechanical ventilator where drifl is present and was
identified by bath the examined statistical tests
consequence, even though drift sometime appears, it is
generally not relevant in adult applications. Anyway,
there are two main aspects that must be taken into
account: first of all the clinical relevance of a ventilator
drift cannot be stated a priori. Furthermore the present
study has been conducted with a view to the real
application of the devices examined at our Children's
Hospital. Small variations in the parameters can be
hazardous due to the mechanical characteristics of
neonatal lungs that strongly depend on age, sex and
physiology of the patient. More specifically, in new
born infants an error of few cmH,O on Peak
Inspiratory Pressure (PIP) can cause barotraumas and
a result definitely dangerous to patient's health.
With reference to the constancy of ventilator settings, it
must be considered that, whereas in an Intensive Care
Unit (ICU) environment, as well as in an operating
theatre, patient condition can change even very
suddenly, in long term treatments, i.e. in case of home
life support or chronic diseases, settings can be left
unchanged for months. Furthermore, this study was
conducted with the main aim of outlining the drift
component which can be attributed just to the
ventilator as a single device in order to evaluate its
reliability of use.
The application of the here reported results can be
eventually useful to CES technicians for the objective
comparison of different ventilator performances or to
check the ageing process of the same ventilator during
maintenance procedures.
Conclusions
To provide acceptance conditions and outline
reliability for some widely used statistical tests that
can be applied to any kind of experimental data where
I g .5 m p - . ..... ,G-i.:kf .....-. .I- -:i-m-+:.-2 .- "' ;::.
.....................
. ... -.. .. a o . * . . . - . . - . .................
.....................................
I ...... " .... -... " " ..". ....
Figure 6: Example of raw data acquired from a pulmonary
mechanical ventilator where drif? was identified by only one
of the examined two statistical tests
Strain 2001 Vol. 37 No. 2 71
a constant output is expected, tests have been carried
out by means of a patient simulator specifically
designed by the manufacturer for ventilator
calibration to guarantee stable "patient" conditions
during device parameter testing. Therefore, the
observed drift amount can be attributed to the
examined ventilator.
The reported analysis allows the determination of
confidence level associated with run test and reverse
arrangement test when they are utilised to evaluate a
monotonous drift in experimental data set. Thus, the
experimentalist can, in an a priori approach, evaluate
the minimum noticeable drift when the overall
accuracy associated with the measurement set-up is
known. The method proposed here was validated with
the experimental data and identified, in an objective
manner, the discordance of the results obtained by
means of the two tests when applied to the same data
set.
References
1. Tobin, MJ ., J ubran, A. and Hinesc, E. Jr. (1994)
Pathophysiology of failure to wean from
mechanical ventilation. Schweiz Med Wochenschr
2. Nava, S. et al. (1994) Survival and prediction of
successful ventilator weaning in COPD patients
requiring mechanical ventilation for more than 21
days. Eur Respir J 7,1645-1652.
3. J ohnson, B. et al. (1985) Pathophysiological
considerations on special modes of ventilation in
severe respiratory distress syndrome. Excerpta
Medica Int. Congr., Rome (Italy), Gasparetto.
4. Calon, B., Clever, B. and Urli, D. (1989) An unusual
failure of the 900C Siemens Servo Ventilator. An. Fr.
Anesthetics, Strasburgh (France).
5. Hartopp, I.K. (1994) Incorrect settings on Manley
ventilators [letter]. Anaesthesia, 49,916-917.
6. ASTM F 1100-90 (1990) Standard Specification for
Ventilators Intended for Use in Critical Care. West
Conshohocken, PA, USA.
7. Branca F.P., Cappa P., Sciuto S.A. and Silvestri S.
(1997) A novel methodology for the experimental
evaluation of pulmonary ventilator performance
drift. JournaZ of Clinical Engineering 22, 163-170.
8. Taylor, J.R. (1982) An introduction to error analysis.
University Science Book, Mill Valley.
9. Draper, N.R. and Smith, H. (1981) Applied
Regression Analysis. J ohn Wiley & Sons, New York.
10. Brownlee, K.A. (1965) Statistical Theory and
Methodology in Science and Engineering. J ohn
124,2139-2145.
Wiley & Sons, New York.
11. Wald, A. and Wolfovitz, J . (1940) On a test whether
two samples are from the same population. Ann of
Math Statist 11, 147-162.
12. Conover, W.J. (1980) Practical Nonparametric
Statistics, Pd ed.. J ohn Wiley & Sons, New York.
13. Blalock, H.M. (1979) Social Statistics, Pd ed..
McGraw-Hill, New York.
14.Stewart, J.Q. and Warntz, W, (1958) Physics of
Population Distribution. Journal of Regional Science
15. Lehmann, E.L. (1975) Non parametrics: statistical
methods based on ranks. Holden Day, San
Francisco.
16. Wayne, D.W. (1978) Applied Nonparametric
Statistics. Houghton Mifflin, Boston, MA, USA.
17.Hollander, M. and Wolfe, D.A. (1973)
Nonparametric Statistical Methods. J ohn Wiley &
Sons, New York.
1,90-123.
F 1 H A R N S S S V S ~
Technical Centre, Owen Road
Diss, Norfolk, England IP22 4ER
Telephone +44(0) 1379 646200
Fax +44(0) 1379 646900
http://www.fl systerns.com
MVBSTORS IN PEOPLE
F1 Harness Systems are world leaders in the design
and manufacture of electrical wiring harnesses for the
premier Motorsport industry, with a major presence in
Formula One, World Rally Car and Indycars. We are
continuing to experience rapid growth in measurement
products and wish to recruit a Strain Gauge
Engineermechnician to support this development.
STRAIN GAUGE ENGINEEWECHNICIAN
You will have specialist experience of design and
installation of strain gauge based systems. You will be
used to building installations to the highest quality and
to short timescales, to service a demanding market.
The role will include designing stain gauge systems,
installation and bonding of strain gauges, training and
developing production processes.
'lease apply in writing with your career details or contact
Susan Hutchinson : Telephone 01379-646214
Email : Susan.Hutchinson@flsysterns.com
72 Strain 2001 Vol. 37 No. 2

You might also like