You are on page 1of 3

CHAPTER 1

INTRODUCTION
Although there is a lot of literature on outlier detection, most of the existing techniques
are suitable for symmetric distributions as discussed in detail in chapter 2. Some of the
authors proposed outliers techniques for skewed data, but the performance of these
techniques needs improvement. The major problem of the existing outlier detection
techniques is that these work in symmetric distribution and fail to work in asymmetric
distribution. Some techniques assume normality assumptions while most of the real data
do not follow normal distribution. Literature needs techniques which work both in
symmetric and asymmetric distributions equally. This thesis proposes a new technique
for measuring skewness and new technique for detection of outliers in skewed data. This
technique works well both in symmetric and skewed distribution. Its performance has
been proved better than existing techniques by comparing their constructed fences with
the true lower and upper boundaries defined around the central 95 percent of the
distributions. These calculations are analytical and easy to understand. The study has
been planned in the following way. In Chapter 2 this study provides literature review of
various aspects of skewness, its measurements, existence of outliers in the real data sets
due to natural effects and some time due to errors and contaminations. Benefits and
deleterious effects of outliers in data have been discussed along with the application of

outlier detection in real life. Existing outlier detection techniques have also been
discussed.
Since this study is related to the skewed distributions, it is important to have robust tests
for measuring skewness of the given data set. Chapter 3 provides a review of techniques
of measuring skewness in the data. This study also introduces a new technique for
measuring skewness (the Split Sample Skewness henceforth abbreviated as SSS) that
splits the sample from the median as its name suggests. This study also compares SSS
with previous non parametric techniques like quartile skewness, octile skewness and
medcouple. A new methodology based on bootstrapping has been developed to compare
these techniques. Since all the techniques except moment measure of skewness are
designed to be robust measure of skewness, the performance of all robust techniques has
been compared by matching the size in symmetric distribution and then comparing the
power in skewed distributions adopting bootstrap simulation technique. Superiority of the
technique has been proven by simulation results.
In Chapter 4, a new technique has been developed based on split sample methodology to
detect outliers in the skewed distributions. This technique has been applied on different
distributions (2, , and Lognormal) with different parameters, and the results are
compared with a very popular method named box plot developed by Tukey (1977).
Applications of the proposed technique show its dominance on Tukeys and Kimbers
techniques in constructing the fence around the true central 95% boundaries of the
different distribution and also in real data sets.

In Chapter 5, a modification is proposed in the HV box plot technique introduced by Mia


Hubert and Ellen Vandervieren (2008) which is specially designed for detection of
outliers in the skewed distribution. The main problem of HV boxplot is that it generates a
larger fence around the 95% boundary of the distribution and increases the chance of type
II error. Simulation study has been done on the skewed distributions, like 2 with different
degrees of freedom, , and lognormal with different parameters and different sample
sizes and supremacy of proposed modification over HVBP has been proven by the
results.
In Chapter 6, a robust measure of skewness known as medcouple, introduced by G. Brys,
M. Hubert and A. Struyf (2004), has been incorporated in the technique developed in
Chapter 4. Again simulation study has been done on the early tested distributions in the
similar fashion.
Chapter 7 includes applications of the Tukeys technique, SSSBB technique introduced in
Chapter 4, HVBP (2008) and MHVBP proposed in Chapter 5 and MCSSSBB technique
proposed in Chapter 6 on the real data sets of stock return of United Trust of Pakistan
(UTP-2008) and baby birth weight data followed up till 28th day. Chapter 8 comprises the
conclusions and recommendations based on the theoretical and empirical evidence and
directions for the future work.

You might also like