Professional Documents
Culture Documents
INTRODUCTION
Although there is a lot of literature on outlier detection, most of the existing techniques
are suitable for symmetric distributions as discussed in detail in chapter 2. Some of the
authors proposed outliers techniques for skewed data, but the performance of these
techniques needs improvement. The major problem of the existing outlier detection
techniques is that these work in symmetric distribution and fail to work in asymmetric
distribution. Some techniques assume normality assumptions while most of the real data
do not follow normal distribution. Literature needs techniques which work both in
symmetric and asymmetric distributions equally. This thesis proposes a new technique
for measuring skewness and new technique for detection of outliers in skewed data. This
technique works well both in symmetric and skewed distribution. Its performance has
been proved better than existing techniques by comparing their constructed fences with
the true lower and upper boundaries defined around the central 95 percent of the
distributions. These calculations are analytical and easy to understand. The study has
been planned in the following way. In Chapter 2 this study provides literature review of
various aspects of skewness, its measurements, existence of outliers in the real data sets
due to natural effects and some time due to errors and contaminations. Benefits and
deleterious effects of outliers in data have been discussed along with the application of
outlier detection in real life. Existing outlier detection techniques have also been
discussed.
Since this study is related to the skewed distributions, it is important to have robust tests
for measuring skewness of the given data set. Chapter 3 provides a review of techniques
of measuring skewness in the data. This study also introduces a new technique for
measuring skewness (the Split Sample Skewness henceforth abbreviated as SSS) that
splits the sample from the median as its name suggests. This study also compares SSS
with previous non parametric techniques like quartile skewness, octile skewness and
medcouple. A new methodology based on bootstrapping has been developed to compare
these techniques. Since all the techniques except moment measure of skewness are
designed to be robust measure of skewness, the performance of all robust techniques has
been compared by matching the size in symmetric distribution and then comparing the
power in skewed distributions adopting bootstrap simulation technique. Superiority of the
technique has been proven by simulation results.
In Chapter 4, a new technique has been developed based on split sample methodology to
detect outliers in the skewed distributions. This technique has been applied on different
distributions (2, , and Lognormal) with different parameters, and the results are
compared with a very popular method named box plot developed by Tukey (1977).
Applications of the proposed technique show its dominance on Tukeys and Kimbers
techniques in constructing the fence around the true central 95% boundaries of the
different distribution and also in real data sets.