Professional Documents
Culture Documents
I. Gkouzionis
3 First Approach
Outline
3 First Approach
Figure 1
Outline
3 First Approach
Outline
3 First Approach
The traditional pixel-wise HSI classification is based on the fact that different
materials have different spectral reflectance and identify each material based
on its spectral curve.
In other words, classify each pixel by its digital numbers from different bands.
Given a set of observations (i.e. pixel vectors in a hyperspectral image), the
goal of classification is to assign a unique label to each pixel vector so that it
is well-defined by a given class.
The availability of hyperspectral data with high spatial resolution has been
quite important for classification techniques (i.e. data mostly contains pure
pixels that represented by a single predominant spectral signature).
(a) (b)
Figure 2
Gkouzionis Ioannis (TUC) ML and HSI January 22, 2018 9 / 66
Hyperspectral Image Classification Introduction - Challenges
Classification Process
The discrimination of materials based on their spectral profile, can be
considered as a classification task, where groups of pixels are labeled to a
particular class based on their reflectance properties, exploiting training
examples for modeling each class.
The classification process has two main stages:
1 The number and nature of the categories are determined
2 Every unknown or unseen element is assigned to one of the categories
according to its level of resemblance or similarity to the basic patterns
Figure 3
Classification Process
Outline
3 First Approach
Feature Extraction/Selection
Feature Extraction/Selection
Feature Extraction
Feature Extraction
Feature Extraction
Feature Selection
In feature selection, the idea is to select a set of spectral bands from the initial
pool of bands available prior to classification.
A particular characteristic of feature selection methods is that they tend to
retain the spectral meaning, while reducing the number of bands.
In unsupervised feature selection, the goal is to automatically find statistically
important features. The advantage of unsupervised methods is that they do
not need training data.
Quite opposite, supervised feature selection is based on general/expert
knowledge, and require labeled and unlabeled training samples.
Techniques in the later category comprise methods based on class separability
measures using standard distance metrics (e.g. Euclidean, Mutual
information, Bhattacharyya, Mahalanobis).
Outline
3 First Approach
Introduction
Figure 4
Outline
3 First Approach
Introduction
Training Data
Figure 5
Outline
3 First Approach
Introduction
K-Means Clustering
K-Means Clustering
K-Means Clustering
K-Means Clustering
Figure 6
K-Means Clustering
The major advantage of this process is that the method is robust, efficient and
easy to understand.
If variables are huge, then K-means most of the times is computationally
faster than other clustering methods, if we keep k small.
A drawback of the K-means algorithm is that the number of clusters k is an
input parameter.
An inappropriate choice of k may yield poor results.
ISODATA Clustering
ISODATA Clustering
Figure 7
Outline
3 First Approach
Introduction
Figure 8
The most widely used model is the multi-layered feed-forward ANN. Its design
consists of one input layer, at least one hidden layer and one output layer.
This algorithm is a promising technique for a number of situations such as
non-normality, complex feature spaces and multivariate data types, where
traditional methods fail to give accurate results.
One of the most notable feature about a neural network which motivates its
adoption in hyperspectral imaging classification is its robustness when
presented with partially incomplete of incorrect input pattern and the ability
to generalize input.
CNNs
CNNs
Figure 9
CNNs
Pooling is used to make the features invariant from the location, and it
summarizes the output of multiple neurons in convolutional layers through a
pooling function.
Typical pooling function is maximum.
A max pooling function basically returns the maximum value from the input.
Max pooling partitions the input data into a set of non-overlapping windows
and outputs the maximum value for each subregion and reduces the
computational complexity for upper layers and provides a form of translation
invariance.
The computation chain of a CNN ends in a fully connected network that
integrates information across all locations in all feature maps of the layer
below.
Figure 10
We can see that the curve of each class has its own visual shape which is
different from other classes, although it is relatively difficult to distinguish
some classes with human eye.
Gkouzionis Ioannis (TUC) ML and HSI January 22, 2018 52 / 66
Hyperspectral Image Classification Deep Learning - CNNs
Figure 11
In the above CNN classifier, the input represents a pixel spectral vector
followed by a convolution layer and a max pooling layer in turns to compute
20 feature maps classified with a fully connected network.
Layers C1 and M2 can be viewed as a trainable extractor to the input HSI
data, and layer F3 is a trainable classifier to the feature extractor.
The training process of the CNN classifier contains two steps: forward
propagation and backward propagation.
The forward propagation aims to compute the actual classification result of
the input data with current parameters.
The backward propagation is employed to update the trainable parameters in
order to make the discrepancy between the actual classification output and
the desired classification output as small as possible.
Outline
3 First Approach
Introduction
Confusion Matrix
Kappa Statistics
Outline
3 First Approach
First, we have to acquire the spectral cube. Then we’ll apply the K-means
clustering algorithm on that cube.
K-means algorithm calculates initial class means evenly distributed in the
data space, then iteratively clusters the pixels into the nearest class using a
min-distance technique.
Choosing the optimal number of clusters:
Elbow method
Silhouette analysis
Figure 12
Silhouette analysis (SA) is another way to measure how close each point in a
cluster is to the points in its neighboring clusters.
The best advantage of using SA score for finding the optimal number of
clusters is that you use it for unlabelled data set.
The silhouette ranges from -1 to 1. Silhouette coefficients near +1 indicate
that the sample is far away from the neighboring clusters.
A value of 0 indicates that the sample is on or very close to the decision
boundary between two neighboring clusters.
Negative values indicate that those samples might have been assigned to the
wrong cluster.
The silhouette can be calculated with any distance metric, such as Euclidean
distance.
Figure 13