Feature Extraction Using Attraction Points For Classification of Hyperspectral Images in A Small Sample Size Situation

1986 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 11, NO.
11, NOVEMBER 2014
Feature Extraction Using Attraction Points for

Classification of Hyperspectral Images in a
Small Sample Size Situation
Maryam Imani and Hassan Ghassemian, Senior Member, IEEE
Abstract—Hyperspectral images provide a large volume of spec- Unsupervised FE methods such as [13]–[15] do not re-
tral bands. Feature extraction (FE) is an important preprocessing quire any prior knowledge and do not consider the separa-
step for classification of high-dimensional data. Supervised FE bility of classes. Thus, they may not be sufficiently efficient
methods such as linear discriminant analysis, generalized discrim-
inant analysis, and nonparametric weighted FE use the criteria in classification applications. Supervised FE methods rely on
of class separability. Theses methods maximize the between-class the training data. They use the between-class discrimination
scatter matrix and minimize the within-class scatter matrix. We information. Thus, they are appropriate for classification of
propose a supervised FE method in this letter, which uses no statis- high-dimensional data. Linear discriminant analysis (LDA) is
tical moments. Thus, it works well using limited training samples. a supervised traditional method for FE [16]. LDA is based on
The proposed FE method consists of two important phases. In the
first phase, an attraction point for each class is found. In the second the mean vector and covariance matrix of classes. It maximizes
phase, by using an appropriate transformation, the samples of the ratio of between-class to within-class scatter matrices. The
each class move toward the attraction point of their class. The advantage of LDA is that it is distribution free. LDA has three
experimental results on two real hyperspectral images demon- disadvantages, which are represented as follows: one is that
strate that FE using attraction points has better performance in the performance of LDA depends on the distribution of data.
comparison with some other supervised FE methods in a small
sample size situation. When the distribution of classes is non-Gaussian-like or class
data from a multimodal mixture distribution, the performance
Index Terms—Attraction points, feature extraction (FE), hyper- of LDA is not satisfactory. The second disadvantage of LDA is
spectral image, limited training sample.
limitation in the number of extracted features. The maximum
I. I NTRODUCTION rank of the between-class scatter matrix is the number of
classes nc minus one; consequently, LDA can extract maximum
H YPERSPECTRAL sensors collect hundreds of spectral
bands that offer rich spectral information. This large
volume of spectral bands creates a challenge for classification
nc − 1 features that may not be sufficient for classification of
hyperspectral data. The third limitation of LDA is singularity
of data due to the Hughes phenomenon [1]. Thus, feature of the within-class scatter matrix, which causes LDA to fail to
reduction plays an important role as a preprocessing step before work in a small sample size situation.
the classification of hyperspectral images. Feature reduction The linear methods such as LDA can be extended to nonlin-
approaches are divided into two general groups: feature se- ear methods using the kernel trick. The class separability may
lection (FS) and feature extraction (FE). The former approach be increased in the kernel space. Generalized discriminant anal-
identifies the best subset of features from all spectral bands ysis (GDA) is the nonlinear extension of LDA [17]. A suitable
based on an adopted selection criterion [2]–[5]. The latter kernel function should be selected to have good efficiency. As
approach generates a new feature space with a small dimension with LDA, GDA can extract maximum nc − 1 features. How-
via data transformation using a projection matrix [6]–[10]. A ever, GDA has better performance than LDA. Nonparametric
full review of feature reduction methods is represented in [11]. discriminant analysis (NDA) was proposed by authors in [18]
In this letter, we focus on FE. So far, many FE methods have to solve the problems of LDA. A new nonparametric between-
been proposed in the literature. class scatter matrix is defined in NDA that uses the boundary
FE methods are categorized as supervised or unsupervised. points and local means. Because of the parametric form of the
There are also semisupervised FE methods such as [12] that within-class scatter matrix in NDA, it has the same singularity
aim at improved classification by utilizing both unlabeled and problem in a small sample size situation as LDA. Nonpara-
limited labeled data. metric weighted FE (NWFE) was developed in light of NDA
[19]. NWFE puts different weights on every sample to compute
Manuscript received January 26, 2014; revised February 28, 2014; accepted the weighted means and defines new within- and between-class
March 24, 2014. Date of publication May 1, 2014; date of current version scatter matrices. NWFE uses the regularization techniques to
May 22, 2014.
The authors are with the Faculty of Electrical and Computer Engineer- achieve better performance than LDA and NDA. The main
ing, Tarbiat Modares University, Tehran 14155-4843, Iran (e-mail: maryam. disadvantage of NWFE is its enormous computation time,
imani@modares.ac.ir; ghassemi@modares.ac.ir). particularly when the number of training samples is increased.
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. All mentioned supervised FE methods use the scatter ma-
Digital Object Identifier 10.1109/LGRS.2014.2316134 trices to create the separability criterion. The supervised FE
1545-598X © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
IMANI AND GHASSEMIAN: FEUAP FOR CLASSIFICATION OF HYPERSPECTRAL IMAGES 1987
methods such as LDA, GDA, NDA, and NWFE maximize the

between-class scatter and minimize the within-class scatter.
We propose a supervised FE method in this letter that uses a
new idea for discrimination between classes. The basis of the
idea is represented as follows: If we consider an appropriate
attraction point for each class, by using a proper transformation,
the samples of each class can move toward the attraction point
of their class. If attraction points of different classes are chosen
away from each other, by aggregation of samples of each class
around the attraction point of the same class, different classes
become separable. Our proposed method is done in two basic
phases. The first phase is to obtain appropriate attraction points.
The second phase is to achieve the proper transformation to
move toward attraction points. We propose two approaches Fig. 1. Samples of each class move toward the attraction point of their class.
for obtaining the attraction points. FE using attraction points
(FEUAP) has no need to estimate the statistical moments (mean the attraction points of classes: 1) selection based on distance
vector or scatter matrix). Thus, it works well using limited measure and 2) selection based on dense measure.
training samples. We compare FEUAP with LDA, GDA, and Selection of attraction points using distance measure: In this
NWFE using two real hyperspectral data (an agriculture image approach, the attraction point of each class is selected from
and an urban image). The experimental results demonstrate the training samples of that class, so that it has the minimum
good efficiency of FEUAP in a small sample size situation. distance from training samples of its class and the maximum
The remainder of this letter is organized as follows: the distance from training samples of other classes.
proposed method is introduced in Section II. The experimental To obtain the attraction point of the cth class, i.e., xac , we
results are represented in Section III. Finally, Section IV con- solve the following optimization problem:
cludes this letter. ⎛ ⎞
⎜
ntc nc ntk
⎟
s = arg min ⎝ xic −xjc 2 − xic −xjk 2 ⎠ .
i=1,...,ntc
II. FEUAP j=1 k=1 j=1
k=c
The high-dimensional data and its low-dimensional represen- (1)

tation are denoted by {xi }N i=1 , xi ∈ R , and {y i }i=1 , y i ∈
d N
Thus, xac = xsc . In (1), xic denotes the ith training sample of
Rd , respectively. d is the number of spectral bands of the class c. ntc is the number of training samples of the cth class,
hyperspectral image, and m denotes the dimensionality of the and nc is the number of classes.
projected subspace (m ≤ d). The transformation matrix Am×d Selection of attraction points using dense measure: In this
maps every original data point xi to y i , i.e., y i = Axi . approach, we define a dense function for each sample of data as
The proposed FE method is represented here, and the per- follows:
formance of it is compared with conventional supervised FE

ntc
methods in Section III. F(xic ) = e−xic −xjc .
2
(2)
Most supervised FE methods such as LDA, GDA, and NWFE j=1
use the scatter matrices for discrimination between classes. In
FEUAP, which is proposed in this letter, a new idea is intro- Then, we solve the following optimization problem to
duced to provide discrimination between classes. In FEUAP, we obtain xac :
choose an attraction point for each class from training samples s = arg min F(xic ). (3)
i=1,...,ntc
of that class. If the samples of each class move toward the
attraction point of their class, the samples of different classes Thus, xsc (sth training sample of class c) is selected as the
become far apart. Thus, by proper selection of attraction points, attraction point of the cth class, i.e., xac = xsc . If a training
the samples of each class aggregate around the attraction point sample from class c has the highest dense function value, this
of their class. In other words, the samples of each class are means that there are more samples of class c around this sample
attracted to the attraction point of their class and are repelled than other samples of class c.
by attraction points of other classes. Thus, different classes In others words, training samples with a dense region can be
become separable in the new feature space. Fig. 1 illustrates accepted as attraction points.
the idea of moving toward attraction points. FEUAP is done in
two phases: 1) obtaining the attraction points and 2) obtaining
B. Transformation Matrix for FE
the transformation matrix for FE.
In FEUAP, the samples in the reduced feature space are in
such a way that 1) each sample has the minimum distance from
A. Obtaining the Attraction Points
the attraction point of its class (attraction); and 2) each sample
The attraction point of each class is selected among training has the maximum distance from the attraction points of other
samples of the same class. We propose two approaches to select classes (repulsion).
1988 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 11, NO. 11, NOVEMBER 2014
Fig. 2. AA versus the number of extracted features using FEUAP, LDA, GDA, and NWFE for the Indian data set.
TABLE I
O BTAINED ACCURACY VALUES IN E ACH C LASS OF THE I NDIAN DATA S ET FOR D IFFERENT FE M ETHODS
TABLE II
S TATISTICAL S IGNIFICANCE OF D IFFERENCES IN C LASSIFICATION Z.
E ACH C ASE OF THE TABLE R EPRESENTS Zrc , WHERE r
I S THE ROW AND c I S THE C OLUMN
Fig. 3. AA versus the number of extracted features using FEUAP, LDA, GDA,
and NWFE for the University of Pavia data set.
where
Based on the represented idea, we define two functions: the
1, if C1
attraction function ψ1 and the repulsion function ψ2 as follows:
wij = −1, if C2

nc
ntc 0, if C3.
ψ1 = y ic − y ac 2 (4)
c=1 i=1 Conditions C1, C2, and C3 are represented as follows:
nc ntc
nc C1: y i and y j belong to the same class, and one of them is the
ψ2 = − y ic − y ak 2 (5) attraction point.
c=1 i=1 k=1
k=c
C2: y i and y j belong to the different classes, and at least one
of them is the attraction point.
where y ac = Axac is the attraction point of the cth class in the C3: None of y i and y j is the attraction point.
new feature space, and y ic = Axic . For finding the transfor- (7), n denotes the number of total training samples (n =

In
nc
mation matrix A, we should solve the following optimization
c=1 ntc ). y i = Axi , where xi is the ith training sample. We
problem: can rewrite ψ in (7) in the matrix form as follows:
min(ψ = ψ1 + ψ2 ). (6) ψ = 2tr(YGYT ) (8)
A
where tr(·) denotes the trace of (·). Y is a d × n matrix, i.e.,
Function ψ can be written in another way as follows: Y = [y 1 , y 2 , . . . , y n ] and Y = AX, (X = [x1 , x2 , . . . , xn ]).

n
n In addition, G = D − W, where the ith row and jth column
ψ= wij y i − y j 2 (7)
matrix W is wij , and D is a diagonal matrix with Dii =
of
j=1 i=1 j wij .
IMANI AND GHASSEMIAN: FEUAP FOR CLASSIFICATION OF HYPERSPECTRAL IMAGES 1989
TABLE III
H IGHEST AVERAGE C LASSIFICATION ACCURACY VALUES ACHIEVED BY U SING 16 T RAINING S AMPLES
TABLE IV
H IGHEST AVERAGE C LASSIFICATION ACCURACY VALUES ACHIEVED BY U SING 32 T RAINING S AMPLES
Thus, the optimization problem in (8) can be rewritten as parameter of the RBF kernel is tested between [0.1–2] with a
follows: step size increment of 0.1. The best values are obtained using
a fivefold cross-validation approach. The training samples are
min ψ = 2tr(AXGXT AT ). (9) randomly chosen from the entire scene, and the remaining
A
samples are used as test data. For fair comparison, in each
The above generalized eigenvalue problem has to be solved experiment, the same training samples are used for all FE
to find the transformation matrix A. The optimal linear methods. In addition, because of random selection of training
projection matrix A, which projects the samples into an samples, each experiment was repeated 100 times, and the
m-dimensional feature space and satisfies AT A = Im×m , is average of 100 trails was reported. FEUAP is done by using
composed of eigenvectors of XGXT corresponding to its first two different ways for obtaining the attraction points: FEUAP-
m largest eigenvalues. dist and FEUAP-dense, which use distance and dense measures
to obtain the attraction points, respectively.
In the first experiment, we compare the FEUAP with other
III. E XPERIMENTAL R ESULTS
supervised FE methods in a small sample size situation for
Here, we evaluate the performance of FEUAP in comparison the Indian data set. This experiment is done using 16 and
with some supervised FE methods such as LDA, GDA, and 32 training samples, and the results are shown in Fig. 2. For
NWFE in a small sample size situation. We use two real instance, the mean and standard deviation (std) of classification
hyperspectral images for our experiments. The first data set is accuracy in each class of data sets, which were obtained by
an agriculture image that was captured by the Airborne Visible/ using 100 iterations, are represented in Table I. These values are
Infrared Imaging Spectrometer over the northwest Indiana’s obtained by using the SVM classifier, 16 training samples, and
Indian pine site in June 1992. This hyperspectral image consists six extracted features. We use the McNemars test for assess-
of 145 × 145 pixels and contains 16 classes, of which 10 classes ment of statistical significance of differences in classification
are chosen for our experiments. The number of spectral bands results [22]. The difference in the accuracy between classifiers
is 220, which is initially reduced to 200 by removing 20 water 1 and 2 is said to be statistically significant if |Z12 | > 1.96. The
absorption bands. The second used data set, i.e., the University sign of Z12 indicates whether classifier 1 is more accurate than
of Pavia, is an urban image that was captured by the Reflective classifier 2 (Z12 > 0), or vice versa (Z12 < 0). The results of
Optics System Imaging Spectrometer. This hyperspectral image the McNemars test for the Indian data set by using the SVM
contains 103 bands, 9 classes, and 610 × 340 pixels. classifier, 16 training samples, and six extracted features are
Average accuracy (AA), which is the mean of class specific represented in Table II.
accuracy values, is used to evaluate the FE results. We use two In the second experiment, we do comparison between FE
supervised classifiers for our experiments: maximum likelihood methods on the University of Pavia data set using 16 training
(ML) and support vector machine (SVM) [20]. The radial samples. Fig. 3 shows the average classification accuracy versus
basis function (RBF) kernel is used for SVM classification the number of extracted features that are obtained using SVM
due to the Library for Support Vector Machines tool [21]. The and ML classifiers. The highest average classification accuracy
one-against-one multiclass classification algorithm is used for values achieved by using 16 and 32 training samples are repre-
our experiments. The penalty parameter C of SVM is tested sented in Tables III and IV, respectively. The numbers in paren-
between [10–1000] with a step size increment of 20, and the γ theses represent the number of features achieving the highest
1990 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 11, NO. 11, NOVEMBER 2014
AA values in our experiments. As seen from the obtained [7] M. Imani and H. Ghassemian, “Band clustering-based feature extraction
results, FEUAP-dist and FEUAP-dense provide the highest for classification of hyperspectral images using limited training sam-
ples,” IEEE Geosci. Remote Sens. Lett., vol. 11, no. 8, pp. 1325–1329,
classification accuracy values using limited training samples Aug. 2014.
in comparison with other supervised FE methods. FEUAP-dist [8] J. Wen, Z. Tian, and X. Liu, “Neighborhood preserving orthogonal PNMF
uses both within-class and between-class information and gen- feature extraction for hyperspectral image classification,” IEEE J. Sel.
Topics Appl. Earth Observ. Remote Sens., vol. 6, no. 2, pp. 759–768,
erally provides better performance than FEUAP-dense, which Apr. 2013.
uses only the within-class information. [9] M. J. Mendenhall and E. Merényi, “Relevance-based feature extraction for
hyperspectral images,” IEEE Trans. Neural Netw., vol. 19, no. 4, pp. 658–
672, Apr. 2008.
IV. C ONCLUSION [10] M. Kamandar and H. Ghassemian, “Linear feature extraction for hyper-
spectral images based on information theoretic learning,” IEEE Geosci.
We have represented a new idea for supervised FEUAP. Remote Sens. Lett., vol. 10, no. 4, pp. 702–706, Jun. 2013.
The proposed method does not need to estimate any statistical [11] X. Jia, B. C. Kuo, and M. Crawford, “Feature mining for hyperspec-
tral image classification,” Proc. IEEE, vol. 101, no. 3, pp. 676–697,
moment. Thus, it works well using limited training samples. Mar. 2013.
FEUAP is done in two phases: 1) obtaining an appropriate [12] W. Liao, A. Piurica, P. Scheunders, W. Philips, and Y. Pi, “Semisuper-
attraction point for each class and 2) obtaining the proper vised local discriminant analysis for feature extraction in hyperspectral
images,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 1, pp. 184–198,
transformation for moving toward attraction points. The ex- Jan. 2013.
perimental results using two real hyperspectral images show [13] J. Yin, C. Gao, and X. Jia, “Using Hurst and Lyapunov exponent for
the better performance of FEUAP in comparison with other hyperspectral image feature extraction,” IEEE Geosci. Remote Sens. Lett.,
vol. 9, no. 4, pp. 705–709, Jul. 2012.
supervised FE methods such as LDA, GDA, and NWFE in a [14] M. Zortea, V. Haertel, and R. Clarke, “Feature extraction in remote
small sample size situation. sensing high-dimensional image data,” IEEE Geosci. Remote Sens. Lett.,
vol. 4, no. 1, pp. 107–111, Jan. 2007.
[15] J. Yin, C. Gao, and X. Jia, “Wavelet packet analysis and gray model
R EFERENCES for feature extraction of hyperspectral data,” IEEE Geosci. Remote Sens.
[1] G. F. Hughes, “On the mean accuracy of statistical pattern recognition,” Lett., vol. 10, no. 4, pp. 682–686, Jul. 2013.
IEEE Trans. Inf. Theory, vol. IT-14, no. 1, pp. 55–63, Jan. 1968. [16] K. Fukunaga, Introduction to Statistical Pattern Recognition. San Diego,
[2] S. Li, H. Wu, D. Wan, and J. Zhu, “An effective feature selection method CA, USA: Academic, 1990.
for hyperspectral image classification based on genetic algorithm and [17] G. Baudat and F. Anouar, “Generalized discriminant analysis using a
support vector machine,” Knowl.-Based Syst., vol. 24, no. 1, pp. 40–48, kernel approach,” Neural Comput., vol. 12, no. 10, pp. 2385–2404,
Feb. 2011. Oct. 2000.
[3] H. Peng, F. Long, and C. Ding, “Feature selection based on mu- [18] K. Fukunaga and M. Mantock, “Nonparametric Discriminant Analysis,”
tual information: Criteria of max dependency, max-relevance, and min- IEEE Trans. Pattern Anal. Mach. Intell., vol. 5, no. 6, pp. 671–678,
redundancy,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8, Nov. 1983.
pp. 1226–1238, Aug. 2005. [19] B. C. Kuo and D. A. Landgrebe, “Nonparametric weighted feature extrac-
[4] J. Yin, Y. Wang, and J. Hu, “A new dimensionality reduction algorithm tion for classification,” IEEE Trans. Geosci. Remote Sens, vol. 42, no. 5,
for hyperspectral image data using evolutionary strategy,” IEEE Trans. pp. 1096–1105, May 2004.
Ind. Inf., vol. 8, no. 4, pp. 935–943, Nov. 2012. [20] S. A. Hosseini and H. Ghassemian, “A new fast algorithm for multi-
[5] L. Bruzzone and C. Persello, “A novel approach to the selection of spa- class hyperspectral image classification with SVM,” Int. J. Remote Sens.,
tially invariant features for the classification of hyperspectral images with vol. 32, no. 23, pp. 8657–8683, Dec. 2011.
improved generalization capability,” IEEE Trans. Geosci. Remote Sens., [21] C. Chang and C. Linin, LIBSVM—A Library for Support Vector
vol. 47, no. 9, pp. 3180–3191, Sep. 2009. Machines, 2008. [Online]. Available: http://www.csie.ntu.edu.tw/~cjlin/
[6] I. Dópido, A. Villa, A. Plaza, and P. Gamba, “A quantitative and com- libsvm
parative assessment of unmixing-based feature extraction techniques for [22] G. M. Foody, “Thematic map comparison: Evaluating the statistical sig-
hyperspectral image classification,” IEEE J. Sel. Topics Appl. Earth nificance of differences in classification accuracy,” Photogramm. Eng.
Observ. Remote Sens., vol. 5, no. 2, pp. 421–435, Apr. 2012. Remote Sens., vol. 70, no. 5, pp. 627–633, 2004.

Feature Extraction Using Attraction Points For Classification of Hyperspectral Images in A Small Sample Size Situation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Feature Extraction Using Attraction Points For Classification of Hyperspectral Images in A Small Sample Size Situation

Uploaded by

Copyright:

Available Formats

1986 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 11, NO.

11, NOVEMBER 2014

Feature Extraction Using Attraction Points for

methods such as LDA, GDA, NDA, and NWFE maximize the

The high-dimensional data and its low-dimensional represen- (1)

You might also like