You are on page 1of 8

Science in China Ser. D Earth Sciences 2004 Vol.47 No.

7 651658

651

Self-organizing feature map neural network classification of the ASTER data based on wavelet fusion
HASI Bagan, MA Jianwen, LI Qiqing, HAN Xiuzhen & LIU Zhili
Laboratory of Remote Sensing Information Science, Institute of Remote Sensing Applications, Chinese Academy of Sciences, Beijing 100101, China Correspondence should be addressed to Hasi Bagan (email: hasibagan@263.net)

Received June 15, 2003

Abstract Most methods for classification of remote sensing data are based on the statistical parameter evaluation with the assumption that the samples obey the normal distribution. However, more accurate classification results can be obtained with the neural network method through getting knowledge from environments and adjusting the parameter (or weight) step by step by a specific measurement. This paper focuses on the double-layer structured Kohonen self-organizing feature map (SOFM), for which all neurons within the two layers are linked one another and those of the competition layers are linked as well along the sides. Therefore, the self-adapting learning ability is improved due to the effective competition and suppression in this method. The SOFM has become a hot topic in the research area of remote sensing data classification. The Advanced Spaceborne Thermal Emission and Reflectance Radiometer (ASTER) is a new satellite-borne remote sensing instrument with three 15-m resolution bands and three 30-m resolution bands at the near infrared. The ASTER data of Dagang district, Tianjin Municipality is used as the test data in this study. At first, the wavelet fusion is carried out to make the spatial resolutions of the ASTER data identical; then, the SOFM method is applied to classifying the land cover types. The classification results are compared with those of the maximum likelihood method (MLH). As a consequence, the classification accuracy of SOFM increases about by 7% in general and, in particular, it is almost as twice as that of the MLH method in the town.
Keywords: classification, wavelet fusion, self-organizing neural network feature map (SOFM), ASTER data. DOI: 10.1360/03yd0411

Remote sensing classification methods can be classified as supervised and unsupervised catalogs. The maximum likelihood method (MLH) is a supervised classification methodwhich is widely used in the remote sensing data classification and produces good results[1]. In the MLH, the parameters are estimated, assuming that the samples are normally distributed in spectral space, to determine the mean vector and covariance matrix of all classes. In most cases, however, the samples are not normally distributed.

Therefore, the artificial neural network method, which does not need the above-mentioned assumption, has become an important new classification method. The artificial neural network method, which has already become a hot topic in classification study, has the sfollowing superiorities to the traditional methods: no need of parameter estimation and assumption of data model, a relatively strong noise resistant ability and the ability of learning multiple complex models.
Copyright by Science in China Press 2004

652

Science in China Ser. D Earth Sciences

The Kohonen self-organized feature map (SOFM), which is also called Kohonen feature map or topology saving map, has already been widely used in many research areas, such as sound identification and image compression[2]. For the application of this method to remote sensing data classification, the selection of the optimal network structure using the Landsat Thematic Mapper (TM) has been investigated in ref. [3]. In this work, the wavelet fusion was used in the first step to increase the spatial resolution of ASTER data in order to get stronger signals in question. Then, SOFM was applied to the land cover classification using the advanced ASTER data. Finally, the SOFM classification result was compared with that of MLH classification. 1 1.1 Basic principle and algorithm Wavelet fusion

The wavelet theory was developed in the late 1980s[4] and has provided a new signal-processing tool. Due to its features of effective local time-frequency, scale selection and orientation, the wavelet method has been extensively applied to many research areas, such as image processing, pattern recognition, computer visualization as well as fractal analysis. The prominent feature of the wavelet analysis is its sensitiveness to the focus change, hence forming an adjustable time frequency window. Specifically, the high frequency resolution and low time resolution are used in the case of low frequency passing, while the low frequency resolution and high time resolution are used in the case of high frequency passing. Thus, the algorithm has the ability to enhance the texture information of the image through manifesting different image structure features in different image resolutions. The data used in this study are the ASTER data, which were acquired on August 20th in 2000. Bands 1, 2, 3N and bands 5, 7, 9 of the data have different spatial resolutions. If the ASTER data are directly used for the SOFM classification, only one group of bands with the same spatial resolution in the data can be applied, so that it is impossible to get the adequate information from the full bands of data, bearing in mind that the more bands are used in classification, the higher accuracy can be achieved. In order to take ad-

vantage of full bands with different resolutions in the data, it is necessary to fuse all bands with different resolutions into those with the same resolution. To do so, the 30-m resolution bands of 59 were fused into 15-m resolution that is the same as the resolution of bands 1, 2, 3N by using the wavelet transformation, which is known as a powerful tool for fusing images with different spatial resolutions. In practice, the multi-resolution Mallat algorithm is used to carry out the wavelet transformation in the images with both low and high resolutions. Specifically, the high frequency parts of the image with the high and low resolutions are compared with each other, and then a new high frequency image is produced in terms of the local maximum variance in all scales and directions. Finally, the reversed wavelet transform is carried out to the low frequency parts of the data with the low resolution and the newly generated high frequency parts of the data, resulting in a new image with the high spatial resolution and enhanced information [5,6]. The procedure is presented as following: Let F be the high frequency part of the fused image, and denote by A and B the high frequency parts of both the low resolution and high resolution images, respectively. The central point (i, j) for the high frequency component and its adjacent 8 points are shown as follows: (i1, j1) (i, j1) (i+1, j1) (i1, j) (i, j) (i+1, j) (i1, j+1) (i, j+1) (i+1, j+1)

First, the mean variances of the corresponding points (i, j) for A and B (the mean variance is calculated from 9 points, i.e. 8 surrounding points and the central point (i, j)) are compared with one another, and then the large mean variance points are temporally selected for F. If the points from A have superiority in F, then all of the 9 points centered at point (i, j) in F are replaced by the corresponding points of A; otherwise, all corresponding points of B are taken in F. Finally, the inverse wavelet transform is applied to the low frequency parts of the low resolution image and the locally replaced high frequency parts (F) of the image, and it produces a new wavelet fusion image. Thus, the fusion image effectively combines with the spatial information of high resolution image, the spec-

Classification based on wavelet & neural network

653

tral information and the texture information in the multi-bands images. 1.2 SOFM and vector quantification

The establishment of the SOFM neural network model is stimulated from the modeling study of the biological system. There is a small extent area in the visualization layer of human brain, which responds to the external environment stimulation. It is called the side feedback (excitement or suppression) vicinity and consists of three obvious side reaction zones. One zone has the excited reaction to the stimulation in a close distance, while another zone has the restrained reaction to the stimulation in a far distance. For the stimulation in an even far distance, the remaining zone has an inactively excited reaction. According to such a feature of human brain, Kohonen built up the SOFM network model to mimic the feedback feature of the brain visualization cells. He believed that the neighboring cells in the neural network interact and compete one another, and finally self-adapte to the external environment to become the special detectors, which are capable of measuring different signals. This is just the content of the SOFM. After learning from samples, the SOFM neural network calculation procedure gives a neuron output array. According to the output array, we can produce a two-dimensional feature map. The more the neuron functions resemble one another, the closer the distances between neurons; conversely, the more the neuron functions are different, the longer the distance between neurons. This process is realized through the competition among neurons. As a consequence, the randomly arranged inputs are reorganized in order. In the adjusting process of the linkage weight, the distribution of the weight can be similar to that of the input samples[7,8]. In the data clustering, the SOFM net carries out the online cluster process in the input model, in which the SOFM net exerts a constraint of the side feedback vicinity on the neuron of the output layer so that the topological nature of the input multi-bands data is clustered into the output layer of neuron weights. The cluster centers are represented by their weights. Through competing and learning rules, the weight can

also be updated. In this clustering process, both the new neuron weights and those of the side feedback vicinity are updated. The size of the side feedback vicinity reduces gradually during the iteration. The SOFM net structure is presented in fig. 1. When the process is finished, the input data are partitioned into non-intersected classes. In this way, the similarity of the individuals in the same class is stronger than that in different classes.

Fig. 1. Self-organizing feature map structure.

The SOFM net training steps are as the following: Let wj be the linkage weight vector linking neuron j of the output vector and node i of the input vector, and x = {x1, x2, , xn} stands for the input vector, n is the dimension (input satellite bands). Network training comprises two steps, namely coarse tune and fine tune. The coarse tune is the self-organizing competition and learning process and belongs to the unsupervised clustering process. Specifically, it is accomplished by the following steps: Step 1. For each neuron, the weight is randomly initialized to a real value within the range from 0.0 to 255.0. Step 2. Feed the network with an input vector x, the distances of vector x to all neurons are computed by formula n a j = ( x i wij ) 2 i =1
0.5

= || x w j || .

(1)

Step 3. Find out the neuron that has the minimum distance to the input vector x, and update

654

Science in China Ser. D Earth Sciences

the weights by formula wij (t + 1) = wij (t ) + (t ) | xi (t ) wij (t ) |, if j N c j (t ); w (t + 1) = wij (t ), if j N c j (t ), ij (2) where N c j (t ) is the side vicinity of the winner neu-

cation (LVQ) algorithm, which consists of the following steps: Step 1. After a training input vector x is randomly selected, c is identified under the condition of minimum ||xwc||. Step 2. The LVQ algorithm is used, if x and wc is in the same class, wc is updated using formula wc(t + 1) = wc(t)+ (t)[xi(t)wc(t)]; otherwise, using wc(t + 1) = wc(t) (t)[xi(t)wc(t)]. As i is not equal to c, wi is changed by wi(t + 1) = wi(t). (5) The learning rate is a small positive value, decreasing with the iteration until 0.001. Step 3. The iteration is stopped when it comes to the maximum times; otherwise, return to step 1.
2 ASTER data and preprocessing

ron at time t, j is the neuron of the output layer; (t) is the learning rate that decreases with t, and its initial value is set as 0.0 < (t) <1.0. Step 4. Feed new inputs and repeat steps 2 and 3 until the network converges. Step 5. For each input vector, label it as its class and label each neuron of the output layer as its cluster by majority voting principle. Learning rate (t) decreases gradually with time: (t) = (t1) , the initial value of can be chosen from 0.5 to 0.9. After the competition and learning process are finished, the input data are grouped into non-intersected clusters. Every cluster is represented by its centroid. Such centers are called template vectors or coding vectors. With regard to the input vector, the corresponding coding vector, instead of the input vector itself, is applied in the next step, i.e. the fine tune. This method is called the vector quantification. The fine tune is accomplished by the learning vector quantifi-

(3)

(4)

The band numbers and spectral range of the ASTER and ETM+ data are compared in table 1. The spatial resolutions of the ASTER data bands 1, 2, 3N, bands 49 and bands 10 14 are 15 m , 30 m and 90 m, respectively. The selected test area in this study is located in
Landsat ETM+ range spectral/m 0.450.52 0.520.60 0.630.69 0.760.90 1.551.75 2.082.35 10.412.5

Table 1 Comparison of ASTER data and ETM+ data Spectral band 1 2 3N 4 5 6 7 8 9 10 11 12 13 14 ASTER range spectral/m 0.520.60 0.630.69 0.760.86 1.601.70 2.1452.185 2.1852.225 2.2352.285 2.2952.365 2.362.43 8.1258.475 8.4758.825 8.9259.275 10.2510.95 10.9511.65 band 1 2 3 4 5 7 6

Visible Visible Near infrared Short infrared Short infrared Short infrared Short infrared Short infrared Short infrared Thermal Thermal Thermal Thermal Thermal

Classification based on wavelet & neural network

655

the Duliu Jian River in the Dagang District, Tianjin Municipality. The ASTER data from the study area were acquired on August 20th in 2000 with 4096 4096 pixels. The data from bands of 1, 2, 3N with 15m resolution and bands 5, 7 and 9 with 30-m resolution are currently available. Fig. 2 is a composite map of bands 3N, 2 and, 1. In order to enhance the spectral and spatial resolution of bands 5, 7, 9, the data of bands 5, 7, 9 (30-m resolution) was fused with the first principle component (PC) of the data of bands 1, 2 (15-m resolution) using the wavelet fusion algorithm.

and spectral resolutions. It can be seen that there is no change in the low frequency part of bands 5, 7 and 9 after and before the image fusion, but some of the high frequency parts have been replaced by the high frequency parts of the first PC of bands 1 and 2. So, as we use bands 1, 2 and 3N and the fused bands 5, 7 and 9 in the classification, the weights of bands 1 and 2 are not dominant. Figs. 3 and 4 highlight the black box area in fig. 2 using the data of bands 5, 7 and 9 before and after the wavelet fusion. It is clearly seen that the spatial and spectral resolutions are greatly improved after the wavelet fusion.

Fig. 2. The composite of the ASTER bands 3N, 2 and 1 of the Duliu-Jian River in the Dagang District, Tianjin. Fig. 3. Composite image of bands 9, 7 and 5 before wavelet fusion.

The wavelet fusion procedure is as follows: Firstly, the data of bands 5, 7 and 9 are re-sampled into 15-m resolution[9] based on the nearest neighborhood method; then, the wavelet transformation is carried out to the first PC component and the re-sampled data of bands 5, 7 and 9. For the wavelet transformation, the high frequency part of the high resolution data (first PC) and that of the low resolution data in all scales and directions are compared using the maximum local variance principle. Then, one of them with the maximum local variance is taken to form the new high frequency component. The inverse wavelet transformation is performed to the low frequency components of the low resolution data and the newly formed high frequency components. Consequently, we can obtain the data of bands 5, 7 and 9 with the enhanced spatial

Fig. 4. Composite image of bands 9, 7 and 5 after wavelet fusion.

656

Science in China Ser. D Earth Sciences

Several field observations and tests were carried out in the Duliu-Jian River, Dagang District, Tianjin during the periods of locusts break out in June of 2001 and June of 2002. During the field observation, the land use map was selected on the ASTER composite map as training sites and testing data sets of the image. Among the training data sets, some data for the clustering of the suspended materials in the sea water were selected according to the report of International Ocean Colour Coordinating Group (IOCCG)[10]. Table 2 is the description of data sets from field observation and test sites.
Table 2 Descriptions of training data sets and testing data sets Training Testing data Class Cover type Description data set set 1 river, ponds river, ponds 2898 622 sea water, salt 2 sea 4832 745 ponds dense sea 3 dense suspends 4234 776 suspends 4 light suspends light sea suspends 4200 648 5 6 7 wet land towns vegetation wet land buildings, houses roads, revetment reeds, crops, trees total 4039 3886 2300 26398 654 773 846 5024

6 bands can be effectively clustered into the output layer. Some wrong classified points are shown in right bottom corner, note that F (town) is located in C (densely suspended materials) area. It is because there are many types of land cover in towns, which make a complexity of spectra, and also because the spectra of some materials in town is very similar to those of suspended materials in the sea water.

Fig. 5. The 2525 SOFM structure shows 7 land cover types.

Experiment and results

The Kohonen self-organizing feature map structure in the experiment is illustrated as follows: the input layer has 6 nodes and each of them corresponds uniquely to one of the ASTER bands 1, 2 and 3N and bands 5, 7 and 9, the latter three bands are produced by the wavelet fusion. The 2-dimensional structure of the output layer neurons is set as 25 25, since, according to the previous studies, such a structure meets a good classification result in terms of the accuracy[3]. The initial value of the learning rate is set here to 0.9, which reduces gradually with the training time until 0.001. The maximum iteration time is 2000. In the output layer, the initial size of the side feedback vicinity is set to 14. Fig. 5 shows the distribution of 7 classes of the output layers in 25 25 structure, which is generated by the SOFM procedure. In the figure, symbol o means that there is no corresponding input vector. It is seen in fig. 5 that the input training data of

The classification was conducted after the SOFM net training. There is no threshold set during the process so that no unclassified pixel occurs. To the purpose of comparison, the same training data sets are classified with the maximum likelihood method (MLH). The input bands are the same bands 1, 2, 3N and bands 5, 7, 9 produced by the wavelet fusion. Likewise, there is no threshold set during the processing. The resultant images are shown in figs. 6 and 7. To test the classification accuracies of the two methods, a set of independently selected test data were used. The test results are set in a confusion matrix, as shown in tables 3 and 4. The total accuracy of SOFM is 90.36335%, whereas that of MLH is only 83.2148%. Clearly, the accuracy of the SOFM method is higher than that of the MLH method. The MLH method obviously overestimates the extent of the town . For example, 666 pixels that should have been in other cluster are labeled as the town. Such wrongly clustered pixels in the MLH classification account for 46.38% of all the town 1436 pixels. Compared with the results of the SOFM method (see fig. 6), the overestimation for the town in fig. 7 is even clear.

Classification based on wavelet & neural network

657

Fig. 6.

SOFM classification image.

Fig. 7. MLH classification image.

Table 3 Confusion matrix for SOFM classification resulta) Class River, pond Sea Dense suspend Light suspend Wetland 7 2 0 0 640 0 5 654 97.86 Town 5 0 0 16 0 752 0 773 97.28 Vegetation 1 0 1 0 114 11 719 846 84.99 User accuracy (%) 96.18 83.85 84.09 96.96 84.43 94.00 99.31

River, pond 478 6 0 0 Sea 140 737 0 0 Dense suspend 0 1 740 138 Light suspend 0 0 0 510 Wetland 4 0 0 0 Town 0 1 36 0 Vegetation 0 0 0 0 Total 622 745 776 648 Ground 76.80 98.93 95.36 78.70 Accuracy (%) a) Total sample, 5064; accurate, 4576; total accurate, 90.36335%.

Table 4 Confusion matrix for MLH classification resulta) Class River, pond Sea Dense suspend 0 0 573 0 0 203 0 776 Light suspend 0 0 0 504 0 144 0 648 77.78 Wetland 70 0 0 0 478 90 16 654 73.09 Town 1 0 0 2 0 770 0 773 99.61 Vegetation 0 0 0 0 82 96 668 846 78.96 User accuracy (%) 87.91 99.53 100.00 99.60 85.36 53.62 97.66

River, pond 589 10 Sea 3 632 Dense suspend 0 0 Light suspend 0 0 Wetland 0 0 Town 30 103 Vegetation 0 0 Total 622 745 Ground 94.69 84.83 73.84 Accuracy (%) a) Total sample, 5064; accurate, 4214; total accurate, 83.2148%.

Conclusion and discussion

The article introduces a classification method, combining the wavelet fusion and the neural network methods. The wavelet transformation is used to fuse the ASTER 30-m resolution and 15-m resolution image data. As a consequence, the spatial resolutions of the new resultant images are unified to a resolution of 15 m for the visible, near infrared, and short infrared

bands. Then, the Kohonen self-organizing neural network is applied with 26398 input training points in 6 nodes to the classification. Training areas cover 7 different land types. The total accuracy reaches 90.36%. Compared with the result of the MLH classification (the total accuracy is 83.21%), the accuracy of the Kohonen classification increases by about 7% and, particularly, is almost as twice as that of the MLH re-

658

Science in China Ser. D Earth Sciences


2. Kohonen, T., Self-organized formation of topologically correct feature maps, Biological Cybernetics, 1982, 43: 5969. 3. Ji, C. Y., Land-use classification of remotely sensed data using Kohonen self-organizing feature map neural networks, Photogrammetric Engineering & Remote Sensing, 2000, 66(12): 1451 1460. 4. Chui, C. K., An Introduction to Wavelets (in Chinese), Xian: Xian Jiaotong University Press, 1995, 198242. 5. Mallat, S. G., A theory of multi-resolution signal decomposition: The wavelet representation, IEEE Transaction on Pattern Analysis and Machine Intelligence, 1989, 11(7): 674693. 6. Jorge, N., Xavier, O., Octavi, F. et al., Multi-resolution-based image fusion with additive wavelet decomposition, IEEE Transactions on Geosciences and Remote Sensing, 1999, 37(3): 1204 1211. 7. Kohonen, T., The self-organizing map, Proceedings of the IEEE, 1990, 78(9): 14641480. 8. Yuan Zengren, Artificial Neural Network and Application (in Chinese), Beijing: Tsinghua University Press, 1999, 319329. 9. Liu Qiang, Liu Qinhuo, Xiao Qing et al., Study on geometric correction of airborne multi-angular imagery, Science in China, Ser. D, 2002, 45(12): 10751086. 10. Li Sihai, The Report of International Ocean Colour Coordinating Group (IOCCG) (in Chinese), Beijing: Ocean Press, 2002, 7296.

sult in the town. With the development of the satellite technology, the effective loading of different resolution bands of the sensor has become a very important research topic. The combination of the wavelet fusion and the SOFM network method presented in this study is of a potential to deal with this sort of sensor data.
Acknowledgements This work was supported by Chinas Special Funds for 2008 Olympic Game (Grant No. 2002BA904B07), the Knowledge Innovation Project of IRSA (Grant No. CX020014), and Chinas Special Funds for the State Key Basic Research Project (Grant No. 2002CB412500). The authors would like to thank Dr. Buheaosier from Hokkaido Institute of Environmental Sciences in Japan for providing the ASTER data and kind help, and to thank Dr. Wang Changlin and Bueh Cholaw for making corrections for the English version of the paper. Finally the author would like to thank the editors for improving the article.

References
1. Zhang Xiaocan, Huang Zichai, Zhao Yuanhong, Remote Sensing Image Processing (in Chinese), Hangzhou: Zhejiang University Press, 1997, 222232.

You might also like