You are on page 1of 7

VNU Journal of Science: Comp. Science & Com. Eng., Vol. 32, No.

3 (2016) 32-38

Some Improvements of Fuzzy Clustering Algorithms


Using Picture Fuzzy Sets and Applications
for Geographic Data Clustering

Nguyen Dinh Hoa1,*, Le Hoang Son2, Pham Huy Thong2


1
VNU Information Technology Institute, 144 Xuan Thuy, Cau Giay, Hanoi, Vietnam
2
VNU University of Science, 334 Nguyen Trai, Thanh Xuan, Hanoi, Vietnam

Abstract

This paper summarizes the major findings of the research project under the code name QG.14.60. The
research aims to enhancement of some fuzzy clustering methods by the mean of more generalized fuzzy sets.
The main results are: (1) Improve a distributed fuzzy clustering method for big data using picture fuzzy sets;
design a novel method called DPFCM to reduce communication cost using the facilitator model (instead of the
peer-to-peer model) and the picture fuzzy sets. The experimental evaluations show that the clustering quality of
DPFCM is better than the original algorithm while ensuring reasonable computational time. (2) Apply picture
fuzzy clustering for weather nowcasting problems in a novel method called PFS-STAR that integrates the STAR
technique and picture fuzzy clustering to enhance the forecast accuracy. Experimental results on the satellite
image sequences show that the proposed method is better than the related works, especially in rain predicting. (3)
Develop a GIS plug-in software that implemented some improved fuzzy clustering algorithms. The tool supports
access to spatial databases and visualization of clustering results in thematic map layers.
Received 20 June 2016, Revised 04 October 2016, Accepted 18 October 2016
Keywords: Spatial clustering, fuzzy clustering, distributed clustering, picture fuzzy set, weather nowcasting,
spatio-temporal regression.

1. Introduction* (GIS) has many challenges. The database of


GIS contains large amounts of data, which
Geographic data clustering problems work increases day by day; the data volume to be
with spatial data. These problems have many processed is often large, even very large [3].
important applications in the economic Attribute data fields are often multi-
development and social activities, from the geo- dimensional and correlated. Clustering multi-
economic analysis, marketing analysis, dimensional data, especially in the case of large
environmental resources management to data sets is a difficult problem.
processing the satellite remote sensing images, Attribute data in GIS are varied, may be
weather forecasting, pollution predictions, collected from various sources and have
diseases preventions, etc ... However, mining different forms and representations; Data can be
geographic data to extract information from the quantitative or qualitative (classified in
database of a geographic information system categories), multimedia data (meteorological
_______ images, remote sensing images). Classification
*
Corresponding author. E-mail.: hoand@vnu.edu.vn
32
N.D. Hoa et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 32, No. 3 (2016) 32-38 33

in categories is inherently fuzzy. We want to speed of convergence in the distributed


classify, by example, a region as "flat", clustering scenario in particular. The
"moderate slope," or "very steep". The development of a tool for data clustering and
interpretation of remote sensing images based integrating it into the geographic information
on the different colors is another example of the
systems as a utility to assist users is also a task
fuzzy nature of clustering geographic data.
It is difficult in general to get the consistent to be completed by the project team.
clustering geographic data and the unique The rest of this paper is organized as
interpretation of results. Fuzzy approach aims follows. Section 2 describes the distributed
to overcome some disadvantages of clear (hard) fuzzy clustering method for big data using
clustering for better quality. Using fuzzy set we picture fuzzy sets called DPFCM. An
can make suitable modifications to traditional application of picture fuzzy clustering for
clear clustering methods and apply to weather nowcasting problems in a novel
processing geographical data. method called PFS-STAR is presented in
section 3. Section 4 introduces the GIS plug-in
Recently, many researches focus on fuzzy
tool SpatialClust that implements some
clustering to handle geographic data (see the improved fuzzy clustering algorithms.
review in [5, 11, 13]). Several research groups Summary and conclusion follows in section 5.
in Vietnam and particularly in VNU Hanoi have
published the works on data clustering, in
which there are some researches in the 2. Distributed Clustering Method Using
direction of clustering geographical data. The Picture Fuzzy Sets - DPFCM
promising results on fuzzy clustering of
geographic data had been published by the 2.1. Fuzzy clustering with picture fuzzy sets
research team at the Center for High
Performance Computing, University of Science, The concept of picture fuzzy sets [4] is
suggested in the case of opinion polls. The
VNU [7,8,9]. The authors have improved fuzzy
voter opinions on the decision in question can
clustering algorithm through the expansion of be one of four types: yes, no, abstain, and
the fuzzy set concept. Instead of the classic refusal to answer. A picture fuzzy set is then
fuzzy set, the process of clustering uses the new defined as a collection of elements x, each
fuzzy concept such as the intuitionistic fuzzy associated with three measures S(x), S(x),
set [1.16] and more recently the picture fuzzy S(x) as follows:
set [4]. S = {(x, S(x), S(x), S(x))};
Research project "Development of These measures subject to the constraints:
advanced data clustering algorithms for S(x)[0,1] , S(x)[0,1], S(x)[0,1].
geographic information systems and S(x)+ S(x)+ S(x) [0,1].
applications" under the code name QG.14.60 S(x) is called the positive degree of
membership of x, S(x) is the neutral degree
aims to continue the researches in this direction.
and S (x) is the negative degree. The refusal
The application of expanded fuzzy concept as degree of an element is calculated as S(x) = 1-
intuitionistic fuzzy sets, picture fuzzy sets will (S(x)+ S(x)+ S(x)).
allow to enhance the quality of clustering. On In [15] the authors have proposed a picture
the other hand, to handle large data sets in fuzzy clustering algorithm, using the concept of
clustering geographic data for the real life picture fuzzy sets instead of the classical fuzzy
applications, it is necessary to improve set. The algorithm bases on the well-known
performance of the algorithms, to increase the fuzzy clustering algorithm FCM [2], but besides
34 N.D. Hoa et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 32, No. 3 (2016) 32-38

the positive factors ukj, the negative and neutral k 1, N , j 1, C ,


factors also included in each steps to calculate
the membership degree of the data point j to the 1
cluster k. The objective function to minimize is
kj 1 u kj kj 1 u kj kj

,
(9)
the following:
k 1, N , j 1, C .
N C
2
N C - Step 3: Stop the loop if the total changes
J ukj 2 kj X k V j kj logkj kj min
m
(1)
k 1 j 1 k 1 j 1 of variables in updating step less than the
predefined threshold:
The variables ukj , kj , kj subject to the
u (t ) u (t 1) (t ) (t 1) (t ) (t 1)
constraints:
or the step counter greater than maxSteps;
ukj , kj , kj 0,1 , (2) otherwise, return to Step 1.

ukj kj kj 1 , 2.2. DPFCM - Distributed fuzzy clustering


(3) using picture fuzzy sets
C

u 2 1 ,
kj kj (4) In [17] the authors have proposed a fuzzy
j 1 clustering algorithm CDFCM for distributed
C
kj computing environments with the peer-to-peer
kj 1 , k 1, N , j 1, C
C
(5) communicational model (P2P). In this
j 1 algorithm, the cluster centers and the fuzzy
membership factors of data points are
The steps of algorithm are as follows: calculated at every peer site and then updated in
- Initial step: t 0 ; randomly initialize the each iteration using only the results of the peer
(t ) (t ) (t ) neighbors. This process is repeated until a
variables u kj , kj , kj ( k 1, N , j 1, C )
stopping criterion is satisfied. CDFCM is
so that the conditions (2-3) are satisfied; considered as one of the most effective fuzzy
- Step 1: t= t+1; calculate the cluster clustering algorithms for distributed
centers Vj using the formula below computing_environments.
N By analysis in details we realize that
u 2
k 1
kj kj
m
Xk
, j 1, C ,
communication costs for each iteration of the
Vj N (6) algorithm CDFCM is high, approximately p.nloc,
u 2
m
kj kj where p is the number of peers and nloc is the
k 1
average number of neighbors of one peer. Also,
- Step 2: Update the ukj , kj, kj by the because the algorithm only use the nearby local
formula (7-9) results to update in each iterations, so the final
clustering result may not be of highest quality.
1 , Our idea of improving the algorithm
u kj 2
C X k V j m1 CDFCM is that we can reduce communication
2
i 1
kj

X k Vi
(7) costs and improve the quality of clustering
results through using the picture fuzzy

clustering and the facilitator model instead of
k 1, N , j 1, C , the peer-to-peer communicational model. The
kj
e 1 C proposed method is called DPFCM (distributed
kj C
1 ki , fuzzy picture clustering method).
C i 1 (8)
e
i 1
ki
- At the local level, each peer site performs
picture fuzzy clustering in each iteration;
N.D. Hoa et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 32, No. 3 (2016) 32-38 35

- At the global level, all the peer sites GLASS, IONOSPHERE, HABERMAN and
transfer the results to the unique master site HEART. The speed of convergence and the
which plays the role of a facilitator in the cluster validity measurements are evaluated.
communication process. Thus, in one updating The average number of iterations AIN is
step at the global level, the cost to complete the obviously better if smaller, where as the
communication process is of order of p. average classification rate ACR and the average
Moreover, the global information allows to normalized mutual information ANMI [6] are
improve the quality of clustering. the bigger the_better.
The experimental evaluation was conducted The table below compares the quality of our
clustering algorithm DPFCM with some other
upon the benchmark datasets from UCI
algorithms.
Machine Learning Repository, namely: IRIS,
F

Table 1. Clustering quality of algorithms [10]

The results presented in the table show that The above results were published in the
the clustering quality of DPFCM is mostly international scientific journal "Expert Systems
better than those of three distributed clustering with Applications" [10].
algorithms, namely CDFCM, Soft-DKM and
PFCM. It is also better than the traditional
centralized clustering algorithm FCM, and is a 3. Application of picture fuzzy clustering in
little worse than the centralized weighted analysis of meteorological images for
clustering WEFCM. There are some cases, for weather nowcasting
example, of the IONOSPHERE and the
HEART dataset, DPFCM results in clustering One of the methods of predicting the
quality of the same order or a little worse than weather, called weather nowcasting, is on the
CDFCM. basis of analysis of the satellite images
For the speed of convergence, the sequence by combining the spatio-temporal
comparison of AIN of DPFCM with the others autoregressive (STAR) model with fuzzy
shows the disadvantage of DPFCM as expected, clustering. There are publications in this
but the differences of AINs are not much. research domain. Recently Shukla and
36 N.D. Hoa et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 32, No. 3 (2016) 32-38

colleagues [14] have proposed a number of and Allied Sciences (GIS-IDEAS)" [12].
technical improvements to raise the accuracy. Table 2. Comparison of RMSE and computational
However, because using classical fuzzy sets, the time of PFC-STAR and the method
image areas of ambiguous interpretation or lack of Shukla et al [12]
of clarity have the negative impacts to the
prediction result. Picture fuzzy clustering [15] Computational
RMSE (%)
using more advanced fuzzy concept has been time (sec)
shown that is better than the traditional fuzzy Shukla Shukla
Data
clustering. Our idea is advancing the research of PFC- et al. PFC- et al.
STAR (2014)s STAR (2014)s
Shukla et al, through combining the primary
method method
STAR techniques with picture fuzzy clustering Malaysia 26.77 27.11 362.745 359.88
to create a new weather prediction method, Luzon
called Picture Fuzzy Clustering - 33.61 33.45 345.672 343.43
Philippines
Spatiotemporal autoregressive (PFC-STAR). Jakarta
30.12 32.04 342.76 339.97
We hope that the combination can improve the Indonesia
quality of the prediction results. The proposed
PFC-STAR method involves three steps:
4. Developing data clustering tool as a plug-
- The pixels of satellite images (training
in for GIS
samples) are divided into groups by using
picture fuzzy clustering algorithm proposed
For the convenience of users in mining
in_[15]. geographical data, a data clustering engine
- All the elements of these clusters in
should be developed and integrated into GIS to
training samples are then labeled and filtered
support direct access of spatial database for
using the Discrete Fourier Transform to clarify reading input data and displaying the results on
non-predictable scale to increase the time range
the map layers.
of predictability. MapWindow is an open source GIS
- Finally, the next sequence of images are
software that Windows users are familiar with
predicted through spatio-temporal auto-
and it is currently being developed and the
regression method, which allows the weather
latest version released continuously.
forecast for the chosen geographic area in a
MapWindow support plug-ins in the form of
short time ahead.
dynamic link libraries (.dll *), and the
- The experimental evaluation of the
development environment such as Visual
proposed method was conducted on the
Studio Community Edition is available for free
personal computer of 2 GB RAM, 2.13 GHz
download. This tool supports using the
core 2 Duo, upon the data sets, which is the
language C# and dot.NET frame. Our
sequence of satellite images of the Southeast
implementation of the proposed algorithms to
Asia region. Each data set includes 5 satellite
run experimental evaluation is conducted using
images taken over a time period from 9:30 to
C / C ++, therefore the Visual Studio
13:30, of 100 x 100 pixels in size. Comparison
development environment in the most suitable
of the results showed that the method proposed
choice to put our source code into.
here is better than the relevant methods of
The plug-in named SpatialClust is a
weather nowcasting, especially with higher
clustering tool module for geographical data,
precision of the rain-rate regression.
which deployed several fuzzy clustering
The above results have been presented and
algorithms with improvements that our team
published in the Proceedings of the
has proposed as presented above. Restrictions
International Symposium on Geo-informatics on computational resources of a plug-in does
for Spatial Infrastructure Development in Earth
not allow to implement the distributed
N.D. Hoa et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 32, No. 3 (2016) 32-38 37

algorithms or to process large data sets. Hence, clustering algorithms, distributed fuzzy
only some appropriate algorithms are included clustering to process large data sets in order to
in the tool, namely: FCM, NE, FGWC, apply for geographical data clustering. The
CFGWC, IPFGWC, MIPFGWC. The plug-in results contribute to better address real-world
supports direct access of spatial database for problems we meet in many application areas.
reading attribute values and displaying the The distributed fuzzy clustering algorithm
resulting clusters in different colors on the map. to handle large data sets using picture fuzzy sets
Input: data file format is *.csv (coma called DPFCM has improved overall clustering
separated values). All the GIS software have to quality in comparison with the algorithm of
support importing and exporting data in the Chen and colleagues [17]. Clustering quality of
*.shp format of one map layer to the *.csv DPFCM is better than some clustering
format. algorithms of the same type, but the
computational time does not add much. The
new weather nowcasting method PFC-STAR
using picture fuzzy sets instead of classical
fuzzy sets has allowed raising the quality of
predictions in comparison with the method of
Shukla et al [14], especially in predicting rain-
rate. We can conclude that the use of picture
fuzzy clustering actually had a positive impact
on the quality of the clustering results for the
problems related to the inherently fuzzy
concepts.
The software tool for data clustering
integrated into MapWindow as a plug-in that
performs typical fuzzy clustering algorithms
Picture 1. Dialog box for choosing input and the improvements proposed in our
data and algorithm.
researches will help to promote practical
applications of geographic data mining in
Output: there are two types:
various domains.
1. Output as text file (*.txt or plain text) to
provide enough detail for the purposes of
analysis and evaluation of algorithms or for the Acknowledgements
subsequent treatment, if any.
2. Displaying visually on the map: in The authors would like to thank the
parallel with printing the results to a text file, colleagues for comments through discussions in
the tool allows updated cluster labels directly to the scientific seminars which help to correct the
the cluster column of database beneath and by errors and to complete the results achieved. We
setting GIS functionalities users can show also express our sincere thanks to VNU Hanoi
visualization of clusters on maps. For this for funding the research project under the code
purpose, the properties table of map layer must name QG.14.60 and for other supports to
have the last column named CLUSTER. conduct the research.

5. Summary and conclusions References

The research we carried out in the research [1] Atanassov, K. T. (1986). Intuitionistic fuzzy sets.
Fuzzy Sets and Systems, 20, 87-96.
project has contributed to improve fuzzy
38 N.D. Hoa et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 32, No. 3 (2016) 32-38

[2] Bezdek, J.C., R. Ehrlich, et al (1984), FCM: the [11] Neethu C V, Subu Surendran, Review of Spatial
fuzzy c-means clustering algorithm, Computers Clustering Methods, International Journal of
and Geosciences, 10, pp.191-203 Information Technology Infrastructure, Volume
[3] Brinkoff, T., Kriegel, H.-P. (1994), The Impact 2, No.3, May - June_2013.
of Global Clustering on Spatial Database [12] Nguyen Dinh Hoa, Pham Huy Thong, Le Hoang
Systems, Proceedings of the 2th VLDB Son, Weather Nowcasting from Satellite Image
Conference, Santiago, Chile, pp. 168-179. Sequences Using Picture Fuzzy Clustering and
[4] Bui Cong Cuong, Vladik Kreinovich, Picture Spatial-temporal Regression, International
Fuzzy Sets - a new concept for computational Symposium on Geoinformatics for Spatial
intelligence problems, Proceeding of 2013 Third Infrastructure Development in Earth_and Allied
World Congress on Information and Sciences (GIS-IDEAS), Danang, Vietnam,
Communication Technologies (WICT 2013),_1-6. December, 7th-9th , 2014, pp. 137-142
[5] Deepti Joshi, Polygonal Spatial Clustering, [13] M. Perumal, B. Velumani, A. Sadhasivam, and
Ph.D. Dissertation, University of K. Ramaswamy, (2015), Spatial Data Mining
Nebraska,_2011. Approches for GIS - A Brief Review, Conference
[6] Huang, H. C., Chuang, Y. Y., & Chen, C. S. paper, January 2015, Springer International
(2012), Multiple kernel fuzzy clustering, Publishing Switzerland.
IEEE_Transactions on Fuzzy Systems, 20(1), [14] Shukla, B. P., Kishtawal, C. M., & Pal, P. K.
120-134. (2014),Prediction of Satellite Image Sequence
[7] Le Hoang Son, Bui Cong Cuong, Pier Luca Lanzi, for Weather Nowcasting Using Cluster-Based
Hoang Anh Hung (2011) Data Mining in GIS: A Spatiotemporal Regression, IEEE Transactions
Novel Context-Based Fuzzy Geographically on Geoscience and Remote Sensing, 52(7),
Weighted Clustering Algorithm. International 4155 - 4160.
Journal of Machine Learning and Computing. [15] Thong, P.H., Son, L.H. (2014). A new approach
[8] Le Hoang Son (2011), Nguyen Dinh Hoa, Pier to multi-variables fuzzy forecasting using picture
Luca Lanzi, and Bui Thi Huong Lan, A fuzzy clustering and picture fuzzy rules
Combination of Clustering Techniques and interpolation method, Proceeding of 6th
Fuzzy Control in 2D Polygon Determination for International Conference on Knowledge and
the Terrain Splitting and Mapping Problem, Systems Engineering (KSE 2014), October 9-11,
International Journal of Computer and Electrical 2014, Hanoi, Vietnam, 679 - 690.
Engineering 3(5), pp. 682 - 689. [16] Visalakshi, N. K., Thangavel, K., & Parvathi, R.
[9] Le Hoang Son, Bui Cong Cuong, Pier Luca (2010). An intuitionistic fuzzy approach to
Lanzi, Nguyen Tho Thong (2012), A Novel distributed fuzzy clustering, International Journal
Intuitionistic Fuzzy Clustering Method for Geo- of Computer Theory and Engineering, 2 (2),
Demographic Analysis, Expert Systems with 1793-8201.
Applications. [17] Zhou, J., Chen, C., Chen, L., & Li, H. (2013). A
[10] Le Hoang Son (2015), DPFCM: A novel collaborative fuzzy clustering algorithm in
distributed picture fuzzy clustering method on distributed network environments, IEEE
picture fuzzy sets, Expert Systems with Transactions on Fuzzy Systems.
Applications, 42 (2015) pp. 51-66. http://dx.doi.org/10.1109/TFUZZ.2013.2294205.

You might also like