Professional Documents
Culture Documents
Viorica Rozina Chifu, Cristina Bianca Pop, Ioan Salomie, Mihaela Dinsoreanu,
Tudor David, Vlad Acretoaie
Department of Computer Science
Technical University of Cluj-Napoca
26-28 Baritiu str., Cluj-Napoca, Romania
Email: {Viorica.Chifu, Cristina.Pop, Ioan.Salomie, Mihaela.Dinsoreanu}@cs.utcluj.ro
ABSTRACT
The behavior of biological swarms offers us many clues regarding the design of
efficient optimization methods applicable in various domains of computer science.
Inspired by the behavior of ants, we propose in this paper two methods for
semantic Web service discovery and composition. For efficiently discovering the
services that will be involved in the composition process we choose to organize the
available services in clusters according to the similarity between their semantic
descriptions. To semantically compose services we propose a graph model that
stores all the composition solutions for a specified user request. We further apply
an ant-inspired selection method on the composition graph to identify the optimal
composition solution according to user constraints. We have tested and validated
the two ant-inspired methods on an extended version of the SAWSDL-TC
benchmark collection.
Pdrop (ant, ant.s) min(1.0, f 4 ) (7) If the ant decides to drop the service
(Decide_Drop), it places the service in a grid element
(Drop_Service) and then moves in the grid (Move)
In Eq. (7) we have adapted the definition of f for until it decides to pick-up another service
the case of semantic Web services: (Decide_Pick). Just as in the case of dropping a
service, the decision to pick-up a service is taken
1 ( s, s j ) according to the probability Ppick defined as follows
* (1 ) ( s, s j ) (8)
f N jL , if (1 ) 0 [9]:
0, otherwise
1 graph which contains the service of the input node, a
Ppick (ant, s) min( 1.0, ) (11) set of services (one service from each cluster in the
f2 graph) and the service of the output node.
where f is defined as in Eq. (8). At the end of 4.2 The Composition Algorithm
Algorithm_2, once similar services are within a
neighborhood, a cluster retrieval algorithm is run, The composition algorithm (Algorithm_3) takes
which identifies the neighborhood of services and as inputs (i) the user request, (ii) the set of services
organizes them into clusters (Retrieve_Clusters). available in a repository and (iii) two threshold
The neighborhood is established using a cluster values, one for intra-cluster matching and another
retrieval distance (CRD) between two services. one for concept matching. First the composition
graph is initialized (Initialize_Graph) based on the
4 THE COMPOSITION GRAPH user request so that it contains only the input node.
Then, the algorithm iteratively performs the
In this section we propose a composition graph following two operations: (1) a set CL of service
model which represents the search space for a clusters are discovered (Discovery) based on the
selection method that identifies the optimal semantic matching between the concepts describing
composition according to the user constraints. the inputs of these services and the ones describing
the outputs of the services that are already in the
4.1 The Composition Graph Model graph, (2) for each discovered cluster c a
corresponding node is created and added to the
The composition graph is a directed acyclic composition graph and as a result the set of graph
graph in which a node is represented as a cluster of edges is updated (Update_Edges). A new edge is
similar services and an edge denotes a semantic added between two nodes if there is a semantic
relationship between two clusters. matching between the clusters associated to these
In our approach, we consider three types of nodes.
graph nodes: several service nodes, one input node
and one output node. A service node contains a -----------------------------------------------------------------------
service cluster, with the property that there is a high Algorithm_3: Build_Composition_Graph
degree of semantic match (DoM) between the -----------------------------------------------------------------------
services in the cluster. We allow services with Input: req = (in, out) - the user request; CLSR - set of
similar functionalities, inputs and outputs to be part available service clusters; clTh – intra-cluster matching
threshold; cTh - concept matching threshold;
of the same cluster. The services of a cluster may
Output: G = (V, E) - the composition graph containing
have the same functionality and also may differ with a set of nodes V and a set of edges E;
at most one input/output parameter. begin
The input node contains a cluster with a single vin = Generate_Input_Node(req.in)
service which only has outputs representing a set of vout = Generate_Output_Node(req.out)
ontology concepts describing the user provided G = Initialize_Graph(vin)
inputs. The output node on the other hand, contains while ((vout G.V ) or (!Fixed_Point(G))) do
a cluster with a single service having just inputs CL = Discovery(CLSR, G.V, cTh, clTh)
representing the concepts describing the user foreach c CL do
requested outputs. G.V = G.V U {c}
A directed edge links a pair of service clusters if G.E = Update_Edges(G.V, clTh)
there is a high degree of semantic match between the end for
outputs of one of the clusters and the inputs of the end while
if (vout G.V ) then return null
other cluster.
G = Prune(G)
The construction of the composition graph starts return G
with the input node. Then, the graph is expanded end
with new clusters of services which have a high -----------------------------------------------------------------------
degree of match with the cluster contained in the
input node. This way, the expansion of the graph is The algorithm ends either when the user
performed until all the inputs of the output node are requested output parameters are provided by the
satisfied. In the construction process, a service services added to the composition graph or when no
cluster is added only if the clusters already present in new service clusters have been added to the
the graph provide all the outputs required for its composition graph for a predefined number of
execution. The service clusters are provided by a iterations (Fixed_Point). If at least one composition
discovery method which is detailed in the following solution has been found we employ a pruning
sub-section. process.
A composition solution is a directed acyclic sub-
The discovery algorithm (Algorithm_4) takes as discovered. If the matching value is below the
input the following parameters: (i) the set CLSR of concept matching threshold cTh and none of the other
available service clusters returned by the ant-based conditions are met, the tested input is marked as
clustering algorithm, (ii) a set CLSG of already missing so as to not allow a second missing input for
discovered clusters that are part of the composition this cluster.
graph – initially, the list contains a generic cluster
with no inputs, (iii) a concept matching threshold cTh
and (iv) an intra-cluster matching threshold clTh. 5 SELECTING THE OPTIMAL SOLUTION
IN WEB SERVICE COMPOSITION
-----------------------------------------------------------------------
Algorithm_4: Cluster_Discovery
----------------------------------------------------------------------- For finding the optimal composition we adapted
Inputs: CLSR – set of available service clusters; CLSG – set the Ant Colony Optimization (ACO) meta-heuristic
of service clusters part of the composition graph; cTh – [5]. Our ant-inspired selection method identifies the
concept matching threshold; clTh – intra-cluster matching optimal composition solution encoded in the
threshold; composition graph. To establish whether a solution is
Output: CLSG – the updated set of service clusters part of optimal according to QoS user preferences and
the composition graph; semantic quality we define an appropriate fitness
Comments: function.
begin
foreach cl in CLSR do 5.1 Mapping ACO to the Problem of Selecting
if (cl not in CLSG) then the Optimal Composition Solution
add = true
missing = false
foreach cl.ini in cl.in do The ACO meta-heuristic relies on a set of
maxDoM = 0 artificial ants which communicate with each other to
foreach cl’ in CLSG do solve combinatorial optimization problems. Just as in
foreach cl’.outj in cl’.out do the ACO meta-heuristic we consider a set of artificial
if (DoM(cl.ini, cl’.outj)>maxDoM) then ants that traverse the composition graph in order to
maxDoM = DoM(cl.ini, cl’.outj)
find the optimal composition solution. In its search,
end if
end foreach an artificial ant has to choose at each step a new edge
end foreach that links the service where the ant is currently
if (maxDoM < cTh) then positioned with a new service. The choice is
if (missing == true or stochastically determined with the probability p [5]
DoM_In_Cluster(cl.in, cl) > clTh) then defined as follows:
add = false
else missing = true
end if ij ij
end if k
if c pq N ( s p ),
end foreach p i, j c pq N ( s P ) pq pq
if (add == true) then CLSG = CLSG U cl
end if 0 otherwise.
(12)
end foreach
return CLSG where N(sp) is the set of candidate edges (which may
end
------------------------------------------------------------------------ be added to the partial solution), ij represents some
The algorithm iterates the set CLSR provided by heuristic information and ij is the level of
the ant-based clustering algorithm, considering those pheromone on the edge leading to a candidate service.
clusters that are not already present in the The heuristic information associated to the edge
composition graph. A service cluster can be added to linking the service s1 where the ant is currently
the set CLSG even if it has at most one input that has positioned and the candidate service s2 is computed
no corresponding output in CLSG. The condition for with the following equation:
adding the discovered cluster to CLSG is that the
missing input is not representative for all the services
in the tested cluster. Once a candidate cluster from wQoS QoS ( s2 ) wMatch DoM services(s1 , s2 )
ij
CLSR has been selected, the best matching output wQoS wMatch
provided by the clusters in CLSG is selected for each
input of the candidate cluster. If the matching value is (13)
below the concept matching threshold cTh and either where:
the tested input is representative for the cluster or wQoS and wMatch are the weights representing the
there has already been an unmatched input for this importance of the QoS score compared to the
cluster (DoM_In_Cluster), the cluster will not be degree of match;
QoS represents the score of the candidate service algorithm returns a ranked set of composition
and is computed as a weighted mean of the QoS solutions, the first being the optimal one.
attributes considered in the underlying QoS
model; -----------------------------------------------------------------------
DoMservices represents the degree of match Algorithm_5: Ant_based_Selection
-----------------------------------------------------------------------
between services s1 and s2 (see sub-section 3.1). Input: noIt - the number of iterations; noAnts - the number
of artificial ants; G = (V, E) - the composition graph;
Initially, the level of pheromone corresponding to Output: SOL - the set of the best composition solutions;
each edge is assigned a constant value 0 . Within the Comments: Find_Min_Score – determines the lowest
score of a solution from SOL; Find_Max_Score –
search process the level of pheromone is updated in
determines the highest score of a solution from SOL; Score
the following two situations: (1) at each step made by
– returns the score of a composition solution;
an artificial ant in the composition graph – local Remove_Worse – removes a solution from SOL which has
update and (2) at the end of an iteration when an ant the lowest score; Add – adds a solution to SOL.
identifies a composition solution – offline update. begin
The local update is performed using the equation SOL =
defined in [5]. For performing the offline update we noItModif = 0
adapt the formula defined in [5] as follows: Ants = Initialize_Ants(noAnts)
while (noItModif < noIt) do
i, j (1 ) i, j i, j (14) noItModif = noItModif + 1
max = 0
1 foreach a Ants do
where ij , where Sbest is the best score of a result = Find_Solution(G, a)
S best minScore = Find_Min_Score(SOL)
composition solution found so far and is the maxScore = Find_Max_Score(SOL)
pheromone evaporation rate. if (minScore < Score(result)) then
SOL = Remove_Worse(SOL)
5.2 The Ant-based Selection Algorithm SOL = Add(SOL, result)
noItModif = 0
The selection algorithm (Algorithm_5) uses a end if
set of artificial ants to iteratively search for the if (maxScore < Score(result)) then
optimal composition solution in the composition max = Score(result)
maxSol = result
graph. Initially, all ants are placed in randomly
end if
chosen graph nodes (Initialize_Ants). Then, in their end for
search, each ant tries to identify a composition G = Apply_Offline_Pheromone(G, maxSol)
solution (Find_Solution) by performing two types of end while
searches: backward and forward. return SOL
A backward search is performed by the ant to end
identify the services that provide the inputs of the ------------------------------------------------------------------------
service where the ant is currently positioned. This
process is repeated until there are no more services 6 FRAMEWORK IMPLEMENTATION
whose inputs have no correspondents to outputs of AND EXPERIMENTAL RESULTS
other services.
In the forward search, the ant continues to To validate our approach to automatically
expand the partial solution identified so far by discover and compose semantic Web services, we
iteratively picking forward edges towards the output have developed an experimental framework that
node. Once a new node is picked, the ant will again integrates the ant-based clustering method, the
go on back edges searching for services that provide composition method as well as the ant-inspired
all its inputs. selection method presented above.
Both in the backward and the forward searches,
a service is chosen stochastically with the probability 6.1 Framework Architecture
defined in Eq. (12). When an ant is searching for a
composition solution, several local pheromone The architecture of the framework is presented
updates are performed. In addition, an offline in Figure 1. In what follows, we describe the
pheromone update (according to Eq. (14)) is functionality of each framework component.
performed after all ants have identified a composition The Ontology Driven Graphical User Interface
solution aiming to promote the best solution found so guides the user in the composition process by
far. The algorithm runs until no changes of the providing a controlled language that uses the
optimal solution found so far have been detected for a ontology concepts. It enables the user to specify
predefined number of iterations noIt. In the end, the composition requirements in terms of functional and
non functional parameters. The SWS Repository is a
repository of semantically annotated services based SAWSDL-TC collection.
on the SAWSDL standard [11]. The semantic 6.2.1 Results Evaluation for the Ant-based Service
descriptions generated for each service are based on Clustering Method
a set of domain ontologies, which serve as a common
vocabulary used by the descriptions. Experiments were carried out to test the
behavior of the Web service clustering and retrieval
algorithm under different settings for the algorithm's
parameters. As the total number of Web services in
the considered set is 737, the parameters presented in
Table 1 were theoretically defined and maintained
constant throughout the performed experiments.
Parameter Value
Grid width 120
Grid height 120
Figure 1: Framework architecture Number of ant agents 10
Agent memory size 10
The Service Matching Module is responsible for
evaluating the semantic similarity between two The clustering parameters with the highest level
services. The module interacts directly with the of impact on the final results were found to be the
ontology repository. The Service Clustering Module number of iterations of the clustering algorithm and
searches the Semantic Web Service Repository for the cluster retrieval distance. The first parameter
services with high semantic similarity between them, influences the final positions of the services on the
and groups them together. The Service Discovery grid, while the second one is crucial in determining
Module searches for clusters of services based on how these positions are interpreted when performing
some available inputs (given as ontology concepts). the cluster retrieval operation. To be more precise,
The Service Composition Module interacts with the the clustering algorithm will stop after a given
Service Discovery Module to build the composition number of iterations. The cluster retrieval algorithm
graph which stores all the possible solutions for a will stop once the distance on the grid between any
specified user request. The Selection Module pair of clusters is greater or equal to the cluster
interacts with the Service Composition Module to retrieval distance. The theoretical recommended
find the optimal composition solution according to number of iterations for the clustering algorithm is
QoS attributes and semantic quality. The Selection 2000 X NumberOfServices. Given that a set of 737
Module returns a ranked set of composition solutions, service descriptions has been used, a recommended
the first one being the optimal solution. The user is number of iterations would be around 1.5 million.
allowed to choose the solution he prefers. After a Also, the recommended value for the cluster retrieval
service composition solution is selected, the Service distance is 5. The theoretical minimum number of
Invocation Module will actually execute the services clustering iterations allowable regardless of the data
part of the composition, and present the user the final set on which clustering is performed is 1 million.
output. Overall, the results obtained for 1 million, 1.5
million and 2 million iterations are within the same
6.2 Experimental Results ranges, showing that as long as the iterations count is
kept above the theoretical minimum, the variations in
We have tested and evaluated our framework on the results are not very significant. For comparison
the SAWSDL-TC service test collection [12]. In our purposes, the graphical results obtained for 1 million
experiments we have used 737 Web services which iterations and 1.5 million iterations are presented in
are semantically annotated using concepts from Figures 2(a) and 2(b). The points in the images
several domain ontologies according to the represent Web services. They are colored differently
SAWSDL specification. The services belong to the according to the cluster they belong to, and the
following domains: education, medical care, food, clusters are also outlined with red rectangles. On the
travel, communication, economy and weapon. For other hand, the cluster retrieval distance has a
composition we have extended the SAWSD-TC significant impact on the final results. A value of 3
standard collection with new services from the for this parameter leads to the creation of very
medical care domain. In this section we present and restrictive small clusters with an average size of 2, a
analyze the experimental results obtained by small intra-cluster variance and a high average intra-
applying our ant-inspired methods on the extended cluster similarity. These clusters are hardly usable, as
there are very similar services placed in different The two usable values for the cluster retrieval
clusters. distance that were derived from these experiments
are 4 and 5. Choosing between them depends on the
purpose for which the clustering operation is
performed. If smaller and more coherent clusters are
desired, the value of 4 is recommended. If larger and
somewhat more diverse clusters are needed, a value
of 5 is best suited for this parameter. However, both
of these values perform well in terms of intra-cluster
variance and average intra-cluster similarity.
8 REFERENCES
[1] W. Liu, W. Wong: Web Service Clustering
using Text Mining Techniques, International
Journal of Agent-Oriented Software Engineering,
Vol. 3, Issue 1, pp. 6-26 (2009).
[2] C. Platzer, F. Rosenberg, S. Dustdar: Web
(b) Service Clustering using Multidimensional
Angles as Proximity Measures, ACM
Transactions on Internet Technologies Vol. 9,
Issue 3, Article 11 (2009).
[3] R. Nayak, B. Lee: Web Service Discovery with
Additional Semantics and Clustering,
Proceedings of the International Conference on
Web Intelligence, pp. 555-558 (2007).
[4] J. Ma, Y. Zhang, J. He: Efficiently Finding Web
(c) Services Using a Clustering Semantic Approach,
Proceedings of the International Workshop on
Context Enabled Source and Service Selection,
Integration and Adaptation (2008).
[5] M. Dorigo, M. Birattari, T. Stutzle: Ant Colony
Optimization - Artificial Ants as a
(d) Computational Intelligence Technique, IEEE
Computational Intelligence Magazine (2006).
[6] W. Zhang, C. K. Chang, T. Feng, H. Jiang: QoS-
based Dynamic Web Service Composition with
Ant Colony Optimization, Proceedings of the
34th Annual IEEE Computer Software and
Applications Conference, pp. 493-502 (2010).
[7] Y. Xia, J. Chen, X. Meng: On the Dynamic Ant
(e) Colony Algorithm Optimization Based on Multi-
pheromones, Proceedings of the Seventh
Figure 4: Highest ranked composition solutions for International Conference on Computer and
the medical care scenario; (a) Solution 1 with score = Information Science, pp. 630-635 (2008).
0.8459; (b) Solution 2 with score = 0.8453; (c) [8] W. Li, H. Yan-xiang: A Web Service
Solution 3 with score = 0.8433; (d) Solution 4 with Composition Algorithm Based on Global QoS
score = 0.8422; (e) Solution 5 with score = 0.8422. Optimizing with MOCACO, Algorithms and
Architectures for Parallel Processing, LNCS vol.
7 CONCLUSIONS AND FUTURE WORK 6082/2010, pp. 218-224 (2010).
[9] J. Handl, J. Knowles, M. Dorigo: Ant-Based
This paper presented two ant-inspired methods Clustering and Topographic Mapping. In
applied in two phases of the automatic Web service Artificial Life 12(1), pp. 35-61. MIT Press,
composition process, namely the discovery and the Cambridge MA, USA (2006).
selection of the optimal composition phases. To [10] D. Skoutas, D. Sacharidis, V. Kantere, T. Sellis:
improve the efficiency and accuracy of services Efficient Semantic Web Service Discovery in
discovery process we have organized the set of Centralized and P2P Environments, LNCS vol.
available services into service clusters. Our 5318/2010, pp. 583-598 (2010).
clustering method was inspired by the way ants [11] SAWSDL Standard,
cluster dead corpses to build cemeteries and sort http://www.w3.org/TR/2007/REC-sawsdl-20070828/.
larvae. The clustering method is based on a set of [12] SAWSDL-TC,
metrics for evaluating the similarity/dissimilarity http://projects.semwebcentral.org/projects/sawsdl-tc/.