[IEEE Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007) - Haikou,...
Transcript of [IEEE Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007) - Haikou,...
Workflow Similarity Measure for Process Clustering in Grid
Yi Wang, Minglu Li, Jian Cao, Xinhua Lin, Feilong Tang Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai
200240, China {wangsuper, li-ml, cao-jian, lin-xh, tang-fl}@cs.sjtu.edu.cn
Abstract
In grid environment, workflow process can be seen as not only cooperative approach of grid services and resources, but also reusable and sharable knowledge to settle specific problem. The research of grid work-flow process clustering can promote knowledge dis-covery and reuse in grid. In this paper, we put forward a grid workflow process design method using Event-Condition-Action (ECA) rule, and propose a new proc-ess similarity measure approach. Then, we use a case to prove the feasibility of the approach and show how to revise present clustering algorithm with the similar-ity measure approach briefly.
1. Introduction
Grid workflow plays more and more important role in the emerging grid technology. With the trend of merging grid and service-oriented technology, grid has changed into a distributed Problem Solve Environment (PSE) among different users, and a grid workflow process can be seen as not only cooperative approach of grid services and resources, but also sharable knowl-edge to settle specific problem. So it is necessary to cluster grid workflow processes to reduce the large amount of raw processes by categorizing them into smaller sets of similar items. We give a novel ap-proach for calculating process similarity of ECA rule-based grid workflow, and introduce a similarity-based algorithm for grid workflow clustering.
The remainder of this paper is organized as follows. Next section overviews the related works. Section 3 briefly introduces ECA rule and analyze how ECA rule supports typical workflow patterns. Section 4 proposes a new process similarity measure approach based on the comparison of ECA rule. A similarity measure case is used to prove the feasibility of our approach in sec-tion 5. The last Section concludes the whole paper and points out some future works briefly.
2. Related Works
The approach proposed in [1] refers to the cluster-ing of execution traces of processes or logs based on k-means clustering. Ref [2] puts forward a process simi-larity measure approach based on both domain classifi-cation and pattern analysis. Ref [3] converts each workflow dependency graph into binary branch vector, and distance between the binary branch vectors is the distance of two processes. An inexact process match-ing approach is introduced in [4], it use ontology path to calculate the distance between two activities and give some rules for similarity comparison. A weighted graph is introduced in [5] for comparing processes. The graph similarity is the weighted sum of similarity be-tween sets of services and sets of service links.
3. Process Design Based on ECA Rule
Event-Condition-Action (ECA) rule is put forward in the research field of active database [6]. The rule make the data repositories react to internal or external events and trigger a chain of activities that includes no-tifying users and applications or performing database updates. It is similar to the business process. So, we developed a grid workflow management system based on ECA rule [7].
Figure 1. ECA Rule
As be shown in Figure 1, An ECA rule consists of two parts essentially: an event and a list of condition-action pairs. When an event has occurred, a list of con-ditions are evaluated, if any condition is satisfied, the relative action is executed. A formal definition of ECA rule-based Workflow can be found in [7].
Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007)0-7695-2874-0/07 $25.00 © 2007
Table 1 shows the ECA rules for the typical work-flow patterns. It only summarizes the ECA rules trig-gered by EndOf(a) Event. We also defined other events, such as “BeginOf(a)” and “ErrorOf(a)”.
Table1 ECA Rules for Basic Workflow Pattern
4. Process Similarity Measure
4.1. Activity Similarity Measure
Measure of activity distance estimates the function dissimilarity between two activities. Here we borrow the idea from [4] and do some modification. We use
)a' ADis(a, to represent the distance of activity a
and a' .
Table 2. Activity Distance Measure
Cate(a) Cate( a' ) LinkNumber (a, a' ) )a' ADis(a,
Start Start no ontology 0 End End no ontology 0
Delay Delay no ontology 0 Assign Assign no ontology 0 Service Service n n
Other cases +
Table 2 shows the value of )a' ADis(a, in the dif-
ferent case. If the categories of a and a' are same and are not Service activity, )a' ADis(a, is 0. If a and a'
are both service activities, we will count the minimal link number “n” from a to a' in ontology tree. If a is same as a' , )a' ADis(a, =0; and in other cases,
)a' ADis(a, + . After get the distance of activity a
and a' , we can calculate activity similarity as: 1))a' 1/(ADis(a,)a' ASim(a, (1)
4.2. Event Similarity Measure
4.2.1 Atomic Event Similarity Measure
Event can be differed as atomic event and compos-ite event. Table 3 shows the evaluation method of the similarity between two atomic events ae and ae' , a and a' is the relative activity of ae and ae' respectively,
)ae' AESim(ae, represents the similarity of ae and ae' .
If category (such as Begin, End, Error) of ae is same as category of ae' , )a' Asim(a,)ae' AESim(ae, , else
0)ae' AESim(ae, .
Table 3. Atomic Event Similarity Measure
Cate(ae) is same as Cate( ae' ) )ae' AESim(ae,
Yes )a' ASim(a,
No 0
4.2.2 Complex Event Similarity Measure
Complex event can be expressed with an atomic event sequence connected by logic nodes “ ” and “ ”. A complex event has only one principal disjunc-tive normal form (PDNF). So, before we calculate the similarity between two complex events, we always transform them into atomic event sequences with PDNF at first. Assume ce and ce' are two complex event, and we use )ce' CESim(ce, to represent the
similarity of ce and ce' , we can do as the following steps:
a. Transform ce and ce' into PDNF of atomic event sequences. Assume that ce and ce' have ccn and
ccn' clauses respectively, ccn >= ccn' .
jik
j
n
iecccece icc a..
11 (2)
jik
j
n
iecccece icc 'a.''.'
'
1
'
1 (3)
In (2), ice.cc is the ith clause of ce, jae.ce.cci is
the jth atomic event in ice.cc , ik is the amount
of atomic events in ice.cc . The symbols in (3)
have the similar meaning. b. Take ccn' clauses from ce arbitrarily, we can get
cc
cc
nnP ' disjunctive form as
jik
j
n
ieccpcepce
icc a.].[*][**
1
'
1 (4)
Which cc
cc
nnP ' means the amount of permutations to
Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007)0-7695-2874-0/07 $25.00 © 2007
take ccn' clauses from ccn clauses, 0<p< cc
cc
nnP ' , p
is an integer, and ][* pce is the pth permutation
of ce. c. To each ][* pce , calculate the similarity of
][* pce and 'ce )'],[*( cepceTCESim with the
average of similarity of clauses in ][* pce and
'ce .ccn
i
iicc
ccceccpceCCSimn
cepceTCESim
'
1
)''.,].[*('
1)'],[*( (5)
To any two clauses icc and icc' , if
i'
i* k!k , )',( ii ccccCCSim =0, i
*k and i'k are the
amounts of atomic events in icc and icc' respec-
tively.Otherwise, we can calculate )',( ii ccccCCSim
with following steps: . Permute the atomic events of icc , get
!''' kPk
k atomic event sequences connected
with “ ” as jik
ji eccpcci
a.][ #1
##'
,
which 0< #p < !''' kPk
k, #p is an integer, and
][ ## pcci is the #p th permutation of icc .
. To each ][ ## pcci , calculate the similarity
of ][ ## pcci and icc' as '
1
### )a.',a.('
1)'],[(
k
j
jijiii ecceccAESSimk
ccpccTCCSim (6)
. )',( ii ccccCCSim = ))'],[(( ##ii ccpccTCCSimMax (7)
d. )'],[*('
)',( cepceTCESimMaxn
nceceCESim
cc
cc (8)
4.3. Condition-Action Similarity Measure
Condition presents the limitation of the relationship of objects or relationship of objects and constants. No-tice that a condition and an action always appears in pairs, we consider the similarity of condition and ac-tion simultaneously, i.e., condition-action similarity. Here we assume that:
If the condition is “null”, Pro(ac) =1, so “null” is also seem as a special condition;
Otherwise, consider an ECA rule has k condition-action pairs which conditions are not null, to each con-dition-action pair, 1/kPro(ac) . Pro(ac) is probability
of the implementation of action ac. Assume con-ac and con'-ac' are two condition-
action pairs, amount of activities in ac and ac' is an
and an' respectively, an >= an' , so ac and ac' can be
represented as:
in
i
a a.acac1
(9)
in
i
a a'.ac'ac''
1 (10)
We use )con'-ac' ac,-CASim(con to represent the
similarity of con-ac and con'-ac' , then, we can do as the following steps:
a. Calculate the probabilities of the implementation of ac and ac' , get pro(ac) and Pro( ac' ) respec-tively.
b. Take an' activities from ac arbitrarily, we can get
a
a
nnP ' action as
in
i
a a.[p]*ac[p]*ac'
1 (11)
Which a
a
nnP ' means the amount of permutations to
take an' activities from action ac, 0<p< a
a
nnP ' , p is
an integer. c. To each ][*ac p , calculate the similarity of
][*ac p and 'ac )'ac],[*ac( pTCASim as
an
iii
a
pASimn
pTCASim'
1
)'a'.ac,a].[*ac('1
)'ac],[*ac(
(12) d. let minp= ))'(oPr),(o(Pr acacMin ,
maxp= ))'(oPr),(o(Pr acacMax ,
)'ac],[*ac(maxp
minp')con'-ac' ac,-CASim(con pTCASimMax
n
n
a
a (13)
4.4. Rule Similarity Measure
Assume we have two ECA rule, ))ac,L(con (E,r ii , ))ac',(con'L' ,(E'r' jj , we can
measure the rule similarity of two ECA rule r and r'with following steps:
a. Measure )',( EECESim , notice that if E or E' is
atomic event, it also be dealt as a composite event. b. Assure the list L and list L' have can and can'
condition-action pairs respectively, can >= can' ,
Take can' condition-action pairs from the condi-
tion-action list L arbitrarily, we can get ca
ca
nnP '
condition-action lists as )ac.[p]*L,,ac.[p]*L , ,ac.[p]*L([p]*L '1 cani conconcon (14)
Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007)0-7695-2874-0/07 $25.00 © 2007
Which ca
ca
nnP ' means the amount of permutations
to take can' condition-action pairs from list L,
0<p< ca
ca
nnP ' , p is an integer, and ][* pL is the pth
permutation of L. c. To each ][*L p , calculate the similarity of
][*L p and 'L )'L],[*L( pTLSim with the aver-
age of similarity of condition-action pairs in ][*L p and 'L .
ca
ca
n
i
ii acconLacconpCASimn
pTLSim
'
1
)'''.,].[*L('1
)'L],[*L(
(15) d. The similarity of condition-action list of two ECA
rule is )'L],[*L()',( pTLSimMaxLLLSim (16)
e. The similarity of ECA rule r and r' is
))',()',((2
1)',( LLLSimEEESimrrRSim (17)
4.4. Process Similarity Measure
In our system, a process is built up with some ECA rules. So, the similarity of process Pr and Pr' can be measured as:
a. Assure the Pr and Pr' have rn and rn' ECA
rules respectively, rn >= rn' , Take n' rules from
Pr arbitrarily, we can get rn
rnP ' rule permutation as
).[p]*P,,.[p]*Pr , ,.[p]*P([p]*Pr '1 rni prrrr (18)
Which rn
rnP ' means the amount of permutations to
take rn' rules from process Pr, 0<p< rn
rnP ' , p is an
integer, and ][Pr* p is the pth permutation of Pr.
b. To each ][*P pr , calculate the similarity of
][*P pr and 'Pr )'P],[*P(TPrSim rpr with the
average of similarity of rules in ][*P pr and 'P .
rn
i
iir
rpRSimn
p
'
1
)'rPr'.,].[*Pr('
1)'P],[*Pr(TPrSim (19)
c. The similarity of two processes is
)'P],[*Pr(TPrSim'
)Pr'(Pr,PrSim pMaxn
n
r
r (20)
5 Case Study
5.1. Similarity of Image Processing Workflows
Figure 2 and figure 3 show two workflow processes
of image processing designed with our grid workflow
system [7]. Icon is the symbol of the activity to in-voke a grid service, the ontology of the activity is on the top of the symbol and the name is below of it. The conditions also appear with control flows.
Figure 2. Image Processing Workflow A
Figure 3. Image Processing Workflow B
Figure 4. Image Process Ontology
Figure 4 is an example of ontology tree to share the concept of Image Processing. The ontology of service activity “WSReverse” is “Reverse”, the ontology of service activity “WSBorLap” is “Border with Laplace” as well. The link numbers between “Reverse” and “BorLap” is 3, and the similarity of the two activities is 0.25. Rules created by workflow A and B are shown in figure 5 and figure 6 respectively.
Then, we can measure the similarity of the two workflow process A and B. Follow the steps proposed in section 4, we can get
4
3)
4
1,1(
1
5.01(
2
1)(
2
1)1.,1.( MaxLSimESimRulePRulePRsim BA
Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007)0-7695-2874-0/07 $25.00 © 2007
In the same way,18
1)2.,1.( RulePRulePRsim BA ,
48
7)1.,2.( RulePRulePRsim BA , and
12
7)2.,2.( RulePRulePRsim BA .
So, the similarity of process A and B is
67.03
2)
48
7
18
1(
2
1),
12
7
4
3(
2
1(
2
2
))1.,2.()2.,1.((2
1
)),2.,2.()1.,1.((2
1
2
2
Max
RulePRulePRsimRulePRulePRsim
RulePRulePRsimRulePRulePRsimMaxPSim
BABA
BABA
Figure 5. Rules Created by Workflow A
Figure 6. Rules Created by Workflow B
5.2. Process Clustering Based on similarity measure discussed above, the
clustering of processes is a quite easy job. As be shown in figure 7, we can use DBSCAN [8] as our clustering algorithm, notice that we will use DisSim(Pr, Pr' )=1-PrSim(Pr, Pr' ) to replace the distance of two object in the algorithm.
Figure 7. ProcessClustering(Pr[n], , MinPts)
6 Conclusion and Future Works
This paper proposes a novel approach for calculat-ing process similarity based on ECA rule to collect and cluster grid workflow processes, and the approach can dealt with different types of events and activities, which are not considered by former literatures.
We just present the feasibility of our approach. In our future work, we will implement it as a grid work-flow recommendation module in our system.
Acknowledgement
This paper is supported by National Scientific Fund of China (No.60503041), National High Technology Research and Development Program of China (No .20 06AA04Z152 , No .2006AA01A124No.2006AA01Z247, No.2006AA01Z172), Shanghai-Grid grand project of Science, Technology Commis-sion of Shanghai Municipality (05DZ15005), and Natural Science Foundation of Shanghai (05ZR14081).
References
[1] Gianluigi Greco, Antonella Guzzo, Luigi Pontieri, et al, “Mining Expressive Process Models by Clustering Workflow Traces”, In Proc. of PAKDD2004, 2004, pp. 52-62
[2] Jae-Yoon Jung, Joonsoo Bae, “Workflow Clustering Method Based on Process Similarity”, In Proc. of ICCSA2006, 2006, pp. 379-389
[3] J. Bae, J. Caverlee, L. Liu, et al, “Process Mining by Measuring Process Block Similarity”, In Proc. of Intl. Workshop on Business Process Intelligence, 2006
[4] Hai Zhuge, “A process matching approach for flexible workflow process reuse”, Information and Software Technology, vol. 44, 2002, pp. 445-450
[5] Kui Huang, Zhaotao Zhou, Yanbo Han, et al, “An Algorithm for Calculating Process Similarity to Cluster Open-Source Process Designs”, In Proc. of the Third International Conference on Grid and Cooperative Computing, 2004, pp. 107-114
[6] Dayal, U., Buchmann, A. P., McCarthy, D. R., “Rules are Objects Too: A Knowledge Model For An Active, Object-Oriented Database System”, In Proc. of the 2nd Intl. Workshop on Advances in Object-Oriented Database System, 1988, pp. 129-143
[7] Lin Chen, Minglu Li, Jian Cao, “ECA Rule-Based Workflow Modeling and Implementation for Service Composition”, IEICE Transactions on Information and Systems, vol. E89-D, no.2, 2006, pp. 624-630
[8] M. Ester, H.-P. Kriegel, J. Sander, et al, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases”, In Proc. of the second International Conference on Knowledge Discovery and Data Mining , 1996, pp. 226-231
Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007)0-7695-2874-0/07 $25.00 © 2007