Handling Duplicated Tasks in Process Discovery by Refining...

33
Handling Duplicated Tasks in Process Discovery by Refining Event Labels Xixi Lu ([email protected]) 1 Dirk Fahland 1 Frank van den Biggelaar 2 Wil van der Aalst 1 1 Eindhoven University of Technology 2 Maastricht University Medical Center 1

Transcript of Handling Duplicated Tasks in Process Discovery by Refining...

Page 1: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

Handling Duplicated Tasks in Process Discovery by Refining Event Labels

Xixi Lu ([email protected])1

Dirk Fahland1

Frank van den Biggelaar2

Wil van der Aalst1

1Eindhoven University of Technology2Maastricht University Medical Center 1

Presenter
Presentation Notes
14:00 – 14.30.
Page 2: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

Overview

• (5 min) Research Problem• (10 min) Approach• (5 min) Evaluation and Conclusion

2

Page 3: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

VA CB A

Duplicated Tasks –Events Having Imprecise Labels

3

CA

BS+

X X

A B ACV S

C B SAC A B

VV S

Discover VABC

S

Record

+

Presenter
Presentation Notes
Root cause analysis, request for accuracy, detecting deviating events.
Page 4: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

4

A B ACV S

C B SAC A B

VV S

Input:Imprecise log

Research Problem

Presenter
Presentation Notes
Root cause analysis, request for accuracy, detecting deviating events.
Page 5: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

Research Problem

5

So that1. all traces & events preserved (no filtering)2. model is more precise (than without refining)3. we can explore different refinement of labels interactively

(because we don’t know correct label refinement from given log)

Discover(IM, ILP, …)

Model

Refine labels of events

Input:Imprecise log

Output:Refined log

Presenter
Presentation Notes
(5 min) Research Problem and Motivation (10 min) Approach (5 min) Evaluation
Page 6: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

Refining Event Labels

6

imprecise Precise

DiscoverDiscover Discover

C B AC A B

A B AC C2 B2 A2

C2 A2 B2

A1 B1 A3C1

C2 B2 A2

C3 A4 B3

A1 B1 A3C1

C2 B2A2

A1 B1 A3C1

C2 B2 A2

C3 A4 B3

A1 B1 A3C1

C BA

A B AC

Discover

C2 B2 A2

C3 A4 B3

A1 B1 A3C1Refining Label

ABC

Presenter
Presentation Notes
Given a log, there is could much more models that are fitting. - Operating this on log, we could have more
Page 7: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

Refining Event Labels

7

Precise

Discover Discover

C B AC A B

A B AC C2 B2 A2

C2 A2 B2

A1 B1 A3C1

C2 B2 A2

C3 A4 B3

A1 B1 A3C1

C2 B2A2

A1 B1 A3C1

C2 B2 A2

C3 A4 B3

A1 B1 A3C1

C BA

A B ACSystem unknown!

We don’t know what is optimal!

Discover

C2 B2 A2

C3 A4 B3

A1 B1 A3C1Refining Label

ABC

imprecise

Discover

Presenter
Presentation Notes
Given a log, there is could much more models that are fitting. - Operating this on log, we could have more
Page 8: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

Related Work

8

Discover(IM, ILP, …)

Process Model

Refine Labelsof Events

Imprecise log Refined log

(c) Cluster traces

(b) New Discovery algorithm

(a) Relabelmodel elements

(d) Filter

Filtered log

Clusters …

Page 9: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

1. Detectimprecise

labels3. Refine

horizontally4. Refinevertically

Imprecise label candidates

Horizontal clusters of events

Vertical clusters of events

Approach

9

Refined log

2. Mapsimilar events

{x, …}

Mappings

Imprecise log

a) Manual by user inputb) Automated detection

Page 10: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

1. Detectimprecise

labels3. Refine

horizontally4. Refinevertically

Imprecise label candidates

Horizontal clusters of events

Vertical clusters of events

Approach

10

Refined log

2. Mapsimilar events

{x, …}

Mappings

Imprecise log

Page 11: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

Mapping Between Events

11

Trace 1

Trace 2

Mapping

AV SA CB

SV C AB

Page 12: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

Quantify Similarity of Mapped Events

12

… based on “structural context of events”

AV SA CB

SV C AB

Presenter
Presentation Notes
Quantify how similar are the two traces are according to this mapping
Page 13: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

Quantify Similarity of Mapped Events

13

… based on “structural context of events”(1) Differences in neighbors

Different Same

Cost = 2

AV SA CB

SV C AB

Presenter
Presentation Notes
Quantify how similar are the two traces are according to this mapping
Page 14: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

Quantify Similarity of Mapped Events

14

… based on “structural context of events”(1) Differences in neighbors(2) Differences in structure

Distance = 4

Distance = 3

Cost = 2+1

AV SA CB

SV C AB

Page 15: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

Quantify Similarity of Mapped Events

15

… based on “structural context of events”(1) Differences in neighbors(2) Differences in structure

Distance = 2

Distance = 1

Cost = 2+1+1+1+0

AV SA CB

SV C AB

Page 16: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

Quantify Similarity of Mapped Events

16

… based on “structural context of events”(1) Differences in neighbors(2) Differences in structure

Cost = 2+3Cost = 4+6

Cost = 4+4 Cost = 0+3

Cost = 2+4 Cost so far = 6+10+8+5+3

AV SA CB

SV C AB

Page 17: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

Quantify Similarity of Mapped Events

17

… based on “structural context of events”(1) Differences in neighbors(2) Differences in structure(3) #Dissimilar events

Xixi Lu, Dirk Fahland, Frank J.H.M. van den Biggelaar, and Wil M.P. van der Aalst. Detecting Deviating Behavior without Models. In BPM Workshops 2015, volume 256, pp.126-139, Springer International Publishing, 2015.

There is an algorithm to compute an optimal mapping :

AS SA CB

SS C AB

Total Cost = 6+10+8+5+3 +1

Page 18: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

1. Detectimprecise

labels3. Refine

horizontally4. Refinevertically

Imprecise label candidates

Horizontal clusters of events

Vertical clusters of events

Approach

18

Refined log

2. Mapsimilar events

{x, …}

Mappings

Imprecise log

Page 19: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

3. Refine horizontally using cost

19

AV SA CB

SV C AB

Page 20: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

3. Refine horizontally using cost

20

AV SA CB

SV C AB

Trace 1

Trace 2

SV C BATrace 3

20

V SA CBTrace 0

10 10 10

Imprecise label candidates

20 20

15 15 15

a) Normalize costs w.r.t. maximal cost seen in the log

Page 21: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

3. Refine horizontally using cost

21

1

0.5 0.5 0.5

Imprecise label candidates

1 1

0.75 0.75 0.75

a) Normalize costs w.r.t. maximal cost seen in the log

b) Set variant threshold,say, 0.8

c) Remove edges if cost > variant threshold

AV SA CB

SV C AB

SV C BA

V SA CB

Trace 1

Trace 2

Trace 3

Trace 0

Presenter
Presentation Notes
Explain, if labels are precise, not splits If labels are precise and inbetween, not splits.
Page 22: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

3. Refine horizontally using cost

22

0.5 0.5 0.5

0.75 0.75 0.75

a) Normalize costs w.r.t. maximal cost seen in the log

b) Set variant thresholdsay, 0.8

c) Remove edges if cost > variant threshold

A1V SA1 C1B1

SV C2 A2B2

SV C2 B2A2

V SA1 C1B1

Trace 1

Trace 2

Trace 3

Trace 0

Page 23: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

4. Refine vertically using frequency

23

I) Set unfolding threshold,say, 60%

II) Refine labels if freq > unfolding threshold

Is this A1 a 2nd iteration of a loop?

A1V SA1 C1B1

V SA1 C1B1

Trace 1

Trace 0

2 1

Presenter
Presentation Notes
It is clear b and c stay as-is. The question now is should “A” be refined. In essence, Mappings have formed different groups of “A”s. We now have to decide are they different As, or are they different iterations of the same A. In some sense, it is not just for loop, but also for Xor, and other distinct way of grouping… In other words, do we want to see a loop or distinct transitions in the model. Is it a repeatitive behavior, and the repetition is below a certain threshold, then we they is it a loop. Otherwise, we assume it is two distinct tasks.
Page 24: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

4. Refine vertically using frequency

24

I) Set unfolding threshold,say, 40%

II) Refine labels if freq > unfolding threshold

1

Or is this a different task?

A3V SA1 C1B1

V SA1 C1B1

Trace 1

Trace 0

2

Page 25: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

1. Detectimprecise

labels3. Refine

horizontally4. Refinevertically

Imprecise label candidates

Approach

25

Refined log

2. Mapsimilar events

{x, …}

Mappings

Imprecise log

Variant threshold

Unfolding threshold

Iterative

Page 26: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

Implemented in ProM

• There will be a demo in the demo track.

26

Detect imprecise labels

Refine horizontally

Refine vertically

Page 27: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

• 3 experiments, each 600 models, k transitions w/ same labelE1) Default parameters, k = 4E2) Adaptive parameters, k = 4E3) 1 in loop, k = 2

Evaluation

27

Refining Label

Discover

Discover

Generate

Improved?Imprecise log Refined log

Compute qualities

Presenter
Presentation Notes
(A1) help discover a model closer to the system (A2) help discover more precise and structured models � yet preserve fitness (A3) provide users with more options and different views w.r.t. a log
Page 28: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

Experiment Result - Example

28

System precision improved by 0.68System recall is 1

Precision w.r.t. log improved by 0.55

B

B B B B

𝑡𝑡1𝑡𝑡2

𝑡𝑡3 𝑡𝑡4B

BB B

Presenter
Presentation Notes
file:///D:/workspace/TraceMatching/dupResult/resultDupTaskmrt05-1559/Result_modelJ.html The refined models (c)(e) shows that the duplicated tasks were rediscovered �in their respective positions, but unable to identify the concurrency between�two consecutive duplicated tasks.
Page 29: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

E2) Adaptive

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

E1) Default

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

E3) 1 in loop

Experiment Result -F1-score of and w.r.t.

29

Improved 35% ~ 46% Models Improved 89% Models Improved 60% Models

Page 30: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

Experiment Results and Limitations

30

Presenter
Presentation Notes
(1) Loops too large or too many iterations (2) Duplicated tasks concurrent to many other tasks
Page 31: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

Real-life Example : Hospital Data

31

1st Iteration : refined “surgery” events

2nd & 3rd Iterations: refine “consultation” events

Page 32: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

Conclusion and Future Work

• Improved logs (and process discovery) by refining labels for up to 87%• Interactive and explorative

• Future work• Different ways to compute similar events using context of events• Log preprocessing framework

• Integrated in “Log to Model Explorer”• Supporting clustering, filtering and label refining

• Come to the Demo!

32

Page 33: Handling Duplicated Tasks in Process Discovery by Refining ...processmining.org/old-version/files/2016bpm_lu.pdf · Is this A1 a 2. nd. iteration of a loop? V. A1. B1. C1. A1. S.

Questions?

33