Data Conformance Checking using Optimal Alignments Felix Mannhardt, Massimiliano de Leoni, Hajo A....

20
Data Conformance Checking using Optimal Alignments Felix Mannhardt , Massimiliano de Leoni, Hajo A. Reijers

Transcript of Data Conformance Checking using Optimal Alignments Felix Mannhardt, Massimiliano de Leoni, Hajo A....

Data Conformance Checking using Optimal Alignments

Felix Mannhardt, Massimiliano de Leoni, Hajo A. Reijers

(a; {A = 3000;R = Michael; E = Pete}); (b; {V = OK;E = Sue});(c; {I = 530;D = OK;E = Sue});(f; {E = Pete});

(a; {A = 3000;R = Michael; E = Pete}); (b; {V = OK;E = Sue});(c; {I = 530;D = OK;E = Sue}); (f; {E = Pete});

Problem (Adapted from Massimiliano de Leoni)

PAGE 2 / 18

Activity d should have occurred, since amount < 5000

«Sue» not authorized to

perform b: is not Assistant

Activity h hasn’t been executed: D cannot be «OK»

(a; {A = 5001;R = Michael; E = Pete}); (b; {V = OK;E = Pete});(c; {I = 530;D = NOK;E = Sue}); (f; {E = Pete});

Department of Mathematics and Computer Science

Department of Mathematics and Computer Science

How Does Data Alignment Work?

• Petri Nets with Data:

• Two new “Moves” with associated “Costs”:• Move with incorrect write operation• Move with missing write operation

• Formulation of an MILP problem for CF Alignment:

PAGE 3 / 18

[1] M. de Leoni, W. M. P. van der Aalst (2103). Aligning Event Logs and Process Models for Multi-Perspective Conformance Checking: An Approach Based on Integer Linear Programming.[2] A. Adriansyah, B. F. van Dongen, W. M. P. van der Aalst (2011). Conformance checking using cost-based fitness analysis.

D

B

C

A

𝑋<5000

𝑋 ≥5000

X

+𝑐𝑜𝑠𝑡 (𝑚𝑖𝑠𝑠𝑖𝑛𝑔)=𝜅𝐷𝐴

Department of Mathematics and Computer Science

Current Data Conformance Checker in ProM

PAGE 4 / 18

Data Conformance Checking

Petri Net with Data

Input Output

Event Log

DataAlignment

Control Flow Alignment

DataAlignment

Cost

Shortcomings of the Current Solution

PAGE 5 / 18Department of Mathematics and Computer Science

(a; {A = 3000;R = Michael}); (b; {V = NOK});(c; {I = 530;D = OK}); (f);

Perfect CF Alignment ()

L a b c f

P a b c f

(a; {A = 5001; R = Michael}); (b; {V = OK});(c; {I = 530;D = NOK});(f);

Resulting DF Alignment ()

(a; {A = 3000;R = Michael}); (b; {V = NOK});(c; {I = 530;D = OK}); (f);

Better DF Alignment ()

Department of Mathematics and Computer Science

First Idea (Multi-Alignment Approach)

PAGE 6 / 18

Data Conformance CheckingPetri Net with

Data

Input Output

Event Log

Optimal DataAlignment

Control Flow Alignment

DataAlignment

Cost

Optimal?

Yes

No

Image source: http://commons.wikimedia.org/wiki/File:Pictofigo_-_Idea.png

OR

Cache

Department of Mathematics and Computer Science

(a; {A = 3000;R = Michael}); (b; {V = NOK});(c; {I = 530;D = OK}); (f);

Second Idea (Single-Alignment Approach)

PAGE 7 / 17

<>

<a> <a> <a>

Move in Log Move in ModelMove in Both

<a,b> <a,b>

Move in Log

Move in Both

… …

<a,b,c>

Move in BothMove in

Log

<a,b,c>

Move in Both

<a,b,c,f>

(0,0)

(,)

(0,2)(1,0)

(1,0)

Image source: http://commons.wikimedia.org/wiki/File:Pictofigo_-_Idea.png

Department of Mathematics and Computer Science

Single-Alignment Approach I

• For each node in the search space• Compute a Data Alignment (MILP) for the prefix• Remember and the variable assignment• Use the variable assignment to check if an MILP needed

• A best-first search on the overall cost () returns one optimal Data Alignment• Use of ILP heuristic for A* [2] still possible• never gets better (no negative edges!)• But, our search space is bigger!

PAGE 8 / 18

[2] A. Adriansyah, B. F. van Dongen, W. M. P. van der Aalst (2011). Conformance checking using cost-based fitness analysis.

Department of Mathematics and Computer Science

Single-Alignment Approach: Search Space

PAGE 9 / 18

[2] A. Adriansyah, B. F. van Dongen, W. M. P. van der Aalst (2011). Conformance checking using cost-based fitness analysis.

vs.

• Two states are equivalent iff• Same marking of “Event Net” & Process Model as in [2]• Same variable assignment wrt. all guards

(a; {A = 5001;R = Michael}); (b; {V = OK});(c; {I = 530;D = OK}); (c; {I = 530;D = NOK});(f);

(a; {A = 5001;R = Michael}); (b; {V = OK});(c; {I = 530;D = OK}); (c; {I = 530;D = NOK});(f);

Department of Mathematics and Computer Science

Comparison: Improvement of Fitness

Average Max0%

10%

20%

30%

40%

50%

60%

Insurance Institute (12,000 Traces)Synthetic Example (1,200 Traces, Length 4-15, 10% Noise)

Ch

ang

e in

Fit

nes

s

PAGE 10 / 18

2 Traces254 Traces

Department of Mathematics and Computer Science

Comparison: Dutch Insurance Institute

Running Time0

20

40

60

80

100

120

Old Multi Single

Sec

on

ds

PAGE 11/ 18

Department of Mathematics and Computer Science

Comparison: Dutch Insurance Institute [1]

PAGE 12 / 18

Max0

20

40

60

80

100

120

Multi Single

# M

ILP

Average0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Multi Single

# M

ILP

[1] M. de Leoni, W. M. P. van der Aalst (2103). Aligning Event Logs and Process Models for Multi-Perspective Conformance Checking: An Approach Based on Integer Linear Programming.

Department of Mathematics and Computer Science

Comparison: Dutch Insurance Institute

PAGE 13 / 18

Average0

5

10

15

20

25

30

35

40

45

50

Old Multi Single

# Q

ueu

ed S

tate

s

Max0

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

64

Old Multi Single

# Q

ueu

ed S

tate

s

Department of Mathematics and Computer Science

Comparison: Synthetic Model (10% Noise)

Running Time0

10

20

30

40

50

60

70

80

Old Multi Single

Sec

on

ds

PAGE 14 / 18

Department of Mathematics and Computer Science

Comparison: Synthetic Model (10% Noise)

PAGE 15 / 18

Max0

200

400

600

800

1,000

1,200

1,400

1,600

Multi Single

# M

ILP

Average0

2

4

6

8

10

12

14

16

Multi Single

# M

ILP

Department of Mathematics and Computer Science

Comparison: Synthetic Model (10% Noise)

PAGE 16 / 18

Average0

100

200

300

400

500

600

700

800

2666

Old Multi Single

# Q

ueu

ed S

tate

s

Max0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

180,000

Old Multi Single

# Q

ueu

ed S

tate

s

Department of Mathematics and Computer Science

Comparison Wrap-up

PAGE 17 / 18

• Multi-Alignment Approach• Building CF Alignments (sorted) up to a certain is not

feasible for certain models/traces• Though faster in some cases (Good Fitness)

• Single-Alignment Approach• Again, solving many (smaller) MILPs• Integrated Optimizations:

− Check if guards already fulfilled No MILP− Check if only write operations missing No MILP− Re-use calculated Data Alignments 1 x

MILP

Department of Mathematics and Computer Science

What Next?

• Improve the Implementation• Faster MILP solving by re-use the lpsolve instance?• Reduce memory footprint of both approaches?

• Will a Decomposition of the process model help?

• Case study with Event Log from Italian local police • Event Log about the management of road-traffic fines• Process with multiple decision points• Process with non-trivial guards• Event Log contains data attributes

• Submit Paper to FASE 2014

PAGE 18 / 18

Department of Mathematics and Computer Science

Summary

• Current Data Alignment could be sub-optimal• Two approaches for an optimal Data Alignment

• Multi-Alignment Approach

• Single-Alignment Approach

• Both implemented in ProM• Soon to be integrated in Data Aware Replayer• Which one to use depends on the case

LAST PAGE

MILP

CFAlignment

MILP

CFAlignment

MILP

CFAlignment

…MILP

CFAlignment

Optimal? DataAlignment

Best First SearchData

Alignment

Department of Mathematics and Computer Science

Image source: http://commons.wikimedia.org/wiki/File:Pictofigo_-_Idea.png