Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
description
Transcript of Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
Process Mining: Data Science in ActionProcess Mining as a Tool to Find Out What People and Organizations Really Do
prof.dr.ir. Wil van der Aalstwww.processmining.org
PAGE 2
How many students will take my class
next year?
What is the
real curriculum?
When, where and why do
students drop out?
When, where and why do students
deviate?
Will I graduate or drop out?
When will I graduate?
Should I take this course if I want to graduate in two
years from now?
PAGE 3
It's the process, stupid!
(not the data, not the system, and not a particular decision)
The data scientist …
PAGE 4
Data Science Center Eindhoven (DSC/e)
PAGE 5
28 research groups are
involved
Data Science Center Eindhoven (DSC/e)
PAGE 67 research programs
Process Mining
www.olifantenpaadjes.nl
Process discovery
abcdacbdabcdabcefcbdacbefcbefbcdetc
Process Discovery
10
Conformance checking
acbdabdacbefbcdacbefebcdddfeabb
ConformanceChecking
PAGE 12
Let's play
PAGE 13
Play-Out
Play Out: A possible scenario
PAGE 14
a b d e gCase Activity Timestamp Resource432 register travel request (a) 18-3-2014:9.15 John432 get support from local manager (b) 18-3-2014:9.25 Mary432 check budget by finance (d) 19-3-2014:8.55 John432 decide (e) 19-3-2014:9.36 Sue432 accept request (g) 19-3-2014:9.48 Mary
Play Out: Another scenario
PAGE 15
a cd e hf b d e
Play Out: Process model allows for many more scenarios
PAGE 16
abdeg
abcefbdehadceg adbeh
acbefbdehacdefcdefbdeh
adcefcdefbdefbdegabdegadcehadbeh
acbefbdegacdefcdefbdehadcefcdefbdefbdeg
abdegadceh
adbeh
acbefbdegacdefcdefbdeh
adcefcdefbdefbdeg
abdeg
PAGE 17
Play-In
Play In:Simple process allowing for 4 traces
PAGE 18
abdeg adbeg
abdehadbeh
abdegadbeg
abdehadbehabdehabdeh
abdehadbeh adbeh
Play In: Process allowing for more traces
PAGE 19
abdeg
abcefbdehadcegadbeh
acbefbdehacdefcdefbdehadcefcdefbdefbdeg abdeg
adcehadbehacbefbdeg
acdefcdefbdeh
adcefcdefbdefbdeg
abdegadceh
adbehacbefbdeg
acdefcdefbdeh
adcefcdefbdefbdeg
abdeg
No modeling needed!
PAGE 20
No modeling needed!
Example Process Discovery(Dutch housing agency, 208 cases, 5987 events)
PAGE 21
Example process discovery for hospital(627 gynecological oncology patients, 24331 events)
PAGE 22
Process discovery algorithms (small selection)
PAGE 23
α algorithm
α++ algorithm
α# algorithm
language-based regions
state-based regionsgenetic mining
heuristic mining
hidden Markov models
neural networks
automata-based learning
stochastic task graphs
conformal process graph
mining block structures
multi-phase miningpartial-order based mining
fuzzy mining
LTL mining
ILP mining
distributed genetic mining
ETM genetic algorithmInductive Miner (infrequent)
Language based regions
PAGE 24
Region R = (X,Y,c) corresponding to place pR: X = {a1,a2,c1} = transitions producing a token for pR, Y = {b1,b2,c1} = transitions consuming a token from pR, and c is the initial marking of pR.
Basic idea: enough tokens should be present when consuming
PAGE 25
A place is feasible if it can be added without disabling any of thetraces in the event log.
PAGE 26
Replay
Replay
PAGE 27
a c d e g
Replay
PAGE 28
a c
check budget (d) is missing!
e g?
Alignments: Relating reality and model
PAGE 29
a c » e ga c d e g
check budget (d) did not happen but should have according to the model
Replay
PAGE 30
a c d e gh
reject request (h) is impossible
?
Alignments: Relating reality and model
PAGE 31
a c h d e ga c » d e g
reject request (h) happened but could not happen according to the model
Any trace in reality can be related to a path in the model
PAGE 32
a c e f d d b c e h
Any trace in reality can be related to a path in the model
PAGE 33
a c » e f d d b c e ha c d e f d » b » e h
check is missing
one check too many
cannot both be done
optimization problem using a cost fu
nction
PAGE 34
process model
event log
synchronous move
move on model only
move on log only
Conformance Checking (WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988)
PAGE 35
Replay with timestamps
PAGE 36
a9.15 c9.20 d9.35 e10.15 g11.30
9.159.20
9.35
10.15
11.30
5 55
20 40
75
Replay with timestamps for many traces
PAGE 37
frequencies of activities
frequencies of paths
durations of activities
waiting times and other delays between
activities
PAGE 38
• conformance checking to diagnose deviations• squeezing reality into the model to do model-based
analysis
Alignments are essential!
Example: BPI Challenge 2012(Dutch financial institute, doi:10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f)
PAGE 39Work of Arya Adriansyah (Replay project)
PAGE 40
Auditor's toolbox
PAGE 41
Business analyst's toolbox
Software
600+ plug-ins available covering the whole process mining spectrum
Disco
PAGE 45
Overview: Role of process models
PAGE 46
Play-Out
Play-In
Replay
decomposed/distributed process mining
data Big data
http://www.multivu.com/assets/58095/photos/Data-is-the-new-oil-infographic-Nigel-Holmes-2012-from-The-Human-Face-of-Big-Data-original.jpg
What if?
PAGE 50
there are more than 1000 different
activities?
there are more than 1.000.000
cases?
there are more than 100.000.000
events?
Decompose event log! vertical or horizontal
PAGE 51
sets of cases
sets of activities
Vertical distribution: Split cases
PAGE 52
sets of cases
Horizontal distribution
PAGE 53
sets of activities
{a,b,e,f,g}
{b,c,d,e}{a,b,e,f,g}
{b,c,d,e}
Horizontal distribution: The key idea
PAGE 54
projected on {a,b,e,f,g}
projected on {b,c,d,e}
Two foundational ways of spitting event data: horizontal or vertical
PAGE 55
Decomposing Conformance Checking
PAGE 57See "divide and conquer" framework by Eric Verbeek.
Example of a valid decomposition
PAGE 58
Log can be split in the same way!
Example of an alignment for observed trace a,b,c,d,e,c,d,g,f
PAGE 59
Etc.
a,b,c,d,e,c,d,g,f
Conformance checking can be decomposed !!!
• General result for any valid decomposition: Any event log or trace is perfectly fitting the overall model if and only if it is also fitting all the individual fragments
PAGE 60
for a
ny Petri
net
(not j
ust WF-n
ets)
for a
ny valid
decompsiti
on
Wil van der Aalst, Decomposing Petri nets for process mining: A generic approach. Distributed and Parallel Databases, Volume 31, Issue 4, pp 471-507, 2013
also for d
ata
Petri n
ets
Example (work with Jorge Munoz-Gama and Josep Carmona)
PAGE 61
prFm6
Decomposing Process Discovery
PAGE 62See "divide and conquer" framework by Eric Verbeek.
conclusion
Don't just check the temperature …
PAGE 64
X-ray your processes!
PAGE 65
PAGE 66
www.processmining.org
www.win.tue.nl/ieeetfpm/
Process Mining: Data Science in Actionhttps://www.coursera.org/course/procmin
First Massive Open Online Course (MOOC) on Process Mining