KR 2014 - Montali - State-Boundedness in Data-Aware Dynamic Systems
ATAED2016 Montali - Marrying data and processes: from model to event data analysis
-
Upload
marco-montali -
Category
Presentations & Public Speaking
-
view
172 -
download
0
Transcript of ATAED2016 Montali - Marrying data and processes: from model to event data analysis
From Model to Event Data Analysis
Marco Montali Free University of Bozen-Bolzano
ATAED 2016
Marrying Data and Processes
Our Starting Point
Marrying processes and data is a must if we want to really understand
how complex dynamic systems operate
Dynamic systems of interest: • business processes• multiagent systems • distributed systems
2
Our ThesisKnowledge representation and
computational logics
is a swiss-army knife to
understand data-aware dynamic systems, and
provide automated reasoning and verification capabilities along their entire lifecycle
3
Formal Verification
Automated analysis of a formal model of the system against a property of interest,
considering all possible system behaviors5
picture by Wil van der Aalst
Process Mining
Extraction of valuable, process-related information
from event logs, i.e., the footprint of reality
6
picture by Wil van der Aalst
Data/Process Fragmentation• A business process consists of a set of activities that
are performed in coordination in an organizational and technical environment [Weske, 2007]
• Activities change the real world• The corresponding updates are reflected into the
organizational information system(s) • Data trigger decision-making, which in turn determines
the next steps to be taken in the process
• Survey by Forrester [Karel et al, 2009]: lack of interaction between data and process experts
9
Experts Dichotomy• BPM professionals: data are subsidiary to
processes
• Master data managers: data are the main driver for the company’s existence
• Forrester: in 83/100 companies, no interaction at all between these two groups • This isolation propagates to languages and tools,
which never properly account for the process-data connection
10
Conventional Data ModelingFocus: revelant entities, relations, static constraints
Supplier ManufacturingProcurement/Supplier
Sales
Customer PO Line Item
Work OrderMaterial PO
*
*
spawns0..1
Material
But… how do data evolve? Where can we find the “state” of a purchase order?
11
Conventional Process ModelingFocus: control-flow of activities in response to events
But… how do activities update data? What is the impact of canceling an order?
12
Do you like Spaghetti?Manage
CancelationShipAssembleManage
Material POsDecompose
Customer PO
Activities
Process
Data
Activities
Process
Data
Activities
Process
Data
Activities
Process
Data
Activities
Process
Data
Customers Suppliers&CataloguesCustomer POs Work Orders Material POs
IT integration: difficult to manage, understand, evolve14
Too Late…• Where are the data?
• Where shall we model relevant business rules?
15
Too late to reconstruct the missing pieces
Where is our data?part is in the DBs,part is hidden in the process execution engine.
Where are the relevant business rules, and how are they modeled?At the DB level? Which DB? How to import the process data?(Also) in the business model? How to import data from the DBs?
DataProcess
Supplier ManufacturingProcurement/Supplier
Sales
Customer PO Line Item
Work OrderMaterial PO
*
*
spawns0..1
Determine cancelation
penaltyNotify penalty
Material
Process Engine
Process State
Business rulesFor each work order W For each material PO M in W if M has been shipped add returnCost(M) to penalty
Diego Calvanese (FUB) Foundations of Data-Aware Process Analysis INRIA Saclay Paris – 18/3/2016 (10/1)
““
How is Research Reacting?
A recent review…
Verification typically takes place at the design stage of a business process type. However, at this stage, required knowledge about data (database schema, integrity constraints) is typically not yet available.
16
…But There is Hope!• [Meyer et al, 2011]: data-process integration
crucial to assess the value of processes and evaluate KPIs
• [Dumas, 2011]: data-process integration crucial to aggregate all relevant information, and to suitably inject business rules into the system
• [Reichert, 2012]: “Process and data are just two sides of the same coin”
17
Formal Verification The Conventional, Propositional Case
Process control-flow
(Un)desired property18
(Un)desired property
Finite-statetransition system
Propositionaltemporal formula|= �
Formal Verification The Conventional, Propositional Case
Process control-flow
19
(Un)desired property
Finite-statetransition system
Propositionaltemporal formula|= �
Verification via model checking2007 Turing award:
Clarke, Emerson, Sifakis
Formal Verification The Conventional, Propositional Case
Process control-flow
20
(Un)desired property
First-ordertemporal formula|= �
Process+Data
Formal Verification The Data-Aware Case
Infinite-state, relational transition system [Vardi 2005] 23
(Un)desired property
First-ordertemporal formula|= �
?Formal Verification
The Data-Aware Case
24
Process+Data
Infinite-state, relational transition system [Vardi 2005]
Why FO Temporal Logics• To inspect data: FO queries• To capture system dynamics: temporal
modalities• To track the evolution of objects: FO
quantification across states • Example:
It is always the case that every order is eventually either cancelled or paid
25
Why FO Temporal Logics• To inspect data: FO queries• To capture system dynamics: temporal
modalities• To track the evolution of objects: FO
quantification across states • Example:
It is always the case that every order is eventually either cancelled or paid
26
G
✓8x.Order(x)
! F�State(x, cancelled) _ State(x, paid)
�◆
Problem DimensionsData
component Relational DB Description logic KB OBDA system Inconsistency
tolerant KB …
Process component
condition-action rules BPMN Golog program Petri nets …
Task modeling
Conditional effects
Add/delete assertions Programs User forms …
External inputs None External
services Input DB Fixed input …
Network topology
Single orchestrator Full mesh Connected, fixed
graph Ring …
Interaction mechanism None Synchronous Asynchronous
and orderedAsynchronous
lossy …27
Colored Petri Nets
• Where is the data model?• Formal analysis only with a-priori propositionalization
29
Tools (e.g. BizAgi)
30
ReviewRequest
Fill Reim-bursement
Review Reim-bursement
Rejected
Accepted
• Which formal semantics? • Analysable?
RAW-SYS• Integrated data+process modeling
• Standard relational model for capturing data • Standard workflow nets (or other types of Petri nets) for capturing
processes
• Net transitions interplay with data • Conditionally enabled by FO queries over the data • Described in terms of full-fledged CRUD operations over the data
• Bridge between theory and practice • Mimics how BPMS actually work • Has unambiguous execution semantics
33
Example: User Cart
35
CustomerId …
Product (read-only)name …
InCartBarCode Product
OwnerCustId
Shared DBLocal DB
create case
close case
Example: User Cart
36
CustomerId …
Product (read-only)name …
InCartBarCode Product
OwnerCustId
Shared DBLocal DB
Customer(x,…)
ADD Owner(x)
…
create case
close case
Example: User Cart
37
CustomerId …
Product (read-only)name …
InCartBarCode Product
OwnerCustId
Shared DBLocal DB
Customer(x,…)
ADD Owner(x)
open cart…
create case
close case
Example: User Cart
38
CustomerId …
Product (read-only)name …
InCartBarCode Product
OwnerCustId
Shared DBLocal DB
Customer(x,…)
ADD Owner(x)
open cart
insert item(p)
create case
Product(p,…)ADD InCart(getBC(),p)
close case
Example: User Cart
39
CustomerId …
Product (read-only)name …
InCartBarCode Product
OwnerCustId
Shared DBLocal DB
Customer(x,…)
ADD Owner(x)
open cart
empty cart
insert item(p)
create case
Exist x,p. InCart(x,p)
Product(p,…)
Forall x,p. InCart(x,p)->DEL InCart(x,p)
ADD InCart(getBC(),p)
close case
Example: User Cart
40
CustomerId …
Product (read-only)name …
InCartBarCode Product
OwnerCustId
Shared DBLocal DB
Customer(x,…)
ADD Owner(x)
open cart…
close cart
empty cart
insert item(p)
create case
Exist x,p. InCart(x,p)
Product(p,…)
Forall x,p. InCart(x,p)->DEL InCart(x,p)
ADD InCart(getBC(),p)
close case
Execution SemanticsRelational transition system. Each state is labeled by: • Instance of the shared DB • Case IDs of running cases, together with corresponding
• Instances of local DBs • Markings of their nets
Successors constructed considering all possible ground executable actions and all possible input configurations (s.t. the resulting state satisfies the schema constraints) —> infinite-state transition system
The Good…RAW-SYS are:
• Markovian: Next state only depends on the current state + input. Two states with identical DBs are bisimilar.
• Generic: FO/SQL (as all query languages) does not distinguish structures which are identical modulo uniform renaming of data objects.
—> Two isomorphic states are bisimilar42
… and the BadReachability undecidable even with a single safe net• Counter —> “size” of a unary relation
• Test counter for zero: check whether counter relation is empty • What matters is the # of tuples, not the actual values • Can be reconstructed also without negation in the queries
43
New
Increment Decrement
State-Boundedness [PODS 2013]
Put a pre-defined bound on the DB size (not the size of the data domain!)
• Resulting transition system: still infinite-state • But: infinitely-many encountered values along a
run cannot be “accumulated” in a single state44
RAW-SYS, Boundedness, and Reachability
Reachability undecidable as soon as one of the following conditions holds:
• Shared DB with unbounded size
• Local DB with unbounded size
• Unboundedly many simultaneously running cases
What happens if all these three sources are “bounded in size”?
45
Magic!
46
First-ordertemporal formula
(FO-CTL or FO-LTL withpersistent quantification)
|= �Infinite-state
transition system
Magic!
47
First-ordertemporal formula
(FO-CTL or FO-LTL withpersistent quantification)
|= �Infinite-state
transition system
|= � Propositionaltemporal formula
‘
Finite-stateabstraction
Magic!
48
First-ordertemporal formula
(FO-CTL or FO-LTL withpersistent quantification)
|= �Infinite-state
transition system
|= � Propositionaltemporal formula
‘If and only if
Finite-stateabstraction
Towards Implementations• [IJCAI 2015] Planning can be lifted to deal with this
infinite-state setting • Ongoing implementation effort using DLVk and
state-of-the-art ADL planners
• [SEBD 2015, AMW 2015] Ongoing effort for implementing model checking techniques based on our abstraction natively in relational technology
• Goal: combine the best of databases and formal methods
49
Expected Reality
XES standard for event logs
53
<logxes.version="1.0"xes.features="nested-attributes"><trace><stringkey=“concept:name”value=“1”/><event> <stringkey=“concept:name”value=“registerrequest”/> <datekey=“time:timestamp”value=“2010-12-30T11:02:00.000+01:00”/></event></trace>
Understanding Reality…
56
1..*
*
Conferencecreationtime:DateTime
confname:String
Usercreationtime:DateTime
username:String
Papercreationtime:DateTime
title:String
ReviewRequestinvitationtime:DateTime
Reviewsubmissiontime:DateTime
Decisiondecisiontime:DateTime
outcome:Bool
UploadSubmitteduploadtime:DateTime
uploadaccepteduploadtime:DateTime
submittedto
1
*
organizerof
AcceptedPaper<<notime>>
*
reviewer
1
0..1
PhasD
1
0..1
RhasR
1
10..1 correspondsto
*
UhasP
1
*
AhasU
1
*1 for
author
1..*
*
by
1
*
USuploadbyU
creator
1
*
1*
UAuploadbyU
1
*
From here…
Impedance Mismatch
57
1..*
*
Conferencecreationtime:DateTime
confname:String
Usercreationtime:DateTime
username:String
Papercreationtime:DateTime
title:String
ReviewRequestinvitationtime:DateTime
Reviewsubmissiontime:DateTime
Decisiondecisiontime:DateTime
outcome:Bool
UploadSubmitteduploadtime:DateTime
uploadaccepteduploadtime:DateTime
submittedto
1
*
organizerof
AcceptedPaper<<notime>>
*
reviewer
1
0..1
PhasD
1
0..1
RhasR
1
10..1 correspondsto
*
UhasP
1
*
AhasU
1
*1 for
author
1..*
*
by
1
*
USuploadbyU
creator
1
*
1*
UAuploadbyU
1
*
Key Issues
• How to resolve the “impedance mismatch”?
• How to get a “view” of the data tailored to process mining?
59
PaperInfo
1..*
*
Conferencecreationtime:DateTime
confname:String
Usercreationtime:DateTime
username:String
Papercreationtime:DateTime
title:String
ReviewRequestinvitationtime:DateTime
Reviewsubmissiontime:DateTime
Decisiondecisiontime:DateTime
outcome:Bool
UploadSubmitteduploadtime:DateTime
uploadaccepteduploadtime:DateTime
submittedto
1
*
organizerof
AcceptedPaper<<notime>>
*
reviewer
1
0..1
PhasD
1
0..1
RhasR
1
10..1 correspondsto
*
UhasP
1
*
AhasU
1
*1 for
author
1..*
*
by
1
*
USuploadbyU
creator
1
*
1*
UAuploadbyU
1
*
Impedance Mismatch is Really an Issue
Crompton (2008): domain experts loose too much time to big into data and turn them into knowledge
• Engineers in the oil/gas industry: 30-70% of their working time spent for data searching and data quality
60
Optique
Scalable, End-User Access to Big Data
• http://optique-project.eu
• Goal: engineering techniques for enabling end-users accessing data through domain ontologies
• Case studies: Statoil, Siemens
61
Facts on Statoil• 1000 TB of data inside relational DBMSs
• Schemas not aligned
• More than 2000 tables, in a plethora of different DBs
• 900 experts part of “Statoil Exploration” • Up to 4 days to formulate queries and encode
them in SQL
62
Query Example
63
OBDI framework Query answering Ontology languages Mappings Identity Conclusions
How much time/money is spent searching for data?
A user query at Statoil
Show all norwegian wellbores with some aditional attributes(wellbore id, completion date, oldest penetrated age,result). Limitto all wellbores with a core and show attributes like (wellbore id,core number, top core depth, base core depth, intersectingstratigraphy). Limit to all wellbores with core in Brentgruppen andshow key atributes in a table. After connecting to EPDS (slegge)we could for instance limit futher to cores in Brent with measuredpermeability and where it is larger than a given value, for instance 1mD. We could also find out whether there are cores in Brent whichare not stored in EPDS (based on NPD info) and where there couldbe permeability values. Some of the missing data we possibly own,other not.
At Statoil, it takes up to 4 days to formulate a query in SQL.
Statoil loses up to 50.000.000e per year because of this!!
Diego Calvanese (FUB) Ontologies for Data Integration FOfAI 2015, Buenos Aires – 27/7/2015 (5/52)
64
OBDI framework Query answering Ontology languages Mappings Identity Conclusions
How much time/money is spent searching for data?
A user query at Statoil
Show all norwegian wellbores with some aditional attributes(wellbore id, completion date, oldest penetrated age,result). Limitto all wellbores with a core and show attributes like (wellbore id,core number, top core depth, base core depth, intersectingstratigraphy). Limit to all wellbores with core in Brentgruppen andshow key atributes in a table. After connecting to EPDS (slegge)we could for instance limit futher to cores in Brent with measuredpermeability and where it is larger than a given value, for instance 1mD. We could also find out whether there are cores in Brent whichare not stored in EPDS (based on NPD info) and where there couldbe permeability values. Some of the missing data we possibly own,other not.
SELECT [...]FROMdb_name.table1 table1,db_name.table2 table2a,db_name.table2 table2b,db_name.table3 table3a,db_name.table3 table3b,db_name.table3 table3c,db_name.table3 table3d,db_name.table4 table4a,db_name.table4 table4b,db_name.table4 table4c,db_name.table4 table4d,db_name.table4 table4e,db_name.table4 table4f,db_name.table5 table5a,db_name.table5 table5b,db_name.table6 table6a,db_name.table6 table6b,db_name.table7 table7a,db_name.table7 table7b,db_name.table8 table8,db_name.table9 table9,db_name.table10 table10a,db_name.table10 table10b,db_name.table10 table10c,db_name.table11 table11,db_name.table12 table12,db_name.table13 table13,db_name.table14 table14,db_name.table15 table15,db_name.table16 table16WHERE [...]
table2a.attr1=‘keyword’ ANDtable3a.attr2=table10c.attr1 ANDtable3a.attr6=table6a.attr3 ANDtable3a.attr9=‘keyword’ ANDtable4a.attr10 IN (‘keyword’) ANDtable4a.attr1 IN (‘keyword’) ANDtable5a.kinds=table4a.attr13 ANDtable5b.kinds=table4c.attr74 ANDtable5b.name=‘keyword’ AND(table6a.attr19=table10c.attr17 OR(table6a.attr2 IS NULL ANDtable10c.attr4 IS NULL)) ANDtable6a.attr14=table5b.attr14 ANDtable6a.attr2=‘keyword’ AND(table6b.attr14=table10c.attr8 OR(table6b.attr4 IS NULL ANDtable10c.attr7 IS NULL)) ANDtable6b.attr19=table5a.attr55 ANDtable6b.attr2=‘keyword’ ANDtable7a.attr19=table2b.attr19 ANDtable7a.attr17=table15.attr19 ANDtable4b.attr11=‘keyword’ ANDtable8.attr19=table7a.attr80 ANDtable8.attr19=table13.attr20 ANDtable8.attr4=‘keyword’ ANDtable9.attr10=table16.attr11 ANDtable3b.attr19=table10c.attr18 ANDtable3b.attr22=table12.attr63 ANDtable3b.attr66=‘keyword’ ANDtable10a.attr54=table7a.attr8 ANDtable10a.attr70=table10c.attr10 ANDtable10a.attr16=table4d.attr11 ANDtable4c.attr99=‘keyword’ ANDtable4c.attr1=‘keyword’ AND
table11.attr10=table5a.attr10 ANDtable11.attr40=‘keyword’ ANDtable11.attr50=‘keyword’ ANDtable2b.attr1=table1.attr8 ANDtable2b.attr9 IN (‘keyword’) ANDtable2b.attr2 LIKE ‘keyword’% ANDtable12.attr9 IN (‘keyword’) ANDtable7b.attr1=table2a.attr10 ANDtable3c.attr13=table10c.attr1 ANDtable3c.attr10=table6b.attr20 ANDtable3c.attr13=‘keyword’ ANDtable10b.attr16=table10a.attr7 ANDtable10b.attr11=table7b.attr8 ANDtable10b.attr13=table4b.attr89 ANDtable13.attr1=table2b.attr10 ANDtable13.attr20=’‘keyword’’ ANDtable13.attr15=‘keyword’ ANDtable3d.attr49=table12.attr18 ANDtable3d.attr18=table10c.attr11 ANDtable3d.attr14=‘keyword’ ANDtable4d.attr17 IN (‘keyword’) ANDtable4d.attr19 IN (‘keyword’) ANDtable16.attr28=table11.attr56 ANDtable16.attr16=table10b.attr78 ANDtable16.attr5=table14.attr56 ANDtable4e.attr34 IN (‘keyword’) ANDtable4e.attr48 IN (‘keyword’) ANDtable4f.attr89=table5b.attr7 ANDtable4f.attr45 IN (‘keyword’) ANDtable4f.attr1=‘keyword’ ANDtable10c.attr2=table4e.attr19 AND(table10c.attr78=table12.attr56 OR(table10c.attr55 IS NULL ANDtable12.attr17 IS NULL))
At Statoil, it takes up to 4 days to formulate a query in SQL.
Statoil loses up to 50.000.000e per year because of this!!
Diego Calvanese (FUB) Ontologies for Data Integration FOfAI 2015, Buenos Aires – 27/7/2015 (5/52)
65
OBDI framework Query answering Ontology languages Mappings Identity Conclusions
How much time/money is spent searching for data?
A user query at Statoil
Show all norwegian wellbores with some aditional attributes(wellbore id, completion date, oldest penetrated age,result). Limitto all wellbores with a core and show attributes like (wellbore id,core number, top core depth, base core depth, intersectingstratigraphy). Limit to all wellbores with core in Brentgruppen andshow key atributes in a table. After connecting to EPDS (slegge)we could for instance limit futher to cores in Brent with measuredpermeability and where it is larger than a given value, for instance 1mD. We could also find out whether there are cores in Brent whichare not stored in EPDS (based on NPD info) and where there couldbe permeability values. Some of the missing data we possibly own,other not.
SELECT [...]FROMdb_name.table1 table1,db_name.table2 table2a,db_name.table2 table2b,db_name.table3 table3a,db_name.table3 table3b,db_name.table3 table3c,db_name.table3 table3d,db_name.table4 table4a,db_name.table4 table4b,db_name.table4 table4c,db_name.table4 table4d,db_name.table4 table4e,db_name.table4 table4f,db_name.table5 table5a,db_name.table5 table5b,db_name.table6 table6a,db_name.table6 table6b,db_name.table7 table7a,db_name.table7 table7b,db_name.table8 table8,db_name.table9 table9,db_name.table10 table10a,db_name.table10 table10b,db_name.table10 table10c,db_name.table11 table11,db_name.table12 table12,db_name.table13 table13,db_name.table14 table14,db_name.table15 table15,db_name.table16 table16WHERE [...]
table2a.attr1=‘keyword’ ANDtable3a.attr2=table10c.attr1 ANDtable3a.attr6=table6a.attr3 ANDtable3a.attr9=‘keyword’ ANDtable4a.attr10 IN (‘keyword’) ANDtable4a.attr1 IN (‘keyword’) ANDtable5a.kinds=table4a.attr13 ANDtable5b.kinds=table4c.attr74 ANDtable5b.name=‘keyword’ AND(table6a.attr19=table10c.attr17 OR(table6a.attr2 IS NULL ANDtable10c.attr4 IS NULL)) ANDtable6a.attr14=table5b.attr14 ANDtable6a.attr2=‘keyword’ AND(table6b.attr14=table10c.attr8 OR(table6b.attr4 IS NULL ANDtable10c.attr7 IS NULL)) ANDtable6b.attr19=table5a.attr55 ANDtable6b.attr2=‘keyword’ ANDtable7a.attr19=table2b.attr19 ANDtable7a.attr17=table15.attr19 ANDtable4b.attr11=‘keyword’ ANDtable8.attr19=table7a.attr80 ANDtable8.attr19=table13.attr20 ANDtable8.attr4=‘keyword’ ANDtable9.attr10=table16.attr11 ANDtable3b.attr19=table10c.attr18 ANDtable3b.attr22=table12.attr63 ANDtable3b.attr66=‘keyword’ ANDtable10a.attr54=table7a.attr8 ANDtable10a.attr70=table10c.attr10 ANDtable10a.attr16=table4d.attr11 ANDtable4c.attr99=‘keyword’ ANDtable4c.attr1=‘keyword’ AND
table11.attr10=table5a.attr10 ANDtable11.attr40=‘keyword’ ANDtable11.attr50=‘keyword’ ANDtable2b.attr1=table1.attr8 ANDtable2b.attr9 IN (‘keyword’) ANDtable2b.attr2 LIKE ‘keyword’% ANDtable12.attr9 IN (‘keyword’) ANDtable7b.attr1=table2a.attr10 ANDtable3c.attr13=table10c.attr1 ANDtable3c.attr10=table6b.attr20 ANDtable3c.attr13=‘keyword’ ANDtable10b.attr16=table10a.attr7 ANDtable10b.attr11=table7b.attr8 ANDtable10b.attr13=table4b.attr89 ANDtable13.attr1=table2b.attr10 ANDtable13.attr20=’‘keyword’’ ANDtable13.attr15=‘keyword’ ANDtable3d.attr49=table12.attr18 ANDtable3d.attr18=table10c.attr11 ANDtable3d.attr14=‘keyword’ ANDtable4d.attr17 IN (‘keyword’) ANDtable4d.attr19 IN (‘keyword’) ANDtable16.attr28=table11.attr56 ANDtable16.attr16=table10b.attr78 ANDtable16.attr5=table14.attr56 ANDtable4e.attr34 IN (‘keyword’) ANDtable4e.attr48 IN (‘keyword’) ANDtable4f.attr89=table5b.attr7 ANDtable4f.attr45 IN (‘keyword’) ANDtable4f.attr1=‘keyword’ ANDtable10c.attr2=table4e.attr19 AND(table10c.attr78=table12.attr56 OR(table10c.attr55 IS NULL ANDtable12.attr17 IS NULL))
At Statoil, it takes up to 4 days to formulate a query in SQL.
Statoil loses up to 50.000.000e per year because of this!!
Diego Calvanese (FUB) Ontologies for Data Integration FOfAI 2015, Buenos Aires – 27/7/2015 (5/52)
50.000.000 €/year
Ontology-Based Data Access
66
OBDI framework Query answering Ontology languages Mappings Identity Conclusions
Ontology-based data integration framework
. . .
. . .
. . .
. . .
Query
Result
Ontologyprovides
global vocabulary
and
conceptual view
Mappingssemantically link
sources and
ontology
Data Sourcesexternal and
heterogeneous
We achieve logical transparency in accessing data:
does not know where and how the data is stored.
can only see a conceptual view of the data.
Diego Calvanese (FUB) Ontologies for Data Integration FOfAI 2015, Buenos Aires – 27/7/2015 (7/52)
data sources
“lightweight” conceptual model
mapping
Ontop• Open-source OBDA technology developed at
UNIBZ (supervisor: Diego Calvanese)
• Fully supports semantic web standards (OWL/SPARQL)
• Integrates with a plethora of relational DBMSs
• Apache open license
• http://ontop.inf.unibz.it
67
Resolving the Impedance Mismatch
68
PaperInfo
FullPapercreationTime: DateTime title: String
mappingId fp-mapping
target paper{ID} a :FullPaper; :title {Title}; :creationTime{CT}
source select I.ID, I.Title, I.CT from PaperInfo I where I.Type = “FP”
What if my DB is Very Nice?
Ontology bootstrapping automatically creates
• a conceptual model that mirrors 1-1 the relational DB
• identity mappings
Useful for “small” case studies
69
OBDA for Process Mining• Need to resolve a second impedance mismatch
problem!
• From here…
70
1..*
*
Conferencecreationtime:DateTime
confname:String
Usercreationtime:DateTime
username:String
Papercreationtime:DateTime
title:String
ReviewRequestinvitationtime:DateTime
Reviewsubmissiontime:DateTime
Decisiondecisiontime:DateTime
outcome:Bool
UploadSubmitteduploadtime:DateTime
uploadaccepteduploadtime:DateTime
submittedto
1
*
organizerof
AcceptedPaper<<notime>>
*
reviewer
1
0..1
PhasD
1
0..1
RhasR
1
10..1 correspondsto
*
UhasP
1
*
AhasU
1
*1 for
author
1..*
*
by
1
*
USuploadbyU
creator
1
*
1*
UAuploadbyU
1
*
OBDA for Process Mining• …To there!
71
-xes.version:xs:decimal-xes.features:xs:token
Log
-name:xs:string-keys:xs:string
Classifier-prefix:xs:string-uri:xs:string-key:xs:string-type:xs:string
Extension
Trace
Event
-key:xs:string-value:xs:string-type:xs:string
Attribute
-scope:{event,trace}
GlobalAttribute
0..*
-declare1
1
-define1
1 1..*
-contain11
0..*
-contain21
0..*
-contain31
0..*
-contain5
1
0..*
-define21..*
1..*
-declare20..1
1
-contain4
1
0..*
-contain60..*
*
Log Annotations
74
1..*
*
Conferencecreationtime:DateTime
confname:String
Usercreationtime:DateTime
username:String
Papercreationtime:DateTime
title:String
ReviewRequestinvitationtime:DateTime
Reviewsubmissiontime:DateTime
Decisiondecisiontime:DateTime
outcome:Bool
UploadSubmitteduploadtime:DateTime
UploadAccepteduploadtime:DateTime
submittedto
1
*
organizerof
AcceptedPaper<<notime>>
*
reviewer
1
0..1
PhasD
1
0..1
RhasR
1
10..1 correspondsto
*
UhasP
1
*
AhasU
1
*1 for
author
1..*
*
by
1
*
USuploadbyU
creator
1
*
1*
UAuploadbyU
1
*
trace
event
event
eventevent
trace:followhasactivityname:“decision”timestamp:decisiontime
resource:followbytype:complete
attributes:outcome
trace:followhas&foractivityname:“review”
timestamp:submissiontimeresource:followRhasR&reviewer
type:complete
trace:followhasactivityname:“uploadsubmitted”
timestamp:uploadtimeresource:followUSuploadbyU
type:complete
trace:followhas&corr.toactivityname:“uploadaccepted”
timestamp:uploadtimeresource:followUAuploadbyU
type:complete
submittedto=BPM2015
75
1..*
*
Conferencecreationtime:DateTime
confname:String
Usercreationtime:DateTime
username:String
Papercreationtime:DateTime
title:String
ReviewRequestinvitationtime:DateTime
Reviewsubmissiontime:DateTime
Decisiondecisiontime:DateTime
outcome:Bool
UploadSubmitteduploadtime:DateTime
UploadAccepteduploadtime:DateTime
submittedto
1
*
organizerof
AcceptedPaper<<notime>>
*
reviewer
1
0..1
PhasD
1
0..1
RhasR
1
10..1 correspondsto
*
UhasP
1
*
AhasU
1
*1 for
author
1..*
*
by
1
*
USuploadbyU
creator
1
*
1*
UAuploadbyU
1
*
trace
event
event
eventevent
trace:followhasactivityname:“decision”timestamp:decisiontime
resource:followbytype:complete
attributes:outcome
trace:followhas&foractivityname:“review”
timestamp:submissiontimeresource:followRhasR&reviewer
type:complete
trace:followhasactivityname:“uploadsubmitted”
timestamp:uploadtimeresource:followUSuploadbyU
type:complete
trace:followhas&corr.toactivityname:“uploadaccepted”
timestamp:uploadtimeresource:followUAuploadbyU
type:complete
submittedto=BPM2015
Multiple Log Views
76
1..*
*
Conferencecreationtime:DateTime
confname:String
Usercreationtime:DateTime
username:String
Papercreationtime:DateTime
title:String
ReviewRequestinvitationtime:DateTime
Reviewsubmissiontime:DateTime
Decisiondecisiontime:DateTime
outcome:Bool
UploadSubmitteduploadtime:DateTime
UploadAccepteduploadtime:DateTime
submittedto
1
*
organizerof
AcceptedPaper<<notime>>
*
reviewer
1
0..1
PhasD
1
0..1
RhasR
1
10..1 correspondsto
*
UhasP
1
*
AhasU
1
*1 for
author
1..*
*
by
1
*
USuploadbyU
creator
1
*
1*
UAuploadbyU
1
*
trace
event
trace:followhasauthoractivityname:“decisionauthor”
timestamp:decisiontimeresource:followPhasD
type:complete
eventtrace:followby
activityname:“decisionchair”timestamp:decisiontimeresource:followPhasD
type:completeattributes:outcome
eventtrace:followhas&revieweractivityname:“review”
timestamp:submissiontimeresource:followRhasR&for
type:complete
eventtrace:followuploadby
activityname:“uploadsubmitted”timestamp:uploadtimeresource:followUhasP
type:complete
77
1..*
*
Conferencecreationtime:DateTime
confname:String
Usercreationtime:DateTime
username:String
Papercreationtime:DateTime
title:String
ReviewRequestinvitationtime:DateTime
Reviewsubmissiontime:DateTime
Decisiondecisiontime:DateTime
outcome:Bool
UploadSubmitteduploadtime:DateTime
UploadAccepteduploadtime:DateTime
submittedto
1
*
organizerof
AcceptedPaper<<notime>>
*
reviewer
1
0..1
PhasD
1
0..1
RhasR
1
10..1 correspondsto
*
UhasP
1
*
AhasU
1
*1 for
author
1..*
*
by
1
*
USuploadbyU
creator
1
*
1*
UAuploadbyU
1
*
trace
event
trace:followhasauthoractivityname:“decisionauthor”
timestamp:decisiontimeresource:followPhasD
type:complete
eventtrace:followby
activityname:“decisionchair”timestamp:decisiontimeresource:followPhasD
type:completeattributes:outcome
eventtrace:followhas&revieweractivityname:“review”
timestamp:submissiontimeresource:followRhasR&for
type:complete
eventtrace:followuploadby
activityname:“uploadsubmitted”timestamp:uploadtimeresource:followUhasP
type:complete
And Now?
78
database
database
database
DomainOntology EventOntology
XESLogExtraction
Mapping
Annota+on
?
?
Mapping Synthesis
79
database
database
database
DomainOntology EventOntology
XESLogExtraction
Mapping
Annota+on
?
?
Automatically synthesized
1. Annotation transformed into an ontology-to-ontology mapping M’
2. M’ is “rewritten” using the data-to-domain ontology mapping
3. The result is a mapping connecting the XES ontology directly to the data
Log Materialization
80
database
database
database
DomainOntology EventOntology
Mapping
Annotation
LogMapping
XESEventData
XESFile
ProcessMiningTools
1
2 3 4
SELECT DISTINCT ?t ?v ?e WHERE {?t :TcontainsA ?ta . ?ta :valueA ?v.
?t :TcontainsE ?e.} SELECT DISTINCT ?e ?t WHERE {?e :EcontainsA ?a . ?a :typeA ?t.} SELECT DISTINCT ?e ?t WHERE {?e :EcontainsA ?a . ?a :keyA ?t.} SELECT DISTINCT ?e ?t WHERE {?e :EcontainsA ?a . ?a :valueA ?t.}
Log Virtualization
83
database
database
database
DomainOntology EventOntology
Mapping
AnnotationProcess
MiningTools
OnDe
mand
XESLoader
LogMapping
1
2
XFactoryOnDemandImpl XLogOnDemandImpl XTraceOnDemandImpl XEventOnDemandImpl XLogOnDemandIterator XTraceOnDemandIterator
xlog.get(7).get(90) toretrieveteeventinindex7thinsidethe90thtraceinalog
Questions• How to optimize and test the scalability of the
approach? Fine-tuning is a must!
• Real vs simulated data? (Benchmarking OBDA)
• Initial benchmarking using CPN tools
• Is the “virtual” approach useful? How do process mining algorithms access the data?
• Hybrid virtual approach with caching strategies?84
KAOS ProjectKnowledge-Aware Operational Support
• Goal: Empowering process mining and online operational support with domain knowledge
• Euregio project: Trento + Bolzano + Innsbruck
• Mix of expertise from AI, BPM, database theory, formal methods, formal ontology, conceptual modeling, process mining, machine learning, software engineering
• Just started: we are hiring!!!
85
AcknowledgmentsAll coauthors of this research,
in particular
Diego Calvanese (UNIBZ)Giuseppe De Giacomo (UNIROMA)Riccardo De Masellis (FBK-Trento)
Alin Deutsch (UCSD)Chiara Difrancescomarino (FBK-Trento)
Chiara Ghidini (FBK-Trento)Fabio Patrizi (UNIBZ)
Sergio Tessaris (UNIBZ)Alifah Syamsiyah (TU/e)Wil van der Aalst (TU/e)
87