ATAED2016 Montali - Marrying data and processes: from model to event data analysis

87
From Model to Event Data Analysis Marco Montali Free University of Bozen-Bolzano ATAED 2016 Marrying Data and Processes

Transcript of ATAED2016 Montali - Marrying data and processes: from model to event data analysis

From Model to Event Data Analysis

Marco Montali Free University of Bozen-Bolzano

ATAED 2016

Marrying Data and Processes

Our Starting Point

Marrying processes and data is a must if we want to really understand

how complex dynamic systems operate

Dynamic systems of interest: • business processes• multiagent systems • distributed systems

2

Our ThesisKnowledge representation and

computational logics

is a swiss-army knife to

understand data-aware dynamic systems, and

provide automated reasoning and verification capabilities along their entire lifecycle

3

Business Process Lifecycle

4

picture by Wil van der Aalst

Formal Verification

Automated analysis of a formal model of the system against a property of interest,

considering all possible system behaviors5

picture by Wil van der Aalst

Process Mining

Extraction of valuable, process-related information

from event logs, i.e., the footprint of reality

6

picture by Wil van der Aalst

7

Loneliness

Data/Process Fragmentation• A business process consists of a set of activities that

are performed in coordination in an organizational and technical environment [Weske, 2007]

• Activities change the real world• The corresponding updates are reflected into the

organizational information system(s) • Data trigger decision-making, which in turn determines

the next steps to be taken in the process

• Survey by Forrester [Karel et al, 2009]: lack of interaction between data and process experts

9

Experts Dichotomy• BPM professionals: data are subsidiary to

processes

• Master data managers: data are the main driver for the company’s existence

• Forrester: in 83/100 companies, no interaction at all between these two groups • This isolation propagates to languages and tools,

which never properly account for the process-data connection

10

Conventional Data ModelingFocus: revelant entities, relations, static constraints

Supplier ManufacturingProcurement/Supplier

Sales

Customer PO Line Item

Work OrderMaterial PO

*

*

spawns0..1

Material

But… how do data evolve? Where can we find the “state” of a purchase order?

11

Conventional Process ModelingFocus: control-flow of activities in response to events

But… how do activities update data? What is the impact of canceling an order?

12

A Deployed Process

13

Do you like Spaghetti?Manage

CancelationShipAssembleManage

Material POsDecompose

Customer PO

Activities

Process

Data

Activities

Process

Data

Activities

Process

Data

Activities

Process

Data

Activities

Process

Data

Customers Suppliers&CataloguesCustomer POs Work Orders Material POs

IT integration: difficult to manage, understand, evolve14

Too Late…• Where are the data?

• Where shall we model relevant business rules?

15

Too late to reconstruct the missing pieces

Where is our data?part is in the DBs,part is hidden in the process execution engine.

Where are the relevant business rules, and how are they modeled?At the DB level? Which DB? How to import the process data?(Also) in the business model? How to import data from the DBs?

DataProcess

Supplier ManufacturingProcurement/Supplier

Sales

Customer PO Line Item

Work OrderMaterial PO

*

*

spawns0..1

Determine cancelation

penaltyNotify penalty

Material

Process Engine

Process State

Business rulesFor each work order W For each material PO M in W if M has been shipped add returnCost(M) to penalty

Diego Calvanese (FUB) Foundations of Data-Aware Process Analysis INRIA Saclay Paris – 18/3/2016 (10/1)

““

How is Research Reacting?

A recent review…

Verification typically takes place at the design stage of a business process type. However, at this stage, required knowledge about data (database schema, integrity constraints) is typically not yet available.

16

…But There is Hope!• [Meyer et al, 2011]: data-process integration

crucial to assess the value of processes and evaluate KPIs

• [Dumas, 2011]: data-process integration crucial to aggregate all relevant information, and to suitably inject business rules into the system

• [Reichert, 2012]: “Process and data are just two sides of the same coin”

17

Formal Verification The Conventional, Propositional Case

Process control-flow

(Un)desired property18

(Un)desired property

Finite-statetransition system

Propositionaltemporal formula|= �

Formal Verification The Conventional, Propositional Case

Process control-flow

19

(Un)desired property

Finite-statetransition system

Propositionaltemporal formula|= �

Verification via model checking2007 Turing award:

Clarke, Emerson, Sifakis

Formal Verification The Conventional, Propositional Case

Process control-flow

20

Marriage

(Un)desired property

Formal Verification The Data-Aware Case

22

Process+Data

(Un)desired property

First-ordertemporal formula|= �

Process+Data

Formal Verification The Data-Aware Case

Infinite-state, relational transition system [Vardi 2005] 23

(Un)desired property

First-ordertemporal formula|= �

?Formal Verification

The Data-Aware Case

24

Process+Data

Infinite-state, relational transition system [Vardi 2005]

Why FO Temporal Logics• To inspect data: FO queries• To capture system dynamics: temporal

modalities• To track the evolution of objects: FO

quantification across states • Example:

It is always the case that every order is eventually either cancelled or paid

25

Why FO Temporal Logics• To inspect data: FO queries• To capture system dynamics: temporal

modalities• To track the evolution of objects: FO

quantification across states • Example:

It is always the case that every order is eventually either cancelled or paid

26

G

✓8x.Order(x)

! F�State(x, cancelled) _ State(x, paid)

�◆

Problem DimensionsData

component Relational DB Description logic KB OBDA system Inconsistency

tolerant KB …

Process component

condition-action rules BPMN Golog program Petri nets …

Task modeling

Conditional effects

Add/delete assertions Programs User forms …

External inputs None External

services Input DB Fixed input …

Network topology

Single orchestrator Full mesh Connected, fixed

graph Ring …

Interaction mechanism None Synchronous Asynchronous

and orderedAsynchronous

lossy …27

Colored Petri Nets

28

Colored Petri Nets

• Where is the data model?• Formal analysis only with a-priori propositionalization

29

Tools (e.g. BizAgi)

30

ReviewRequest

Fill Reim-bursement

Review Reim-bursement

Rejected

Accepted

• Which formal semantics? • Analysable?

Atrue

false

BPMS and Data

Correct?

• BizAgi…

• YAWL…

31

Atrue

false

BPMS and Data

Correct?

• BizAgi… not sure…

• YAWL… YES!

32

RAW-SYS• Integrated data+process modeling

• Standard relational model for capturing data • Standard workflow nets (or other types of Petri nets) for capturing

processes

• Net transitions interplay with data • Conditionally enabled by FO queries over the data • Described in terms of full-fledged CRUD operations over the data

• Bridge between theory and practice • Mimics how BPMS actually work • Has unambiguous execution semantics

33

RAW-SYS

34

Task

local

Task

local

TaskCase

new case close case archive

shared

local

archive

Example: User Cart

35

CustomerId …

Product (read-only)name …

InCartBarCode Product

OwnerCustId

Shared DBLocal DB

create case

close case

Example: User Cart

36

CustomerId …

Product (read-only)name …

InCartBarCode Product

OwnerCustId

Shared DBLocal DB

Customer(x,…)

ADD Owner(x)

create case

close case

Example: User Cart

37

CustomerId …

Product (read-only)name …

InCartBarCode Product

OwnerCustId

Shared DBLocal DB

Customer(x,…)

ADD Owner(x)

open cart…

create case

close case

Example: User Cart

38

CustomerId …

Product (read-only)name …

InCartBarCode Product

OwnerCustId

Shared DBLocal DB

Customer(x,…)

ADD Owner(x)

open cart

insert item(p)

create case

Product(p,…)ADD InCart(getBC(),p)

close case

Example: User Cart

39

CustomerId …

Product (read-only)name …

InCartBarCode Product

OwnerCustId

Shared DBLocal DB

Customer(x,…)

ADD Owner(x)

open cart

empty cart

insert item(p)

create case

Exist x,p. InCart(x,p)

Product(p,…)

Forall x,p. InCart(x,p)->DEL InCart(x,p)

ADD InCart(getBC(),p)

close case

Example: User Cart

40

CustomerId …

Product (read-only)name …

InCartBarCode Product

OwnerCustId

Shared DBLocal DB

Customer(x,…)

ADD Owner(x)

open cart…

close cart

empty cart

insert item(p)

create case

Exist x,p. InCart(x,p)

Product(p,…)

Forall x,p. InCart(x,p)->DEL InCart(x,p)

ADD InCart(getBC(),p)

close case

Execution SemanticsRelational transition system. Each state is labeled by: • Instance of the shared DB • Case IDs of running cases, together with corresponding

• Instances of local DBs • Markings of their nets

Successors constructed considering all possible ground executable actions and all possible input configurations (s.t. the resulting state satisfies the schema constraints) —> infinite-state transition system

The Good…RAW-SYS are:

• Markovian: Next state only depends on the current state + input. Two states with identical DBs are bisimilar.

• Generic: FO/SQL (as all query languages) does not distinguish structures which are identical modulo uniform renaming of data objects.

—> Two isomorphic states are bisimilar42

… and the BadReachability undecidable even with a single safe net• Counter —> “size” of a unary relation

• Test counter for zero: check whether counter relation is empty • What matters is the # of tuples, not the actual values • Can be reconstructed also without negation in the queries

43

New

Increment Decrement

State-Boundedness [PODS 2013]

Put a pre-defined bound on the DB size (not the size of the data domain!)

• Resulting transition system: still infinite-state • But: infinitely-many encountered values along a

run cannot be “accumulated” in a single state44

RAW-SYS, Boundedness, and Reachability

Reachability undecidable as soon as one of the following conditions holds:

• Shared DB with unbounded size

• Local DB with unbounded size

• Unboundedly many simultaneously running cases

What happens if all these three sources are “bounded in size”?

45

Magic!

46

First-ordertemporal formula

(FO-CTL or FO-LTL withpersistent quantification)

|= �Infinite-state

transition system

Magic!

47

First-ordertemporal formula

(FO-CTL or FO-LTL withpersistent quantification)

|= �Infinite-state

transition system

|= � Propositionaltemporal formula

Finite-stateabstraction

Magic!

48

First-ordertemporal formula

(FO-CTL or FO-LTL withpersistent quantification)

|= �Infinite-state

transition system

|= � Propositionaltemporal formula

‘If and only if

Finite-stateabstraction

Towards Implementations• [IJCAI 2015] Planning can be lifted to deal with this

infinite-state setting • Ongoing implementation effort using DLVk and

state-of-the-art ADL planners

• [SEBD 2015, AMW 2015] Ongoing effort for implementing model checking techniques based on our abstraction natively in relational technology

• Goal: combine the best of databases and formal methods

49

Process Mining

50 picture by Wil van der Aalst

Process Mining

51 picture by Wil van der Aalst

Expected Reality

52

log

traceevent

Expected Reality

XES standard for event logs

53

<logxes.version="1.0"xes.features="nested-attributes"><trace><stringkey=“concept:name”value=“1”/><event> <stringkey=“concept:name”value=“registerrequest”/> <datekey=“time:timestamp”value=“2010-12-30T11:02:00.000+01:00”/></event></trace>

Actual Reality

54

Actual Reality

55

PaperInfo

Understanding Reality…

56

1..*

*

Conferencecreationtime:DateTime

confname:String

Usercreationtime:DateTime

username:String

Papercreationtime:DateTime

title:String

ReviewRequestinvitationtime:DateTime

Reviewsubmissiontime:DateTime

Decisiondecisiontime:DateTime

outcome:Bool

UploadSubmitteduploadtime:DateTime

uploadaccepteduploadtime:DateTime

submittedto

1

*

organizerof

AcceptedPaper<<notime>>

*

reviewer

1

0..1

PhasD

1

0..1

RhasR

1

10..1 correspondsto

*

UhasP

1

*

AhasU

1

*1 for

author

1..*

*

by

1

*

USuploadbyU

creator

1

*

1*

UAuploadbyU

1

*

From here…

Impedance Mismatch

57

1..*

*

Conferencecreationtime:DateTime

confname:String

Usercreationtime:DateTime

username:String

Papercreationtime:DateTime

title:String

ReviewRequestinvitationtime:DateTime

Reviewsubmissiontime:DateTime

Decisiondecisiontime:DateTime

outcome:Bool

UploadSubmitteduploadtime:DateTime

uploadaccepteduploadtime:DateTime

submittedto

1

*

organizerof

AcceptedPaper<<notime>>

*

reviewer

1

0..1

PhasD

1

0..1

RhasR

1

10..1 correspondsto

*

UhasP

1

*

AhasU

1

*1 for

author

1..*

*

by

1

*

USuploadbyU

creator

1

*

1*

UAuploadbyU

1

*

…to there!

Impedance Mismatch

58

Key Issues

• How to resolve the “impedance mismatch”?

• How to get a “view” of the data tailored to process mining?

59

PaperInfo

1..*

*

Conferencecreationtime:DateTime

confname:String

Usercreationtime:DateTime

username:String

Papercreationtime:DateTime

title:String

ReviewRequestinvitationtime:DateTime

Reviewsubmissiontime:DateTime

Decisiondecisiontime:DateTime

outcome:Bool

UploadSubmitteduploadtime:DateTime

uploadaccepteduploadtime:DateTime

submittedto

1

*

organizerof

AcceptedPaper<<notime>>

*

reviewer

1

0..1

PhasD

1

0..1

RhasR

1

10..1 correspondsto

*

UhasP

1

*

AhasU

1

*1 for

author

1..*

*

by

1

*

USuploadbyU

creator

1

*

1*

UAuploadbyU

1

*

Impedance Mismatch is Really an Issue

Crompton (2008): domain experts loose too much time to big into data and turn them into knowledge

• Engineers in the oil/gas industry: 30-70% of their working time spent for data searching and data quality

60

Optique

Scalable, End-User Access to Big Data

• http://optique-project.eu

• Goal: engineering techniques for enabling end-users accessing data through domain ontologies

• Case studies: Statoil, Siemens

61

Facts on Statoil• 1000 TB of data inside relational DBMSs

• Schemas not aligned

• More than 2000 tables, in a plethora of different DBs

• 900 experts part of “Statoil Exploration” • Up to 4 days to formulate queries and encode

them in SQL

62

Query Example

63

OBDI framework Query answering Ontology languages Mappings Identity Conclusions

How much time/money is spent searching for data?

A user query at Statoil

Show all norwegian wellbores with some aditional attributes(wellbore id, completion date, oldest penetrated age,result). Limitto all wellbores with a core and show attributes like (wellbore id,core number, top core depth, base core depth, intersectingstratigraphy). Limit to all wellbores with core in Brentgruppen andshow key atributes in a table. After connecting to EPDS (slegge)we could for instance limit futher to cores in Brent with measuredpermeability and where it is larger than a given value, for instance 1mD. We could also find out whether there are cores in Brent whichare not stored in EPDS (based on NPD info) and where there couldbe permeability values. Some of the missing data we possibly own,other not.

At Statoil, it takes up to 4 days to formulate a query in SQL.

Statoil loses up to 50.000.000e per year because of this!!

Diego Calvanese (FUB) Ontologies for Data Integration FOfAI 2015, Buenos Aires – 27/7/2015 (5/52)

64

OBDI framework Query answering Ontology languages Mappings Identity Conclusions

How much time/money is spent searching for data?

A user query at Statoil

Show all norwegian wellbores with some aditional attributes(wellbore id, completion date, oldest penetrated age,result). Limitto all wellbores with a core and show attributes like (wellbore id,core number, top core depth, base core depth, intersectingstratigraphy). Limit to all wellbores with core in Brentgruppen andshow key atributes in a table. After connecting to EPDS (slegge)we could for instance limit futher to cores in Brent with measuredpermeability and where it is larger than a given value, for instance 1mD. We could also find out whether there are cores in Brent whichare not stored in EPDS (based on NPD info) and where there couldbe permeability values. Some of the missing data we possibly own,other not.

SELECT [...]FROMdb_name.table1 table1,db_name.table2 table2a,db_name.table2 table2b,db_name.table3 table3a,db_name.table3 table3b,db_name.table3 table3c,db_name.table3 table3d,db_name.table4 table4a,db_name.table4 table4b,db_name.table4 table4c,db_name.table4 table4d,db_name.table4 table4e,db_name.table4 table4f,db_name.table5 table5a,db_name.table5 table5b,db_name.table6 table6a,db_name.table6 table6b,db_name.table7 table7a,db_name.table7 table7b,db_name.table8 table8,db_name.table9 table9,db_name.table10 table10a,db_name.table10 table10b,db_name.table10 table10c,db_name.table11 table11,db_name.table12 table12,db_name.table13 table13,db_name.table14 table14,db_name.table15 table15,db_name.table16 table16WHERE [...]

table2a.attr1=‘keyword’ ANDtable3a.attr2=table10c.attr1 ANDtable3a.attr6=table6a.attr3 ANDtable3a.attr9=‘keyword’ ANDtable4a.attr10 IN (‘keyword’) ANDtable4a.attr1 IN (‘keyword’) ANDtable5a.kinds=table4a.attr13 ANDtable5b.kinds=table4c.attr74 ANDtable5b.name=‘keyword’ AND(table6a.attr19=table10c.attr17 OR(table6a.attr2 IS NULL ANDtable10c.attr4 IS NULL)) ANDtable6a.attr14=table5b.attr14 ANDtable6a.attr2=‘keyword’ AND(table6b.attr14=table10c.attr8 OR(table6b.attr4 IS NULL ANDtable10c.attr7 IS NULL)) ANDtable6b.attr19=table5a.attr55 ANDtable6b.attr2=‘keyword’ ANDtable7a.attr19=table2b.attr19 ANDtable7a.attr17=table15.attr19 ANDtable4b.attr11=‘keyword’ ANDtable8.attr19=table7a.attr80 ANDtable8.attr19=table13.attr20 ANDtable8.attr4=‘keyword’ ANDtable9.attr10=table16.attr11 ANDtable3b.attr19=table10c.attr18 ANDtable3b.attr22=table12.attr63 ANDtable3b.attr66=‘keyword’ ANDtable10a.attr54=table7a.attr8 ANDtable10a.attr70=table10c.attr10 ANDtable10a.attr16=table4d.attr11 ANDtable4c.attr99=‘keyword’ ANDtable4c.attr1=‘keyword’ AND

table11.attr10=table5a.attr10 ANDtable11.attr40=‘keyword’ ANDtable11.attr50=‘keyword’ ANDtable2b.attr1=table1.attr8 ANDtable2b.attr9 IN (‘keyword’) ANDtable2b.attr2 LIKE ‘keyword’% ANDtable12.attr9 IN (‘keyword’) ANDtable7b.attr1=table2a.attr10 ANDtable3c.attr13=table10c.attr1 ANDtable3c.attr10=table6b.attr20 ANDtable3c.attr13=‘keyword’ ANDtable10b.attr16=table10a.attr7 ANDtable10b.attr11=table7b.attr8 ANDtable10b.attr13=table4b.attr89 ANDtable13.attr1=table2b.attr10 ANDtable13.attr20=’‘keyword’’ ANDtable13.attr15=‘keyword’ ANDtable3d.attr49=table12.attr18 ANDtable3d.attr18=table10c.attr11 ANDtable3d.attr14=‘keyword’ ANDtable4d.attr17 IN (‘keyword’) ANDtable4d.attr19 IN (‘keyword’) ANDtable16.attr28=table11.attr56 ANDtable16.attr16=table10b.attr78 ANDtable16.attr5=table14.attr56 ANDtable4e.attr34 IN (‘keyword’) ANDtable4e.attr48 IN (‘keyword’) ANDtable4f.attr89=table5b.attr7 ANDtable4f.attr45 IN (‘keyword’) ANDtable4f.attr1=‘keyword’ ANDtable10c.attr2=table4e.attr19 AND(table10c.attr78=table12.attr56 OR(table10c.attr55 IS NULL ANDtable12.attr17 IS NULL))

At Statoil, it takes up to 4 days to formulate a query in SQL.

Statoil loses up to 50.000.000e per year because of this!!

Diego Calvanese (FUB) Ontologies for Data Integration FOfAI 2015, Buenos Aires – 27/7/2015 (5/52)

65

OBDI framework Query answering Ontology languages Mappings Identity Conclusions

How much time/money is spent searching for data?

A user query at Statoil

Show all norwegian wellbores with some aditional attributes(wellbore id, completion date, oldest penetrated age,result). Limitto all wellbores with a core and show attributes like (wellbore id,core number, top core depth, base core depth, intersectingstratigraphy). Limit to all wellbores with core in Brentgruppen andshow key atributes in a table. After connecting to EPDS (slegge)we could for instance limit futher to cores in Brent with measuredpermeability and where it is larger than a given value, for instance 1mD. We could also find out whether there are cores in Brent whichare not stored in EPDS (based on NPD info) and where there couldbe permeability values. Some of the missing data we possibly own,other not.

SELECT [...]FROMdb_name.table1 table1,db_name.table2 table2a,db_name.table2 table2b,db_name.table3 table3a,db_name.table3 table3b,db_name.table3 table3c,db_name.table3 table3d,db_name.table4 table4a,db_name.table4 table4b,db_name.table4 table4c,db_name.table4 table4d,db_name.table4 table4e,db_name.table4 table4f,db_name.table5 table5a,db_name.table5 table5b,db_name.table6 table6a,db_name.table6 table6b,db_name.table7 table7a,db_name.table7 table7b,db_name.table8 table8,db_name.table9 table9,db_name.table10 table10a,db_name.table10 table10b,db_name.table10 table10c,db_name.table11 table11,db_name.table12 table12,db_name.table13 table13,db_name.table14 table14,db_name.table15 table15,db_name.table16 table16WHERE [...]

table2a.attr1=‘keyword’ ANDtable3a.attr2=table10c.attr1 ANDtable3a.attr6=table6a.attr3 ANDtable3a.attr9=‘keyword’ ANDtable4a.attr10 IN (‘keyword’) ANDtable4a.attr1 IN (‘keyword’) ANDtable5a.kinds=table4a.attr13 ANDtable5b.kinds=table4c.attr74 ANDtable5b.name=‘keyword’ AND(table6a.attr19=table10c.attr17 OR(table6a.attr2 IS NULL ANDtable10c.attr4 IS NULL)) ANDtable6a.attr14=table5b.attr14 ANDtable6a.attr2=‘keyword’ AND(table6b.attr14=table10c.attr8 OR(table6b.attr4 IS NULL ANDtable10c.attr7 IS NULL)) ANDtable6b.attr19=table5a.attr55 ANDtable6b.attr2=‘keyword’ ANDtable7a.attr19=table2b.attr19 ANDtable7a.attr17=table15.attr19 ANDtable4b.attr11=‘keyword’ ANDtable8.attr19=table7a.attr80 ANDtable8.attr19=table13.attr20 ANDtable8.attr4=‘keyword’ ANDtable9.attr10=table16.attr11 ANDtable3b.attr19=table10c.attr18 ANDtable3b.attr22=table12.attr63 ANDtable3b.attr66=‘keyword’ ANDtable10a.attr54=table7a.attr8 ANDtable10a.attr70=table10c.attr10 ANDtable10a.attr16=table4d.attr11 ANDtable4c.attr99=‘keyword’ ANDtable4c.attr1=‘keyword’ AND

table11.attr10=table5a.attr10 ANDtable11.attr40=‘keyword’ ANDtable11.attr50=‘keyword’ ANDtable2b.attr1=table1.attr8 ANDtable2b.attr9 IN (‘keyword’) ANDtable2b.attr2 LIKE ‘keyword’% ANDtable12.attr9 IN (‘keyword’) ANDtable7b.attr1=table2a.attr10 ANDtable3c.attr13=table10c.attr1 ANDtable3c.attr10=table6b.attr20 ANDtable3c.attr13=‘keyword’ ANDtable10b.attr16=table10a.attr7 ANDtable10b.attr11=table7b.attr8 ANDtable10b.attr13=table4b.attr89 ANDtable13.attr1=table2b.attr10 ANDtable13.attr20=’‘keyword’’ ANDtable13.attr15=‘keyword’ ANDtable3d.attr49=table12.attr18 ANDtable3d.attr18=table10c.attr11 ANDtable3d.attr14=‘keyword’ ANDtable4d.attr17 IN (‘keyword’) ANDtable4d.attr19 IN (‘keyword’) ANDtable16.attr28=table11.attr56 ANDtable16.attr16=table10b.attr78 ANDtable16.attr5=table14.attr56 ANDtable4e.attr34 IN (‘keyword’) ANDtable4e.attr48 IN (‘keyword’) ANDtable4f.attr89=table5b.attr7 ANDtable4f.attr45 IN (‘keyword’) ANDtable4f.attr1=‘keyword’ ANDtable10c.attr2=table4e.attr19 AND(table10c.attr78=table12.attr56 OR(table10c.attr55 IS NULL ANDtable12.attr17 IS NULL))

At Statoil, it takes up to 4 days to formulate a query in SQL.

Statoil loses up to 50.000.000e per year because of this!!

Diego Calvanese (FUB) Ontologies for Data Integration FOfAI 2015, Buenos Aires – 27/7/2015 (5/52)

50.000.000 €/year

Ontology-Based Data Access

66

OBDI framework Query answering Ontology languages Mappings Identity Conclusions

Ontology-based data integration framework

. . .

. . .

. . .

. . .

Query

Result

Ontologyprovides

global vocabulary

and

conceptual view

Mappingssemantically link

sources and

ontology

Data Sourcesexternal and

heterogeneous

We achieve logical transparency in accessing data:

does not know where and how the data is stored.

can only see a conceptual view of the data.

Diego Calvanese (FUB) Ontologies for Data Integration FOfAI 2015, Buenos Aires – 27/7/2015 (7/52)

data sources

“lightweight” conceptual model

mapping

Ontop• Open-source OBDA technology developed at

UNIBZ (supervisor: Diego Calvanese)

• Fully supports semantic web standards (OWL/SPARQL)

• Integrates with a plethora of relational DBMSs

• Apache open license

• http://ontop.inf.unibz.it

67

Resolving the Impedance Mismatch

68

PaperInfo

FullPapercreationTime: DateTime title: String

mappingId fp-mapping

target paper{ID} a :FullPaper; :title {Title}; :creationTime{CT}

source select I.ID, I.Title, I.CT from PaperInfo I where I.Type = “FP”

What if my DB is Very Nice?

Ontology bootstrapping automatically creates

• a conceptual model that mirrors 1-1 the relational DB

• identity mappings

Useful for “small” case studies

69

OBDA for Process Mining• Need to resolve a second impedance mismatch

problem!

• From here…

70

1..*

*

Conferencecreationtime:DateTime

confname:String

Usercreationtime:DateTime

username:String

Papercreationtime:DateTime

title:String

ReviewRequestinvitationtime:DateTime

Reviewsubmissiontime:DateTime

Decisiondecisiontime:DateTime

outcome:Bool

UploadSubmitteduploadtime:DateTime

uploadaccepteduploadtime:DateTime

submittedto

1

*

organizerof

AcceptedPaper<<notime>>

*

reviewer

1

0..1

PhasD

1

0..1

RhasR

1

10..1 correspondsto

*

UhasP

1

*

AhasU

1

*1 for

author

1..*

*

by

1

*

USuploadbyU

creator

1

*

1*

UAuploadbyU

1

*

OBDA for Process Mining• …To there!

71

-xes.version:xs:decimal-xes.features:xs:token

Log

-name:xs:string-keys:xs:string

Classifier-prefix:xs:string-uri:xs:string-key:xs:string-type:xs:string

Extension

Trace

Event

-key:xs:string-value:xs:string-type:xs:string

Attribute

-scope:{event,trace}

GlobalAttribute

0..*

-declare1

1

-define1

1 1..*

-contain11

0..*

-contain21

0..*

-contain31

0..*

-contain5

1

0..*

-define21..*

1..*

-declare20..1

1

-contain4

1

0..*

-contain60..*

*

OBDA for Process Mining• From here…

72

PaperInfo

OBDA for Process Mining• …To there!

73

log

traceevent

Log Annotations

74

1..*

*

Conferencecreationtime:DateTime

confname:String

Usercreationtime:DateTime

username:String

Papercreationtime:DateTime

title:String

ReviewRequestinvitationtime:DateTime

Reviewsubmissiontime:DateTime

Decisiondecisiontime:DateTime

outcome:Bool

UploadSubmitteduploadtime:DateTime

UploadAccepteduploadtime:DateTime

submittedto

1

*

organizerof

AcceptedPaper<<notime>>

*

reviewer

1

0..1

PhasD

1

0..1

RhasR

1

10..1 correspondsto

*

UhasP

1

*

AhasU

1

*1 for

author

1..*

*

by

1

*

USuploadbyU

creator

1

*

1*

UAuploadbyU

1

*

trace

event

event

eventevent

trace:followhasactivityname:“decision”timestamp:decisiontime

resource:followbytype:complete

attributes:outcome

trace:followhas&foractivityname:“review”

timestamp:submissiontimeresource:followRhasR&reviewer

type:complete

trace:followhasactivityname:“uploadsubmitted”

timestamp:uploadtimeresource:followUSuploadbyU

type:complete

trace:followhas&corr.toactivityname:“uploadaccepted”

timestamp:uploadtimeresource:followUAuploadbyU

type:complete

submittedto=BPM2015

75

1..*

*

Conferencecreationtime:DateTime

confname:String

Usercreationtime:DateTime

username:String

Papercreationtime:DateTime

title:String

ReviewRequestinvitationtime:DateTime

Reviewsubmissiontime:DateTime

Decisiondecisiontime:DateTime

outcome:Bool

UploadSubmitteduploadtime:DateTime

UploadAccepteduploadtime:DateTime

submittedto

1

*

organizerof

AcceptedPaper<<notime>>

*

reviewer

1

0..1

PhasD

1

0..1

RhasR

1

10..1 correspondsto

*

UhasP

1

*

AhasU

1

*1 for

author

1..*

*

by

1

*

USuploadbyU

creator

1

*

1*

UAuploadbyU

1

*

trace

event

event

eventevent

trace:followhasactivityname:“decision”timestamp:decisiontime

resource:followbytype:complete

attributes:outcome

trace:followhas&foractivityname:“review”

timestamp:submissiontimeresource:followRhasR&reviewer

type:complete

trace:followhasactivityname:“uploadsubmitted”

timestamp:uploadtimeresource:followUSuploadbyU

type:complete

trace:followhas&corr.toactivityname:“uploadaccepted”

timestamp:uploadtimeresource:followUAuploadbyU

type:complete

submittedto=BPM2015

Multiple Log Views

76

1..*

*

Conferencecreationtime:DateTime

confname:String

Usercreationtime:DateTime

username:String

Papercreationtime:DateTime

title:String

ReviewRequestinvitationtime:DateTime

Reviewsubmissiontime:DateTime

Decisiondecisiontime:DateTime

outcome:Bool

UploadSubmitteduploadtime:DateTime

UploadAccepteduploadtime:DateTime

submittedto

1

*

organizerof

AcceptedPaper<<notime>>

*

reviewer

1

0..1

PhasD

1

0..1

RhasR

1

10..1 correspondsto

*

UhasP

1

*

AhasU

1

*1 for

author

1..*

*

by

1

*

USuploadbyU

creator

1

*

1*

UAuploadbyU

1

*

trace

event

trace:followhasauthoractivityname:“decisionauthor”

timestamp:decisiontimeresource:followPhasD

type:complete

eventtrace:followby

activityname:“decisionchair”timestamp:decisiontimeresource:followPhasD

type:completeattributes:outcome

eventtrace:followhas&revieweractivityname:“review”

timestamp:submissiontimeresource:followRhasR&for

type:complete

eventtrace:followuploadby

activityname:“uploadsubmitted”timestamp:uploadtimeresource:followUhasP

type:complete

77

1..*

*

Conferencecreationtime:DateTime

confname:String

Usercreationtime:DateTime

username:String

Papercreationtime:DateTime

title:String

ReviewRequestinvitationtime:DateTime

Reviewsubmissiontime:DateTime

Decisiondecisiontime:DateTime

outcome:Bool

UploadSubmitteduploadtime:DateTime

UploadAccepteduploadtime:DateTime

submittedto

1

*

organizerof

AcceptedPaper<<notime>>

*

reviewer

1

0..1

PhasD

1

0..1

RhasR

1

10..1 correspondsto

*

UhasP

1

*

AhasU

1

*1 for

author

1..*

*

by

1

*

USuploadbyU

creator

1

*

1*

UAuploadbyU

1

*

trace

event

trace:followhasauthoractivityname:“decisionauthor”

timestamp:decisiontimeresource:followPhasD

type:complete

eventtrace:followby

activityname:“decisionchair”timestamp:decisiontimeresource:followPhasD

type:completeattributes:outcome

eventtrace:followhas&revieweractivityname:“review”

timestamp:submissiontimeresource:followRhasR&for

type:complete

eventtrace:followuploadby

activityname:“uploadsubmitted”timestamp:uploadtimeresource:followUhasP

type:complete

And Now?

78

database

database

database

DomainOntology EventOntology

XESLogExtraction

Mapping

Annota+on

?

?

Mapping Synthesis

79

database

database

database

DomainOntology EventOntology

XESLogExtraction

Mapping

Annota+on

?

?

Automatically synthesized

1. Annotation transformed into an ontology-to-ontology mapping M’

2. M’ is “rewritten” using the data-to-domain ontology mapping

3. The result is a mapping connecting the XES ontology directly to the data

Log Materialization

80

database

database

database

DomainOntology EventOntology

Mapping

Annotation

LogMapping

XESEventData

XESFile

ProcessMiningTools

1

2 3 4

SELECT DISTINCT ?t ?v ?e WHERE {?t :TcontainsA ?ta . ?ta :valueA ?v.

?t :TcontainsE ?e.} SELECT DISTINCT ?e ?t WHERE {?e :EcontainsA ?a . ?a :typeA ?t.} SELECT DISTINCT ?e ?t WHERE {?e :EcontainsA ?a . ?a :keyA ?t.} SELECT DISTINCT ?e ?t WHERE {?e :EcontainsA ?a . ?a :valueA ?t.}

81

82

Log Virtualization

83

database

database

database

DomainOntology EventOntology

Mapping

AnnotationProcess

MiningTools

OnDe

mand

XESLoader

LogMapping

1

2

XFactoryOnDemandImpl XLogOnDemandImpl XTraceOnDemandImpl XEventOnDemandImpl XLogOnDemandIterator XTraceOnDemandIterator

xlog.get(7).get(90) toretrieveteeventinindex7thinsidethe90thtraceinalog

Questions• How to optimize and test the scalability of the

approach? Fine-tuning is a must!

• Real vs simulated data? (Benchmarking OBDA)

• Initial benchmarking using CPN tools

• Is the “virtual” approach useful? How do process mining algorithms access the data?

• Hybrid virtual approach with caching strategies?84

KAOS ProjectKnowledge-Aware Operational Support

• Goal: Empowering process mining and online operational support with domain knowledge

• Euregio project: Trento + Bolzano + Innsbruck

• Mix of expertise from AI, BPM, database theory, formal methods, formal ontology, conceptual modeling, process mining, machine learning, software engineering

• Just started: we are hiring!!!

85

86

Conclusion

AcknowledgmentsAll coauthors of this research,

in particular

Diego Calvanese (UNIBZ)Giuseppe De Giacomo (UNIROMA)Riccardo De Masellis (FBK-Trento)

Alin Deutsch (UCSD)Chiara Difrancescomarino (FBK-Trento)

Chiara Ghidini (FBK-Trento)Fabio Patrizi (UNIBZ)

Sergio Tessaris (UNIBZ)Alifah Syamsiyah (TU/e)Wil van der Aalst (TU/e)

87