Semantic Meta-Mining of Knowledge Discovery Processes

51
Semantic Meta-Mining of Knowledge Discovery Processes Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements Poznan University of Technology June 11, 2015 ADAA Seminar Silesian University of Technology Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - Semantic Meta-Mining of Knowledge Discovery Processes June 11, 2015 ADAA Semina / 50

Transcript of Semantic Meta-Mining of Knowledge Discovery Processes

Page 1: Semantic Meta-Mining of Knowledge Discovery Processes

Semantic Meta-Mining of Knowledge DiscoveryProcesses

Agnieszka Lawrynowiczcollaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario,Claudia d’Amato, Raul Palma and others - see acknowledgements

Poznan University of Technology

June 11, 2015ADAA Seminar

Silesian University of Technology

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 1

/ 50

Page 2: Semantic Meta-Mining of Knowledge Discovery Processes

Outline

Semantic data mining

Pattern discovery with Fr-ONT-Qu

Meta-mining of KD processes▸ e-LICO Intelligent Discovery Assistant▸ Data Mining OPtimization Ontology▸ Semantic meta-mining

Summary and future work

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 2

/ 50

Page 3: Semantic Meta-Mining of Knowledge Discovery Processes

Introduction: data mining

Input: a data table, text documents, ...Output: a model, a pattern set

DATA$MINING$

Model,$pa0erns$data$

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 3

/ 50

Page 4: Semantic Meta-Mining of Knowledge Discovery Processes

Introduction: using background knowledge in data mining

Using background knowledge in data mining has been extensivelyresearched

hierarchy/taxonomy of attributes (Michalski et al., 1986, Srikant,Agrawal, 1995)

Inductive Logic Programming (Muggleton, 1991, Lavrac andDzeroski, 1994)

relational learning (Quinlan, 1993, de Raedt, 2008)

semantic data mining tutorial @ ECML/PKDD’2011 (Lavrac,Vavpetic, Lawrynowicz, Potoniec, Hilario, Kalousis)

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 4

/ 50

Page 5: Semantic Meta-Mining of Knowledge Discovery Processes

Introduction: relational data mining

Input: a relational database, a graph, a set of logical facts, ...Output: a model, a pattern set

RELATIONAL)DATA)MINING)

Model,)pa4erns)

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 5

/ 50

Page 6: Semantic Meta-Mining of Knowledge Discovery Processes

Semantic data mining

Input:

a data table, text documents, Web pages, a relational database, agraph, a set of logical facts, ...

one or more ontologies

Output: a model, a pattern set

SEMANTIC)DATA)MINING)

Model,)pa3erns)

Data)

Ontologies)

annota;ons)mappings)vocabulary)reBuse)

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 6

/ 50

Page 7: Semantic Meta-Mining of Knowledge Discovery Processes

Fr-ONT-Qu

algorithm for mining patterns in RDF(s) data

patterns expressed as SPARQL queries

consists of: a refinement operator and a strategy to select bestpatterns for further refinement

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 7

/ 50

Page 8: Semantic Meta-Mining of Knowledge Discovery Processes

Overview

Input of the algorithm:

a declarative bias (B) to limit a search space (i.e. classes andproperties to use) and maximal number of iterations

2 thresholds: for keeping good enough patterns and for refining bestpatterns

several quality measures to select for thresholds (e.g. support on KB)

beam search size

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 8

/ 50

Page 9: Semantic Meta-Mining of Knowledge Discovery Processes

Example

B: classes: PassengerTrain, CargoTrain, property: hasEngine

1 Refine every pattern from the previous iteration by adding a singlerestriction for a variable already existing in the pattern. E.g. forpatern {?x a :Train.}, its refinements are:

▸ {?x a :Train . ?x a :CargoTrain.}▸ {?x a :Train . ?x a :PassengerTrain}▸ {?x a :Train . ?x :hasEngine ?y}

2 Evaluate patterns (with some quality measure as support on a dataset) and select only the best ones

3 Repeat steps 1-2 as long as there are patterns for refinement andmaximal number of iterations is not exceeded

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 9

/ 50

Page 10: Semantic Meta-Mining of Knowledge Discovery Processes

Trie data structure

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 10

/ 50

Page 11: Semantic Meta-Mining of Knowledge Discovery Processes

Pattern based classification 1/2

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 11

/ 50

Page 12: Semantic Meta-Mining of Knowledge Discovery Processes

Pattern based classification 2/2

We learn features that are optimized with regard to the (classification) task

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 12

/ 50

Page 13: Semantic Meta-Mining of Knowledge Discovery Processes

Propositionalisation 1/2

Pa#erns    1)  ?x  a  :Train  .  ?x  :hasCar  ?y  2)  ?x  a  :Train  .  ?x  :hasCar  ?y  .  ?y  :hasShape  :rectangle  3)  ?x  a  :Train  .  ?x  :hasCar  ?y  .  ?y  :wheels  :three  4)  …  

Dataset  (Michalski’s  train  problem,  1977)  

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 13

/ 50

Page 14: Semantic Meta-Mining of Knowledge Discovery Processes

Propositionalisation 2/2

In this way, learned features may be consumed by any out-of-the-shelf’attribute-value’ classification algorithm

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 14

/ 50

Page 15: Semantic Meta-Mining of Knowledge Discovery Processes

What is RapidMiner? 1/2

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 15

/ 50

Page 16: Semantic Meta-Mining of Knowledge Discovery Processes

What is RapidMiner? 2/2

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 16

/ 50

Page 17: Semantic Meta-Mining of Knowledge Discovery Processes

What is RapidMiner? 2/2

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 17

/ 50

Page 18: Semantic Meta-Mining of Knowledge Discovery Processes

RMonto - plugin to RapidMiner

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 18

/ 50

Page 19: Semantic Meta-Mining of Knowledge Discovery Processes

Comparative experiments on classification of semantic data1/2

we considered published work with available results and datasets(including ESWC 2008 best paper, ESWC 2012 best paper)

various types of methods: kernel methods, statistical relationalclassifier, concept learning algorithms

we strictly followed the tasks, protocols and experimental setups ofthe methods

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 19

/ 50

Page 20: Semantic Meta-Mining of Knowledge Discovery Processes

Comparative experiments on classification of semantic data2/2

For classification task Fr-ONT-Qu outperformed state-of-art approaches toclassification of Semantic Web data(see: ”Pattern based feature construction in semantic data mining” by A.Lawrynowicz, J. Potoniec, IJSWIS 10(1), 2014):

kernel methods Bloehdorn et al. (2007), Loesch et al. (ESWC 2012best paper) on SWRC AIFB dataset,

statistical relational classifier SPARQL-ML by Kiefer et al (ESWC2008 best paper) on SWRC AIFB dataset and OWLS-TC v2.1dataset,

concept learning algorithms DL-FOIL by Fanizzi et al (2008),DL-Learner cutting-edge CELOE variant by Lehmann (2009) on allmeasures on datasets BioPax, NTN, Financial

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 20

/ 50

Page 21: Semantic Meta-Mining of Knowledge Discovery Processes

Overview of meta-learning

Meta-learning: learning to learn

application of machine learning techniques to meta-data about pastmachine learning experiments;

the goal: to modify some aspect of the learning process to improvethe performance of the resulting model;

meta-mining: meta-learning applied to full data mining process

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 21

/ 50

Page 22: Semantic Meta-Mining of Knowledge Discovery Processes

Overview of the e-LICO system

!"#$%&'()*+,'-!./01' ' ' '(23"$%4'567879'

"':'"'

'

! "#$%&'()&*+,-./,012*+3*2-%4,&

!56 78+*8$+9&21&/:+&+;<=>7&?"&<#@&4;!' <=!*+0)/' />1,)*!?' )*' @!=1)/*' 5' A)!,?<' +' <!1' /B' 0!C>)0!D!*1<' /*' 1;!' >*?!0,A*E' ?+1+' D)*)*E'.,+1B/0DF'4;)<'<!=1)/*'.0!<!*1<'1;!'?)BB!0!*1'=/D./*!*1<'/B'1;!'!"#$%&'+0=;)1!=1>0!'G()E>0!'7H'+*?'<;/I<';/I'1;!A')*1!0+=1'1/'+=;)!J!'1;!'><!0K<'L*/I,!?E!'?)<=/J!0A'E/+,F''

4;!'!"#$%&')*B0+<10>=1>0!'G?!.)=1!?')*'1;!'B)E>0!'>*?!0'1;!'?+<;!?',)*!H')<'1;!'D!+*<'MA'I;)=;'1;!'?+1+"D)*)*E' .,+1B/0D' )<' ?!,)J!0!?' 1/' <=)!*1)<1<F' 4;!' )**/J+1)J!' =/0!' ' /B' 1;!' !"#$%&'.,+1B/0D' )<' 1;!'!"#$%%&'$"#( )&*+,-$./( 0**&*#1"#' G$NOP' +M/J!' 1;!' ?+<;!?' ,)*!H' I)1;' )1<' .,+**!0' +*?' D!1+",!+0*!0F'Q/I!J!0P'1/'?!,)J!0'1;!'?+1+"D)*)*E'.,+1B/0D'1/')1<'<=)!*1)<1'><!0<P'1;!0!'+0!'<!J!0+,'/1;!0'<!0J)=!<'+*?'=/D./*!*1<F'()E>0!'7'<;/I<'+*'/J!0J)!I'/B'!"#$%&R<'=/D./*!*1<'+*?';/I'1;!A' )*1!0+=1'I)1;'!+=;'/1;!0F'

'()E>0!'7F'&J!0J)!I'/B'1;!'!"#$%&'<A<1!DF''

4;!0!'+0!'1I/'><!0"B+=)*E'=/D./*!*1<'B/0'1;!'!"#$%&'.,+1B/0DS'1;!<!'+,,/I'<=)!*1)<1<'1/'+==!<<'?+1+"D)*)*E' /.!0+1/0<' +*?T/0' /1;!0' ?+1+' .0/=!<<)*E' <!0J)=!<P' 1/' =/D./<!' 1;!D' )*1/' I/0LB,/I<' +*?'!U!=>1!' 1;!DP' =/,,!=1)*E' 1;!' 0!<>,1<' B/0' )*1!0.0!1+1)/*' /0' B>01;!0' +*+,A<)<F' 4;!<!' 1I/' =!*10+,')*B0+<10>=1>0!'=/D./*!*1<'+0!V'

7F 213&45&"$.V' O*' +..,)=+1)/*' 1;+1' E)J!<' +==!<<' 1/' +' I)?!' J+0)!1A' /B' ?+1+"D)*)*E' /.!0+1/0<P'1/E!1;!0'I)1;'1;!'D!+*<'1/'=/D./<!'1;!D')*1/'I/0LB,/I<F'

5F 61-$."1V' O' I/0LB,/I' =0!+1)/*' +*?' !*+=1D!*1' I/0LM!*=;' 1;+1' E)J!<' +==!<<' 1/' +0M)10+0A'W!M'<!0J)=!<'+*?'D+*A'/1;!0'L)*?<'/B'<!0J)=!<F' $1' )<'I)?!,A'><!?' )*'M)/)*B/0D+1)=<P'M>1'+,</' )*'D+*A'/1;!0'?)<=).,)*!<F'

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 22

/ 50

Page 23: Semantic Meta-Mining of Knowledge Discovery Processes

IDA architecture

!"##$%&''()#

goal data

*

DM Workflow Ontology (DMWF)

$)+,&,-%-./0##1&'2()#

planned workflows

ranked workflows

3 4

5(6&'/0#7(8&97-'()#meta-mined model

:

DM Optimization Ontology (DMOP)

;7<=#;>#

training meta-data ?

top ranked workflows

@

INTELLIGENT DISCOVERY ASSISTANT

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 23

/ 50

Page 24: Semantic Meta-Mining of Knowledge Discovery Processes

Ontology in computer science

“engineering artefact [...]“ (Guarino 98)

“An ontology is aformal specification ê machine interpretationof a shared ê group of people, consensusconceptualization ê abstract model of phenomena, conceptsof a domain of interest“ ê domain knowledge(Gruber 93)

Ontologia = formal specification of a terminology (from a particulardomain)

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 24

/ 50

Page 25: Semantic Meta-Mining of Knowledge Discovery Processes

Ontology in computer science

“engineering artefact [...]“ (Guarino 98)

“An ontology is aformal specification ê machine interpretationof a shared ê group of people, consensusconceptualization ê abstract model of phenomena, conceptsof a domain of interest“ ê domain knowledge(Gruber 93)

Ontologia = formal specification of a terminology (from a particulardomain)

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 24

/ 50

Page 26: Semantic Meta-Mining of Knowledge Discovery Processes

Data Mining OPtimization Ontology (DMOP)

the primary goal of DMOP is to support all decision-making stepsthat determine the outcome of the data mining process;

development started in EU FP7 project e-LICO (2009-2012);

DMOP v5.5: 723 classes, 111 properties, 4291 axioms;

highly axiomatized;

represented in Web Ontology Language (OWL 2);

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 25

/ 50

Page 27: Semantic Meta-Mining of Knowledge Discovery Processes

Competency questions

”Given a data mining task/data set, which of the valid or applicableworkflows/algorithms will yield optimal results (or at least better resultsthan the others)?”

”Given a set of candidate workflows/algorithms for a given task/dataset, which data set/workflow/algorithm characteristics should betaken into account in order to select the most appropriate one?”

and others more fine-grained, e.g.:

”Which induction algorithms should I use (or avoid) when my datasethas many more variables than instances?”

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 26

/ 50

Page 28: Semantic Meta-Mining of Knowledge Discovery Processes

Architecture of DMOP knowledge base and its satellitetriple stores

TBox%

DMOP%

ABox%

Operator%DB%

DMEX(DB1%%%%DMEX(DB2%%…%%%DMEX(DBk%

OWL2%

RDF%

Triple%

Store%

Formal%Conceptual%Framework%%of%Data%Mining%Domain%

Accepted%Knowledge%of%DM%Tasks,%Algorithms,%Operators%%

Specific%DM%ApplicaFons%Datasets,%Workflows,%Results%

MetaHminer’s%training%data%

MetaHminer’s%prior%%

DM%knowledge%

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 27

/ 50

Page 29: Semantic Meta-Mining of Knowledge Discovery Processes

The core concepts of DMOP

Fig. 1. The core concepts of DMOP.

more than specify their input/output types; only processes called DM-Operations haveactual inputs and outputs. A process that executes a DM-Operator also realizes the DM-Algorithm implemented by the operator and achieves the DM-Task addressed by thealgorithm. Finally, a DM-Workflow is a complex structure composed of DM operators, aDM-Experiment is a complex process composed of operations (or operator executions).An experiment is described by all the objects that participate in the process: a workflow,data sets used and produced by the different data processing phases, the resulting mod-els, and meta-data quantifying their performance. In the following, the basic elementsof DMOP are detailed.

DM Tasks: The top-level DM tasks are defined by their inputs and outputs. ADataProcessingTask receives and outputs data. Its three subclasses produce new databy cleansing (DataCleaningTask), reducing (DataReductionTask), or otherwise trans-forming the input data (DataTransformationTask). These classes are further articulatedin subclasses representing more fine-grained tasks for each category. An Induction-Task consumes data and produces hypotheses. It can be either a ModelingTask or aPatternDiscoveryTask, based on whether it generates hypotheses in the form of globalmodels or local pattern sets. Modeling tasks can be predictive (e.g. classification) ordescriptive (e.g., clustering), while pattern discovery tasks are further subdivided intoclasses based on the nature of the extracted patterns: associations, dissociations, devia-tions, or subgroups. A HypothesisProcessingTask consumes hypotheses and transforms(e.g., rewrites or prunes) them to produce enhanced—less complex or more readable—versions of the input hypotheses.

Data: As the primary resource that feeds the knowledge discovery process, datahave been a natural research focus for data miners. Over the past decades meta-learningresearchers have actively investigated data characteristics that might explain generaliza-tion success or failure. Fig. 2 shows the characteristics associated with the different Datasubclasses (shaded boxes). Most of these are statistical measures, such as the number of

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 28

/ 50

Page 30: Semantic Meta-Mining of Knowledge Discovery Processes

DMOP: algorithm representation

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 29

/ 50

Page 31: Semantic Meta-Mining of Knowledge Discovery Processes

Alignment of DMOP with DOLCE 1/3

Two main reasons to align DMOP with a foundational ontology:

considerations about attributes and data properties; extantnon-foundational ontology solutions were partial re-inventions of howthey are treated in a foundational ontology;

reuse of the ontology’s object properties;

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 30

/ 50

Page 32: Semantic Meta-Mining of Knowledge Discovery Processes

Alignment of DMOP with DOLCE 2/3

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 31

/ 50

Page 33: Semantic Meta-Mining of Knowledge Discovery Processes

Alignment of DMOP with DOLCE 3/3

Perdurant: DM-Experiment and DM-Operation are subclasses ofdolce:process;

Endurant: most DM classes, such as algorithm, software, strategy,task, and optimization problem, are subclasses ofdolce:non-physical-endurant;

Quality: characteristics and parameters of DM entities madesubclasses of dolce:abstract-quality;

Abstract: for identifying discrete values, classes added as subclassesof dolce:abstract-region;

object properties: DMOP reuses mainly DOLCE’s parthood, quality,and quale relations;

each of the four DOLCE main branches have been used.

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 32

/ 50

Page 34: Semantic Meta-Mining of Knowledge Discovery Processes

Qualities and attributes 1/4

How to handle ’attributes’ in OWL ontologies, and, in a broader context,measurements?

easy way: attribute is a binary functional relation between a class anda datatype

Elephant ⊑ =1 hasWeight.integerElephant ⊑ =1 hasWeightPrecise.realElephant ⊑ =1 hasWeightImperial.integer (in lbs)

building into one’s ontology application decisions about how to storethe data (and in which unit it is) /

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 33

/ 50

Page 35: Semantic Meta-Mining of Knowledge Discovery Processes

Qualities and attributes 2/4

How to handle ’attributes’ in OWL ontologies, and, in a broader context,measurements?

more elaborate way: unfold the notion of an object’s property (e.g.weight) from one attribute/OWL data property into at least twoproperties: one OWL object property from the object to the ’reifiedattribute’ (“quality property” represented as an OWL class) andanother property to the value(s)

▸ favoured in foundational ontologies;▸ solves the problem of non-reusability of the ’attribute’ and prevents

duplication of data properties;▸ neither ontology has any solution to represent actual values and units

of measurements;

measurements for DMOP more alike values for parameters;

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 34

/ 50

Page 36: Semantic Meta-Mining of Knowledge Discovery Processes

Qualities and attributes 3/4

DM-Data

dolce:non-physical-endurant dolce:abstract

DataType DataFormat

dolce:quality

dolce:region

dolce:abstract-regiondolce:quale

dolce:abstract-quality

anyType

hasDataValue

Characteristic Parameter

hasDataType

hasDataType

dolce:has-quale

dolce:particular

dolce:has-quality

dolce:q-location

TableFormat

DataTable hasTableFormat

DataCharacteristic

has-quality

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 35

/ 50

Page 37: Semantic Meta-Mining of Knowledge Discovery Processes

Qualities and attributes 4/4

ModelingAlgorithm ⊑ =1 has-quality.LearningPolicy

LearningPolicy is a dolce:quality

LearningPolicy ⊑ =1 has-quale.Eager-Lazy

Eager-Lazy is a subclass of dolce:abstract-region

Eager-Lazy ⊑ ≤ 1 hasDataValue.anyType

In this way, the ontology can be linked to many different applications, whoeven may use different data types, yet still agree on the meaning of thecharacteristics and parameters (’attributes’) of the algorithms, tasks, andother DM endurants.

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 36

/ 50

Page 38: Semantic Meta-Mining of Knowledge Discovery Processes

Meta-modeling in DMOP 1/4

only processes (executions of workflows) and operations (executionsof operators) consume inputs and produce outputs

DM algorithms (as well as operators and workflows) can only specifythe type of input or output

inputs and outputs (DM-Dataset and DM-Hypothesis class hierarchy,respectively) are modeled as subclasses of IO-Object class

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 37

/ 50

Page 39: Semantic Meta-Mining of Knowledge Discovery Processes

Meta-modeling in DMOP 2/4

DM algorithms: classes or individuals? Individuals.

Problem: expressing types of inputs/outputs associated withalgorithm

”C4.5 specifiesInputClass CategoricalLabeledDataSet” 8

↗ ↖

Individual Class(instance of DM-Algorithm) (subclass of DM-Hypothesis)

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 38

/ 50

Page 40: Semantic Meta-Mining of Knowledge Discovery Processes

Meta-modeling in DMOP 3/4

Initial solution: one artificial class per each single algorithm with asingle instance corresponding to this particular algorithm

Problem: hasInput, hasOutput, specifiesInputClass,specifiesOutputClass—assigned a common range—IO-Object

”C4.5 specifiesInputClass Iris” ?

↗ ↖

Individual Individual(instance of DM-Algorithm) (instance of DM-Hypothesis)

Iris is a concrete dataset. Clearly, any DM algorithm is not designedto handle only a particular dataset.

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 39

/ 50

Page 41: Semantic Meta-Mining of Knowledge Discovery Processes

Meta-modeling in DMOP 4/4

Final solution: weak form of punning available in OWL 2

IO-Class: meta-class—the class of all classes of input and outputobjects

”C4.5 specifiesInputClass CategoricalLabeledDataSet” 4

↗ ↖

Individual Individual(instance of DM-Algorithm) (instance of IO-Class)

”DM-Process hasInput some CategoricalLabeledDataSet” 4↗ ↖

Class Class(subclass of dolce:process) (subclass of IO-Object)

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 40

/ 50

Page 42: Semantic Meta-Mining of Knowledge Discovery Processes

DMOP: further details

Data Mining Optimization Ontology. C. Maria Keet, AgnieszkaLawrynowicz, Claudia d’Amato, Alexandros Kalousis, Phong Nguyen, RaulPalma, Robert Stevens, and Melanie Hilario, Journal of Web Semantics,DOI: 10.1016/j.websem.2015.01.001

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 41

/ 50

Page 43: Semantic Meta-Mining of Knowledge Discovery Processes

Recap: Propositionalisation

Pa#erns    1)  ?x  a  :Train  .  ?x  :hasCar  ?y  2)  ?x  a  :Train  .  ?x  :hasCar  ?y  .  ?y  :hasShape  :rectangle  3)  ?x  a  :Train  .  ?x  :hasCar  ?y  .  ?y  :wheels  :three  4)  …  

Dataset  (Michalski’s  train  problem,  1977)  

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 42

/ 50

Page 44: Semantic Meta-Mining of Knowledge Discovery Processes

RapidMiner XML based workflow representation

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 43

/ 50

Page 45: Semantic Meta-Mining of Knowledge Discovery Processes

Importing RapidMiner worfklows to DMOP based RDFformat

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 44

/ 50

Page 46: Semantic Meta-Mining of Knowledge Discovery Processes

Propositionalisation

Workflow  pa*erns      

Dataset  

DMOP-­‐based  RDF  repository  of  DM  

processes  

Results of experiments. Below we present the results of experimental evaluation of Fr-ONT-Qu in the meta-mining scenario. In the experiments, we used OWLIM SE (v5.3.5849) as an underlying reasoning engine and a semantic store with the owl2-rl-reduced-optimized ruleset. The choice of such a ruleset was motivated by the expressivity of our background knowledge base, e.g. existence of object property chains. During each cycle of cross-validation, Fr-ONT-Qu discovered around 2000 patterns, and redundant patterns were subsequently pruned. We discuss some of the discovered patterns below (for compactness denoting by Bd the body of the base pattern used in the experiments). The first example pattern: Q1 = select distinct ?x where { Bd ∪ ?opex2!dmop:executes ?front0 .! ?opex2!dmop:executes rm:RM-Decision_Tree .! ?opex2!dmop:hasParameterSetting ?front1.! ?front0!dmop:executes rm:DM-Operator .! ?front0!dmop:implements ?front2 .!!! ?front2 a dmop:DM-Algorithm . ?front2 a dmop:InductionAlgorithm .!!! ?front2 a dmop:ModelingAlgorithm .!!! ?front2 a dmop:ClassificationModelingAlgorithm .!!! ?front2 a dmop:ClassificationTreeInductionAlgorithm .!}!

was mined when Fr-ONT-Qu traversed down the algorithm classes hierarchy specializing variable ?front2. In this way, it is possible to abstract from the level of operators (algorithm implementations) to the level of algorithms and their taxonomy. For instance, both rm:RM-Decision_Tree and weka:Weka-J48 operators implement a classification tree induction algorithm and one may generalize over it. The patterns containing class hierarchies provide similar expressivity to this of patterns mined in so-called generalized association rule mining.

The following pattern covers only those workflows that contain ‘Decision Tree’ operator, for which the parameter minimal size for split has value between 2 and 5.5: Q2 = select distinct ?x where { Bd ∪ ?opex2!dmop:executes ?front0 .! ?opex2!dmop:executes rm:RM-Decision_Tree .! ?opex2!dmop:hasParameterSetting ?front1.! ?front0!dmop:executes rm:DM-Operator .! ?front1!dmop:setsValueOf ?front2.! ?front1!dmop:hasValue ?front3.! filter(2.000000 <= xsd:double(?front3) && xsd:double(?front3) <= 16.000000) . ?front2!dmop:hasParameterKey 'minimal_size_for_split'.! ?front1!dmop:hasValue ?front3.! filter(2.000000 <= xsd:double(?front3) && xsd:double(?front3) <= 9.000000) . ?front1!dmop:hasValue ?front3.! filter(2.000000 <= xsd:double(?front3) && xsd:double(?front3) <= 5.500000) . }

Dataset  characteris3cs  …  

Features  

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 45

/ 50

Page 47: Semantic Meta-Mining of Knowledge Discovery Processes

Semantic meta-mining experimental setup

baseline DM experiment set: 1581 RapidMiner workflows solving apredictive modeling task on 11 UCI datasets

dataset characteristics meta-data stored in DMEX-DB containingover 85 million of RDF triples

workflow patterns represented as SPARQL queries using DMOPentities

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 46

/ 50

Page 48: Semantic Meta-Mining of Knowledge Discovery Processes

The inside of X-Validation operator with the workflow fortraining and evaluating the pattern-based model

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 47

/ 50

Page 49: Semantic Meta-Mining of Knowledge Discovery Processes

Semantic meta-mining results

McNemar’s test for pairs of classifiers performed with the nullhypothesis that a classifier built using dataset characteristics and amined pattern set has the same error rate as the baseline that useddataset characteristics and only the names of the machine learningDM operators

Test confirmed that classifiers trained using workflow patternsperformed significantly better (accuracy 0.927) than the baseline(accuracy 0.890)

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 48

/ 50

Page 50: Semantic Meta-Mining of Knowledge Discovery Processes

Summary and future work

RMonto RapidMiner plugin, all experimental data and (meta-mining)workflows are publicly available:http://www.myexperiment.org/packs/421.html,http://semantic.cs.put.poznan.pl/fr-ont/

LeoLOD project - Learning and Evolving Ontologies from LinkedOpen Data (2013-2015)

▸ project funded by Foundation for Polish Science under the POMOSTprogram,

▸ Fr-ONT-Qu re-adapted for ontology learning,▸ DMOP used to model provenance metadata (in industry: treaceability)

of ontology learning workflows

DMOP is being aligned to OPMW (Open Provenance Model forWorkflows)

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 49

/ 50

Page 51: Semantic Meta-Mining of Knowledge Discovery Processes

Acknowledgements

Foundation for Polish Science under the POMOST programme,cofinanced from European Union, Regional Development Fund (NoPOMOST/2013-7/8) (2013-2015)

EU FP7 ICT-2007.4.4 (No 231519) ”e-LICO: An e-Laboratory forInterdisciplinary Collaborative Research in Data Mining andData-Intensive Science” (2009-2012)

RMonto, Meta-mining experiments, LeoLOD plugin done jointly withJedrzej Potoniec

Contributors to the development of DMOP and/or other e-LICOinfrastructure used in the research described in this presentation:Melanie Hilario, C. Maria Keet, Claudia d’Amato, Huyen Do, SimonFischer, Dragan Gamberger, Lina Al-Jadir, Simon Jupp, AlexandrosKalousis, Joerg Uwe-Kietz, Petra Kralj Novak, Babak Mougouie,Phong Nguyen, Raul Palma, Floarea Serban, Robert Stevens, AnzeVavpetic, Jun Wang, Derry Wijaya, Adam Woznica

Thanks to Veli Bicer for sharing the AIFB dataset

Agnieszka Lawrynowicz collaboration with Jedrzej Potoniec, Maria C. Keet, Melanie Hilario, Claudia d’Amato, Raul Palma and others - see acknowledgements ( Poznan University of Technology )Semantic Meta-Mining of Knowledge Discovery ProcessesJune 11, 2015 ADAA Seminar Silesian University of Technology 50

/ 50