A Semantic-Aided Designer for Knowledge Discovery

20
A Semantic-Aided Designer for Knowledge Discovery Claudia Diamantini, Domenico Potena, Emanuele Storti [email protected] CTS2011, Philadelphia, May 23-27 Università Politecnica delle Marche Department of Computer Science, Management and Automation Ancona, Italy

description

Full paper: http://boole.diiga.univpm.it/paper/cts11.pdf Knowledge Discovery in Databases (KDD), as any scientific experimentation in e-Science, is a complex and computationally intensive process aimed at gaining knowledge from a huge set of data. Often performed in distributed settings, a KDD project usually involves a deep interaction between tools and several users with specific expertise, which can be either a co-located group or a geographically distributed virtual team of experts. Given the complexity of the process, such users need some support to achieve their goal of knowledge extraction. This paper introduces KDDesigner, a webbased semantic-driven tool aimed at supporting users in the collaborative design of a KDD process. In this paper we address semantic issues related to the collaborative project managment: tool localization, tool integration, interfaces matchmaking, process execution, team building and process versioning.

Transcript of A Semantic-Aided Designer for Knowledge Discovery

Page 1: A Semantic-Aided Designer for Knowledge Discovery

A Semantic-Aided Designer for Knowledge Discovery

Claudia Diamantini, Domenico Potena, Emanuele Storti

[email protected]

CTS2011, Philadelphia, May 23-27

Università Politecnica delle MarcheDepartment of Computer Science, Management and AutomationAncona, Italy

Page 2: A Semantic-Aided Designer for Knowledge Discovery

CTS2011, May 23-27

Introduction

A Semantic-Aided Designer for KD

Organizations need methods and technologies to analyze huge amounts of data, to support decisional processes

Knowledge Discovery in Databases (KDD) is the process of identifying valid, novel, potentially useful patterns in data

many steps, iterations interaction user knowledge

Data explosion & KDD

Page 3: A Semantic-Aided Designer for Knowledge Discovery

CTS2011, May 23-27

Introduction

DB / DWHadministrator

domain expert

DMspecialistsKDD

specialists

1st generation of IDAs (Intelligent Data Analysis systems):● local frameworks● single-user● predefined set of tools (little extensibility)

How to support the design of a KDD project in an open, distributed and collaborative scenario?

2nd generation: distribution of tools & computational aspects

Evolution of organizations: distribution of user, collaboration

A Semantic-Aided Designer for KD

Page 4: A Semantic-Aided Designer for Knowledge Discovery

tools should be easily and dinamically added in the platform they should be accessible, searchable, executable via standard API suggestions about the best tool sequences support for tool setup and process execution

CTS2011, May 23-27

Issues

Heterogeneity & tool distribution

Many KDD and Data Mining tools available for any domain/task, many possible combinations

Heterogeneous interfaces programming languages, OSs, transfer protocols,..

Complex to use process design, data preparation, precondition satisfaction, I/O interpretation

A Semantic-Aided Designer for KD

Page 5: A Semantic-Aided Designer for Knowledge Discovery

collaborative design of KDD processes tool/process sharing and annotation easy join of new partners in Virtual Teams

CTS2011, May 23-27

Issues

User distribution

Distributed organizations:● multiple branch enterprises● E-Science project

Collaboration: ● source of complexity● distributed computation: several users can succeed where a single user is likely to fail

A Semantic-Aided Designer for KD

Page 6: A Semantic-Aided Designer for Knowledge Discovery

Basic Services

Services for any KDD task:every KDD tool is wrapped as a Web Service, deployed on the publisher's server, and published in a common repository

C4.5 tool C4.5 service

CTS2011, May 23-27

Methodology

Service Oriented Architecture

Support Services

Back-end services:● access control ● data transfer● service publishing● UDDI registry

High-level functionalities:● service discovery● interface matchmaking● process composition

A Semantic-Aided Designer for KD

Page 7: A Semantic-Aided Designer for Knowledge Discovery

Separation of information in 3 abstraction layers

Tools/services are annotated through XML descriptors: details about interfaces and QoS Algorithms are formally described in a KDD ontology, which contains an algorithm taxonomy and high level information about their tasks, methods and functionalities

CTS2011, May 23-27

Methodology

Semantic descriptors for Basic Services

A Semantic-Aided Designer for KD

Page 8: A Semantic-Aided Designer for Knowledge Discovery

KDD tools

KDD algorithms ID3

Benefits: loose-coupling, reusability Support services rely on such layers:

service discovery interface matchmaking process composition

CTS2011, May 23-27

Methodology

KDD services

A Semantic-Aided Designer for KD

Page 9: A Semantic-Aided Designer for Knowledge Discovery

Benefits: loose-coupling, reusability Support services rely on such layers:

service discovery interface matchmaking process composition

CTS2011, May 23-27

Methodology

Labeled Dataset

KDD services

KDD ontology

abc C4.5_v.2.0

C4.5algorithm

Remove missing values algorithm

A Semantic-Aided Designer for KD

Page 10: A Semantic-Aided Designer for Knowledge Discovery

A web-based tool aimed at supporting users in collaborative KDD process design

CTS2011, May 23-27

KDDesigner

A Semantic-Aided Designer for KD

Page 11: A Semantic-Aided Designer for Knowledge Discovery

Service discovery

Retrieval of KDD services satisfying user requirements

CTS2011, May 23-27 A Semantic-Aided Designer for KD

Page 12: A Semantic-Aided Designer for Knowledge Discovery

Service discovery

Retrieval of KDD services satisfying user requirements

1

2

3

4

CTS2011, May 23-27

KDDONTO

A Semantic-Aided Designer for KD

Page 13: A Semantic-Aided Designer for Knowledge Discovery

Service discovery

Retrieval of KDD services satisfying user requirements

2

3

1

4

CTS2011, May 23-27 A Semantic-Aided Designer for KD

Page 14: A Semantic-Aided Designer for Knowledge Discovery

CTS2011, May 23-27

Process design

A Semantic-Aided Designer for KD

Page 15: A Semantic-Aided Designer for Knowledge Discovery

CTS2011, May 23-27

Interface matchmaking

Verification of data compatibility in an I/O connection

A Semantic-Aided Designer for KD

Page 16: A Semantic-Aided Designer for Knowledge Discovery

CTS2011, May 23-27

Interface matchmaking

Matchmaker service checks the validity of the match ● syntactic compatibility comparison between service descriptors (I/O primitive datatype and syntax)

KDD services abc

A Semantic-Aided Designer for KD

● Output: cost of match

same format?same primitive datatype?

Page 17: A Semantic-Aided Designer for Knowledge Discovery

CTS2011, May 23-27

Interface matchmaking

Matchmaker service checks the validity of the match ● syntactic compatibility comparison between service descriptors (I/O primitive datatype and syntax)

same concept? subconcept?part-of concept?

KDD services

KDD ontology

abc

A Semantic-Aided Designer for KD

● semantic compatibility comparison between ontological annotations of the services (kind of match between I/O, preconditions/postconditions... and many more)

● Output: cost of match

x y

Page 18: A Semantic-Aided Designer for Knowledge Discovery

CTS2011, May 23-27

Semi-automatic composition

KDDComposer: advanced service for composition

Input● user dataset● a set of requirements (max num algorithms, computational complexity, max cost of match)● user goal (classification, regression, ...)

Output A ranked list of abstract processes(suggestions about processes useful to solve the user problem)

A Semantic-Aided Designer for KD

Page 19: A Semantic-Aided Designer for Knowledge Discovery

CTS2011, May 23-27

Collaboration

● collaborative process edit/annotation (wiki-style)● versioning system● team management and add of new users● manual parameter setting

A Semantic-Aided Designer for KD

Page 20: A Semantic-Aided Designer for Knowledge Discovery

Open environment and heterogeneous tools● different interfaces: need of a common representation

(service)● abstraction for an high-level description of tools (algorithm)● semantics for interoperability and high-level functionalities

CTS2011, May 23-27

Conclusion

Future work● extension with new support services● process export in more workflow languages● more collaborative features (real-time editor)

A Semantic-Aided Designer for KD

SOA for KDD● Basic Services and Support Services● KDD Designer: a semantic-aided designer for KDD