Toward Knowledge Discovery in Databases Attached to Grids

16
Institut für Softwarewissenschaft - Universität Wien P.Brezany 1 Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software Science University of Vienna E-mail : [email protected]

description

Toward Knowledge Discovery in Databases Attached to Grids. Peter Brezany Insti tute for Software Science Univers ity of Vienna E-mail : [email protected]. Media That Radically Influenced Society. 1850s Telegraph. 1840s Penny Post. 1500s Printing Press. 1930s - PowerPoint PPT Presentation

Transcript of Toward Knowledge Discovery in Databases Attached to Grids

Page 1: Toward Knowledge Discovery in Databases Attached to Grids

Institut für Softwarewissenschaft - Universität Wien

P.Brezany1

Toward Knowledge Discovery inDatabases Attached to Grids

Peter Brezany

Institute for Software Science

University of Vienna

E-mail : [email protected]

Page 2: Toward Knowledge Discovery in Databases Attached to Grids

Institut für Softwarewissenschaft - Universität Wien

P.Brezany2

Media That Radically Influenced Society

Web

1500sPrinting Press

1840sPenny Post

1850sTelegraph

1920sTelephone

1930sRadio

1990s

1950s TV

20xxGrid

Page 3: Toward Knowledge Discovery in Databases Attached to Grids

Institut für Softwarewissenschaft - Universität Wien

P.Brezany3

Talk Outline

• Data Mining on the Grid – Background Information

• Application Examples

• Architecture of a Traditional Data Mining System

• GridMiner – A framework for Data Mining on the Grid

• GridMiner Architecture

• Functional and Data Access Model

• Conclusions

Page 4: Toward Knowledge Discovery in Databases Attached to Grids

Institut für Softwarewissenschaft - Universität Wien

P.Brezany4

Data Mining on the Grid

• Data mining on the Grid (DMG) : finding unknown data patterns in an environment with geographically distributed data and computation.

• Data may be highly heterogeneous with a high update frequency

• A good DMG algorithm analyzes data in a distributed fashion with modest data communication overhead.

• A typical DMG algorithm involves local data analysis followed by the generation of a global data model.

Page 5: Toward Knowledge Discovery in Databases Attached to Grids

Institut für Softwarewissenschaft - Universität Wien

P.Brezany5

Application Examples

• Finding out the dependency of the emergence of hepatitis-C on the weather patterns: access to a large hepatitis-C DB at one location and an environmental DB at another location.

• 2 major financial organizations want to cooperate. They need to share data patterns relevant to the data mining task, they do not want to share the data since it is sensitive - combining the databases may not be feasible.

• Federating Brain Data Project – Integrating several neuro-science DBs

• A major multi-national corporation wants to analyze the customer transaction records for quickly developing successful business strategies. - It has thousands of establishments through out the world

- Collecting all the data to a centralized data warehouse, followed by analysis using existing commercial data mining software,takes too long.

Page 6: Toward Knowledge Discovery in Databases Attached to Grids

Institut für Softwarewissenschaft - Universität Wien

P.Brezany6

Telemedical ApplicationsAMG – Austrian Medical Grid

Web

Raw Medical Data

Reconstructed Medical Data

Derived Medical DataDatabase Database

Page 7: Toward Knowledge Discovery in Databases Attached to Grids

Institut für Softwarewissenschaft - Universität Wien

P.Brezany7

Telemedical Collaboration - Example

A patient living in a remote village has a heart problem.

An EEG is taken by the local doctor and all the patient’s detailsare stored in the doctor’s PC based telemedical system.

MRI and CT scans are taken within different departments of ageneral hospital and stored in the telemedical DB. A consultantcompiles a report and saves it in the DB.

If necessary, in a specialized clinic a 3D ultrasound scan is takenand further report compiled.

Requiring complicated surgery, an external specialist using VirtualReality techniques defines how the surgery should be planned.The resulting operation is placed on video for, e.g., education.

Data mining support/assistance is needed.

Page 8: Toward Knowledge Discovery in Databases Attached to Grids

Institut für Softwarewissenschaft - Universität Wien

P.Brezany8

Architecture of a Data Mining System

Graphical user interface

Pattern evaluation

Data mining engine

Database or data warehouse server

Knowledge base

Database Datawarehouse

FilteringData cleaning, data integration

Page 9: Toward Knowledge Discovery in Databases Attached to Grids

Institut für Softwarewissenschaft - Universität Wien

P.Brezany9

On Line Analytical Mining (OLAM)

Page 10: Toward Knowledge Discovery in Databases Attached to Grids

Institut für Softwarewissenschaft - Universität Wien

P.Brezany10

GridMiner – A Framework for Data Mining on Grids

System Requirements:- Algorithm and data publishing and integration- Compatibility with grid infrastructure and Grid awareness- Openness- Scalability- Security and data privacy

Functionality requirements:- Mining different kinds of knowledge in databases- Incremental data mining algorithms- Interactive mining of knowledge at multiple levels of abstraction

Page 11: Toward Knowledge Discovery in Databases Attached to Grids

Institut für Softwarewissenschaft - Universität Wien

P.Brezany11

GridMiner (Layered) Architecture(Based on the K.F. Jeffery´s idea)

Page 12: Toward Knowledge Discovery in Databases Attached to Grids

Institut für Softwarewissenschaft - Universität Wien

P.Brezany12

Functional and Data Access Model

MDS

Page 13: Toward Knowledge Discovery in Databases Attached to Grids

Institut für Softwarewissenschaft - Universität Wien

P.Brezany13

Example: Mining Patterns for Data Classification and

Associations

use database dat1, dat2mine classificationsanalyze credit_ratingusing g_parsimonydisplay as tree

use database DBs attributesmine associationsusing method attributesdisplay as rules

Page 14: Toward Knowledge Discovery in Databases Attached to Grids

Institut für Softwarewissenschaft - Universität Wien

P.Brezany14

Knowledge Grid Architecture Layers

Generic Grid and Data Grid Services

KnowledgeDirectory Service

Resource AllocationExecution Management

DataAccess Service

Tools and AlgorithmsAccess Service

Execution PlanManagement

Result Present.Service

High level layer

Core layer

Page 15: Toward Knowledge Discovery in Databases Attached to Grids

Institut für Softwarewissenschaft - Universität Wien

P.Brezany15

Conclusions

• Grid data mining is a relevant research topic• GridMiner approach may contribute to this research

domain• Collaborations are needed• IPG (Information Power Grid) is the only Grid project,

which wants to addresss knowledge discovery issues• Looking for a pilot application(s)• Open issues

- basic Grid technology: Globus, DataGrid,

Jini, JXTA ?

Page 16: Toward Knowledge Discovery in Databases Attached to Grids

Institut für Softwarewissenschaft - Universität Wien

P.Brezany16

Data Storage and the Components

Site A Site B Site C Site D

Preprocesing Preprocessing Preprocessing Preprocessing

Local DM Local DM Local DM Local DM

Construction of the Global Model

GUI Site E