1 Modeling and Language Support for the management of PBMS Manolis Terrovitis Panos Vassiliadis...
-
date post
20-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of 1 Modeling and Language Support for the management of PBMS Manolis Terrovitis Panos Vassiliadis...
1
Modeling and Language Support for the management of PBMS
Manolis TerrovitisPanos Vassiliadis
Spiros SkiadopoulosElisa Bertino
Barbara CataniaAnna Maddalena
3
Motivation
Huge amounts of data are produced.Interesting knowledge has to be detected
and extracted.Knowledge extraction techniques (i.e.,
Data Mining) are not sufficient: Huge amounts of results (clusters, association
tules, decision trees etc) Arbitrary modeling of results
4
Motivation (con’t)
We need to be able to manipulate the knowledge discovered!
The basic requirements: A generic and homogenous model for patterns. Well defined query operators. Efficient storage.
5
The Patterns and PBMS [Rizzi et. al. ER 2003]
Patterns are compact and rich in semantics representations of raw data. Clusters, association rules, decision trees e.t.c.
Pattern Base Management System Patterns are treated as first class citizens Pattern-based queries Approximate mapping between patterns and
raw data
6
Contributions
We formally define the logical foundations for pattern management
We present a pattern specification language
We introduce queries and query operators
8
PBMS architecture
Pattern Space: Pattern Types Pattern Classes Patterns
Intermediate Results
Data Space
PatternClasses
PatternTypes
Patterns
Member of Instanceof
DataMining
Algorithms
PatternRecognitionAlgorithms
DataSpace
PatternSpace
IntermediateMappings
DB1 DB2
9
The patterns
Patterns hold information for: the data source the structure of the pattern The relation between the structure and the
source, in an approximate logical formula.
10
Pattern - Cluster Example
Pid 337
Structure [CENTER: [X: 21, Y: 1200], RAD: 12 ]
Data EMP: {[Age, Salary]}
Formula (t.Age - 21)2 + (t.Salary - 1200)2 ≤ 12 2 where t EMP
11
Pattern type - example
Name Disk
Structure Schema [CENTER: [X:real, Y: real], RAD: real ]
Data Schema REL: {[X: real, Y: real]}
Formula Schema (t.X - CENTER.X)2 + (t.Y - CENTER.Y )2 ≤RAD2
where t REL
12
The formula
An intentional description of the pattern-data relation pros:
Efficiency, more intuitive results cons:
Accuracy
14
The formula (con’t)
The formula is a predicate:
fp(x,y) where x Source,y Structure Expressiveness.
Functions and predicates
Safety. Range restriction.
Queries employing the formula are n-depth domain independent.
16
Query Operators
Query operator classes: Database operators Pattern Base operators Crossover database operators Crossover pattern base operators
17
Crossover Operators
PID
data
structure
formula
PatternSpace
DataSpace
Exact
Exact
Approximation
Exact evaluation, via the intermediate mappings
Approximate evaluation, via the formula
18
Crossover Operators
Database Drill-Through: Which data are represented by
these patterns? Data-Covering: Which data from this dataset
can be represented by this pattern?
Pattern Base Pattern-Covering: Which of these patterns
represent this dataset?
21
Summary
Formal specification of basic PBMS concepts
Investigation on the representation of the pattern-data relation
Formal definition of query operators