1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science...

Post on 20-Dec-2015

221 views 2 download

Transcript of 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science...

1

CoBase: Scalable and Extensible Cooperative Information System

Wesley W. ChuComputer Science Department

University of California, Los Angeles

http://www.cobase.cs.ucla.edu

2

Conventional Query Answering

Need to know the detailed database schemaCannot get approximate answersCannot answer conceptual queries

Cooperative Query AnsweringDerive approximate AnswersAnswer Conceptual Queries

3

Find a seaport with railway facility in Los Angeles

CoBase ServersHeterogeneousInformation Sources

CoBase provides: Relaxation Approximation Association Explanation

Find a nearby friendly airport that can land F-15

Domain Knowledge

Find hospitals with facility similar to St. John’s near LAX

Cooperative Queries

4

Generalization and Specialization

More Conceptual Query

Specific Query

Conceptual Query Conceptual Query

Specific Query

Generalization

SpecializationGeneralization

Specialization

5

Type Abstraction Hierarchy (TAH)

Chemical-Suit Size TAH(A non-numerical TAH) All_Sizes

Large_SizeSmall_Size

Very_Small

Small_to_Medium

Large_to_Extra_Large

Very_Large

XL XXLLMSXXSXXXS

Provide multi-level knowledge representations

6

Type Abstraction Hierarchy (TAH)

CA

N. CAS. CA C. CA

SanJose

PaloAltoSacramento

DavisSanDiego

LongBeach

LA SF

(Location Example)

7

Relaxation Agent

query conditionsconstraints

Use knowledge-based approach (generalization

and specialization via Type Abstraction Hierarchy)

to relax the followings for matching:

8

Query Relaxation

Yes

Query

Display

AnswersRelaxAttribute Database

No

QueryModificationTAHs

9

10

Visualization of Relaxation Process

Query: Find seaports in the given region.

given region

relaxed region

11

12

Relaxation Control Primitives

not-relaxable runway-length

relaxation-order (runway length,

location)

preference-listunacceptable-listanswer-sizerelaxation-level

13

Relaxation Primitives

^ (approximate) ^ 9 am

betweennear-to (context-sensitive) Airport near-to

LAX Restaurant near-to

UCLA

similar-to Airport similar-

to LAX base-on (traffic,runway)

within

14

Similar-to

Find all airports in Tunisia similar to the Bizerte airport based on runway length and (more importantly) runway width.

select aport_name, runway_length, runway_widthfrom runways, countrieswhere aport_name similar-to ‘Bizerte’

based-on ((runway_length 1.0) (runway_width 2.0)) and country_state_name = ‘Tunisia’ and countries.glc_cd = runways.glc_cd

15

Similar-to Result

APROT_NM LENGTH WIDTH RANKBezerte 8000 148 0.00El Borma 7200 144 0.09Monastir 9700 137 0.20Jerba 10171 148 0.24Bjedeida 6000 122 0.27

Similar-to module ranks the returned answersaccording to mean-squared error.

16

Unacceptable List Operator

NETunisia

CentralTunisia

NWTunisia

SWTunisia

Tunisia

Bizerte El Borma...

CentralTunisia

SWTunisia

Tunisia

Gafsa El Borma

Type Abstraction Hierarchy Trimmed TAH

Avoid Northern Tunisia!

CoBaseRelaxationManager

Constraint

Gafsa

17

TAH Generation for Numerical Attribute Values

Relaxation Error Difference between the exact value and the

returned approximate value The expected error is weighted by the

probability of occurrence of each value

DISC (Distribution Sensitive Clustering) is based on the attribute values and frequency distribution of the data

18

TAH Generation for Non-numerical Attribute Values

Pattern Based Knowledge Induction (PBKI)

Rule-based approachClusters attribute values into TAH based on other attributes in the relation (i.e., Inter-Attributes Relationships)Provides attribute correlation value (measure how well the rules applied to the databases)

19

Type Abstraction Hierarchy (TAH)

Location Name Runway Length

All

Short Medium Long

0 ... 700 700 ... 1K 1K ... 5K

Tunisia

NE Tunisia

Bizerte

Tunis

Djedeida

CentralTunisia

SW Tunisia

El Borma

...

Provide multi-level knowledge representations

20

Associative Query Answering

Provide relevant information not explicitly asked by the userUser Query: List all airports with runway length between 8500

and approximately 10000 feet

Airport Name Runway Length (feet)Jerba 10171

Monastir 9700Tunis 10500

Weather Runway QualitySunny GoodRain Good

Foggy Damaged

Military or Civilian Flag

Refrigerated Storage Capacity (Tons)

CC 0.00C 1000.00

Query Answers

Associated Attributes and Answers Associated Attributes and Answers

User Type = Pilot User Type = Planner

21

CoBase and GLADIntegration

Wesley W. Chu

22

CoBase FunctionalityProvide approximate matching Find HETs with capacity of approximate 5-ton

Provide conceptual query answering Find “Earth Moving” Equipment

Provide content-sensitive spatial queries Find storage sites near selected location (Integration with MATT map server)

Provide relaxation control Relaxation order Not-relaxable At-least (answer set, quantity on hand)

23

Cooperative Operations Added to GLADImplicit Query RelaxationExplicit Query Relaxation Approximate operator Similar-to/based-on Spatial relaxation

Relaxation Control Relaxation-order Not-relaxable At-least (answer-set size, quantity on hand)

24

CoBase Features Added to GLADEnhance GLAD queries with cooperative operators (similar-to, relaxation-order, etc.)Display the query relaxation process modified query conditions (value, spatial) type abstraction hierarchies

Rank returned answers with similarity measurese.g., spatial relaxation ranks answers according to

their distance from the selected location

25

CoBase and GLAD TIE

ReportCollection

Report QueryConstructor

Filter

Editor

ObjectCache

DisplayGenerator

QueryCollection

GLAD

CoBase QueryEditor

CoBaseRelaxationManager

KnowledgeBase

DataCacheCoBase

Data Source

Manager

Databases

NSNs

SpatialArea

Selection

26

GLAD Query

Find NSNs of aircraft with passenger capacity > 10, combat type = 'I', capacity weight <= 2 tons and price < 700,000. select nsn, price, pax_capacity_qty, capacity_wt_stonfrom nsn_descriptionwhere (upper(class) = '7'

and upper(cbs_category_nomen) = 'AIRCRAFT'

and price < 700000and pax_capacity_qty > 10and upper (combat_type) = 'I'and capacity_wt_ston <= 2)

27

CoGLAD Query with Relaxation Control Operators

Find NSNs of aircrafts with passenger capacity > 10, combat type = 'I',capacity weight <= 2 tons and price < 700,000. Attribute passengercapacity is not relaxable. Relax price first and then capacity weight. select nsn, price, pax_capacity_qty, capacity_wt_stonfrom nsn_descriptionwhere (upper(class) = '7'

and upper(cbs_category_nomen) = 'AIRCRAFT'and price < 700000and pax_capacity_qty > 10and upper (combat_type) = 'I'and capacity_wt_ston <= 2)

not-relaxable pax_capacity_qtyrelaxation-order price capacity_wt_ston

28

CoGLAD Querywith Similar-to OperatorFind aircraft similar to NSN = '0000IB0000961' based on the attributes price, passenger capacity and air mileage. Passenger capacity has a weight of 8 and price and air mileage has a weight of 1.

select nsnfrom nsn_descriptionwhere upper(nsn) similar-to '0000IB0000961'

based-on ((price 1.0) (pax_capacity_qty 8.0) (air_mileage 1.0))

at-least 4

* '0000IB0000961' is an answer from the previous query

29

CoGLAD Querywith Approximate Operator

Find DLA stock report with NSN like ‘%8340% (FSC for tents and tarpaulin) and on-hand quantity is approximate 150.

select nsn, ricfrom dla_stock_reportwhere nsn like ‘%8340%’ and

on_hand_quantity = ~150

30

Adding Constraints to a Query

GLAD queryselect nsn, ricfrom dla_stock_reportwhere nsn like ‘%8340%’ and

nomenclature like ‘%TARP%’

Query with added constraintsselect nsn, ricfrom dla_stock_reportwhere nsn like ‘%8340%’ and

nomenclature like ‘%TARP%’ and on_hand_quantity = ~150

andsize_in_square_feet = 350

31

Example of Spatial Relaxation

NSNsselected an area on the mapconstraint: quantity on hand

CoBaseRelaxationManager

satisfyconstraints

Yes

No

return the answers

QueryProcessing

relax the selected areabased on the context-sensitive TAHs

32

Spatial Relaxation with Relaxation Controlrelaxation-order: size, (latitude, longitude)

not-relaxable: price

at-least: value: size of the tarpaulin quantity on hand: relax until enough

quantity on hand (specified by the user) is obtained

33

Scalable and Extensible CoBase Architecture

34

Mediator Inter-Communications via KQML

ModuleObjects

APIs

Content LanguageDataActions

CoBaseOntology

Mediator A

Module A

CoBase Ontology

CoBase Content Language

KQML

Mediator B

Module B

CoBase Ontology

CoBase Content Language

KQML

35

36

Query Answers Without CoBase

Query: find chemical suits

37

38

39

40

41

42

43

Electronic Warfare

Identify and locate sources of radiated electromagnetic energyDetermine emitter type based on the operating parameters of observed signals: Radio Frequency (RF) Pulse Repetition Frequency (PRF) Pulse Duration (PD) Scan Period (SP) other operating parameters

Determine platform sites near the line of the bearing of an emitter

This research is a joint effort between CoBase and Lockheed Martin Communication Systems (Russ

Frew, et al.), Camden, NJ

44

Performance Improvement by Using CoBase in EW

Conventional DB CoBaseCase 1 Case 2 Case 1 Case 2

identified 90.00% 30.00% 100.00% 85.90%id/ranking 100.00% 36.00% 100.00% 98.80%relaxation 0.00% 0.00% 95.90% 99.80%

Conventional DB: parameter ranges from emitter specificationsCoBase:

DB: peak parameters (RF,PRF) and parameter ranges (PD,SP)KB: TAHs based on RF and PRF peak parameters

TAHs based on PD and SP parameter rangesCase 1: emitter signals without noiseCase 2: add noise - PD & SP (10%), PRF (5%), RF (2.5%)Sample Size: 1000 signals Emitter Types: 75

This research is a joint effort between CoBase and Lockheed Martin Communication Systems (Russ

Frew, et al.), Camden, NJ

45

Current CoBase Users and Applications

ARPI members ISI Unisys

Enchance Query Capabilities in TransportationDomain (ARPI TARGET): query relaxation, association, and explanation

UCLA KMeD Project Medical School

Improve Search in Medical Images (X-rays, MRs) approximate matching of image features and

contents explanation of approximate matching quality

Hughes Research Lab Integrate Schema in Heterogeneous Databases approximate matching of attributes and views

Lockheed/Martin Marietta

Emitter and Platform Identification approximate matching of observed emitter signals relaxation of regions to identify emitter platforms

BBN Enchance DOD Logistic Anchor Desk (GLAD) query relaxation and spatial relaxation

46

Conclusions

Provide user and context sensitive query relaxations (structured and unstructured data)Provide additional information (associative query answering) based on past casesCoSQL (Cooperative SQL) similar-to, near-to, approximate relaxation control operators

GUI map server, high-level query formation

47

48

CoSent: An Active Data Base Technology

Natural language-like rule supports conceptual & approximate terms Decompose natural language-like rule to low level rules via knowledge based (TAH) Mimic human cognitive process and thus ease in rule specificationEase in rule maintenance

49

CoSent: An Active Database Technologies

Trigger with high-level rules containing conceptual term (e.g., bad, heavy) and approximate operators (e.g., similar-to, near-

to, approximate)Allow trigger conditions to be specified with fuzzy and conceptual termsMimic human cognitive expression

CoSent monitors temporal composition events and executes rules with conceptual and approximate terms.

50

Key Features of CoSent

User defined rules transformed into low-level range values via knowledge base--Type Abstraction Hierarchies (TAHs)TAHs are typically generated from data sources automaticallyLeveraged on conventional DBMS (e.g., Oracle, Sybase, Teradata) triggering systemsRule definition is either specified by domain expert or derived by data mining technologies

51

Example of Rule Definitions with Data Mining Technology

Find attributes that frequently appear together for a given target attribute. If bad road condition and also bad weather,

then cause traffic congestion. If a person wrote many bad checks and also

has past eviction, then this person is a poor credit risk.

Based on the frequency of occurrence, the derived rules can be ranked according to certain information measure.

52

Conventional vs. Natural Language-Like Rules

Natural Language-Like RulIf the weather turns bad,

then notify all affected units in that region and all those that are near to that region.

Conventional RuleIf wind_speed > MAX_WIND_SPEED and

wave_height > MAX_WAVE_HEIGHTthen notify affected units in regions.

53

Natural Language-Like Rule Specifications

Example 2If the aircraft has a fuel contamination problem and the aircraft type is similar-to‘C-5’ based on the fuel type and fueling method, then notify the authority

Example 1If the number of departures of large cargo carrier (e.g., C-5, C-141) becomes significantly low in the past seven days, notify the Air Mobility Command.

54

Example

Wind Speed(meters/second)

14.913.512.212

11.810.610.510108.37.98.17.77.1

Wave Height(meter)

3.33.13.12.62.82.32.72.52.52.32.222

1.8

Wind Speed(meter/second)

7.47.77

6.56.66.56.66.45.95.76

4.54

3.7

Wave Height(meter)

1.91.71.61.51.61.41.41.51.51.41.61.41.31.2

Wind Speed is the hourly average over an eight-minute period for buoys and a two-minute period for land stations

Wave height is sampled in a 20-minute period

DoD Transportation PlanningWeather Report Table

55

TAH Example

Wave Height[0.6, 7.2]

VERY LOW[0.6, 1.25]

LOW[1.25, 1.75]

HIGH[1.75, 2.45]

VERYHIGH

[2.45, 7.2]

Wave Height

56

A Portionof WaveHeightTAH

57

Triggering Based on Temporal Composite Events

Notify the commander if within the past seven days, the total departure of C-5 is significantly low and the filter problem on C-5 is extremely high.

C-5 Departure

Low9-134.5

High134.5-208

Very Low53-134.5

Signt. Low9-53

Signt High162-208

Very High134.5-162

C-5 Filter Problem

Low0-53

High53-79

Very Low36-53

Extra. Low0-36

Ex High60-79

Very High53-60

58

Natural Language-Like Rule Translations

RuleDefinition

TAH

Conventional triggering system (e.g.,Oracle,Sybase,Teradata)

Low-level rules

Natural Language-Like Rules

Rule Parser

Rule Rep

Rule Decomposer

Rule Translator

Rule Translation/Relaxation

59

CoSent Architecture

TriggerAction(output)

Rule Parser

RelaxationEngine

TAHs

Rule Base

RuleManager

EventManager

ActionManager

Natural Language-Like Rule

Composite Event Specification and Notification

CoSent Server

(input)

(input/output)

Rule Translation/Relaxation

Commercial relational database systems (e.g., Oracle, Sybase, Teradata, etc.)

60

CoSent Demo

Natural Language-like rule with conceptual terms :“very high wave height” and ”very strong wind speed”Natural language-like rule with approximate term “nearby” and conceptual term “bad weather”Install trigger by drag-and-drop on the desired location on the map

61

Natural Language-Like Rule

Natural language-like rule containing conceptual terms, such as wave_height = “very-high” and wind_speed = “very-strong”, can be translated to range values by domain knowledge. For instance, type abstraction hierarchy. Natural language-like rules reduce the number of rules, thus easing rule maintenance

62

63

64

65

66

67

Rules With Approximate Terms

Rules can contain approximate terms, such as near-by and approximate, thus ease in rule specificationThe Trigger can be installed on the desired location on a map by drag-and-drop methodThe near-by region affected by the bad weather condition is specified by the trigger condition shown by a red circle

68

69

70

71

72

73

74

75

Map Server Architecture

76

Current Capabilities of Map Server

Visualization of Query Answers Icons Paths

Enter Query Constraints GraphicallyVisualization of Query Relaxation Process

77

Visualization of Relaxation Process

Query: Find seaports in the given region.

given region

relaxed region

78

Explanation Agent

Based on process traces and invocation rules, generate English-like explanation of: Relaxation process Quality of approximate matching Further explanation on definitions and terms in

explanation

79

Explanation of Relaxation Process

80

Relaxation Primitive: within

81

Extend near-to Primitive Points to Regions

82

Dynamic Nearness

Uses transaction history to identify nearness between tuples and values

If two tuples (or attribute values) appear together in a query answer, then that is a piece of evidence that they should be clustered together.

Gather evidence over time

Evolve the hierarchy

83

The BOOKS Relation

84

Schematic of a Browsing System

85

Schematic of a Query Modification System

86

The Links Between Tuples in BOOKS

87

Dynamic Links After Two Queries

88

Links with Counts

89

Number of Links with Threshold Value

90

Number of Links is determined by Maximum Answer Set Size

91

Query Formation From High-LevelConcepts for Relational

Databases

Guogen ZhangWesley ChuFrank MengGladys Kong

92

Outlines

OverviewSemantic Graph ModelHigh-Level Query Formation for SPJ queriesIncremental Query Formation for Complex QueriesConclusions

93

Overview: Query Formation

Based on semantic graph model, including user-defined relationshipsUser specifies requests and constraintsFormulate simple query by graph search technique Candidates ranked by information measure English-like query description

A complex query can be formulated by a series of simple queries

94

Related WorkQuery formulation as Steiner tree problem (Wald and Sorenson, 1984) limited to partial 2-tree graphs

Formulate simple Select-Project-Join (SPJ) queries via Universal Relation Model: no need to specify natural joins (Ullman 1988, Vardi, 1988)Object-oriented query path expression completion: partial order relationship between different path for ranking (Ioannidis and Lashkari, 1994)Query-by-Icon (QBI) [Massari and Chrysanthis, 1995]Natural language interfaces (text/voice): logical form to query

95

Semantic Graph Model

Weighted graph G=(V,E):Nodes: entities -- strong, weak, user-definedLinks: relationships -- ISA, HAS, simple, complex, user-defined For relational databases:

nodes: relations links: natural and user-defined joins Weight: information measure of a node or link

96

Query Feature

Query expression in a semantic graph

Query Topic, T: A set of Joins represented by links

Query Constraints, C: Query Conditions Query Aspect, A: Attribute list

97

A query topic for “aircraft can land on airports at geographical locations of countries”

airports

runwayscan land

have

is a located

airfield_chars

geoloc country

98

Semi-Automatic Generation of Semantic Model

Find natural joins through key and foreign key between nodes.User-defined links can be added into the graph model.Designers need to specify link types and assign names to all the elements in the graph.

99

Example of Semantic Model Generation

AIRPORT: APORT_NM, GEOLOC_TYPE, GLC_CD, ELEV_FT, …;key: APORT_NM.

RUNWAY: APORT_NM, RUNWAY_NM, GLC_CD, RUNWAY_LENGTH_FT,RUNWAY_WIDTH_FT, …; key: RUNWAY_NM.

GEOLOC: GLC_CD, GLC_NM, CY_CD, LATITUDE, LONGITUDE, …;key: GLC_CD.

COUNTRY: CY_CD, CY_NM, …; key: CY_CD.Links:

AIRPORT--RUNWAY: APORT_NM;AIRPORT--GEOLOC: GLC_CD;RUNWAY--GEOLOC: GLC_CD;GEOLOC--COUNTRY: CY_CD;

100

Information Measure

Information measure of a node or link, aI(a) = - log P(a)

where P(a) is the probability of a being used

in queries.Assume nodes and links are independent, for a subgraph with a set of elements A={ai | i = 1, …, n}, information measure is additive:

n

I(A) = SUM I(ai) i = 1

101

Information Measure (cont.)

Initial Information Measure:all the nodes = 1different nodes have a different value

Information measure is normalized and converted into counts

Probability of a node or a link is P(ai) = ci/cUpdate Information measureRanking based on Information measure, thus adapt to user feedback

102

Query Formulation

To formulate (simple) queries without knowledge of query language or database schema

Example:Find airports in Tunisia that can land a C-5 cargo plane

User input:Query aspect: AIRPORTS.APORT_NMConstraints: AIRCRAFT_AIRFIELD_CHARS.AC_TYPE_NAME = ‘C-5’

COUNTRY_STATE.CY_NM = ‘Tunisia’Links: CAN LAND

103

Formulated Query

SELECT R3.APORT_NMFROM AIRCRAFT_AIRFIELD_CHARS R0

AIRPORTS R3, COUNTRY_STATE R11GEOLOC R12, RUNWAYS R16

WHERE R0.AC_TYPE_NM = ‘C-5’AND R11.CY_NM = ‘Tunisia’AND R0.WT_MIN_AVG_LAND_DIST_FT <= R16.RUNWAY_LENGTH-FTAND R0.WT_MIN_RUNWAY_WIDTH_FT <= R16.RUNWAY_WIDTH_FTAND R11.GLC_CD = R3. GLC_CDAND R3.APORT_NM = R16.APORT_NMAND R11.CY_CD = R11.CY_CD

104

Query Completion as Graph Search Problem

Given: An incomplete input query topic Ti

Find a set of links to complete the topic (to make Ti connected)

Minimum Missing Information principle:The query completion candidate Tc (the missing links and nodes) for an incomplete input topic Ti contains the minimum information

105

Query Formulation Algorithm

Input: subgraph T of the semantic graph G Find candidates with the minimum Information

measure

Two methods used to limit the search scope: L-step-bound paths: paths that connect two

components with at most L links, to limit search within the neighborhood of the input subgraph

k-minimum completion candidates: only at most k candidates with minimum Information measure are kept (alpha-beta pruning)

106

Initial Components and 2-Step-BoundPaths For the “CAN LAND” Query

airportsrepair

(1)2

aircrafts airportshave authorize

1 2(2)runways

can land

airports

country

geolocat is a

1 1

geolocat located

1 1

geolocis a located

1 1

airportshave

1(3)

(4)

(5)

(6)

(a) Initial components (b) 2-step-bound paths

airfield_chars

airports

runways

runways

runways

airfield_chars

airfield_chars

country

country

airports

107

The Semantic Graph For theTransportation Domain

airports

runwayscan land

Relation Node

at

have

is a located

2

1

1 1

1

weather

airfield_chars

geoloc country

108

Incremental Query Formulation To assist user reach a complex query goal

with a series of simple queries The subsequent queries may depend on

results of preceding queries (derived relations)

Issues Incorporate derived relations into the

semantic graph Suggest missing attributes to link isolated

derived nodes to the graph

Incremental Query Formulation

109

Incremental Query Examples

Find airports in Tunisia.Which of these airports can land a C-5?What is the weather at these airports?

110

Incorporating Derived Relations

Source relation: contributes attributes to the derived relationsDerived relation: inherits properties of attributes from their source relationsDeriving link: links to the source relations through inherited keysInherited link: inherits links from the source relations

111

Extended semantic graph showing derived nodes, derived links and inherited links

airports

runwayscan land

Relation Node

at

have

is a located

2

1

1 1

1

Derived Node

Derived Link

Inherited Link

airfield_chars

weather

geoloc country

airporttunisiacanland airporttunisiacanlandweather

airporttunisia

112

Suggesting Key Attributes for a Query

Find source relations for the isolated derived relation.Suggest key of the source relations as attributes to include.

113

Concept and Attribute Specification Interface

114

Query Constraint Specification

115

Action Specification

116

English-Like Query Descriptionand the Formulated Query

117

Conclusions

Semantic graph model provides a basis for query formulation searchRanking of query candidates by information measure in formulation provides adaptive behaviorIncremental query formulation is effective for complex queriesGUI and voice interface can be built for query formulation from high-level concepts

118