August 2000 Gio vdR 1 Increasing the Precision of Semantic Interoperation Gio Wiederhold Stanford...

34
August 2000 Gio vdR 1 Increasing the Precision of Semantic Interoperation Gio Wiederhold Stanford University August 2000 Reind van de Riet celebration Thanks to Jan Jannink, Shrish Agarwal, Prasenjit Mitra, Stefan Decker.
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of August 2000 Gio vdR 1 Increasing the Precision of Semantic Interoperation Gio Wiederhold Stanford...

August 2000 Gio vdR 1

Increasing the Precision of

Semantic Interoperation

Gio Wiederhold

Stanford University

August 2000

Reind van de Riet celebration

Thanks to Jan Jannink, Shrish Agarwal, Prasenjit Mitra, Stefan Decker.

August 2000 Gio vdR 2

Reind van de Riet, music and computing

August 2000 Gio vdR 3

Achievements• Computer Science-based Formalization

• Effective Dissemination - students and writings

• Powerful Database Technology

• Language-sensitive Methods

• Ubiquitous Databases

• World-wide interconnectivity

• All the Information one might want

somewhere . . . .

August 2000 Gio vdR

Information overload Data starvation

• More databases– public & corporate

• Faster communication– digital– packeting: TCP-IP, ATM

• World-wide connectivity– internet– world-wide web

• Disintermediation– ubiquitous publishing

August 2000 Gio vdR

Data and Knowledge

Information is created at theconfluence ofdata -- the state & knowledge -- the ability to select and project the state into the future

Knowledge LoopKnowledge Loop

ExperienceExperience

ActionAction

Data LoopData Loop

StorageStorageEducationEducation

SelectionSelection

IntegrationIntegration

AbstractionAbstraction

Decision-makingDecision-making

State changesState changes

RecordingRecording

August 2000 Gio vdR 6

Language issues

• Our languages are specialized for our needs– efficiency of expression: minimal symbols

Examples: • carpenter versus householder domain• surgeon versus pathologist

• World-wide communication changes ranges– geographical locality is less relevant– conceptual locality is important

• Business requires precision, including that words map consistently to instances

knowledge controls use of data

August 2000 Gio vdR 7

Transform Data to Information

data and simulation resources

value-added services

decision-makers at workstationsApplication Layer

Mediation Layer

Foundation Layer

August 2000 Gio vdR 8

Heterogeneity among Domains

If interoperation involves distinct

domains mismatch ensues• Autonomy conflicts with consistency,

– Local Needs have Priority,– Outside uses are a Byproduct

Heterogeneity must be addressed• Platform and Operating Systems • Representation and Access Conventions • Naming and Ontology

August 2000 Gio vdR 9

Semantic Mismatches

Information comes from many autonomous sources• Differing viewpoints (by source)

– differing terms for similar items { lorry, truck }

– same terms for dissimilar items trunk(luggage, car)

– differing coverage vehicles (DMV, AIA)

– differing granularity trucks (shipper, manuf.)

– different scope student museum fee, Stanford

• Hinders use of information from disjoint sources – missed linkages loss of information, opportunities– irrelevant linkages overload on user or application

program

• Poor precision when merged

ok for web browsing , poor for business

August 2000 Gio vdR 10

Need for precision

adapted from Warren Powell, Princeton Un.

data

err

ors

information quantity

human lim

it

acceptable limit

hum

an w

ith to

ols?

Information Wall

More precision is needed as data volume increases--- a small error rate still leads to too many errors False Positives have to be investigated ( attractive-looking supplier - makes toys apparent drug-target with poor annotation )

False Negatives cause lost opportunities, suboptimal to some degree

False positives = poor precision typically cost more thanfalse negatives = poor recall

August 2000 Gio vdR 11

Means to achieve precision

• Reduce redundancy– omit similar results from alternate sources

reports, workshop papers, journals, books

• Reduce false positives– recognize contextual domains *

• the same word refers to different object types nail (carpentry, anatomy), miter (carpentry, religion)

• Abstract findings to higher levels– Linguistic processing based on customer model

medical case studies have similar formats

August 2000 Gio vdR 12

Proposed Language Solutions

Specify and define terminology usage: ontology

• Domain-specific ontologies XML DTD assumption– Small, focused, cooperating groups– high quality, some examples - genomics, arthritis, Shakespeare plays

– allows sharable, formal tools – ongoing, local maintenance affecting users - annual updates

– poor interoperation, users still face inter-domain mismatches

• Cannot achieve globally consistency – wonderful for users and their programs– too many interacting sources– long time to achieve, 2 sources (UAL, BA), 3 (+ trucks), 4, … all ? – costly maintenance, since all sources evolve – no world-wide authority to dictate conformance

August 2000 Gio vdR 13

Domains and Consistency .

• a domain will contain many objects• the object configuration is consistent• within a domain all terms are consistent &• relationships among objects are consistent

• context is implicit

No committee is needed to forge compromises * within a domain

Compromises hide valuable details

Domain Ontology

August 2000 Gio vdR 14

Many ontologies Interoperation

Have devolved maintenance onto many domain-specific experts / authorities

• Many applications span multiple domains

• Need an algebra to compute composed ontologies that are limited to their articulation terms

• enable interpretation within the source contexts

SKC

August 2000 Gio vdR 15

Sample Operation: INTERSECTION

Source Domain 1:Owned and maintained by Store

Result contains shared terms,useful for purchasing

Source Domain 2:Owned and maintainedby Factory

Articulation

August 2000 Gio vdR 16

Tools to create articulations

Combine ontology graphs with expert selection based on spelling, graph matching, and a nexus derived from a dictionary (O.E.D.)

Veh

icle

reg

istr

atio

n

on

tolo

gy

Veh

icle

sale

s o

nto

log

y

Suggestionsfor articulations

August 2000 Gio vdR 17

An Ontology Algebra

A knowledge-based algebra for ontologies

The Articulation Ontology (AO) consists of matching rules that link domain ontologies

Intersection create a subset ontology keep sharable entries

Union create a joint ontology merge entries

Difference create a distinct ontology remove shared entries

August 2000 Gio vdR 18

INTERSECTION support

Store Ontology

Articulation ontology

Matching rules that use terms from the 2 source domains

Factory Ontology

Terms usefulfor purchasing

August 2000 Gio vdR 19

Other Basic Operations

typically priorintersections

UNION: mergingentire ontologies

DIFFERENCE: materialfully under local control

Arti-culation ontology

August 2000 Gio vdR 20

Features of an algebra

Operations can be composed

Operations can be rearranged

Alternate arrangements can be evaluated

Optimization is enabled

The record of past operations can be

kept and reused when sources change

August 2000 Gio vdR 21

Knowledge CompositionArticulationknowledgefor U

U

U

(A B)U

(B C)U

(C E)

Knowledge resource

B

Knowledge resource

A

Knowledge resource

C Knowledge

resourceD

U

(C D)

U

(B C)

Articulationknowledge

Composed knowledge forapplications using A,B,C,E

Knowledge resource

E

U

(C E)

Legend:

U : unionU: intersection

Articulationknowledgefor (A B)

U

August 2000 Gio vdR 22

Domain Specialization .• Knowledge Acquisition (20% effort) &• Knowledge Maintenance (80% effort *)

to be performed• Domain specialists• Professional organizations• Field teams of modest size

Empowermentautonomouslymaintainable

* based on experience with software

August 2000 Gio vdR 23

Summary

To sustain the growth of web usage1. The value of the results has to keep increasing

precision, relevance not volume2. Value is provided by mediating experts,

encoded as models of diverse resources, diverse customersProblems being addressed redundancy mismatches quality maintenance

Need research to develop tools for these tasks

} Clear models

August 2000 Gio vdR 24

Integration Science ?

IntegrationScience

IntegrationScience

ArtificialIntelligence

knowledge mgmtlinguistic models

uncertainty

ArtificialIntelligence

knowledge mgmtlinguistic models

uncertainty

Systems Engineering

analysisdocumentation

costing

Systems Engineering

analysisdocumentation

costing

Databasesaccessstoragealgebras

scalability

Databasesaccessstoragealgebras

scalability

August 2000 Gio vdR 25

Background Slides

In Handout distributed at the VU, but not actually used

August 2000 Gio vdR 26

Sample Processing in HPKB

• What is the most recent year an OPEC member nation was on the UN security council?– Related to DARPA HPKB

Challenge Problem– SKC resolves 3 Sources

• CIA Factbook ‘96 (nation)

• OPEC (members, dates)

• UN (SC members, years)

– SKC obtains the Correct Answer

• 1996 (Indonesia)

– Other groups obtained more,

but factually wrong answers

– Problems resolved by SKC* Factbook has out of date

OPEC & UN SC lists

– Indonesia not listed

– Gabon (left OPEC 1994)

* different country names

– Gambia => The Gambia

* historical country names

– Yugoslavia

• UN lists future security council members

– Gabon 1999

• intent of original question

– Temporal variants

August 2000 Gio vdR 27

Tools to create articulations

Combine ontology graphs with expert selection based on spelling, graph matching, and a nexus derived from a dictionary (O.E.D.)

Veh

icle

reg

istr

atio

n

on

tolo

gy

Veh

icle

sale

s o

nto

log

y

Suggestionsfor articulations

August 2000 Gio vdR 28

Tools to create articulations

Graph matcherforArticulation- creatingExpert

Vehicle ontology

Transport ontology

Suggestionsfor articulations

August 2000 Gio vdR 29

continue from initial point

Also suggest similar terms for further articulation:

• by spelling similarity,• by graph position• by term match repository

Expert response:1. Okay2. False3. Irrelevant to this articulation

All results are recorded

Okay’s are converted into articulation rules

August 2000 Gio vdR 30

Candidate Match Repository

Term linkages automatically extracted from 1912 Webster’s dictionary *

* free, other sources .have been processed.

Based on processing headwords definitions using algebra primitives

Notice presence of 2 domains: chemistry, transport

August 2000 Gio vdR 31

Using the match repository

August 2000 Gio vdR 32

Navigating the match repository

August 2000 Gio vdR 33

Primitive Operations

Unary• Summarize -- structure up• Glossarize - list terms• Filter - reduce instances• Extract - circumscription

Binary • Match - data corrobaration• Difference - distance

measure• Intersect - schem discovery• Blend - schema extension

Constructors• create object• create setConnectors• match object• match setEditors• insert value• edit value• move value• delete valueConverters• object - value• object indirection• reference indirection

Model and Instance

August 2000 Gio vdR 34

Future: exploiting the result

Processing & query evaluation is best performed within Source

Domains & by their engines

Result has linksto source

Avoid n2 problem of interpretermapping as stated by Swartout as an issue in HPKB year 1