1 Context Knowledge Management for Armament Safety Stuart Madnick, Lynn Wu MIT Sloan School of...

48
1 Context Knowledge Management for Armament Safety Stuart Madnick, Lynn Wu MIT Sloan School of Management {smadnick, linwu}@mit.edu

Transcript of 1 Context Knowledge Management for Armament Safety Stuart Madnick, Lynn Wu MIT Sloan School of...

1

Context Knowledge Management for Armament Safety

Stuart Madnick, Lynn Wu

MIT Sloan School of Management{smadnick, linwu}@mit.edu

2

Information Integration & Re-Use Projects Stuart Madnick ([email protected]):

Technologies Applications

Strategy, Policy & Legal Issues

Security

COntextINterchange

(COIN) (1)

Financial Services(account aggregation)

Security Analysis

Military Logistics

System DynamicsModeling ofState Stability (4)

StakeholderPerceptions ofSecurity (2)

Economic modelof alternatives toEU DatabaseDirective (3)

RFID ITInfrastructure

DataQuality

Total Data Quality (TDQM) Program (5)

MIT InformationQuality (MIT-IQ)Program

Pros and consOf data standards

Others …

Context Knowledge Management Approach to“Armament Safety Management”

3

Data bases

Appli- cations

OUTPUT PROCESSING

ODBC Driver

Web - Publishing

CONTEXT MEDIATION* Automatic Automatic conflict conflict detection detection and and conversionconversion- Derived data

- Source selection

- Source attribution

TRUSTED

AGENTS

INPUT PROCESSING

* Automatic web wrapping

- Semi-- Semi-structured structured texttext

-Multi--Multi-source source query plan query plan and and executionexecution

Browsers APPLICATIONS: Financial services,

electronic commerce, asset visibility, in-transit visibility.

Sources

Web Pages

Receivers

COntext INterchange (COIN) Project

4

Key COIN Technologies Web Wrapper

Extract selected information from web (HTML+XML) Allows web to be treated as large relational SQL database Can handle dynamic web sites, cookies, “login”, etc. Performs SQL Joins & Unions involving DB’s + Web sources

Context Mediator Resolve semantic (meaning) differences

Enable meaningful aggregation & comparison

5

Context: Multiple Perspectives . . . old lady or young lady ?

6

CONTEXT VARIATIONS:- GEOGRAPHIC ( US vs. UK )

- FUNCTIONAL (CASH MGMT vs. LOANS )

- ORGANIZATIONAL ( CITIBANK vs. CHASE )

Context Context

Context

Data: Databases Web data E-mail

?$ £

¥

Role Of Context05-06-07

07-06-05

06-05-07

7

Types of Context

Representational Ontological

Temporal

Example Temporal

Representational Currency: $ vs € Scale factor: 1 vs 1000

Francs before 2000, € thereafter

Ontological Revenue: Includes vs excludes interest

Revenue: Excludes interest before 1994 but incl. thereafter

8

The 1999 Overture

Unit-of-measure mixup tied to loss of $125Million Mars Orbiter

“NASA’s Mars Climate Orbiter was lost because engineers did not make a simple conversion from English units to metric, an embarrassing lapse that sent the $125 million craft off course. . . .

. . . The navigators ( JPL ) assumed metric units of force per second, or newtons. In fact, the numbers were in pounds of force per second as supplied by Lockheed Martin ( the contractor ).”

Source: Kathy Sawyer, Boston Globe, October 1, 1999, page 1.

9

Context Knowledge Management for Armament Safety Motivation

• Context Knowledge Management is an important challenge

• Semantic inconsistency is present in databases even in the military.

– For example, what does accident rate really mean?• Army Ground Accident Rate: # accidents/period-of-time

1. Per year2. Per month3. Per total actual personnel strength4. Per operational personnel strength

• How do we address such semantic inconsistencies?– How do we interpret different accident rates?– Need context knowledge management

10

Motivating Example

Disclaimer: The data above are artificial and is used to for demonstration only

In the military, there are many ways to measure safety.

1. Accident and injury rate can be measured in per week, per month or per year basis.

2. Nuclear testing data generally uses U.S. Customary measurement system, since most of the nuclear testing has been done in the US. To conform with international standards, the US government has been slowly trying convert the units to metric system. However, even with the metric system, there is a confusion between SI units and non SI units.

2500

Nuclear Test Safety Exclusion Zone

1

Radioactivity

77

Injury Rate

A123

Weapon

0.01

Accident Rate

Unit A

762

Nuclear Test Safety Exclusion Zone

170

Injury Rate

A123

Weapon

3.7 x 10^100.52

RadioactivityAccident Rate

Unit B

0.1/week↔0.52/year77/week/prs↔170/ps2500 feet ↔ 762 meters1 curie ↔ 3.7 x 10^10 bq

Contexts:

Per monthper pro-rated Strength

Per monthper personnel Strength

Semantic heterogeneity

Per week

Per year

FeetCurie

Meters Bq

11

Accident Rate

Injury Rate

Nuclear Test Safety Exclusion Zone (radius)

Radioactivity

Unit A Army Ground Accident Rate

(per week)

Active Army Military Injury Rate

(per month)

Meter Curie

Unit B Army Ground Accident Rate

(per year)

USAR & ARNG military Injury Rate

(per month)

Meter Becquerel

Unit C Army Ground Accident Rate

(per week)

Active Army Military Injury Rate

(per week)

Kilometer TBq

Unit D Army Ground Accident Rate

(per month)

Army Civilian Employee Injury Rate

(per month)

Feet MBq

So

urc

e C

on

tex

tSource Context Differences

12

Scenario

• A general wants to see a composite reports on all four units.– Direct queries on all four units would results incomparable data. – Without mediation, unit B seems to be doing poorly.

Accident Rate Injury Rate Exposure Radioactivity

Unit A 0.01 0.037 762 1

Unit B 2 0.08 762 37x1010

Unit C 0.05 0.08 0.762 0.037

Unit D 0.028 0.01 2500 37.04

13

Standardization: often not a solution• Works in small systems.• Legitimate reasons for diversity (e.g., different needs)

multiple standards– Unit 1 uses accident rate per year– Unit 2 uses accident rate per month

• Standards are costly to develop– DoD started data standardization in 1991; by 2000,

they only standardized ~1.2% of 1 million data elements*

• Standards do evolve over time– Nuclear tests used the US Customary Measurement

Standard. Now it is moving toward SI standard

* Rosenthal, A., Seligman, L. and Renner, S. (2004) "From Semantic Integration to Semantics Management: Case Studies and a Way Forward", ACM SIGMOD Record, 33(4), 44-50.

14

The Context Interchange Approach

ContextMediator

ReceiverContext

ConversionLibraries

SourceContext

SharedOntologies

Context ManagementAdministrator

Concept: Accident Rate

Per Week Per Year f()Per Week Per Year

Select accidentRateFrom unitA

Source Receiver

ContextTransformation0.01

accidentRate

0.52

Select accidentRate x 52From unitA

12

3

15

Aggregated results in receiver context of Unit C

Accident Rate

(Per week)

Injury Rate

(Per week)

Nuclear Test Safety Exclusion Zone

(Kilometer)

Radioactivity

(TBq)

Mediation No mediation

mediation No mediation

mediation No mediation

mediation No mediation

Unit A 0.1232 0.01 0.009 0.037 0.762 762 0.037 1

Unit B 0.038 2 0.02 0.08 0.762 762 0.037 37x10^10

Unit C 0.05 0.05 0.08 0.08 0.762 0.762 0.037 0.037

Unit D 0.07 0.028 0.1234 0.01 0.762 2500 0.037 37.04

16

Conclusion

Many different contexts are used to evaluate safety measurement within the military.

Needs to have an aggregator to gather and integrate various data.

Automatic context mediation plays a critical role

Context Interchange enables meaningful aggregation

For more information:http://context2.mit.edu/coin

17

Another Example: Regional Comparison Shoppers

US Sweden France UK

18

COIN Conceptual Model

(Ontology)

19

Ontology and Conversion Functioncontext_acurrency: ‘KRW’; scaleFactor:1000kind: base; format: yyyy.mm.dd

context_bcurrency: ‘TRL’; scaleFactor:1e6kind:base+tax; format: dd-mm-yyyy

context_ccurrency: ‘USD’; scaleFactor:1kind:base+tax+SH; format: mm/dd/yyyy

context_d is_a context_b scaleFactor:1e3

context_e is_a context_dFormat: yyyy-mm-dd

context_f is_a context_cKind: base+tax

monetaryValue

price

temporalEntitybasic

kind

currency

is_a relationship

attribute

modifier

Legend

format

scaleFactor

organization

taxRate

Example source: src_turkey(Product, Vendor, QuoteDate, Price)

.*])2([),,,_(

][])2([])1([],1@)2,([

|:

222

ruvrCvalueRDTBCACDRBAolsen

TtempAttrxCCcurrencyxCCcurrencyxvuCCcurrencycvtx

luemonetaryVax

CC

t

C

f

tf

20

Demo – Same Context

No semantic differences

Meaningful data returned

21

(a) Select Vendor, Price From src_turkey Where Product=“Samsung SyncMaster 173P”;

Conversion for scale factor

(b) Select Vendor, QuoteDate, Price From src_turkey Where Product=“Samsung SyncMaster 173P”;

Conversion for date formatConversion for scale factor

Compose only relevant conversions (b e)

22

Introduced because of context difference in auxiliary source

Auto-reconciliation for auxiliary source (b f)

23

Detection and Explication (ba)

24

Date format for receiver

Price definition – remove taxScale factor

Date format for auxiliary source olsenCurrency

Mediated Query (b a)

25

Interoperate: hard-wired approaches(a) BFS approach: Brute-force between pair-wise sources

(b) BFC approach: Brute-force between contexts

1 2

6

5 4

3

1 2

6

5 4

3Internal

standard

(c) Internal standard approach:Adopting a standard

1 2

65 43

context_bcurrency: ‘TRL’; scaleFactor:1e6kind:base+tax; format: dd-mm-yyyy

context_acurrency: ‘KRW’; scaleFactor:1000kind: base; format: yyyy-mm-dd

context_ccurrency: ‘USD’; scaleFactor:1kind:base+tax+SH; format: mm/dd/yyyy

26

Flexibility and Scalability

Approach General case In the example BFS N(N-1), N:= number of sources and

receivers 159,600

BFC n(n-1), n:= number of unique contexts 72,630 ETL/GS 2N, N:= number of sources and receivers 800 COIN 1) Worst case:

m

iii nn

1)1( , ni:= number of

unique values of ith modifier, m := number of modifiers in ontology

2)

m

iin

1)1( , when equational relationships

exist 3) m, if all conversions can be

parameterized

1) worst: 108 2) actual number: 5 (3

general conversions plus 2 for price)

Need to update/add many conversion programs

• Why other approaches cannot fully benefit from general purpose conversion?– the decision whether to invoke the conversion is in the conversion

program

Update the declarative knowledge base.

flexible

Flexible

Not

27

How COIN Scales

• Semantic differences cannot be standardized away• Must be flexible and scalable• Component conversions are defined for each modifier• Overall conversions are automatically composed by

abductive reasoning engine• Composition via symbolic equation solver and a shortest

path algorithm• Inheritance enabled• COIN is a good solution

– Modularization, declarativeness– Automatic composition of necessary conversions

28

The 1805 Overture

In 1805, the Austrian and Russian Emperors agreed to join forces against Napoleon. The Russians promised that their forces would be in the field in Bavaria by Oct. 20.

The Austrian staff planned its campaign based on that date in the Gregorian calendar. Russia, however, still used the ancient Julian calendar, which lagged 10 days behind.

The calendar difference allowed Napoleon to surround Austrian General Mack's army at Ulm and force its surrender on Oct. 21, well before the Russian forces could reach him, ultimately setting the stage for Austerlitz.

Source: David Chandler, The Campaigns of Napoleon, New York: MacMillan 1966, pg. 390.

29

EXTRA SLIDES

30

Yet Another Context Example (Basis for Demo)

Company Name

Company Name

Net Income

Net Income

Sales

Sales

DAIMLER-BENZ AG

346,577

56,268,168

615,000,000

97,737,000,000

O&A DEM-USD Exchange Rate1.00 German Mark= 0.58 US Dollar as 12/31/93

WorldScope

Disclosure

OANDAWeb Server

Context Mediation Services

Users & Appl.

Systems

Net IncomeCompany Name

Sales

DAIMLER-BENZ

614,99597,736,992

Datastream

Wrapper Services

*

*

*

*

*

DAIMLER BENZ CORP

31

Some Context DifferencesContext Definitions

Disclosure Worldscope DataStream Currency Used

Country of Incorporation

USD Country of Incorporation

Currency Conversion

Money Amount As_Of_Date

Money Amount As_Of_Date

Money Amount As_Of_Date

Currency Symbols

3 Letters 3 Letters 2 Letters

Scale Factor 1 1000 1000 Company Names

Disclosure Names Worldscope Names DataStream Names

Date Style American with ‘/’ as separator

American with ‘/’ as separator

European with ‘-’ as separator

Olsen (OANDA) Web Source uses 3 Letter Currency Symbols and European Date Style with ‘/’ as a separator

32

Domain Modelnumber exchange-

Ratestring

currency-Type

from

Cur

toCur

company-Financials

scal

eFac

tor

date

country-Name

curT

ypeSym

company-Name

curr

ency

fyEnding

company

coun

tryI

ncor

p

form

at

date

FmttxnDate

officialCurrency

InheritanceAttributeModifier

Some currency context possibilities:• Currency is stated explicitly as part of record• Currency not stated, but the same for all (e.g., US $)• Currency not stated or constant, but inferred by country

33

HT

TPD

-Daem

on

HT

TPD

-Daem

on

HT

TPD

-Daem

on

Web-site

Wrapper

WWW Gateway

SERVER PROCESSES MEDIATOR PROCESSES CLIENT PROCESSES

COINRepository

ContextMediator

Optimizer

Executioner

Data Store for IntermediateResults

SQL Compiler

DatalogQuery

MediatedQuery

Optimized Query Plan

N

N

HT

TPD

-Daem

on

ODBC-compliant Apps

(e.g Microsoft Excel)

ODBC-Driver

Web Client

(cgi-scripts)

Results

SQL Query

SQL

Query

COIN System Architecture

34

System Demonstration

Q6. Scenario: Using Context Interchange, you can look at the Disclosure data using Datastream Context.

Query: Find out from Disclosure what Net Income for DAIMLER-BENZ was. Use Datastream Context.

Capabilities Demonstrated:

Ability to perform Scale Factor Conversion, Date Format Conversion, Company Name Conversion.

Single Source Queries with MediationSingle Source Queries with Mediation

35

Demonstration @ context2.mit.edu

Context

Source

36

Context Metadata (Partial)

37

Conflict Detection and Mediation

Date convertScale factor convertName convert

Mediated Query in Datalog

38

Mediated SQL Query & Result

Adjust scale factor

Date format conversion

Name conversion

Final results – from Disclosure but in Datastream context

Mediated SQL Query

39

More Complex Example (4 sources: DB + Web)

select WorldcAF.TOTAL_ASSETS, DiscAF.NET_SALES, DiscAF.NET_INCOME, DStreamAF.TOTAL_EXTRAORD_ITEMS_PRE_TAX, quotes.Lastfrom WorldcAF, DiscAF, DStreamAF, quotes where WorldcAF.COMPANY_NAME = "DAIMLER-BENZ AG"and DStreamAF.AS_OF_DATE = "01/05/94" and WorldcAF.COMPANY_NAME = DStreamAF.NAME and WorldcAF.COMPANY_NAME = DiscAF.COMPANY_NAME and WorldcAF.COMPANY_NAME = quotes.Cname;

Databases Web source

40

Conflict Table (1st part)

41

Conflict Table (2nd part)

42

Generated SQL (1st Part)select worldcaf.total_assets, discaf.net_sales, ((discaf.net_income*0.001)*olsen.rate), (dstreamaf2.total_extraord_items_pre_tax*olsen2.rate), quotes.Lastfrom (select date1, 'European Style -', '01/05/94', 'American Style /' from datexform where format1='European Style -' and date2='01/05/94' and format2='American Style /') datexform, (select dt_names, 'DAIMLER-BENZ AG' from name_map_dt_ws where ws_names='DAIMLER-BENZ AG') name_map_dt_ws, (select ds_names, 'DAIMLER-BENZ AG' from name_map_ds_ws where ws_names='DAIMLER-BENZ AG') name_map_ds_ws, (select 'DAIMLER-BENZ AG', ticker, exc from ticker_lookup2 where comp_name='DAIMLER-BENZ AG') ticker_lookup2, (select 'DAIMLER-BENZ AG', latest_annual_financial_date, current_outstanding_shares, net_income, sales, total_assets, country_of_incorp from worldcaf where company_name='DAIMLER-BENZ AG') worldcaf, (select country, currency from currencytypes where currency <> 'USD') currencytypes, (select exchanged, 'USD', rate, date from olsen where expressed='USD') olsen, (select company_name, latest_annual_data, current_shares_outstanding, net_income, net_sales, total_assets, location_of_incorp from discaf) discaf,

43

Generated SQL (Continued - Partial) (select as_of_date, name, total_sales, total_extraord_items_pre_tax, earned_for_ordinary, currency from dstreamaf) dstreamaf, (select as_of_date, name, total_sales, total_extraord_items_pre_tax, earned_for_ordinary, currency from dstreamaf) dstreamaf2, (select char3_currency, char2_currency from currency_map where char3_currency <> 'USD') currency_map, (select country, currency from currencytypes where currency <> 'USD') currencytypes2, (select exchanged, 'USD', rate, '01/05/94' from olsen where expressed='USD' and date='01/05/94') olsen2, (select Cname, Last from quotes) quoteswhere currencytypes.country = discaf.location_of_incorpand currencytypes.currency = olsen.exchangedand dstreamaf.currency = dstreamaf2.currencyand dstreamaf2.currency = currency_map.char2_currencyand olsen.date = discaf.latest_annual_dataand currency_map.char3_currency = currencytypes2.currencyand currencytypes2.currency = olsen2.exchangedand name_map_dt_ws.dt_names = dstreamaf2.nameand name_map_ds_ws.ds_names = discaf.company_nameand ticker_lookup2.ticker = quotes.Cnameand datexform.date1 = dstreamaf2.as_of_dateand currencytypes.currency <> 'USD'and currency_map.char3_currency <> 'USD'unionselect worldcaf2.total_assets, discaf2.net_sales, ((discaf2.net_income*0.001)*olsen3.rate), dstreamaf4.total_extraord_items_pre_tax, quotes2.Last

from (select date1, 'European Style -', '01/05/94', 'American Style /' from datexform where format1='European Style -' and date2='01/05/94' and format2='American Style /') datexform2, (select dt_names, 'DAIMLER-BENZ AG' from name_map_dt_ws where ws_names='DAIMLER-BENZ AG') name_map_dt_ws2, (select ds_names, 'DAIMLER-BENZ AG' from name_map_ds_ws where ws_names='DAIMLER-BENZ AG') name_map_ds_ws2, (select 'DAIMLER-BENZ AG', ticker, exc from ticker_lookup2 where comp_name='DAIMLER-BENZ AG') ticker_lookup22, (select 'DAIMLER-BENZ AG', latest_annual_financial_date, current_outstanding_shares, net_income, sales, total_assets, country_of_incorp from worldcaf where company_name='DAIMLER-BENZ AG') worldcaf2, (select country, currency from currencytypes where currency <> 'USD') currencytypes3, (select exchanged, 'USD', rate, date from olsen where expressed='USD') olsen3, (select company_name, latest_annual_data, current_shares_outstanding, net_income, net_sales, total_assets, location_of_incorp from discaf) discaf2, (select as_of_date, name, total_sales, total_extraord_items_pre_tax, earned_for_ordinary, currency from dstreamaf) dstreamaf3, (select 'USD', char2_currency from currency_map where char3_currency='USD') currency_map2,

etc

44

Final Result

45

Execution Trace (1st Part - Partials)

. . .

Parallel Execution

Retrieving dataFrom Web source

46

Execution Trace (Continued - Partials). . .

. . .

Another Web source used(for currency conversion)

Stock price returnedFrom Web source

47

Appendix: Sample Applications

• Airfare, Car Rental and Merged Travel • Weather • Global Price Comparison • Airfare Aggregation • Disaster Relief • TASC Financial Example • Web Services Demo • Corporate Householding

48

Web page spec file *

Appendix: COIN Web-Wrapper Technology

Select Edgar.Net_incomeFrom EdgarWhere Edgar.Ticker=intcand Edgar.Form=10-Q

Ticker Net IncomeINTC 1,983

User or Program (via SQL Query)

Web Wrapper Generat

or

Data record returned

* Spec file contains:Schema, Navigation rules,and Extraction rules.

SQLSide

HTMLSide