1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge...

34
1 Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System January 24, 2007 Stuart Madnick ([email protected]) MASSACHUSETTS INSTITUTE OF TECHNOLOGY SLOAN SCHOOL OF MANAGEMENT & SCHOOL OF ENGINEERING INFORMATION TECHNOLOGIES GROUP © MIT, 2007

Transcript of 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge...

Page 1: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

1

SIMTech – Invited Research Lecture

Integrating Information from Global Systems:

Knowledge Representation and Reasoning in the Context Interchange System

January 24, 2007

Stuart Madnick ([email protected])

MASSACHUSETTS INSTITUTE OF TECHNOLOGYSLOAN SCHOOL OF MANAGEMENT

& SCHOOL OF ENGINEERINGINFORMATION TECHNOLOGIES GROUP

© MIT, 2007

Page 2: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

2

Characteristics of Global Systems

• Large number of sources– Manufacturers, Suppliers, Logistics, Customers, etc– Online comparison shopping services

• Diverse user needs – Different organizations have different needs

• Cannot establish a single data standard– Sometimes works, but not always

• Must get semantics right– Adaptability, extensibility, scalability

Page 3: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

3

Example from RFID: Types of Information in EPCGlobal Registry

Five types of Information: 162 Attributes

Page 4: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

4

A Case Study on Context Issues

• Company is a top China-based international trading firm supplying more than 50,000 types of goods to 350 buyers located in 40 countries.

• It’s trading product lines are very wide, but mostly in consumer packaged goods (CPG), apparel, and hard-line categories.

• It’s buyers include many US’s major retailers such as Wal-mart, Home Depot, Staples, Target, and once received the best supplier award from Wal-mart.

• It is a member of the local EAN meaning that it publishes its offered product items in a product database LocalRegistry

Page 5: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

5

Some Global Supply-Chain Problem areas caused by:

1. Measurement systems

2. Regulations: Safety & Substituability

3. Cultural systems

4. Logistical systems

5. Trading terms

Page 6: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

6

Scenario 1: Core Attribute Context Discrepancy: Measurement

• Context discrepancy subject to different measurement systems used in China and the US.– GlobalRegistry’s attributes “height”, “width”

and “length” are assumed to take “inch” while LocalRegistry’s counterpart attribute are assumed to take “cm”.

– GlobalRegistry’s attributes “FlashPointTemp”, indicating the flashpoint temperature for hazardous material, is assumed to take “Degrees Fahrenheit” while the local convention is assumed to take “Degrees Celsius”.

Page 7: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

7

Scenario 3: Manufacturing-specific Attribute Context Discrepancy: Cultural systems

• Context discrepancy subject to different cultural systems used in China and US. – GlobalRegistry’s attribute “PackageType” is

used in the apparel industry indicating whether the item is of the size: ‘S”, “M”, “L”, and “XL”.

– Contract manufacturers in China might interpret “M” as the medium size for Asians and manufacture accordingly while the US buyers mean the medium size for Americans, which are very different in sizing.

Page 8: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

8

Scenario 4: Logistics-specific Attribute Context Discrepancy: Logistic systems

• Context discrepancy subject to different logistic systems used in China and US. – GlobalRegistry’s attributes “ti” and ‘hi” refer

respectively to the number of items that can fit on a single layer on a pallet and the number of layers on a pallet.

– The issue arises when the standard pallets being used in Asia (mostly 100 * 100) are different from the standard pallets used in domestic US (100* 120).

– Consequently, the values for “ti” and “hi” filled by the Asian suppliers based on the former pallet capacity will be misleading and cause troubles for a LSP in the US adopting the latter pallet standard (e.g., Wal-mart cross-docking distribution strategy).

Page 9: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

9

Example for analysis:Comparison Shopping: www.mysimon.com

Page 10: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

10

Regional Comparison Shoppers

US Sweden France UK

Page 11: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

11

Motivating Example

Semantic aspect Number of distinctions

Currency 10 different currencies (e.g., USD, UKP, JPY, TRL, KOW)

Scale factor 3 different scale factors: 1, 1K, 1M

Price definition 3 different definitions: base, base+tax, base+tax+SH

Date format 3 different formats, mm/dd/yyyy, dd-mm-yyyy, yyyy-mm-dd

Global Online Comparison Shopping– Different semantic assumptions in data – Compare prices in the context of any source chosen

by the user– Many vendor sources in different countries– Example: 270 potential different contexts

Need many conversions - 159,600 of them!

Page 12: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

12

Desired Properties • Adaptability

– Capability of accommodating changes in sources

• Extensibility– Easy to add/remove sources

• Scalability– Effort of enabling interoperation wrt the number of

sources and the size of ontology – Performance wrt number of sources and the size

of each source (query optimization issue)

• Flexibility = Adaptability + Extensibility

Page 13: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

13

Interoperate: hard-wired approaches(a) BFS approach: Brute-force between pair-wise sources

(b) BFC approach: Brute-force between contexts

1 2

6

5 4

3

1 2

6

5 4

3Internal

standard

(c) Internal standard approach:Adopting a standard

1 2

65 43

context_bcurrency: ‘TRL’; scaleFactor:1e6kind:base+tax; format: dd-mm-yyyy

context_acurrency: ‘KRW’; scaleFactor:1000kind: base; format: yyyy-mm-dd

context_ccurrency: ‘USD’; scaleFactor:1kind:base+tax+SH; format: mm/dd/yyyy

Page 14: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

14

Data bases

Appli- cations

OUTPUT PROCESSING

ODBC Driver

Web - Publishing

CONTEXT MEDIATION* Automatic Automatic conflict conflict detection detection and and conversionconversion- Derived data

- Source selection

- Source attribution

TRUSTED

AGENTS

INPUT PROCESSING

* Automatic web wrapping

- - Semi-Semi-structured structured texttext

-Multi--Multi-source source query plan query plan and and executionexecution

Browsers APPLICATIONS: Financial services,

electronic commerce, asset visibility, in-transit visibility.

Sources

Web Pages

Receivers

COntext INterchange (COIN) Project

Page 15: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

15

Key COIN Technologies Web Wrapper

Extract selected information from web (HTML+XML) Allows web to be treated as large relational SQL database Handles dynamic web sites, cookies, “login”, etc. Performs SQL Joins & Unions involving DB’s + Web sources

Context Mediator Resolve semantic (meaning) differences

Enable meaningful aggregation & comparison

Page 16: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

16

Context: Multiple Perspectives . . . old lady or young lady ?

Page 17: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

17

CONTEXT VARIATIONS:- GEOGRAPHIC ( US vs. UK )

- FUNCTIONAL (CASH MGMT vs. LOANS )

- ORGANIZATIONAL ( CITIBANK vs. CHASE )

Context Context

Context

Data: Databases Web data E-mail

?$ £

¥

Role Of Context08-07-09

09-07-08

07-08-09

Page 18: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

18

Types of Context

Representational Ontological

Temporal

Example Temporal

Representational Currency: $ vs € Scale factor: 1 vs 1000

Francs before 2000, € thereafter

Ontological Revenue: Includes vs excludes interest

Revenue: Excludes interest before 1994 but incl. thereafter

Page 19: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

19

Airbus' A380 double-decker jet is two years behind schedule, sending billions of dollars in potential profits down the drain. But the reason sounds too simple to be true: Airbus factories in Germany and France were using incompatible design software, so the wiring produced in Hamburg didn't fit properly into the plane on the assembly line in Toulouse.

Point:Not just a technology issue,but also involves business strategy and organization/culture.

Page 20: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

20

The 1999 Overture

Unit-of-measure mixup tied to loss of $125Million Mars Orbiter

“NASA’s Mars Climate Orbiter was lost because engineers did not make a simple conversion from English units to metric, an embarrassing lapse that sent the $125 million craft off course. . . .

. . . The navigators ( JPL ) assumed metric units of force per second, or newtons. In fact, the numbers were in pounds of force per second as supplied by Lockheed Martin ( the contractor ).”

Source: Kathy Sawyer, Boston Globe, October 1, 1999, page 1.

Page 21: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

21

COntext Interchange (COIN) Approach

ContextMediator

ReceiverContext

ConversionLibraries

SourceContext

SharedOntologies

Context ManagementAdministrator

Concept: Length

Meters Feet f()meters feet

Source Receiver

ContextTransformation17

part length

Select partlengthFrom catalogWhere partno=“12AY”

55.79

Auto-composition of conversions

Select partlength/.3048From catalogWhere partno=“12AY”

Page 22: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

22

COIN Conceptual Model

(Ontology)

Page 23: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

23

Ontology and Conversion Functioncontext_acurrency: ‘KRW’; scaleFactor:1000kind: base; format: yyyy.mm.dd

context_bcurrency: ‘TRL’; scaleFactor:1e6kind:base+tax; format: dd-mm-yyyy

context_ccurrency: ‘USD’; scaleFactor:1kind:base+tax+SH; format: mm/dd/yyyy

context_d is_a context_b scaleFactor:1e3

context_e is_a context_dFormat: yyyy-mm-dd

context_f is_a context_cKind: base+tax

monetaryValue

price

temporalEntitybasic

kind

currency

is_a relationship

attribute

modifier

Legend

format

scaleFactor

organization

taxRate

Example source: src_turkey(Product, Vendor, QuoteDate, Price)

.*])2([),,,_(

][])2([])1([],1@)2,([

|:

222

ruvrCvalueRDTBCACDRBAolsen

TtempAttrxCCcurrencyxCCcurrencyxvuCCcurrencycvtx

luemonetaryVax

CC

t

C

f

tf

Page 24: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

24

Demo – Same Context

No semantic differences

Meaningful data returned

Page 25: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

25

(a) Select Vendor, Price From src_turkey Where Product=“Samsung SyncMaster 173P”;

Conversion for scale factor

(b) Select Vendor, QuoteDate, Price From src_turkey Where Product=“Samsung SyncMaster 173P”;

Conversion for date formatConversion for scale factor

Compose only relevant conversions (b e)

context_bcurrency: ‘TRL’; scaleFactor:1e6kind:base+tax;format: dd-mm-yyyy

context_d is_a context_b scaleFactor:1e3

context_e is_a context_dFormat: yyyy-mm-dd

Page 26: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

26

Auto-reconciliation for auxiliary source (b f)

Introduced because of context difference in auxiliary source

context_bcurrency: ‘TRL’; scaleFactor:1e6kind:base+tax; format: dd-mm-yyyy

context_ccurrency: ‘USD’; scaleFactor:1kind:base+tax+SH; format: mm/dd/yyyy

context_f is_a context_cKind: base+tax

Page 27: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

27

Detection and Explication (ba)

context_acurrency: ‘KRW’; scaleFactor:1000kind: base; format: yyyy.mm.dd

context_bcurrency: ‘TRL’; scaleFactor:1e6kind:base+tax; format: dd-mm-yyyy

Page 28: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

28

Date format for receiver

Price definition – remove taxScale factor

Date format for auxiliary source olsenCurrency

Mediated Query (b a)

Page 29: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

29

Flexibility and Scalability

Approach General case In the example BFS N(N-1), N:= number of sources and

receivers 159,600

BFC n(n-1), n:= number of unique contexts 72,630 ETL/GS 2N, N:= number of sources and receivers 800 COIN 1) Worst case:

m

iii nn

1)1( , ni:= number of

unique values of ith modifier, m := number of modifiers in ontology

2)

m

iin

1)1( , when equational relationships

exist 3) m, if all conversions can be

parameterized

1) worst: 108 2) actual number: 5 (3

general conversions plus 2 for price)

Need to update/add many conversion programs

• Why other approaches cannot fully benefit from general purpose conversion?– the decision whether to invoke the conversion is in the conversion

program

Update the declarative knowledge base.

flexible

Flexible

Not

Page 30: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

30

How COIN Scales

• Semantic differences cannot be standardized away• Must be flexible and scalable• COIN approach

– Component conversions are defined for each modifier– Overall conversions are automatically composed by abductive

reasoning engine– Composition via symbolic equation solver and a shortest path

algorithm– Inheritance enabled

• COIN is a good solution to:– Modularization, declarativeness– Automatic composition of necessary conversions

Page 31: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

31

The 1805 Overture

In 1805, the Austrian and Russian Emperors agreed to join forces against Napoleon. The Russians promised that their forces would be in the field in Bavaria by Oct. 20.

The Austrian staff planned its campaign based on that date in the Gregorian calendar. Russia, however, still used the ancient Julian calendar, which lagged 10 days behind.

The calendar difference allowed Napoleon to surround Austrian General Mack's army at Ulm and force its surrender on Oct. 21, well before the Russian forces could reach him, ultimately setting the stage for Austerlitz.

Source: David Chandler, The Campaigns of Napoleon, New York: MacMillan 1966, pg. 390.

Page 32: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

32

Summary• Tremendous opportunity to gather and integrate

information from many diverse sources

• But … need to overcome many context challenges

• Context-type “metadata” plays a critical role

• COIN technology can be an important aid for semantically meaningful information integration:

- Scalable- Extensible

- Application Domain Merging- Reuse and extension of ontologies and contexts

References: http://web.mit.edu/smadnick/www/wp/CISL-Sloan%20WP%20spreadsheet.htm

Page 33: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

33

Appendix: Sample Applications• Airfare, Car Rental and Merged Travel • Weather • Global Price Comparison • Airfare Aggregation • Disaster Relief • TASC Financial Example • Web Services Demo • Corporate Householding • Aggregation/Integration of Intelligence Data• Infrastructure for Inter-organizational RFID

Systems

Page 34: 1 SIMTech – Invited Research Lecture Integrating Information from Global Systems: Knowledge Representation and Reasoning in the Context Interchange System.

34

Web page spec file *

Appendix: COIN Web-Wrapper Technology

Select Edgar.Net_incomeFrom EdgarWhere Edgar.Ticker=intcand Edgar.Form=10-Q

Ticker Net IncomeINTC 1,983

User or Program (via SQL Query)

Web Wrapper Generat

or

Data record returned

* Spec file contains:Schema, Navigation rules,and Extraction rules.

SQLSide

HTMLSide