Integrating Heterogeneous in situ Information using SPARCE

31
Integrating Heterogeneous in situ Information using SPARCE Sudarshan Murthy CSE 606 INI: Fall 2003 This work is supported by US NSF Grant IIS 0086002.

description

Integrating Heterogeneous in situ Information using SPARCE. Sudarshan Murthy CSE 606 INI: Fall 2003 This work is supported by US NSF Grant IIS 0086002 . Observations. People often superimpose new interpretations onto existing information (from heterogeneous sources) - PowerPoint PPT Presentation

Transcript of Integrating Heterogeneous in situ Information using SPARCE

Page 1: Integrating Heterogeneous  in situ  Information using SPARCE

Integrating Heterogeneous in situ Information using SPARCE

Sudarshan MurthyCSE 606 INI: Fall 2003

This work is supported by US NSF Grant IIS 0086002.

Page 2: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

2

Page 3: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 3

Page 4: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 4

Page 5: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

5

• People often superimpose new interpretations onto existing information (from heterogeneous sources)

• They excerpt information and create annotations• They integrate existing information and new

interpretations– Prepare many arrangements of the same

information– Organize using appropriate models and schemas

(possibly different from any of the sources)

Observations

Page 6: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 6

Page 7: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

7

Facilitate integration of heterogeneous in situ information, of varying granularity, with minimal mediation

using superimposed information to enhance base information

given one superimposed information model and schema (possibly different from any base information model and schema).

Goal

Page 8: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

8

Benefits

• Likely discover information not completely contained in base sources– Information cannot always be obtained by a

query distributed over base sources• Exploit human expertise

– Annotations and relationships created by humans can be valuable

• Minimize volume of base data mediated– We only retrieve selected information

Page 9: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

9

Outline

• Goal• Background

– Superimposed information management, SPARCE• Information integration example• Proposal• Future work• Conclusion

Page 10: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

10

What is Superimposed Information?

Data placed over existing information sources to help organize, access, connect, and reuse information elements in those sources. [Maier 1999, Delcambre 2001]

Superimposed

Layer

Base Layer

Information Source1

Information Source2

Information Sourcen

marks

Page 11: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

11

Marks

• A Mark is a reference to a base-layer element [Delcambre 2001]

– Several mark implementations exist– Addressing scheme usually depends on the base

type– PDF mark uses page no., and starting and

ending word indexes; MS Word mark uses starting and ending character indexes

• Marks provide uniform interface across base-layer types and access protocols

Page 12: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

12

Excerpts and Contexts

• Excerpt is the content of a marked region– Type of an excerpt varies: text, graphics, …

• Context is information related to a mark• Context element is one piece of context

– Section heading, containing paragraph text, and font name are examples

• Many kinds of context elements exist– Content, Presentation, Location, Topology, …

• Context definition varies across and within base types

Page 13: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 13

Example Context

Name Value

Excerpt Garlic permits traditional and multi-media data to be stored in a variety of existing data repositories, including databases, files, text managers, …

Font name Times New Roman

Italics True

Section Heading

Abstract:

Page 14: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

14

Superimposed Applications

• These are applications that manipulate superimposed information

• They associate marks and context elements with superimposed information elements

• They are free to choose display and data models based on their needs

• A user can activate a mark to navigate to base layer or examine context without expressly navigating to base layer

Page 15: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

15

SPARCE

• SPARCE: Superimposed Pluggable Architecture for Contexts and Excerpts– Middleware for superimposed information

management

• Address base information regardless of its type, location, and access protocol

• Retrieve excerpts and contexts– Use the same programmatic interface to work with

any base type • View excerpts and contexts side by side with

superimposed information

Page 16: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

16

Overview

SA 1

XML

SPARCE

Marks

<mark ID=“…”> <type>…</type> <address>…</address> …</mark>

Word

Acrobat

SA 2

Relations

Superimposed Layer Base Layer

Page 17: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

17

Information Integration Example

Page 18: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

18

Setup

• SPARCE extended for information integration– XML serialization introduced– Pluggable context transformers infrastructure added– A query interface developed

• RIDPad extended for information integration– Annotations, XML serialization (and DOM) added

• Information models supported– Object model (COM)– XML (DOM and serialized)

• Example uses XML

Page 19: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

19

• Five items (two groups)• An item contains a label

and a comment• Five base documents

(all PDF—heterogeneous?)

• Granularity of marks varies

Input

Page 20: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

20

Generating XML Data (1)

Page 21: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

21

Generating XML Data (2)

Page 22: Integrating Heterogeneous  in situ  Information using SPARCE

Integrating Heterogeneous in situ Information using SPARCE

XML Data Generated

XML for the two groups

Mark

Context

Page 23: Integrating Heterogeneous  in situ  Information using SPARCE

Integrating Heterogeneous in situ Information using SPARCE

23

Querying*

For each item, get text content from the context (of its mark)

* Currently supports XSLT and XPath; XQuery coming soon

Page 24: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

24

This system isn’t very smart.

Page 25: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

25

Preserve the Layers

<Item name=‘CLIO’><Mark id=‘…’>

<Context …>

</Context></Mark>

</Item>

<Group name=‘…’> <Item name=‘…’>

</Item></Group>-------------------------------------

<Mark id=‘…’>…</Mark><Mark id=‘…’>…</Mark>-------------------------------------

<Context …>…</Context><Context …>…</Context>

Page 26: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

26

Why Preserve the Layers

• The information sources are different– SI: Superimposed application– Marks: SPARCE– Contexts: Base applications (via context agents)

• A hierarchy is inefficient and unnecessary– Mark and context information is replicated– Context can be large (broad)– Joins can provide the same result

Page 27: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

27

Start with the Query

• Figure out what information is in scope– Only some superimposed information elements might

qualify– Only some marks might qualify– Only some context elements might be needed

• Minimize the amount of information retrieved– Push “selects” down and distribute “selects”

• Helped by preserving the layers• Enables parallel and distributed query execution

Page 28: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

28

Exploit Relationships

• Relationships in superimposed layer can have many benefits– Improve recall (for user)– Alternative execution plans

(for query processor)

• XML has no native support for relationships– Can be implemented using

XPointer, XLink, etc.

Page 29: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

29

Future Work

• Test proposal• Bi-level query system

– Develop example queries– Model the system, build it, test it

• Support other information models– RDF should be easy, relational might not be– Support for new models can be added without

affecting existing implementations – Sun’s “No Recompile” guarantee for

superimposed applications

Some restrictions may apply

Page 30: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

30

Conclusion

• Enhancing base information with superimposed information makes possible new queries over base information

• Heterogeneous in situ base information can be integrated and queried using SPARCE

• The naïve XML implementation makes a good straw man

• If this stuff holds water, a bi-level query system maybe in my future

Page 31: Integrating Heterogeneous  in situ  Information using SPARCE

Apr 22, 2023 Integrating Heterogeneous in situ Information using SPARCE

31

Questions?

ask me about a demo