A Comprehensive Model for Arbitrary Result Extraction Neal Sample, Gio Wiederhold Stanford...

25
A Comprehensive Model for Arbitrary Result Extraction Neal Sample, Gio Wiederhold Stanford University Dorothea Beringer Hewlett-Packard

Transcript of A Comprehensive Model for Arbitrary Result Extraction Neal Sample, Gio Wiederhold Stanford...

A Comprehensive Model for Arbitrary Result

Extraction

Neal Sample, Gio WiederholdStanford University

Dorothea BeringerHewlett-Packard

2 SAC 2002

Shift in Programming Tasks

Coding

Integration/Composition

1970 1990 2010

3 SAC 2002

Sample Composition Tasks Logistics

Reservation and distribution systems, “find the best transportation route from A to B”

Genomics Framework for composing various processing

tools and repositories Modeling

Weather prediction, complex chemical systems, basin modeling

Composition of processes (vs. components, data)

4 SAC 2002

CLAM Composition Language Purely compositional

no primitives for arithmetic no primitives for I/O, etc.

Splitting up CALL-statement parallelism by asynchrony in sequential program novel possibilities for optimizations reduction in complexity of invocation statements

Higher-level language assembly HLLs HLLs compositional paradigm

Intent: Enable domain experts

5 SAC 2002

CLAM Primitives

Pre-invocation:SETUP: set up the connection to a service

SET-, GETPARAM: in a service

ESTIMATE: for optimization

Invocation and result gathering:INVOKE: begin execution

EXAMINE: test progress of an invoked method

EXTRACT: extract results from an invoked method

Termination:TERMINATE: terminate a method invocation/connection to

a service

6 SAC 2002

Data Dependencies & Scheduling

// begin programA = service1();B = service2();C = service3(A,B);D = service4(C);E = service5(C);// end of program

START

END

service1 service2

service3

service4 service5

7 SAC 2002

Runtime: data extraction is hard

Data extraction with native modules worked

No language-level specifications in CLAM E.g., Polling, threading, exception handling…

Multiple middleware for transport difficult mapping CORBA-RMI, RMI-COM, COM-CPAM, etc.

Crisis of legacy services To generalize or restrict?

Refine the strategy…

8 SAC 2002

Strategy: hide it & depend on it

Have to respect service capabilities Or suffer the LCD… (more in a bit)

Simple and flexible programming Data extractions is a runtime issue,

it is not central to composition task Simplified Integration

Legacy ambivalence Simple bridging for middleware Increase audience for services

Better scheduling Declarative language, data dependencies

9 SAC 2002

Where are we?

Declarative language for composition Data is used synchronization No primitives to support synchronization

Apparent “mismatch” in data extraction methods & capabilities among various actors What does the data look like? How can data be extracted?

10 SAC 2002

Data View: Services

RESULTS

Result A Result B Result C

11 SAC 2002

Extraction Techniques Asynchrony

Explicitly controlled: spin-locks, polling, interrupt handling, etc.

Can use with any DAG schedule Partial extraction

web browsing - HTML text as a schema SQL cursors (thanks to the reviewer)

Progressive extraction (exceptional) Adaptive mesh refinements, JPEG

interleaving

12 SAC 2002

Current Focus

Pre-invocation:SETUP: set up the connection to a service

SET-, GETPARAM: in a service

ESTIMATE: for optimization

Invocation and result gathering:INVOKE: begin execution

EXAMINE: test progress of an invoked method

EXTRACT: extract results from an invoked method

Termination:TERMINATE: terminate a method invocation/connection to

a service

13 SAC 2002

Current Focus

Pre-invocation:SETUP: set up the connection to a service

SET-, GETPARAM: in a service

ESTIMATE: for optimization

Invocation and result gathering:INVOKE: begin execution

EXAMINE: test progress of an invoked method

EXTRACT: extract results from an invoked method

Termination:TERMINATE: terminate a method invocation/connection to

a service

14 SAC 2002

EXAMINE Primitive in CLAM Returns “status” and “progress”

Status – 2 bits of state status = {DONE, NOT_DONE, PARTIAL, ERROR}

Progress – open descriptor Indicates progress in application specific-way Could be variance, mean, amplitude, etc. Default assumption: integer 0-100 = % done

Resolution of EXAMINE Can apply per service (black box) Can apply per result (white box)

Not complete for many legacy systems:only “status”, no “progress”

15 SAC 2002

EXAMINEService

A B C

Service.EXAMINE() {NOT_DONE, 0}

Service.EXAMINE(A) {NOT_DONE, 0}

Service.EXAMINE(B) {NOT_DONE, 0}

Service.EXAMINE(C) {NOT_DONE, 0}

Service

A B C

Service.EXAMINE() {PARTIAL, 40}

Service.EXAMINE(A) {DONE, 100}

Service.EXAMINE(B) {NOT_DONE, 0}

Service.EXAMINE(C) {PARTIAL, 20}

Service

A B C

Service.EXAMINE() {DONE, 100}

Service.EXAMINE(A) {DONE, 100}

Service.EXAMINE(B) {DONE, 100}

Service.EXAMINE(C) {DONE, 100}

16 SAC 2002

EXTRACT Primitive Extracts data from a service

Per service (black box) (var) = Service.EXTRACT();

Per result (white box) (varA = A, varC = C) = Service.EXTRACT();

Allows partial data extraction saves volume: abandon uninteresting

elements saves time: termination of useless invocation

Allows progressive data extraction with 2-value EXAMINE (status+progress) Steering, time saving

17 SAC 2002

Examine-Extract Relationship

per servicestatus only

per service per result

EXTRACT

per servicestatus+progress

per resultstatus only

per resultstatus+progress

asynchronousprocedure call,

Java RMI

limited Partial Extraction,

(binary) thumbnails

partitionedprogressive extract

(full result set) ?semantic partial

extraction(full result set)

partial extractionbrowsing, SQL cursor

(no progressive)

progressiveextraction

(full result set)

progressive andpartial extraction

CLAM

18 SAC 2002

Examine-Extract Relationship

per servicestatus only

per service per result

EXTRACT

per servicestatus+progress

per resultstatus only

per resultstatus+progress

asynchronousprocedure call,

Java RMI

limited Partial Extraction,

(binary) thumbnails

partitionedprogressive extract

(full result set) ?semantic partial

extraction(full result set)

partial extractionbrowsing, SQL cursor

(no progressive)

progressiveextraction

(full result set)

progressive andpartial extraction

CLAM

19 SAC 2002

Examine-Extract Relationship

per servicestatus only

per service per result

EXTRACT

per servicestatus+progress

per resultstatus only

per resultstatus+progress

asynchronousprocedure call,

Java RMI

limited Partial Extraction,

(binary) thumbnails

partitionedprogressive extract

(full result set) ?semantic partial

extraction(full result set)

partial extractionbrowsing, SQL cursor

(no progressive)

progressiveextraction

(full result set)

progressive andpartial extraction

*CLAM

20 SAC 2002

Examine-Extract Relationship

per servicestatus only

per service per result

EXTRACT

per servicestatus+progress

per resultstatus only

per resultstatus+progress

asynchronousprocedure call,

Java RMI

limited Partial Extraction,

(binary) thumbnails

partitionedprogressive extract

(full result set) ?semantic partial

extraction(full result set)

partial extractionbrowsing, SQL cursor

(no progressive)

progressiveextraction

(full result set)

progressive andpartial extraction

*CLAM

21 SAC 2002

Examine-Extract Relationship

per servicestatus only

per service per result

EXTRACT

per servicestatus+progress

per resultstatus only

per resultstatus+progress

asynchronousprocedure call,

Java RMI

limited Partial Extraction,

(binary) thumbnails

partitionedprogressive extract

(full result set) ?semantic partial

extraction(full result set)

partial extractionbrowsing, SQL cursor

(no progressive)

progressiveextraction

(full result set)

progressive andpartial extraction

*CLAM

22 SAC 2002

Examine-Extract Relationship

per servicestatus only

per service per result

EXTRACT

per servicestatus+progress

per resultstatus only

per resultstatus+progress

asynchronousprocedure call,

Java RMI

limited Partial Extraction,

(binary) thumbnails

partitionedprogressive extract

(full result set) ?semantic partial

extraction(full result set)

partial extractionbrowsing, SQL cursor

(no progressive)

progressiveextraction

(full result set)

progressive andpartial extraction

*CLAM

23 SAC 2002

Examine-Extract Relationship

per servicestatus only

per service per result

EXTRACT

per servicestatus+progress

per resultstatus only

per resultstatus+progress

asynchronousprocedure call,

Java RMI

limited Partial Extraction,

(binary) thumbnails

partitionedprogressive extract

(full result set) ?semantic partial

extraction(full result set)

partial extractionbrowsing, SQL cursor

(no progressive)

progressiveextraction

(full result set)

progressive andpartial extraction

*CLAM

24 SAC 2002

Examine-Extract Relationship

per servicestatus only

per service per result

EXTRACT

per servicestatus+progress

per resultstatus only

per resultstatus+progress

asynchronousprocedure call,

Java RMI

limited Partial Extraction,

(binary) thumbnails

partitionedprogressive extract

(full result set) ?semantic partial

extraction(full result set)

partial extractionbrowsing, SQL cursor

(no progressive)

progressiveextraction

(full result set)

progressive andpartial extraction

*CLAM

25 SAC 2002

Conclusions Data extraction hiding is bueno!

User is not responsible for data management

Synchronizing extractions not in the language simplicity

Enables effective service scheduling

Simplified integration Blueprint for proactive design

pattern for future services