© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution &...
-
date post
20-Dec-2015 -
Category
Documents
-
view
215 -
download
2
Transcript of © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Planning for the Web II Execution &...
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Planning for the Web IIExecution & Service
IntegrationDan Weld
University of WashingtonJune, 2003
2 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Acknowledgements
• Oren Etzioni• Yolanda Gil• Keith Golden• Alon Halevy• Zack Ives• Tal Shaked
Caveat
3 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Outline• Execution for Data Integration
Coping with incomplete statistics, latency Interleaved planning & execution Convergent query processing
• Service Integration Web service composition
• Background• Representational issues• Planning algorithms
Automated data analysis
4 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Optimization and Execution
• Problem: Few and unreliable statistics about the data. Unexpected (possibly bursty) network
transfer rates. Generally, unpredictable environment.
• General solution: (research area) Adaptive query processing. Interleave optimization and execution. As you
get to know more about your data, you can improve your plan.
5 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Adaptivity & Incremental Processing Query Performance
QueryTranslation
User's
Query
Query overSources
Que
ry R
esul
ts
Tukwila Network-BasedQuery Processor
Evaluated within the Tukwila system
[Ives PhD]
6 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Query Optimization: Model Query Plans’ Execution &
Choose the Best
op
op
op
op
Restock (R) 100 tuples
Orders (O)50 tuples
Shipping (S)90 tuples
Restock (R)100 tuples
Orders (O)50 tuples
Shipping (S)90 tuples
From source sizes, stats, estimate result sizes, costs
RO~30 tuples
ROS~270 tuples
50 sec
ROS~270 tuples
30 sec
OS~15 tuples
Estimates, assumptions introduce error: Exponential increase in estimation error with
each join [Ioannidis & Christodoulakis 91] [Antoshekov 93,96]
Worse if no detailed statistics
7 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Why Does Data Integration Make Optimization Harder?
Query optimization estimates costs using knowledge about environment and data:
Data source sizes (“cardinalities”)Often unavailable or not meaningful in data
integration Histograms
Too expensive to maintain in data integration I/O costs
Network I/O costs fluctuate
Need a way to gain this sort of knowledge!
8 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Some Solutions
1. Adaptive operators2. Mid query reoptimization3. Convergent query processing4. Query scrambling [Franklin et al.]5. Eddies [Hellerstein et al.]
9 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Optim izer
(Re-)Optim izer
Mem Alloc-Fragm enter
ExecutionEngine
Tem p Store
EventHandler
QueryOperators
Reform ulator
Catalog
source mappings
querylogical
planexecplan
answ er
data
execresults
Tukwila Data Integration System
Novel components: Event handler Optimization-execution loop Adaptive operators
10 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Hybrid Hash Join No output until build relation
read Asymmetric (build vs. probe)
— optimization requires source behavior knowledge
Double Pipelined Hash Join Outputs data immediately Symmetric — requires less
source knowledge to optimize Threads overlap I/O,
computation
Double Pipelined Join
11 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Tim
e (
sec)
Tuples Output (1000s)
Join of 3 tables sent via JDBC over 10Mb Ethernet: TPC-H Lineitem Supplier Order
Performance on Networked Data
12 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Benefits:Easier to optimize (symmetric)Sub-operations scheduled flexiblyAllows overlap of I/O and computation
Incurs some overhead: Threading, queues Required extensions to intelligently handle
overflow:• Same hash function, number of buckets for each side• Approaches: flush buckets on left side or flush
symmetrically
Double Pipelined Join in Summary
13 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Some Solutions
1. Adaptive operators2. Mid-query reoptimization
• Interleaved planning and execution
3. Convergent query processing4. Query scrambling5. Eddies
14 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Mid-query reoptimization
BA
C
D C D
AB
Materialization Point: write AB to disk
If actual predicted statistics replan[Kabra & DeWitt]
15 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Some Solutions
1. Adaptive operators2. Mid query reoptimization3. Convergent query processing4. Query scrambling5. Eddies
16 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Convergent Query Processing
• Instead of adapting remainder of plan after executing all data on plan prefix
• Adapt whole plan after executing whole plan on part of data
• Can better gather information this way…
17 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Convergent Query Processing in Action: Changing Join Plans
in Mid-Stream(R O S)
“Cleanup” query plan
Join Restock, Orders, Shipping
ROS
RS
R1
O1 S1
O1S1
R1 O1S1
R2 O2S2
R2O2
R0 O0S0
R0 S0
18 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Breaking a Join into Phases: One Subset per Table, Each
PhaseRestock (R) Orders (O)
R1 O1
Phase 1
R0 O0Phase 0
O1
O0
CleanupPhase
ncnc
cm
cm
m
mTTTT
1,...,1
11
1
1)...(...
19 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
The Cleanup Plan Reuses Previous
Work Where PossibleRestock Orders Shipping
R0 O0S0
R1
O1 S1
O1S1
R1 O1S1
R2O2
R0 S0
R2
S2
O2R1
S1
O1
S0
R0 O0
R2
S2
O2
R2O2
R1
S1
O1
S0
R0 O0
Exclude R2O2
Exclude R0S0O0, R1S1O1, R2S2O2,
R2 O2S2
20 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
CQP on a 100Mbps LAN: Nearly “Optimal”
Performance 866MHz P-III, 256MB buffer pool, re-optimization every 10sec
cost to parse XML
21 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Slow WAN, Faster CPU: CQP Reduces Work
1GHz P-III, 256MB, re-optimization every 10sec. 1Mbps network, RTT ~50msec
22 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Outline• Execution for Data Integration
Coping with incomplete statistics, latency Interleaved planning & execution Convergent query processing
• Service Integration Web service composition
•Background•Representational issues•Planning algorithms
Automated data analysis
23 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
What is a Web Service• A web service is a network accessible interface to application
functionality, built using standard Internet protocols (TCP/IP, XML, SOAP, WSDL… Clients of a web service do NOT need to know how it is implemented.
• Why interesting? Increased automation
Application
client
Application
codeNetwork Web
Service
24 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Case Study: Amazon
• Services Exported Product details (short, long, images, samples) Purchase functionality Ratings, reviews, collaborative filtering data, lists, …
• Examples Store builder tools Amazon Browser – visualization tool Windows desktop interfaces – drag-n-drop… MP3 Piranha Games Automatic review writer??
25 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Case Study: Google
• Services Exported Search interface Limits on items returned, queries / day
• Examples Metacrawler functionality Geosearch ‘nearby thai restaurants’
• TIGER, FIPs -> lat,long of pages
Robust hyperlinks• Creates a signature for destination pages & tracks with query
26 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Case Study: Fed Express• Shipment tracking• Proof of delivery• Invoice reviewed, adjusted, settled• Schedule pickup time, location
Outgoing or returns• Order supplies (airbills, envelopes, boxes)• Review shipping history• Rate requests
Location, package size• International trade
Required documents, duties, taxes
27 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Case Study: Hailstorm / MyServices
• Web Services MyDocuments MyAddressbook MyWallet MyNotifications ….
• Scenario Wallet keeps receipts, arranges product return Expedia uses notifications to warn of canceled flight
• Reality Ebay, AmEx, Groove, …
28 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Case Study: OAA
• Common schema for travel industry• Reservations
Flights, trains, rental cars, hotels
• Time & distances• Payment, deposits, vouchers• Vacation Packages
29 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Web Service Technology Stack
DiscoveryDiscovery
DescriptionDescription
PackagingPackaging
TransportTransport
NetworkNetwork
shopping web service?
WSDL URIsWeb ServiceClient
Web Service
UDDI
Proxy
WSDL
SOAP pkg requestWSDL
SOAP pkg response
32 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
SOAP (Simple Object Access Protocol)
• SOAP Messages XML Payload
• Using SOAP as RPC (Remote Procedure Call) Messages
SOAP client SOAP serverRequest message
Response message
33 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
If a WS were a Phone Call…
• XML represents the conversation,
• SOAP describes the rules for how to call someone
• UDDI is the phone book.
• WSDL describes what the phone call is about and how
you can participate.
34 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
WSDL <types> <schema targetNamespace="http://tempuri.org/xsd" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:wsdl="http://schemas...l/" elementFormDefault="qualified" > </schema> </types> <message name="Simple.foo"> <part name="arg" type="xsd:int"/> </message> <message name="Simple.fooResponse"> <part name="result" type="xsd:int"/> </message> <portType name="SimplePortType"> <operation name="foo" parameterOrder="arg" > <input message="wsdlns:Simple.foo"/> <output message="wsdlns:Simple.fooResponse"/> </operation> </portType>
for int foo(int arg);
35 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
DISCO
• If you know the URL for a service• DISCO lets you query them• And get back a WSDL description
• But what if you don’t know the right URL?
36 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
UDDI• Hosted Registries
Microsoft, IBM, HP, SAP, NTT, BEA• Entries defined with
Business information• Name, contacts, descriptions, identifier, yellow pages category
Service information• Entities, each of which describes a family of related services
which together implement a business process Binding information
• How to invoke: URI, required parameters, options, & Tmodel Service specifications (Tmodel)
• As a symbol – fingerprint to recognize a known service• Decomposable to find WSDL description
37 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Acronyms (W3C, MSFT, IBM)• UDDI
Discover, describe, register services SOAP-based service for locating WSDL-formatted service descriptions
• DISCO Discover / retrieve SCL+SDL descrips
• SDL / NASSL SOAP description lang –get params / types
• SCL SOAP contract lang – extends SDL – orchestration of msgs
• WSDL Describe abstract interface and protocol bindings of arbitrary
network services (extends scl)
• XLANG / WSFL / BPEL4WS lang for biz processes used in BizTalk Biz process execution language for web services
• MSFT, IBM, BEA proposal NASSL
SCL
SDL
WSDL
WSFL
XLANG
BPEL4WS
38 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
The Layer Cake [TBL,XML2000]
39 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
RDF (Resource Description Framework)
Way to describe resources via metadata Makes no assumptions about a particular application domainBased on XMLAnother one?Standard for semantic web
Restricts resource descriptions to triplets (subject,predicate,object)
Provides a lightweight ontology systemSubproperty, Subclass, Domain & Range
40 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
DAML+OIL (www.daml.org)
• DAML extends RDF and RDFS with richer modeling primitives. disjointWith, intersectionOf, oneOf,
cardinality
• Able to provide properties of properties uniqueness, transitivity, etc.
41 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
(mapping to WSDL)
DAML-SDAML+OIL ontology describing Web
ServicesComplements low level descriptions like
WSDL Describes what and why a service operates, Not just how to communicate with it.
Goals: Discovery, Invocation, Composition, Verification, Execution Monitoring
42 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Outline• Execution for Data Integration
Coping with incomplete statistics, latency Interleaved planning & execution Convergent query processing
• Service Integration Web service composition
• Background• Representational issues• Planning algorithms
Automated data analysis
43 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Partial Survey of Planners• UW Internet Softbot
Planners: SENSp / XII / PUCCINI Repr. languages: UWL / SADL ; LCW
• PKS Planning at the knowledge level
• McDermott Forward-chaining search w/ GRG guidance
• McIlraith et al. ConGolog (procs, loops, conditionals, w/ nondet
• Papazoglou, Traverso et al. Stratified service arch; XSRL language; MBP
• Finin; Srivastava; Knoblock; Ambite; Nau…
44 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
•
Inputs Filters Models Visualization
False Color
PhenologyNPP
Mean wind
MODISFPAR
MODISLAI
RUC2
Mosaic
Re-project
Drill-down
Mosaic
Re-project
LandSurfaceModels
Daily
GRIBStatistics
LAZEA
Daily LAZEA
Mean Precip.
Soil
Topography
FP
AR
L
AI
Min, Max Temp
Stream flowSnow coverSoil Moisture
GOESRadiation
WGRIB bin
Com-posit
Com-posit
8-day
8-day
Planning for image processing tasks• Many fielded systems
Lansky’s COLLAGE , Chien et al. MVP/ASIP, Golden ADLIM, Blythe GRID…
• Spatial representations important
45 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Motivating ScenariosPlanning a trip
Yahoo maps -> driving time -> travel prefsAutomatic expense form filing
Purchasing a group of itemsAggregation from multiple vendorsSelect for: payment types, stock level, delivLocal & 3rd party reputation services (BBB)
Monitoring marketplaceAuction sitesEvents (check calendar / notification service
46 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
UW Internet Softbot
• Software robot• Effectors mv, ftp, chmod, cd, lpr, rm, ...• Sensors ls, finger, INSPEC, netfind, wc, ...• Say whatwhat we want, not howhow to do it
Find phone numbers, fetch/print online papers, …
• Integrate multiple resources
47 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Motivation/Contributions
• Represent actions like ls, finger• Represent goals such as
“Rename paper.tex to kr.tex” “Print all files in directory papers.”
(even with incomplete information)
• No previous system could express
48 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
The Middle Ground
Tractability Expressiveness Complete I nf o STRI PS ADL Situation Calculus I ncomplete UWL Moore et al
1. Action Representation
Tractability Expressiveness Complete I nf o Closed World Assumption (CWA) I ncomplete OWA Circumscription
2. Knowledge Representation
49 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Softbot Architecture
Sensors
TaskManager
Effectors
UNIX shell & WWW
SADLActions
PUCCINIPlanner
LCWKnowledge
50 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
SADL Family Tree
STRIPS
UWL ADL
SADL
Incomplete info,Noise-free sensors
ConditionalEffects
[Pednault, 89][Etzioni et al, 92]
[Fikes & Nilsson, 71]
Represents ls, “Rename”, finger...
[Golden & Weld, 96]
51 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
SADL/UWL AnnotationsGoal annotations: satisfy = achieve by any means hands-off = don’t change (maintenance)Effect annotations cause = change world observe = change agent’s knowledge
“Delete the file named junk” satisfy (name (ƒ, junk)) satisfy(deleted (ƒ))
52 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Information Goals are Temporal
• Two time points When proposition sampledWhen proposition sampled When reply givenWhen reply given
• “Tell me nownow who was President in 1883in 1883”• “Tell me tomorrowtomorrow who is President nownow”• “Identify (ASAPASAP) the file nownow named `junk’”
53 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Information Goals are Temporal
“Rename paper.tex to kr.tex” designator (name) changes UWL can’t express
SADL solution initiallyinitially = time goal was posed = time goal was posed
initially (name (ƒ, paper.tex)) satisfy (name (ƒ, kr.tex))
initially (name (ƒ, core)) satisfy (deleted (ƒ ))
Compare to more general temporal representation
54 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
“Print paper, but don’t leave it uncompressed.”
initially (compressed (paper), tv) satisfy (printed (paper)) satisfy (compressed (paper), tv)
State of paper.ps may change temporarilybut must be restored
Compare to more general goal lang, e.g. LTL C C B B
Tidiness Goals
55 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Unbounded Information Gain
action ls (d ) precondition: satisfy(current.shell(csh)) satisfy(readable(d )) effect: ff when in.dir(ff, d) l,n,d observeobserve(length(ff, l ))
observeobserve(name(ff, n )) observeobserve(in.dir(ff, d ))
56 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Compare PKS Representation
Initial State:Kf = {(= (pwd) root), (indir papers root), (indir planner root), (dir root), (dir papers), (dir planner), (file paper_tex)}Kx = {((indir paper_tex planner) | (indir paper_tex papers))}Goal:K(indir paper_tex (pwd))
57 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
The Internet Softbot
Sensors
TaskManager
Effectors
UNIX shell & WWW
SADLActions
PUCCINIPlanner
LCWKnowledge
58 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Knowledge Representation
• Closed World Assumption (CWA) Made by classical planners Anything not recorded as true is falsefalse
• Open World Assumption (OWA) Anything not recorded true or false is
unknownunknown Sensor abuse Can’t handle goals
59 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Sensor Abuse
• OWA: Don’t know when to stop sensing Many ways to find same information Many plans containing same action
• After executing find / -name foo, should know ls bin won’t reveal more files named foo
ls tex won’t reveal more files named foo Google may reveal more files named foo
60 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
CBA
How Classical Planners Handle
blockblock ((xx)) OnTable (xx) replaced with:OnTable (AA) OnTable (BB) OnTable (CC)
• Relies on CWA Must know all blocks OWA can never be sure
AA
C C
B B
61 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Local Closed World Knowledge
• Complete info over restricted domain All blocks on table, all products at Amazon
• Local Closed World Knowledge (LCW) Restricted form of circumscription Provides fast closed world inference Allows fast updates Suited to planner action representations.
62 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
LCW Semantics
““I know all files in directory I know all files in directory binbin””LCW(in.dir(f, bin))
LCW(in.dir(ff, bin)) ff ⊨in.dir(ff, bin) ⊨ in.dir(ff, bin)
63 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
LCW Representation• M: Ground literals in agent’s model
in.dir(icaps03, papers) in.dir(junk, papers)executable(core)
• L: LCW formulas in agent’s model LCW(in.dir(ff, papers))
• If P M, and L ⊢LCW(P), then P Conclude: in.dir(foofoo, papers)
64 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
LCW Reasoning•Inference
If I know all files in tex, and I know the size of every file, then do I know the size of every file in tex?
•Updates If I know the size of every file in the size of every file in textex,
and I removeremove a file from tex, do I still know the size of every file in the size of every file in textex?
What if I addadd a file to tex?
65 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
LCW Reasoning is HardTheorem:
If LCW formulas can contain and then answering an LCW query is NP-hard.
But we need fast inference!But we need fast inference!
• Solution: restrict representation• Positive first-order conjunctions• Fast polynomial time inference/updates
[Etzioni et al. AIJ][Levy VLDB96][Friedman & Weld IJCAI97]
66 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
LCW Updates
• L must be updated when M changes.• All changes to M fall into one of four
categories: Information loss: Δ(φ{TF} U) Information gain: Δ(φU{TF}) Domain Growth: Δ(φFT) Domain contraction: Δ(φTF)
67 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Domain GrowthAdding core to bin invalidate
LCW(in.dir(f, bin) size(f,c)) unless the size of core is known!Theorem:
If Δ(φFT)thenL’ L - MREL(φ)
MREL(φ) {ΦREL(φ)⊬LCW(ΦX)θ}REL(φ) {ΦL(XΦθα)Xθφα⊬(ΦX)θ}
BBBB
A A C C
68 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
LCW Updates
I nf ormation loss
T F U L’ L -
REL() compress
I nformation gain
U T F L’ L LCW() ls, wc
Domain growth
F T L’ L - MREL()
cp
Domain contraction
T F L’ L rm
69 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Pruning Redundant Sensing
Experience (problems attempted)
Tim
e (
CPU
seco
nd
s)
70 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
The Internet Softbot
Sensors
TaskManager
Effectors
UNIX shell & WWW
SADLActions
PUCCINIPlanner
LCWKnowledge
71 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
XII / Puccini Planner• Based on UCPOP
Generative, Partial-Order, Causal-Link I.e. much like Gerevini’s LPG
• Efficient sensing (LCW control)• Lifted support of goals
[Golden et al. 94, Golden Phd]
72 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Satisfying GoalsLink Directly to Effect
Subgoal on LCW; Then Expand to Ground Form
Partition
rm * f Satisfy(Deleted(f))
ls LCWlpr foo, lpr bar f Satisfy(Printed(f))
73 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
ls -l /tex goal
Threats to LCW,
LCW(in.dir(f, /tex) & size(f, l))
compress /tex/paper cause(length(paper), U)
Threat = “Information Loss”PromoteDemoteConfrontShrink
mv junk /tex/ cause(in.dir(junk, /tex), T)
Threat = “Domain Growth”Promote, Demote, ConfrontShrinkEnlarge
74 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Softbot Status• Fully Implemented (1997)• Hundreds of Unix, Internet Actions• Daunting Combinatorics
Declarative Search Control Laborious, Brittle
• Hence... ? Improved Declarative Control ? Reactive Control ? Less Expressive Language
Rodney
Simon
Info Manifold
MetaCrawler
BargainFinder
ShopBot
Occam
ILASIMS
Ahoy
75 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
PG-based Heuristics / Sensing
(own PlayGo)
(subject PlayGo go)
(subject MySystem chess)
(search amazon chess)
(atStore *b amazon)
(subject *b chess)
(LCW((atStore !b amazon)(subject !b chess)))
(trade PlayGo *b amazon)
(order MySystem amazon)
(not (own PlayGo))
(own *b)
(subject *b chess)
(atStore MySystem amazon)
(own PlayGo) (own PlayGo)
(subject PlayGo go) (subject PlayGo go)
(subject MySystem chess) (subject MySystem chess)
(LCW((atStore !b amazon)(subject !b chess)))
(atStore *b amazon)
0 1 2
?
?
[Shaked03]
76 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Using the Graph
• LPG-like search (local search on POP)• Propagating sensing action links• Executing to reach ‘better’ states• Sophisticated heuristics!
77 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Conclusion
,
• Planning for the web is ripe for progress• Data integration
Modeling sources: GAV, LAV, … Answering queries using views Interleaved planning and execution, eddies, cqp
• Service integration Web service composition Representing unbounded information gain Latest heuristic search techniques => fast!
78 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
PKS
• Contingent, forward-chaining planner Constructs a complete, correct plan Separates plan-time and execution-time effects
• Less Expressive No universal quantification
• Still needs search control heuristics
[Pettrick & Bacchus KR00, AIPS02]