Cool Stars 19 : 2016 Jun 6-10 : Uppsala, Sweden SPECTRVM ...
Ph d defense_Department of Information Technology, Uppsala University, Sweden
-
Upload
sabesan-manivasakan -
Category
Technology
-
view
560 -
download
0
description
Transcript of Ph d defense_Department of Information Technology, Uppsala University, Sweden
1111
Querying Data Providing Web Services
Manivasakan Sabesan
Uppsala DataBase Laboratory
Dept. of Information Technology
Uppsala University
Sweden
222
Outline
WSMED Architecture
Semantic Enrichments
Adaptive Parallelization
Web Service Query Service
Related Work & Future Directions
33
It is difficult to retrieve data provided by web services:
• Web service applications must be developed using a regular programming language such Java or C#
WSMED :
• Simplifies searching web services data by using database queries
• Automatically generates collections of parallel programs to do search
• Automatically optimizes the generated programs
Our problem area
44
Search information
Search information through web services
Automatically generated parallel programs
WSMED
US States information Place Details Weather Forecast information
Web service operations
55
Our approach
WSMED, a web service based mediator prototype:
WSMED mediator
SQL query result
wrapper
DPWSO1
wrapper
DPWSO2
wrapper
DPWSOn
SOAP SOAP SOAP
WSDL
Data Providing Web Service Operations
6
6
Relational WSMED view
ndb keyword descry gpcode
19080 Sweet Candies, Sweet chocolate 1900
………. ……… ………… ……….
View food is based on the web service operation :SearchFoodByDescription
select descryfrom foodwhere gpcode = ’1900’ and keyword = ’Sweet’;
SQL Query:
WSDL document
≡
77
Research questions
1. How can standards, such as WSDL and SOAP, be automatically utilized by a mediator?
2. How can database views of web service operations be automatically generated?
3. How can modern query optimization be used to provide efficient and scalable search from different web services?
4. How can the query optimizer speed up queries calling web service operations without any cost estimate?
5. How can data sources that are not accessible via web services be simply transformed into data providing web service operations?
6. How can Everything as a Service paradigm be used for querying web services?
88
Web Service MEDiator (WSMED) system architecture
WSDL importer: extracts meta data from WSDL document using Web Service Schema and store them in the Web service meta-database
Web Service Manager: invokes the web service operation to retrieve the data
WSMED enrichments: contains the semantic enrichments
WSDL Importer
Web serviceManager
SQL queryQuery
Processor
WSMED enrichments
Web serviceSchema
Web service Meta-database
Results
Web ServiceWSDL
document
999
Outline
WSMED Architecture
Semantic Enrichments
Adaptive Parallelization
Web Service Query Service
Related Work & Future Directions
101010
Semantic enrichments
Manually define SQL views over web service operations defined by imported WSDL
Manually add semantic enrichments to help WSMED improve the query performance
1111
create view food(ndb, keyword, descry, gpcode)as <wrapper definition>;
create view foodclasses(ndb, keyword, gpcode) as select ndb, keyword, gpcode from food;
create view fooddescriptions(ndb, descry) as select ndb, descry from food;
Multi-level views
SQL query accesses the above views:
select fd.descryfrom foodclasses fc, fooddescriptions fdwhere fc.ndb=fd.ndb and fc.gpcode=’1900’;
1212
Query execution strategies
No query optimization
Heuristic cost model: very simple manual heuristic cost model of web service operation cost and naïve join strategy
Hash join strategy: heuristic cost model + hash join
Semantic enrichment: key of the view is also specified
1313
Comparison of query execution strategies
0
1000
2000
3000
4000
5000
6000
0 200 400 600 800 1000
Number of food items
Ex
ec
uti
on
tim
e (
se
c)
no optimization heuristic cost model
hash join semantic enrichment
1414
Full semantic enrichment Vs hash join
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
0 100 200 300 400 500 600 700 800 900
Number of Food Items
Res
pons
e T
ime(
sec)
hash join semantic enrichment
Hash join requires memory to materialize results of the web service calls
151515
Outline
WSMED Architecture
Semantic Enrichments
Adaptive Parallelization
Web Service Query Service
Related Work & Future Directions
161616
Adaptive parallelization
SQL Views are fully automatically generated
No semantic enrichments
Costs are not known of web service operations:
=> Need for adaptive query processing which changes the query plans while running the query
171717
Queries calling data providing web services often have dependent calls:
Web service calls incur high-latency and high message setup cost.
A naïve implementation of an application making these calls sequentially is time consuming
WSMED :
• automatically generates parallel plans
• experimented with three operators for adaptive parallelization
Parallelization of queries calling dependent web service operations
WS1 WS2 WS3 WSn
181818
Example query
select gl.City , gl.TypeIdfrom GetAllStates gs, GetPlacesWithin gp, GetPlaceList glwhere gs.state=gp.state and gp.distance=15.0 and gp.placeTypeToFind='City' and gp.place='Atlanta' and gl.placeName=gp.ToPlace+', '+gp.ToState and gl.MaxItems=100 and gl.imagePresence='true'
Finds information about places located within 15 km from each City named ’Atlanta‘ in all US states
Invokes 300 web service calls and returns a stream of 360 tuples
<City,
TypeId>GetAllStates GetPlacesWithin GetPlaceList<state> <ToPlace,
ToState>
<15,’City’,’Atlanta’> <100,’true’>
191919
Parallel plans in WSMED
Parallel query plan
SQL queryCalculus
Generator
Parallel pipeliner
Plan function generator
Central plan creator
Plan splitter
Phase 1
Phase 2
central plan
202020
Manually parallelized execution plan (FF_APPLYP)
Parallel pipeline of calls to plan functions PF1 and PF2 Manually specified fanout:
• fixed number of children in a level (e.g. fanout of level 2 is 3) Query processes qi: Processes executing plan functions
Level 2
q0
q1
q3 q4
q2 GetPlacesWithin
GetAllStates
GetPlaceListq5 q8q7q6
Coordinator
Level 1
Query
<State>
FF_APPLYP(PF2,3,ToPlace,ToState)
<City, TypeID>
γGetAllStates()
FF_ APPLYP(PF1,2,State)
<ToPlace, ToState>
Parallel Plan Process tree
212121
Define process tree by manually specifying fanouts per level:
FF_APPLYP(Function PF, Integer fo, Stream pstream) → Stream result
PF – plan function
fo – fanout , values are manually set
pstream – stream of argument tuples for PF: ai
result – stream of results ri from PF
Asynchronous operator
q3
q4q5
PF
PF
PFp1
p2
p3
FF_APPLYP
FF_APPLYP
r1r2
r3
p4
p5
p6
PFp1, p2, p3
r1
p4
r3
p5
r2
p6
222222
Observations
•Fastest execution time 56.4 sec outperformed with the speed-up of 4.3 the central plan (244.8 sec)
•Limitation: Manual specification of fanout
Non parallel plan Best execution time
2323
AFF_APPLYP
1. AFF_APPLYP initially forms a binary process tree by always setting fanout to 2 - init stage.
23
q0
q1
q3 q4
q2
q6q5
Coordinator
Level 1
Level 2
Automatically adapts process tree at run time:
AFF_APPLYP(Function PF, Stream pstream) → Stream result
2424
AFF_APPLYP (cont.)
2. Executes a monitoring cycle for each invocation of PF for argument tuple ai in non-leaf node
2.1 After the first monitoring cycle AFF_APPLYP adds p new child processes - an add stage to compare performance change
3. When an added node has several levels of children, recursive init stages of AFF_APPLYP s will produce a binary sub–tree
q0
q1
q3 q4
q2
q5
Coordinator
Level 1 q7
q9q8q10Level 2 q6 q11
252525
AFF_APPLYP (cont.)
4. AFF_APPLYP records per monitoring cycle i the average time ti to produce an incoming tuple from the children
4.1 If ti decreases more than a threshold the add stage is rerun
4.2 If ti increases we either add no more children or run a drop stage that drops one child and its children
q0
q1
q3 q4
q2
q5
Coordinator
Level 1
q12q10Level 2 q6 q11
262626
Adaptive results
p- number of children added after each monitoring cycle
Methods with different p value0
50
100
150
200
250
300
1
Exe
cuti
on
Tim
e(se
c)
Non-parallel plan best FF_APPLYP
p=1, no drop stage, fo1=3 fo2=3 p=1, drop stage, fo1=2 fo2=3
p=2, no drop stage, fo1=4 fo2=5 p=2, drop stage, fo1=3 fo2=3
p=3, no drop stage, fo1=5 fo2=3.4 p=3, drop stage, fo1=4 fo2=3.25
p=4, no drop stage, fo1=6 fo2=8.7 p=4, drop stage, fo1=5 fo2=4.2
p=5, no drop stage, fo1=7 fo2=7.5 p=5, drop stage, fo1=6 fo2=7.8
Non parallel plan
Best FF_APPLYPBest AFF_APPLYP
272727
The PAP operator adaptively parallelizes independent and dependent calls
AFF_APPLYP can also handle independent calls, but will treat them as a sequence (suboptimal):
Parameterized adaptive parallelization
WS5
WS1
WS2 WS3
WS4
WS1 WS2 WS3 WS4 WS5
Queries calling data providing web services often have both dependent & independent calls:
28
PAP(Vector of Function VPF, Stream pstream ,Vector argorder, Vector resorder ) → Stream result
VPF – set of plan function pstream – stream of argument values pi
argorder – arguments order resorder – result order result – stream of results rj
Different plan functions use different argument values from an argument tuple in pstream• argorder specifies for each plan function how to form the its arguments
Similarly resorder specifies how the result of PAP is constructed from the results of its children
Asynchronous operator
PAP operator(Parameterized Adaptive Parallelization)
292929
Experimental study
Cached dependent (CD): Modifies D by caching the results of web service operation calls using AFF_APPLYP
WS1 WS2 WS3 WS4 WS5
WS1 WS2 WS3 WS4 WS5
Cache
Dependent (D):All web service operations are using AFF_APPLYP
Independent (I): Parallel independent calls using PAP
WS5
WS1
WS2 WS3
WS4
303030
Experimental results
Experiments with adaptive strategies Relative scalability
0
100
200
300
400
500
600
700
0 500 1000 1500 2000
No. of Zipcodes
Exec
utio
n tim
e (s
ec)
D CD I
0
20
40
60
80
100
120
140
160
0 500 1000 1500 2000
No. of Zipcodes
Exec
utio
n tim
e di
ffere
nce
(sec
)
D-I D-CD CD-I
313131
Outline
WSMED Architecture
Semantic Enrichments
Adaptive Parallelization
Web Service Query Service
Related Work & Future Directions
323232
WSMED assumes data sources are web service operations
• How handle a data providing system not available as web service?
The conventional way:
• Develop software, define WSDL, deploy the interface code
Our approach: WSMED Web Service Generator
• Once data source defined as Amos II mediator system
Automatically generates web service interfaces, generates WSDL, dynamically deploys the Web Service
The WSMED query service is automatically generated by the WSMED Web Service Generator
Everything as a Service paradigm (XaaS)
• URL to use WSMED web service: http://udbl2.it.uu.se/WSMED/wsmed.html
Web service query service
333333
Outline
WSMED Architecture
Semantic Enrichments
Adaptive Parallelization
Web Service Query Service
Related Work & Future Directions
3434
Contributions of papers
34
Research questions Paper I Paper II Paper III paper IV
1. How can web service standards be automatically utilized?
A A A
2. How can views of web service operations be automatically generated?
PA A A
3. How can query optimization be used to provide efficient and scalable search from web services?
PA PA A
4. How can the query optimizer speed up queries without any cost estimate?
PA PA A
5. How can data sources that are not accessible via web services be transformed into web services?
A
6. How can Everything as a Service paradigm be used for querying web services?
A
A- Answered PA – Partially answered
1. Paper1 - Semantic enrichments
2. Paper II - Adaptive parallelization with dependent calls: AFF_APPLYP
3. Paper III - Adaptive parallelization with dependent & independent calls: PAP
4. Paper IV - Web service query service
353535
WSMS (U.Srivastava, J.Widom, K.Munagala, and R.Motwani, Query Optimization over Web Services, VLDB 2006)• WSMED also invokes parallel web service calls. • WSMS has static cost model• WSMED supports adaptive parallelization without any static cost model .
Eddies (R.Avnur, et al., Eddies: Continuously adaptive query processing, SIGMOD ,2000)• Adaptive operator
• Eddies dynamically adapting algebra expression
• PAP speeds up the calls to individual plan functions for a given algebra expression .
Two-phase query optimization strategies in distributed databases (Hasan, W. :Optimization of SQL queries for Parallel Machines, 1997) • Two-phase optimization
• Two-phase query optimization used static cost model to statically distribute execution plans
• WSMED supports adaptive parallelization without any static cost model.
Related work
363636
Future directions
WSMED approach relies on calling side effect free data providing web service operations
• WSDL language does not provide meta-data describing side effects
• When such a standard is available WSMED can utilize it to guarantee query correctness by managing the updatable views.
All performance measurements were made with publicly available web service operations
• Development of a benchmark to simulate the parallel web service calls for controlled experiments.
3737
Thank you for your attention
?
37“The un-queried life is not worth living”