14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 1...
-
Upload
erica-daniel -
Category
Documents
-
view
213 -
download
0
Transcript of 14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 1...
14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)
2
context
1. High-level data access and integration servicesservices are needed if applications that have data with complex structure and complex semantics are to benefit from the GRID.
2. Standards for data access are emerging, and middleware products that are reference implementations of such standards are already available.
3. Distributed query processing technology is one approach to delivering (1.) given the availability of (2.).
14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)
3
OGSA-DQPgoals
1. To benefit from homogeneous access to heterogeneous data sources [OGSA-DAI].
2. To benefit from Grid abstractions for on-demand, transparent allocation of resources required for a task [OGSA/OGSI/GT3].
3. To provide transparent, implicit parallelism and distribution. [Polar*]
4. To orchestrate the composition of data retrieval and analysis services using query mechanisms.
5. To expose this orchestration capability as a Grid data service.
14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)
4
OGSA-DQPinnovations
OGSA-DQP dynamically allocates evaluators to do work on behalf of the mediator. All available nodes can be allocated for query evaluation (not just
the nodes with data sources) A distributed query execution plan is resourced on the fly
This allows for runtime circumstances to be taken into account when the optimiser decides how to partition and schedule. The query plan is the outcome of optimising a declarative service
orchestration expressed as a query.
OGSA-DQP uses a parallel physical algebra: most mediator-based query processors do not.
14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)
6
OGSA-DQPprovides two grid services
Exposes to clients
• Grid Distributed Query Services (GDQSs) that:– interact with clients;– find and retrieve service
descriptions;– parse, compile, partition
and schedule the query execution over a union of distributed data sources.
– Coordinates the GQESs into executing the plan
• The query plan is an orchestration of GQESs
Coordinates transparently
• Grid Query Evaluation Services (GQESs) that:– implement the physical
query algebra;– implement the query
execution model and semantics;
– run a partition of a query execution plan generated by a GDQS;
– interact with other GQESs/GDSs/WSs but not with clients.
14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)
7
Brief tour: an illustration
G D Q S
GD S 1
GD S 2
W e bS e rv ice s
C lie n t
re s o u rce lis t
W S D L
D B S ch e m a
D B S ch e m a
G L o g ica lO pt im is e r G
Ph y s ica lO pt im is e r
G Pa rt it io n e r GS ch e du le r
G
OQ
L P
arse
r
Po la r* Q u e ry O pt im is e r En g in e
GD S Q u e r yR e qu e s t D oc .
O Q LQ u e r y
pr in t
e xc h an g e
h as h join
s c ane x ch a n g e
s c an
P1
P2
P3
GGQ ES 3
GGQ ES 2
GGQ ES 1
Distributed QueryExecution Engine
sub- pla n
sub- pla n
da ta b lock s
da ta b lock s
s u b-qu e ry
s u b-qu e ry
o pe ra t io n ca ll
<?xml version="1.0" encoding="UTF-8"?>
<GDQDataSourceList xmlns="http://dqp.ogsadai.org.uk/schema/gdqs" >
<importedDataSource>
<GDSFactoryHandle>http://phoebus.cs.man.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory</GDSFactoryHandle>
<GDSFactoryHandle>http://rpc676.cs.man.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory</GDSFactoryHandle>
<GDSFactoryHandle>http://mygrid.ncl.cs.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory</GDSFactoryHandle>
</importedDataSource>
<importedService>
<wsdlURL>http://phoebus.cs.man.ac.uk:9090/axis/services/EntropyAnalyserService?WSDL</wsdlURL>
</importedService>
</GDQDataSourceList>
<?xml version="1.0" encoding="UTF-8"?>
<databaseSchema xmlns="">
<logicalSchema>
<table name="goterm">
<column fullName="goterm_id" length="32" name="id">
<sqlTypeName>varchar</sqlTypeName>
<sqlJavaTypeID>12</sqlJavaTypeID>
</column>
<column fullName="goterm_type" length="55" name="type">
<sqlTypeName>varchar</sqlTypeName>
<sqlJavaTypeID>12</sqlJavaTypeID>
</column>
<column fullName="goterm_name" length="255" name="name">
<sqlTypeName>varchar</sqlTypeName>
<sqlJavaTypeID>12</sqlJavaTypeID>
</column>
<primaryKey>
<columnFullName>id</columnFullName>
</primaryKey>
</table>
</logicalSchema>
<physicalSchema>
<hostMachine>130.88.192.230</hostMachine>
<database join_buffer_size="131072" max_join_size="4294967295">
<physTable avgRowLength="67" dataLength="766784" indexLength="126976" name="goterm" rowFormat="Dynamic" rows="11369"/>
</database>
</physicalSchema>
<GDSFHandle>http://phoebus.cs.man.ac.uk:9090/ogsa/services/ogsadai/GridDataServiceFactory</GDSFHandle>
</databaseSchema>
<?xml version="1.0" encoding="UTF-8"?>
<Partitions>
<Partition>
<evaluatorURI>http://130.88.198.195:9090/ogsa/services/ogsadai/dqp/GridQueryEvaluationFactory/hash-11025450-1076603541049</evaluatorURI>
<Operator operatorID="0" operatorType="TABLE_SCAN">
<tupleType>
<type>goterm</type>
<name>goterm.OID</name>
<type>string</type>
<name>goterm.id</name>
<type>string</type>
<name>goterm.type</name>
<type>string</type>
<name>goterm.name</name>
</tupleType>
<TABLE_SCAN>
<dataResourceName> goterms </dataResourceName>
<GDSHandle> http://130.88.192.230:9090/ogsa/services/ogsadai/GridDataServiceFactory/hash-31056514-1076603576481</GDSHandle>
<tableName> goterms </tableName>
<predicateExpr>
<predicate>
<comparativeOperator>LIKE</comparativeOperator>
<leftOperand name=" goterm.id" type="13"/>
<rightOperand name=" GO:0000%" type="16"/>
</predicate>
</predicateExpr>
</TABLE_SCAN>
</Operator> . . .
</Partition> . . .
</Partitions>
14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)
9
The Demonstration:Configuring the DQP
Select DQP Factory
Select Data Sources
Select Web Services
Import Metadata
14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)
10
The Demonstration :Example Query
• Given two DBMSs and one analysis tool (e.g., a WS):– Goterm to a GO Gene Ontology
running as a remote mySQL DB,– proteinSequence yeast protein
sequences,– EntropyAnalyser (information
Content analyser);• We can obtain the information content of
protein sequences of a certain kind specified by certain gene ontology terms:
select p.ORF, go.id, calculateEntropy(p.sequence)
from p in protein_sequences, go in goterms, pg in protein_goterms
where go.id=pg.GOTermIdentifier and p.ORF=pg.ORF and p.ORF like "YBL06%" and go.id like "GO:0000%";
• Then, OGSA-DQP acts as an enactor of a declarative orchestration of services on the Grid:
Partition boundaries
Parallelized on nodes 1 & 2
14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir)
12
where to find out more: software
OGSA-DQPGrid middleware to query distributed data
sources
www.ogsadai.org.uk/dqp OGSA-DAI
Grid middleware to interface with data(bases)
www.ogsadai.org.uk/ Globus ToolkitOpen-source implementation of OGSA/OGSI
www.globustoolkit.org/