1 Reading Report 2 Yin Chen 17 Feb 2004 References : The State of the Art in Distributed Query...

1

Reading Report 2Yin Chen

17 Feb 2004

References :The State of the Art in Distributed Query Processing, Donald Kossmann, ACM computing Sruveys, Sep 2000

http://portal.acm.org/citation.cfm?id=371598&dl=ACM&coll=portal

2

Contents

Distributed Query Processing: Basic Approach and Techniques

Client-Server Database System

Heterogeneous Database System

Dynamic Data Placement

Economic Models for Distributed Query Processing

3

Distributed Query Processing

Architecture of Query Processor

Query Optimization

Query Execution Techniques

Refer to Ch.2

4

Distributed Query ProcessingArchitecture of a Query Processor The Classic “textbook” architecture for query processing.

The query processor receives an SQL query , translates and optimizes it into an executable query plan.

If the query is an interactive ad hoc query (dynamic SQL ), the plan is directly executed by the query execution engine.

If the query is a canned query that is part of an application program (embedded SQL), the plan is stored in the database and executed every time the application program is executed.

5

Distributed Query ProcessingArchitecture of a Query Processor (Contd.) The Parse translates the query into an internal representation.

The Query Rewrite transforms a query in order to carry out optimization regardless of the physical state of the system. (the size of tables, presence of indices, locations of copies of tables, speed of machines)

The Query Optimizer carries out optimizations that depend on the physical state of the system.

A Plan specifies how the query is to be executed, always has trees structure.

The Plan Refinement/ Code Generation transforms the plan into a executable plan, i.e. generation of an assembler-like code.

The Query Execution Engine provides generic implementations for every operator. Based on an iterator model, in this model, operators are implemented as iterators, all iterators have the same interface, two iterators can be plugged together.

The Catalog stores all information including the schema of the database, the partitioning schema, physical information, information about indices, and statistics that are used to estimate the cost of plan.

In a distributed system, WHERE to store the catalog? - At one central site - Replicate the catalog at several site to reduce communication cost - Cache catalog at sites

6

Distributed Query ProcessingQuery Optimization Dynamic Programming Algorithm: Works in a bottom-up way by building more

complex (sub-)plan from simpler (sub-)plans:

(1) Build an access plan for every table involved in the query. If Table A is replicated at sites S1 and S2, enumerate scan(A, S1) and scan(A, S2) as alternative access plans for Table A.

(2) Enumerate all two-way join plans using the access plans as building blocks. also enumerate alternative join plans for all relevant sites,

(3) Build three-way join plans, using access-plans and two-way join plans as building blocks.

(4) Continues in this way until has enumerated all n-way join plans which are complete plans for the query, if the query involves n tables.

Inferior plans can be earlier discarded. The earlier inferior plans are pruned, the better because more complex plans are not constructed from them.

In a distributed system, can NOT immediately prune scan(A, S1) or scan(A, S2) to guarantee finding a good plan. Both plans do the same work, but produce results at different sites. Even if scan(A, S1) is cheaper than scan(A, S2) , scan(A, S2) must be kept because it might be a building block of the overall best plan, i.e. the query results are to be presented at S2. Only if the cost of scan(A, S1) plus the cost of shipping A from S1 to S2 is lower than the cost of scan(A, S2) , scan(A, S2) is pruned.

7

Distributed Query ProcessingQuery Optimization (Contd.) Cost Model - The classic way to estimate the cost of a plan is to estimate the cost of every

individual operator and then sum up these costs. The cost of a plan is defined as the total resource consumption of the plan.

- In a centralized system, operator cost = CPU costs + disk I/O costs. (disk I/O costs = seek cost + latency cost + transfer costs).

- In a distributed system, operator cost = CPU costs + disk I/O costs + communication costs (communication costs = fixed costs per message + per-byte costs to transfer data + CPU costs to pack and unpack messages at the sending and receiving sites.)

- As a result, the optimizer may favor plans that carry out operators at fast and unloaded machines and avoid expensive communication links.

Response Time Models (1) Compute the total resource consumption for each individual operator, X. (2) Compute the total usage of every shared resource used by a group of operators that

run in parallel, Y. (3) The response time of an entire group of operators that run in parallel is then

computed as the maximum of X and Y.

8

Distributed Query ProcessingQuery Execution Techniques Row Blocking : Ship tuples in a block rather than every tuple individually. The block

size should be larger than message size used by the network, thus first block of tuples can be operated even the next block of tuples is delayed.

Optimization of Multicasts : In a hierarchical networks environment, choose a cheaper route to ship data.

Multithreaded Query Execution : Establishing a separate thread for every query operator to parallelize all operations. But may raise synchronization problem resulting in additional cost.

Joins with Horizontally Partitioned : If table A is horizontally partitioned as A=A1UA2, then A join B = (A1UA2) join B or (A1 join B)U(A2 join B). Further more, if A=A1UA2UA3, ((A1 U A2) join B) U (A3 join B) might be a optimized plan if B is replicated and one copy of B is located near the A1 and A2 and another copy of B is located near A3.

Semijoins : Send only the column(s) of A that are needed to evaluate the join predicates from Site 1 to Site 2, find the tuples of B that qualify the join at Site 2, send back to Site 1 and then match A with B tuples at Site 1.

Double-Pipelined Hash Joins: Construct two main-memory hash tables for A and B. To process a tuple of A, the B hash table is probed in order to find B tuples that match this A tuple; A and the matching B tuples are immediately output. After that, the A tuple is inserted into the A hash table for matching B tuples that have not yet been processed. B tuples are processed analogously. The algorithm terminates when all A and B tuples have been processed and is guaranteed to find all the results of the join. Possible to deliver the first results of a query as early as possible. But main memory may exhausted quickly with growing of the hash tables.

9

Distributed Query ProcessingQuery Execution Techniques (Contd.) Pointer-Based Joins: Foreign keys are implemented by explicit references that

contain the address of an object. Some ways to execute A join B:

naïve approach : scan A following the references to find tuples of B.

value-based join: join A and B based on values.

Object assembly: group tuples such that the corresponding references are stored at the same site; for each group, visit the site that stores the references, fetch all the referenced objects and return. If the referenced objects also have references, further group the key pairs and store the reference object at the same site.

Top N and Bottom N Queries: to avoid wasted work, isolate the top N (or bottom N) tuples as quickly as possible and then performing other operations (sorts, joins, etc.) only on those tuples.

- In standard relational databases, can use stop operators to isolate the top N and bottom N tuples.

- In multimedia database system, combining score to every record, the top N tuples are determined by an overall score function.

- The method can be extended i.e. for meta-searching in the WWW, combining the scores for Web pages returned by search engines such as AltaVista, Infoseek, or Lycos in order to find Web pages with a high total score according to all search engines.

10

Heterogeneous Database System

Wrapper Architecture

Query Optimization

Query Execution Techniques

Refer to Ch.4

11

Heterogeneous Database SystemWrapper Architecture

Characteristic : the individual component databases can have different capabilities to store data, carry out database operations (e.g., joins and group-bys), and/or communicate with other component databases.

Challenges - To find the best possible query plans and to avoid to carry out invalid operations - To deal with semantic heterogeneity

Wrapper Architecture - The mediator parses a query, carries out query rewrite and query optimization, executes some of the operations, and maintains a catalog. Is designed to integrate any kind of component database, cannot directly interact with component database.

- The Wrapper is associated to every component database, translates every request of the mediator to database API, and

result returned by database.

12

Heterogeneous Database SystemQuery Optimization How to estimate the cost or response time of the plans?

Calibration Approach Define a generic cost model for all wrappers an adjust certain parameters for every

individual wrapper. i.e. c* n n -- estimated number of tuples returned by the wrapper plan ; c -- wrapper/component database specific parameter, would be small for very fast component

databases and large for slow component databases or component databases that are only reachable by a slow communication link.

Individual Wrapper Cost Models - Every wrapper provides its own cost formulas. - More accurate - But difficult to develop - Improvement: Default using the calibration approach and free to overwrite and define

their own cost functions.

Learning Curve Approach - Monitoring the system and keeping statistics about the cost, using query feedback - Can be inaccurate - But automatically and dynamically adapts to changes in the system

13

Heterogeneous Database SystemQuery Execution Techniques

Bindings : Consider a heterogeneous system with two relational component databases, D1 and D2, that store Tables A, B. To execute A join B, first scan A in D1 to return tuples one by one. Binding every tuple of Table A in query to select tuples from B in D2.

Cursor Caching : To optimize a query only once in order to reduce the overhead of submitting the same query to the same component database repeatedly.

14

Dynamic Data Placement

Dynamic Replication Algorithms

Cache Investment

Refer to Ch. 5

15

Dynamic Data Placement Dynamic data placement approaches keep statistics about the query workload

and automatically move data and establish copies of data at different sites to balance the current workload.

NOT aim to be perfect, BUT try to improve the data placement with every move.

Two mechanisms: replication and caching, both establish copies of data at different sites in order to reduce communication costs and balance the load of a system.

Replication Caching

Target Server Client or middle-tier

Granularity Coarse fine

Storage device disk Typically main memory

Impact on catalog Yes No

Update protocol Propagation Invalidation

Remove copy Explicit Implicit

mechanism Separate fetch Fault in an keep copy after use

16

Traditional algorithms can be classified in 2 groups: - Algorithms that try to reduce communication costs in a WAN by moving copies of data

to server that near the clients - Algorithms that try to replicate hot data in order to balance the load on servers in a LAN

or an environment in which communication is cheap.

ADR algorithm Servers 5 and 7 would keep read and write statistics for the object and periodically decide whether the replication scheme should be expanded to Servers 2, 3, 4, 8, 9, or 10, be contracted, removing the replicas at Servers 5 or 7, or remain unchanged. They do following tests: - Expansion Test. Add the neighbor to the replication scheme if more read requests

originate from clients of that neighbors or clients connected to servers of the subtree rooted in that neighbor than updates originate at other clients.

- Contraction Test. Drop the copy if more updates are propagated to that copy than the copy is read.

Dynamic Data PlacementDynamic Replication Algorithms

17

Dynamic Data PlacementCache Investment

Cache investment applies what-if analysis to decide whether it is worth caching. The investment is the difference in cost between the suboptimal, client-side plan that

brings the pages to the client’s cache and the optimal, server-side plan. It depends on the selectivity of the predicates of the WHERE clause and the number of columns of the query result.

The benefit is the difference in cost between the best plan for the query given that NONE of the relevant pages are cached, and the cost of the best plan assuming that ALL relevant pages are cached. It depends on the selectivity of the predicate and the target columns of the query.

When benefits of caching outweigh the investment, Cache investment advices optimizer to load the cache. i.e.

SELECT e.name, e.manager FROM Emp e WHERE e.salary > 100.000;

Assuming to evaluate this query at the client involves 10 pages of the Emp.salary index, another 20 pages to retrieve the name and manager fields. Thus the overall communication costs are 30 pages if index scan is carried out at the client. If index scan is carried out at server, the communication cost of shipping name and manager fields is 10 pages. Here, the investment is 20 pages, the benefit is 10 pages. If the client repeatedly ask queries, after three queries, the benefits outweigh the investment, cache investment will advise the optimizer to load the cache with the relevant Emp data

18

Economic Models for Distributed Query Processing

Refer to Ch. 6

19

Economic Models for Distributed Query Processing The motivation to use an economic model is that distributed systems are too complex to

be controlled by a single centralized component with a universal cost model. “Magic of capitalism” : every server that offers a service (data, CPU cycles, etc.) tries to

maximize its own profit by selling its services to clients. Mariposa is the first distributed database system based on an economic paradigm, which

processes queries by carrying out auctions: - Clients send out queries with a budget to every query. The budget depends on the

importance and demanding response time. i.e. A client in Las Vegas could pay $5.00 if the client gets the latest World Cup football results in 1 second, but only 10 cents if the delivery of the results takes 1 minute.

- The broker starts an auction, every server that stores copies of data or is willing to execute one or several of the operators is asked to give bids :

(Operator o, Price p, Running Time r, Expiration Date x) which indicates that server can execute Operator o for p dollars in t seconds, and this offer

is valid until the expiration date x. - The broker collects all bids and makes contracts with servers to execute the queries. It

tries to maximize its profit by finding cheaper way. If finds no way, it reject the query. The client must raise the budget, revise the response time goals, or agree with the answer.

Advantages of Mariposa: - Different servers can flexibly establish different bidding strategies to achieve high revenue. - Can use dynamic data placement: servers can make a profit by buying and selling copies

of data, allowing other servers to replicate the data.

20

End …

1 Reading Report 2 Yin Chen 17 Feb 2004 References : The State of the Art in Distributed Query...

Documents

Transcript of 1 Reading Report 2 Yin Chen 17 Feb 2004 References : The State of the Art in Distributed Query...