Stage2Raj.ppt

1

An open source DBMS An open source DBMS for handheld devicesfor handheld devices

Stage 2 Stage 2

by Rajkumar Sen

IIT Bombay

Under the guidance of

Prof. Krithi Ramamritham

04/13/23 An open source DBMS for handheld devices

2

OutlineOutline

• Introduction

• Storage Management

• Query Processing

• Future Work


3

IntroductionIntroductionStage 1 Survey of

Storage Models: Flat Storage, Domain Storage, and Ring Storage Query Processing issues Data Synchronization Concurrency Control and Recovery

Goals for stage 2 New storage models to further reduce storage cost Memory cognizant query processing Data Synchronization issues System Implementation issues


4

Storage ManagementStorage Management

Aim at compactness in representation of data

Existing storage models – Flat Storage– Pointer-based Domain Storage

In Domain Storage, pointer of size p (typically 4 bytes) to

point to the domain value. Can we further reduce the storage cost?


5

Storage ManagementStorage Management ID Storage:

– An identifier for each of the domain values

– Identifier is the ordinal value in the domain table

– Store the identifier instead of the pointer

– Use the identifier as an offset into the domain table

– Extendable IDs, length of the identifier grows and shrinks depending on the number of domain values


6

Storage ManagementStorage Management D domain values can be distinguished by identifiers of

length log2D /8 bytes.

Starting with 1 byte identifiers, the length grows and shrinks. ID values are projected out from the rest of the relation and stored separately maintaining Positional Indexing. Why not bit identifiers?

– Storage is byte addressable.– Packing bit identifiers in bytes increases the storage

management complexity.


7


Relation R ID Values

Figure: ID Storage

0

1

2

1

n

0

n

v0

v1

vn

Domain Values

Positional Indexing


8

Storage ManagementStorage ManagementPing Pong Effect

– At the boundaries, there is reorganization of ID values when the identifier length changes– Frequent insertions and deletions at the boundaries might result in a lot of reorganization– Phenomena should be avoided

No deletion of Domain values– Domain structure means a future insertion might reference the deleted value– Do not delete a domain value even it is not referenced

Setting a threshold for deletion – Delete only if number of deletions exceeds a threshold– Increase the threshold when boundaries are being crossed


9

Storage ManagementStorage ManagementPrimary Key-Foreign Key relationship

– Primary key: A domain in itself– IDs for primary key values– Values present in child table are the corresponding primary

key IDs– Projected foreign key column forms a Join Index

Child Table

Relation S

S.BID Values

Figure: Primary Key-Foreign Key Join Index

0

1

2

1

n

0

n

v0

v1

vn

Parent TableRelation R


10

Storage ManagementStorage Management ID based Storage wins over Domain Storage when p > log2D /8

Relations in a small device do not have a very high cardinalityAbove condition true for most of the data.

Advantages(i) Considerable saving in storage cost.(ii) Efficient join between parent table and child table


11

Storage ManagementStorage ManagementBitmap Storage

– When the number of domain values is very less compared to the number of tuples, e.g., True, False – Selection on multiple attributes

A Data + Index Model– A bitmap index is created for every bitmap attribute– Attribute values are not stored in the base relation– The index can be used to retrieve the domain value of each tuple

Cost of Projection becomes high as is the case with Ring Storage

Join index of parent table-child table possible by storing

bitmaps for every primary key value


12


• Bitmap Storage not an alternative to Ring Storage• Indexing capabilities of both models are different• Depending on attribute characteristics, choose the appropriate model

Memory requirement for selection– Number of bit vectors is equal to the number of attributes that form part of the selection– Bit vectors in memory


13

Query ProcessingQuery ProcessingConsiderations

– Minimize writes to secondary storage– Efficient usage of limited main memory– Read buffer not required– Main memory as write buffer– If read:write ratio very high, flash memory as write buffer

Query Plan– An optimal query plan is needed– Reduce materialization, if absolutely necessary use main memory– Bushy trees and right-deep trees are ruled out– Left deep tree is most suited for pipelined evaluation– Right operand in a left-deep tree is always a stored relation– Only one input is pipelined


14

Query ProcessingQuery ProcessingMemory Allocation to Operators

– Limited main memory, cannot assume that the entire memory is available for every operator in the left-deep tree plan– Can the plan be executed with the available memory?

If nested loop algorithms are used for every operator, minimumamount of memory is needed to execute the plan

– Nested loop algorithms are inefficient– Should memory usage be reduced to a minimum at the cost of performance?– Memory increasing with every new device– Different devices come with different memory sizes– Query plans should make extensive use of memory– Memory must be optimally allocated among all operators


15

Query ProcessingQuery ProcessingOperator evaluation schemes

– Different schemes for an operator– All have different memory usage and cost– Schemes conform to left-deep tree query plan– Cost of a scheme is the computation time

Schemes for Join– Nested Loop Join– Indexed Nested Loop Join– Hash Join

Similar schemes for other operators


16

Query ProcessingQuery ProcessingBenefit/Size of a scheme

Every scheme is characterized by a benefit/size ratio which represents its benefit per unit memory allocation Minimum scheme for an operator is the scheme that has max. cost and min. memory

Assume n schemes s1, s2,…sn to implement an operator o

min(o)=smin

i, 1≤i≤n : Cost(si) ≤ Cost(smin) ,

Memory(si) ≥ Memory(smin)

smin is the minimum scheme for operator o. Then,

Benefit(si)=Cost(smin) – Cost(si)

Size(si) =Memory(si) – Memory(smin)

A


17

Query ProcessingQuery Processing

An operator is defined by the benefit and size of its schemesEvery operator is a collection of (size,benefit) points, n pointsfor n schemes

Benefit

(0,0)

(s1,b1)

(s2,b2)

Figure: (Size, Benefit) points for an operator

Size


18

Query ProcessingQuery ProcessingOptimal Memory Allocation

Determine the amount of memory allocated to each operator to get maximum benefit

2-Phase ApproachPhase 1: Query is first optimized to get a query planPhase 2: Division of memory among the operators

Scheme for every operator is determined in phase 1 and remainsunchanged after phase 2, memory allocation in phase 2 on thebasis of the cost functions of the schemes

Memory is assumed to be available for all the schemes, this maynot be true for a resource constrained device


19


Depending on the available memory, need to determine thebest scheme for every operator out of all possible ones

Schemes in phase 1 and after phase 2 need not be the same

Optimal division of memory involves the decision of selectingthe best scheme for every operator


20


Our Solution– We use a heuristic to determine which operator gains the most per unit memory allocation and allocate memory to that operator– Gain of every operator is determined by its best possible scheme– Repeat the process till memory allocation is done

Heuristic:

Select the scheme that has the maximum benefit/size and allocate its memory


21

Query ProcessingQuery ProcessingMemAllocate(MTotal) {

1. Mmin = Memory(min(i))

2. for i=1 to m do3. Scheme(i)=min(i)4. end for

5. Mavail = MTotal – Mmin

6. sbest,obest=GetBestScheme(Mavail)7. if no best scheme then return8. else {

9. Mavail = Mavail – Memory(sbest) + Memory(Scheme(obest))

10. Scheme(obest)=sbest

11. RemoveSchemes(sbest,obest)

12. RecomputeBenefits(sbest,obest)13. }14. goto step 6

}

Complexity = O(nm2), m=no. of operators, n=no. of schemes

i=1

mΣ


22

Query ProcessingQuery ProcessingRecomputation of Benefits

Once the operator obest gets memory Memory(sbest),

the benefit and size of all the schemes of obest that

have higher memory than sbest change.

New benefit and size values will be the difference between their old values and those of sbest.

Benefit

Size(0,0)

(s1,b1)(s2,b2)

(s2-s1)(b2-b1)

Scheme 1 has highest benefit/size ratioBenefit(Scheme 2)=(b2-b1)

Size(Scheme 2)=(s2-s1)Figure: Benefit and Size Recomputation


23

Query ProcessingQuery Processing1 Phase Approach

The 2-phase solution optimally allocates memory to all the operators in the query plan. However, the plan itself might be suboptimal for the given available memory.

1-phase approach takes into account memory division among operators while choosing between plans.

Ideally, 1-phase optimization should be done but the

optimizer becomes complex.


24

Future WorkFuture WorkImplementation Status

1. Flat Storage, Domain Storage, Ring Storage, and ID Storage2. Join algorithms

Future Work Bitmap Storage implementation Algorithms for aggregation Query optimizer and the iterator Test using sample relations and data from handheld apps Examine the feasibility of a 1-phase optimizer Database Module Toolkit An operator that returns first-k results of a query Application specific DBMS

25

Thank YouThank You


26

ReferencesReferences1. A. Ammann, M. Hanrahan, and R. Krishnamurthy. Design of a

Memory Resident DBMS. In IEEE COMPCON, 1985.

2. C. Bobineau, L. Bouganim, P. Pucheral, and P. Valduriez. PicoDBMS: Scaling down Database Techniques for the Smartcard. In VLDB, 2000.

3. Stephen Blott and Henry F. Korth. An Almost Serial Protocol for Transaction Execution in Main Memory Database Systems. In VLDB, 2002.

4. DB2 Everyplace. http://www.ibm.com/software/data/db2/everyplace.

5. Anindya Datta, Debra VanderMeer, Krithi Ramamritham, and Bongki Moon. Applying Parallel Processing Techniques in Data Warehousing and OLAP. In VLDB, 1999.

6. A. Hulgeri, S. Sudarshan, and S. Seshadri. Memory Cognizant Query Optimization. In Advances In Data Management, 2000.


27

ReferencesReferences7. Arthur M. Keller. Algorithms for Translating View Updates to Database Updates for Views Involving Selections, Projections and Joins. In ACM PODS, 1985.

8. Rom Langerak. View Updates in Relational Databases with an Independent Scheme. In ACM PODS, 1990.

9. T. Lehmann and M. Carey. A Study of Index Structures for Main Memory DBMS. In VLDB, 1986.

10. M. Missikov and M. Scholl. Relational Queries in a Domain Based DBMS. In ACM SIGMOD, 1983.

11. Mysql. http://www.mysql.com.

12. P. Pucheral, P. Valduriez, and J.M.Thevenin. EÆcient Main Memory Data Management using the DBGraph Storage Model. In VLDB, 1990.13. The Simputer. http://www.simputer.org.


28

A

Σ

Stage2Raj.ppt

Documents

Transcript of Stage2Raj.ppt