Stage2Raj.ppt

28
1 An open source DBMS An open source DBMS for handheld devices for handheld devices Stage 2 Stage 2 by Rajkumar Sen IIT Bombay Under the guidance of Prof. Krithi Ramamritham

Transcript of Stage2Raj.ppt

Page 1: Stage2Raj.ppt

1

An open source DBMS An open source DBMS for handheld devicesfor handheld devices

Stage 2 Stage 2

by Rajkumar Sen

IIT Bombay

Under the guidance of

Prof. Krithi Ramamritham

Page 2: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

2

OutlineOutline

• Introduction

• Storage Management

• Query Processing

• Future Work

Page 3: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

3

IntroductionIntroductionStage 1 Survey of

Storage Models: Flat Storage, Domain Storage, and Ring Storage Query Processing issues Data Synchronization Concurrency Control and Recovery

Goals for stage 2 New storage models to further reduce storage cost Memory cognizant query processing Data Synchronization issues System Implementation issues

Page 4: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

4

Storage ManagementStorage Management

Aim at compactness in representation of data

Existing storage models – Flat Storage– Pointer-based Domain Storage

In Domain Storage, pointer of size p (typically 4 bytes) to

point to the domain value. Can we further reduce the storage cost?

Page 5: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

5

Storage ManagementStorage Management ID Storage:

– An identifier for each of the domain values

– Identifier is the ordinal value in the domain table

– Store the identifier instead of the pointer

– Use the identifier as an offset into the domain table

– Extendable IDs, length of the identifier grows and shrinks depending on the number of domain values

Page 6: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

6

Storage ManagementStorage Management D domain values can be distinguished by identifiers of

length log2D /8 bytes.

Starting with 1 byte identifiers, the length grows and shrinks. ID values are projected out from the rest of the relation and stored separately maintaining Positional Indexing. Why not bit identifiers?

– Storage is byte addressable.– Packing bit identifiers in bytes increases the storage

management complexity.

Page 7: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

7

Storage ManagementStorage Management

Relation R ID Values

Figure: ID Storage

0

1

2

1

n

0

n

v0

v1

vn

Domain Values

Positional Indexing

Page 8: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

8

Storage ManagementStorage ManagementPing Pong Effect

– At the boundaries, there is reorganization of ID values when the identifier length changes– Frequent insertions and deletions at the boundaries might result in a lot of reorganization– Phenomena should be avoided

No deletion of Domain values– Domain structure means a future insertion might reference the deleted value– Do not delete a domain value even it is not referenced

Setting a threshold for deletion – Delete only if number of deletions exceeds a threshold– Increase the threshold when boundaries are being crossed

Page 9: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

9

Storage ManagementStorage ManagementPrimary Key-Foreign Key relationship

– Primary key: A domain in itself– IDs for primary key values– Values present in child table are the corresponding primary

key IDs– Projected foreign key column forms a Join Index

Child Table

Relation S

S.BID Values

Figure: Primary Key-Foreign Key Join Index

0

1

2

1

n

0

n

v0

v1

vn

Parent TableRelation R

Page 10: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

10

Storage ManagementStorage Management ID based Storage wins over Domain Storage when p > log2D /8

Relations in a small device do not have a very high cardinalityAbove condition true for most of the data.

Advantages(i) Considerable saving in storage cost.(ii) Efficient join between parent table and child table

Page 11: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

11

Storage ManagementStorage ManagementBitmap Storage

– When the number of domain values is very less compared to the number of tuples, e.g., True, False – Selection on multiple attributes

A Data + Index Model– A bitmap index is created for every bitmap attribute– Attribute values are not stored in the base relation– The index can be used to retrieve the domain value of each tuple

Cost of Projection becomes high as is the case with Ring Storage

Join index of parent table-child table possible by storing

bitmaps for every primary key value

Page 12: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

12

Storage ManagementStorage Management

• Bitmap Storage not an alternative to Ring Storage• Indexing capabilities of both models are different• Depending on attribute characteristics, choose the appropriate model

Memory requirement for selection– Number of bit vectors is equal to the number of attributes that form part of the selection– Bit vectors in memory

Page 13: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

13

Query ProcessingQuery ProcessingConsiderations

– Minimize writes to secondary storage– Efficient usage of limited main memory– Read buffer not required– Main memory as write buffer– If read:write ratio very high, flash memory as write buffer

Query Plan– An optimal query plan is needed– Reduce materialization, if absolutely necessary use main memory– Bushy trees and right-deep trees are ruled out– Left deep tree is most suited for pipelined evaluation– Right operand in a left-deep tree is always a stored relation– Only one input is pipelined

Page 14: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

14

Query ProcessingQuery ProcessingMemory Allocation to Operators

– Limited main memory, cannot assume that the entire memory is available for every operator in the left-deep tree plan– Can the plan be executed with the available memory?

If nested loop algorithms are used for every operator, minimumamount of memory is needed to execute the plan

– Nested loop algorithms are inefficient– Should memory usage be reduced to a minimum at the cost of performance?– Memory increasing with every new device– Different devices come with different memory sizes– Query plans should make extensive use of memory– Memory must be optimally allocated among all operators

Page 15: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

15

Query ProcessingQuery ProcessingOperator evaluation schemes

– Different schemes for an operator– All have different memory usage and cost– Schemes conform to left-deep tree query plan– Cost of a scheme is the computation time

Schemes for Join– Nested Loop Join– Indexed Nested Loop Join– Hash Join

Similar schemes for other operators

Page 16: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

16

Query ProcessingQuery ProcessingBenefit/Size of a scheme

Every scheme is characterized by a benefit/size ratio which represents its benefit per unit memory allocation Minimum scheme for an operator is the scheme that has max. cost and min. memory

Assume n schemes s1, s2,…sn to implement an operator o

min(o)=smin

i, 1≤i≤n : Cost(si) ≤ Cost(smin) ,

Memory(si) ≥ Memory(smin)

smin is the minimum scheme for operator o. Then,

Benefit(si)=Cost(smin) – Cost(si)

Size(si) =Memory(si) – Memory(smin)

A

Page 17: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

17

Query ProcessingQuery Processing

An operator is defined by the benefit and size of its schemesEvery operator is a collection of (size,benefit) points, n pointsfor n schemes

Benefit

(0,0)

(s1,b1)

(s2,b2)

Figure: (Size, Benefit) points for an operator

Size

Page 18: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

18

Query ProcessingQuery ProcessingOptimal Memory Allocation

Determine the amount of memory allocated to each operator to get maximum benefit

2-Phase ApproachPhase 1: Query is first optimized to get a query planPhase 2: Division of memory among the operators

Scheme for every operator is determined in phase 1 and remainsunchanged after phase 2, memory allocation in phase 2 on thebasis of the cost functions of the schemes

Memory is assumed to be available for all the schemes, this maynot be true for a resource constrained device

Page 19: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

19

Query ProcessingQuery Processing

Depending on the available memory, need to determine thebest scheme for every operator out of all possible ones

Schemes in phase 1 and after phase 2 need not be the same

Optimal division of memory involves the decision of selectingthe best scheme for every operator

Page 20: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

20

Query ProcessingQuery Processing

Our Solution– We use a heuristic to determine which operator gains the most per unit memory allocation and allocate memory to that operator– Gain of every operator is determined by its best possible scheme– Repeat the process till memory allocation is done

Heuristic:

Select the scheme that has the maximum benefit/size and allocate its memory

Page 21: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

21

Query ProcessingQuery ProcessingMemAllocate(MTotal) {

1. Mmin = Memory(min(i))

2. for i=1 to m do3. Scheme(i)=min(i)4. end for

5. Mavail = MTotal – Mmin

6. sbest,obest=GetBestScheme(Mavail)7. if no best scheme then return8. else {

9. Mavail = Mavail – Memory(sbest) + Memory(Scheme(obest))

10. Scheme(obest)=sbest

11. RemoveSchemes(sbest,obest)

12. RecomputeBenefits(sbest,obest)13. }14. goto step 6

}

Complexity = O(nm2), m=no. of operators, n=no. of schemes

i=1

Page 22: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

22

Query ProcessingQuery ProcessingRecomputation of Benefits

Once the operator obest gets memory Memory(sbest),

the benefit and size of all the schemes of obest that

have higher memory than sbest change.

New benefit and size values will be the difference between their old values and those of sbest.

Benefit

Size(0,0)

(s1,b1)(s2,b2)

(s2-s1)(b2-b1)

Scheme 1 has highest benefit/size ratioBenefit(Scheme 2)=(b2-b1)

Size(Scheme 2)=(s2-s1)Figure: Benefit and Size Recomputation

Page 23: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

23

Query ProcessingQuery Processing1 Phase Approach

The 2-phase solution optimally allocates memory to all the operators in the query plan. However, the plan itself might be suboptimal for the given available memory.

1-phase approach takes into account memory division among operators while choosing between plans.

Ideally, 1-phase optimization should be done but the

optimizer becomes complex.

Page 24: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

24

Future WorkFuture WorkImplementation Status

1. Flat Storage, Domain Storage, Ring Storage, and ID Storage2. Join algorithms

Future Work Bitmap Storage implementation Algorithms for aggregation Query optimizer and the iterator Test using sample relations and data from handheld apps Examine the feasibility of a 1-phase optimizer Database Module Toolkit An operator that returns first-k results of a query Application specific DBMS

Page 25: Stage2Raj.ppt

25

Thank YouThank You

Page 26: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

26

ReferencesReferences1. A. Ammann, M. Hanrahan, and R. Krishnamurthy. Design of a

Memory Resident DBMS. In IEEE COMPCON, 1985.

2. C. Bobineau, L. Bouganim, P. Pucheral, and P. Valduriez. PicoDBMS: Scaling down Database Techniques for the Smartcard. In VLDB, 2000.

3. Stephen Blott and Henry F. Korth. An Almost Serial Protocol for Transaction Execution in Main Memory Database Systems. In VLDB, 2002.

4. DB2 Everyplace. http://www.ibm.com/software/data/db2/everyplace.

5. Anindya Datta, Debra VanderMeer, Krithi Ramamritham, and Bongki Moon. Applying Parallel Processing Techniques in Data Warehousing and OLAP. In VLDB, 1999.

6. A. Hulgeri, S. Sudarshan, and S. Seshadri. Memory Cognizant Query Optimization. In Advances In Data Management, 2000.

Page 27: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

27

ReferencesReferences7. Arthur M. Keller. Algorithms for Translating View Updates to Database Updates for Views Involving Selections, Projections and Joins. In ACM PODS, 1985.

8. Rom Langerak. View Updates in Relational Databases with an Independent Scheme. In ACM PODS, 1990.

9. T. Lehmann and M. Carey. A Study of Index Structures for Main Memory DBMS. In VLDB, 1986.

10. M. Missikov and M. Scholl. Relational Queries in a Domain Based DBMS. In ACM SIGMOD, 1983.

11. Mysql. http://www.mysql.com.

12. P. Pucheral, P. Valduriez, and J.M.Thevenin. EÆcient Main Memory Data Management using the DBGraph Storage Model. In VLDB, 1990.13. The Simputer. http://www.simputer.org.

Page 28: Stage2Raj.ppt

04/13/23 An open source DBMS for handheld devices

28

A

Σ