Scheduling in Staged- DB Systems
-
Upload
todd-perkins -
Category
Documents
-
view
27 -
download
1
description
Transcript of Scheduling in Staged- DB Systems
Staged-DB IC-65 Advances in Data Management Systems 1
Scheduling in Staged- DB Systems
Nicolas Bonvin, Rammohan Narendula, and Surender Reddy Yerva
Staged-DB IC-65 Advances in Data Management Systems 2
Organization
• What is Staged-DB?
• Scheduling in Staged-DB
• Our Contribution
–Scheduling in Execution Phase
–System Modeling• System Design Details
• Performance Study
• Future Work
MotivationResponse time: time needed to produce the first page as output
Big advantage for the overlapping case ('1')
Staged-DB IC-65 Advances in Data Management Systems 4
QueryPARSER
OPTIMIZER
EXECUTION
Answer
Querytree
Queryplan
Data
catalogs and
statisticsoperators
Query Lifetime in DBMS
EXECUTION(Disk-IO) : 90% OF TIME
Staged-DB IC-65 Advances in Data Management Systems 5
DB Paradigm So Far..
• Query Query Execution Plan (Tree of Operators)• Multiple Queries
– Each query handled by a DIFFERENT THREAD• No cross communication/sharing across threads • Sharing Opportunity is missed
DBMS
thread pool
xno
coordination
D
CD
C
One Query Multiple Operators
Staged-DB IC-65 Advances in Data Management Systems 6
Staged-DB Paradigm
• DB is remodeled as various stages
• Stage
– “Common execution logic” grouped into a stage
– Each operator in QEP can be seen as a stage
• Query passed through all the needed stages to get an output
• Common Data needs Detected by the Stage
DBMS
thread pool
D
CD
C
StagedDB
One Operator Multiple queries
Staged-DB IC-65 Advances in Data Management Systems 7
Staged Database Systems
• DB Stages ; Execution Stage microEngine• Each Stage has a queue, Also each microEngine has a request queue.
DBMS
queries
Stage 3
Stage 2
Stage 1
StagedDB
queries
Conventional
High concurrency locality across requests
Staged-DB IC-65 Advances in Data Management Systems 8
Scheduling In Staged-DB
• Scheduling at Different levels– Stages (Parser, Optimizer, Execution)– Across MicroEngines (Execution Engine has
SCAN,JOIN etc micro-engines) – Within MicroEngine
• We Consider only scheduling “across microEngines”
• Scheduling Policies:– Round-Robin– Heavy Load First– Light Load First
Staged-DB IC-65 Advances in Data Management Systems 9
Detailed System Design
• Based on Discrete Event Simulation technique• All the computation, data needs, dependencies
are modeled using events• System components
– Global System Queue– Dispatcher– Operator (or) Engine – Global Scheduler– Main Memory– Overlap Detector
Staged-DB IC-65 Advances in Data Management Systems 10
QueryArrival
Dispatcher
Scheduler
Disk-Fetch
EngineInsert
EngineExec-Begin
EngineExec-EndMemory
Global System Queue
event
eventId
componentId
functionId
firingTime
packet
Staged-DB IC-65 Advances in Data Management Systems 11
Engine
EngineInsert
EngineExecution Begin
EngineExecution End
Input Packet Queue
Packet format
queryId list
queryPlans
pageId
contextInfo
Request packet from parent node/ dispatcher
Call Overlap detector
Insert packet
Pick packet from Q
Send packet to
Child OR execute and produce output
Insert event into
Event queue for the scheduler
Staged-DB IC-65 Advances in Data Management Systems 12
Engines
• Join• Sort• Aggregation• Scan• Wait and Scan• Index Scan
Staged-DB IC-65 Advances in Data Management Systems 13
Overlap detection
• With memory• With input queue• Two types
– Linear– Spike
Staged-DB IC-65 Advances in Data Management Systems 14
Memory Manager
• Pinning and unpinning• Put()• pageExists()• consumePage()
Staged-DB IC-65 Advances in Data Management Systems 15
Performance study
• 5 queries• 5 runs• Uniform arrival rate
Effect of OverlappingResponse time: time needed to produce the first page as output
Big advantage for the overlapping case ('1')
Effect of OverlappingMemory consumption: max # of pages consumed in memory during the life time of the query
Higher memory consumption with Overlapping !
Effect of OverlappingThroughput: # of queries completed in a unit of time
Clear advantage with Overlap detection !
Comparing scheduling policiesMean response time
Round Robin seems to perform a little better
Comparing scheduling policiesMemory consumption
No differences !
Staged-DB IC-65 Advances in Data Management Systems 21
Future Work
• Few more interesting global scheduling policies are possible.
• The system did not consider a local scheduling policy to pick one packet among many in the input packet queue, for processing next. It picks the fist packet in the queue at the moment.
• Regarding implementation, experimentation should be done with more Engines and a bench mark style input queries.