DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood...

22
DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept. Madison, WI Presented by Derwin Halim

Transcript of DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood...

Page 1: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

DBMSs On A Modern Processor:Where Does Time Go?

byA. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood

University of Wisconsin-Madison Computer Science Dept.Madison, WI

Presented by

Derwin Halim

Page 2: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Agenda

Database and DBMS

Motivation for DBMS performance study

Proposed DBMS performance study

Processor model

Query execution time breakdown

Database workload

Experimental setup and results

Conclusion

Page 3: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Database and DBMS

Database is a collection of data, typically describing the activities of one or more related organizations: entities and relationships

DBMS (Database Management System) is a software designed to assist in maintaining and utilizing large collections of data

Page 4: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Motivation for DBMS Performance Study

DBMSs are becoming compute and memory boundModern processors do not improve database system performance to the same extent as scientific workloadsContrasting commercial DBMSs and identifying common characteristics are difficultUrgent need to evaluate and understand the processor and memory behavior of commercial DBMSs on existing hardware platform

Page 5: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Proposed DBMS Performance Study

Analyze the execution time breakdown of multiple different commercial DBMSs on the same hardware platformUse workload consists of simple queries on a memory resident databaseIsolate basic operations and identify common trends across the DBMSsIdentify and analyze bottlenecks and provide solutions

Page 6: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Processor Model:Basic Pipeline Operation

Page 7: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Processor Model:Handling Pipeline Stall

Non-blocking cache

Out-of-order execution

Speculative execution with branch prediction

Page 8: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Query Execution Time Breakdown

TQ = TC + TM + TB + TR – TOVL

Page 9: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Database Workload

Single-table range selections and two table equijoins over a memory resident database, running a single command streamEliminates dynamic and random parametersIsolate basic operations: sequential access and index selectionAllows examination of the processor and memory behavior without I/O interference

Page 10: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Database Workload

Table:

create table R (a1 integer not null,

a2 integer not null,

a3 integer not null,

<rest of field>)

Sequential range selection:

select avg(a3)

from R

where a2 < Hi and a2 > Lo

Page 11: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Database Workload

Indexed range selection:construct non-clustered index on R.a2 then resubmitted the range selectionSequential join:select avg(R.a3)from R, Swhere R.a2 = S.a1

40,000 100-byte records in S, each of which joins with 30 records in R

Page 12: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Experimental Setup:Hardware and Software Platform

400MHz PII Xeon/MT Workstation512 MB main memory with 100 MHz system busOut-of-order engine and speculative instruction executionNon-blocking cacheSeparate data and instruction first level cachesUnified second level cache4 commercial DBMSs on Windows NT 4.0 Service Pack 4Event measurement counters and emon

Page 13: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Experimental Setup:PII Xeon Cache Characteristics

Page 14: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Experimental Setup:Measuring Stall Time Components

Page 15: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Results:Execution Time Breakdown

Processor spends most of the time stalledThe problem will be exacerbated by the ever increasing processor-memory gapBottleneck shifts

Page 16: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Results:Memory Stalls Breakdown

L1 D-cache, L2 I-cache, ITLB stall time are insignificant

Focus on L1 I-cache and L2 D-cache stall time component

Page 17: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Results:L2 D-cache Stall Time

Position of the accessed data in the records and the record size

L2 D-cache miss is much more expensive than L1 D-cache miss

Only gets worse as processor-memory performance gap increases

Larger cache => longer latency

Page 18: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Results:L1 I-cache Stall Time

L1 I-cache miss is difficult to overlap and causes serial bottleneck in the pipelineL1 cache size vs. latencyL1 cache miss increases as data record size increases- Inclusion: L2 cache replacement forces L1 cache replacement- OS interrupt: periodical context switching- Page boundary crossing

Page 19: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Results:Branch Mis-prediction

Serial bottleneck and instruction cache misses40% BTB misses on average => more static predictionL1 I-cache miss follows branch mis-prediction behavior

Page 20: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Results:Resource Stall Time

Dominated by dependency and/or functional unit stalls

Dependency stalls are the most important resource stalls due to low ILP except for System A

FU stalls are caused by contention in the execution unit

Page 21: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Results:Simple Query vs TPC Benchmarks

Simple Query vs TPC-D (DSS):- Similar CPI breakdown- Still dominated by L1 I-cache and L2 D-cache miss

Simple Query vs TPC-C (OLTP):- CPI rate of TPC-C is much higher- Resource stalls are higher- Dominated by L2 D- and I-cache miss

Page 22: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Conclusion

Memory stall is a serious performance bottleneck

Focus on L1 I-cache and L2 D-cache misses

Improvements should address all of the stall components due to possibility of bottleneck shifts

Simple query offers methodological advantage

TPC-D has similar execution time breakdown, while TPC-C incur more second level cache and resource stalls