Erhan Erdinç Pehlivan

Erhan Erdinç Pehlivan

Computer Architecture Support for Database Applications

Outline Introduction Methodology of the Experiment Analysis of OLTP workloads Analysis of DSS workloads Conclusion

Introduction Today Database workloads alone motivate the sale of vast quantities of

symmetric multiprocessor (SMP) machines,

Introduction Unfortunately, due to some challenges, commercial

applications are often ignored in preference to technical benchmarks, such as SPEC(Standard Performance Evaluation Corporation)

Reasons Complex standardized benchmarks. Large hardware requirements for full scale. Numerous configuration parameters. Lack of useful proprietary information.

What is SMP method of work management that treats all

processors equally threads that can run concurrently on any

available processor improves the total throughput of the system requires applications that can take

advantage of multi-threaded parallelism

SMP ARCHITECTURE

SMP(Continued) Advantages of SMP

High performance Simplicity to program Easier load balancing

Disadvantages of SMP Low availability Low scalability

Database Workloads OLTP(Online transaction

processing) Ex : Airline reservation systems

DSS(Decision Support Systems) Ex: Datawarehouse systems

Characteristics of OLTP and DSS OLTP

uses short, moderately complex queries that read and/or modify a relatively small portion of the overall database.

have a high degree of multiprogramming,

DSS typically long-running, moderately to very complex queries,

that scan large portions of the database in a read-mostly fashion. The multiprogramming level in DSS systems is typically

much lower than that of OLTP systems.

Motivation

Since SPEC evaluations don’t hold for DBMS, architectural behavior of two standard database workloads will be investigated in terms of

cycles per instruction (CPI) decomposition, cache miss rates, branch behavior. superscalarness, out-of-order execution

Methodology : Experimental Platform a commodity four-processor Intel-based SMP server running Windows NT is chosen.

IO System Configurations(OLTP)

IO System Configurations(DSS)

Software Architecture(OLTP) Transaction Processing Council’s TPC-C

benchmark

Software Architecture(OLTP)

Software Architecture(DSS) Transaction Processing Council’s TPC-D

benchmark the activity of a wholesale supplier in doing complex

business analysis. analysis: pricing and promotions, market share

study,shipping management, supply and Demand management, profit and revenue management and customer satisfaction study.

17 read-only queries and 2 update queries,

Software Architecture(DSS)

Pentium Pro Processor Architecture

Potential sources of stalls misses to the L1 instruction cache a branch misprediction the instruction mix of the workload the out-of-order execution engine

Measurement Methodology NT performance monitor Pentium Pro hardware counters. Intel tool called emon

Analysis of OLTP Workloads OLTP does short, moderately complex transactions small, random I/O operations large number of concurrent users, a high degree of

multiprogramming. database implements locking,logging The combination of these tasks :

Large instruction working set Larger data footprint

Experimental Results: CPI

Experimental Results: Memory System Behavior

How do OLTP cache miss rates vary with L2 cache size?

Experimental Results: Memory System What effects do larger caches have on OLTP

throughput and stall cycles?

How useful is superscalar issue and retire for OLTP?

Experimental Results: Processor Issues

Experimental Results: Processor Issues How effective is branch prediction for OLTP?


Is out-of-order execution successful at hiding stalls for OLTP?

Experimental Results: Multiprocessor Scaling Issues How well does OLTP performance scale as the

number of processors increases?

Experimental Results: Multiprocessor Scaling Issues How do OLTP CPI components change

as the number of processors is scaled?

Experimental Results: Multiprocessor Scaling Issues

How prevalent are cache misses to dirty data in other processors’ caches for OLTP?

Experimental Results: Multiprocessor Scaling Issues Is the four-state (MESI) invalidation-based cache

coherence protocol worthwhile for OLTP?

Experimental Results: Multiprocessor Scaling Issues How does OLTP memory system performance scale with

increasing cachesizes and increasing processor count?

Analysis of Decision SupportWorkloads DSS queries are typically long-running, moderately to

very complex queries, Scan large portions of the database in a read-mostly

fashion. Large sequential disk I/O read operations. The multiprogramming level in DSS systems is typically

lower than that of OLTP systems.

Dss Workload

How do DSS cache miss rates vary with L2 cache size?

Experimental Results:Memory System Behaviour

Experimental Results:Memory System Behaviour What impact do larger L2 caches have on DSS

database performance and stall cycles?

Experimental Results:Memory System Behaviour How prevalent are cache misses to dirty data in other

processors’ caches in DSS?

Experimental Results:Memory System Behaviour Is the four-state (MESI) invalidation-based cache coherence

protocol worthwhile for DSS?

Experimental Results:Memory System Behaviour How does DSS memory system performance scale

with increasing cache sizes?


How useful is superscalar issue and retire for DSS?

BEHAVES LIKE OLTP

Experimental Results: Processor Issues How effective is branch prediction for DSS?

Experimental Results: Processor Issues Is out-of-order execution successful at hiding stalls for

DSS?

Conclusions for OLTP out-of-order execution is only somewhat effective for this database

workload. increased superscalar width for the out-of-order engine may be

helpful. Innovation needed in branch prediction algorithms and hardware

structures to better support database workloads. caches are effective at reducing the processor traffic to memory Three-state (MSI) cache coherence protocol would be better the amount of time when the memory system is unavailable decreases

with larger caches, increases with # of processors

Conclusions for DSS out-of-order execution provides potentially more benefit

for DSS than OLTP DSS performance is less sensitive to L2 cache size than

OLTP performance. Existing branch prediction schemes are more effective for

this workload. Increasing the micro-operation retire width in the Pentium

Pro’s out-of-order RISC core may provide performance improvements

Dirty misses are less prevalent for DSS than OLTP.

Erhan Erdinç Pehlivan

Documents

Transcript of Erhan Erdinç Pehlivan