(C) 2003 Mulitfacet ProjectUniversity of Wisconsin-Madison Evaluating a $2M Commercial Server on a...

33
(C) 2003 Mulitfacet Project University of Wisconsin-Madison Evaluating a $2M Commercial Server on a $2K PC and Related Challenges Mark D. Hill Multifacet Project (www.cs.wisc.edu/multifacet) Computer Sciences Department University of Wisconsin—Madison February 2003

Transcript of (C) 2003 Mulitfacet ProjectUniversity of Wisconsin-Madison Evaluating a $2M Commercial Server on a...

(C) 2003 Mulitfacet Project University of Wisconsin-Madison

Evaluating a $2M Commercial Server on a $2K PC

and Related Challenges

Mark D. Hill

Multifacet Project (www.cs.wisc.edu/multifacet)

Computer Sciences Department

University of Wisconsin—Madison

February 2003

Wisconsin Multifacet Project2 Methods

• Commercial Servers– Processors, memory, disks $2M– Run large multithreaded transaction-oriented workloads– Use commercial applications on commercial OS

• To Simulate on $2K PC– Scale & tune workloads– Manage simulation complexity– Cope with workload variability

• NSF Challenges in Computer Architecture Evaluation

Context & Summary

Keep L2 miss rates, etc.Separate timing & functionUse randomness & statistics

Advice researchers, program committees, & funders basically “know," but often forget to heed

Wisconsin Multifacet Project3 Methods

Multifacet: Commercial Server Design

• Wisconsin Multifacet Project– Directed by Mark D. Hill & David A. Wood– Sponsors: NSF, WI, IBM, Intel, & Sun– Current Contributors: Alaa Alameldeen, Brad Beckman,

Milo Martin, Mike Marty, Kevin Moore, & Min Xu

• Commercial Server Availability– SafetyNet tolerates some transient faults [ISCA 2002]

• Commercial Server Software Complexity– Flight Data Recorder aids debugging of multithreaded programs

[ISCA 2003]

• Commercial Server Design Complexity– Token Coherence eases coherence protocol design

[IEEE Micro Top Picks, Nov-Dec 2003]

Wisconsin Multifacet Project4 Methods

Outline

• Workload & Simulation Methods– Select, scale, & tune workloads– Transition workload to simulator– Specify & test the proposed design– Evaluate design with simple/detailed processor models

• Separate Timing & Functional Simulation

• Cope with Workload Variability

• NSF Challenges in Computer Architecture Evaluation

Wisconsin Multifacet Project5 Methods

Multifacet Simulation Overview

• Virtutech Simics (www.virtutech.com)

• Rest is Multifacet software

Full System FunctionalSimulator (Simics)

Pseudo-RandomProtocol Checker

Memory TimingSimulator (Ruby)

Processor TimingSimulator (Opal)

Commercial Server(Sun Fire V880)

Scaled WorkloadsFull Workloads

Memory ProtocolGenerator (SLICC)

Timing SimulatorProtocol Development

Workload Development

Wisconsin Multifacet Project6 Methods

Select Important Workloads

• Online Transaction Processing: DB2 w/ TPC-C-like• Java Server Workload: SPECjbb• Static web content serving: Apache• Dynamic web content serving: Slashcode• Java-based Middleware

Full Workloads

Wisconsin Multifacet Project7 Methods

Setup & Tune Workloads (on real hardware)

• Tune workload, OS parameters• Measure transaction rate, speed-up, miss rates, I/O• Compare to published results

Commercial Server(Sun Fire V880)

Full Workloads

Wisconsin Multifacet Project8 Methods

Scale & Re-tune Workloads

• Scale-down for PC memory limits• Retaining similar behavior (e.g., L2 cache miss rate)• Re-tune to achieve higher transaction rates

(OLTP: raw disk, multiple disks, more users, etc.)

Commercial Server(Sun Fire V880)

Scaled Workloads

Wisconsin Multifacet Project9 Methods

Transition Workloads to Simulation

• Create disk dumps of tuned workloads• In simulator: Boot OS, start, & warm application• Create Simics checkpoint (snapshot)

Full System FunctionalSimulator (Simics)

Scaled Workloads

Wisconsin Multifacet Project10 Methods

Specify Proposed Computer Design

• Coherence Protocol (control tables: states X events)• Cache Hierarchy (parameters & queues)• Interconnect (switches & queues)• Processor (later)

Memory TimingSimulator (Ruby)

Memory ProtocolGenerator (SLICC)

Wisconsin Multifacet Project11 Methods

Test Proposed Computer Design

• Randomly select write action & later read check• Massive false-sharing for interaction• Perverse network stresses design• Transient error & deadlock detection• Sound but not complete

Memory TimingSimulator (Ruby)

Pseudo-RandomProtocol Checker

Wisconsin Multifacet Project12 Methods

Simulate with Simple Blocking Processor

• Warm-up caches or sometimes sufficient (SafetyNet)• Run for fixed number of transactions

– Some transaction partially done at start– Other transactions partially done at end

• Cope with workload variability (later)

Full System FunctionalSimulator (Simics)

Memory TimingSimulator (Ruby)

Scaled Workloads

Wisconsin Multifacet Project13 Methods

Simulate with Detailed Processor

• Accurate (future) timing & (current) function• Simulation complexity decoupled (discussed soon)• Same transaction methodology

& work variability issues

Full System FunctionalSimulator (Simics)

Memory TimingSimulator (Ruby)

Processor TimingSimulator (Opal)

Scaled Workloads

Wisconsin Multifacet Project14 Methods

Simulation Infrastructure & Workload Process

• Select important workloads: run, tune, scale, & re-tune• Specify system & pseudo-randomly test• Create warm workload checkpoint• Simulate with simple or detailed processor• Fixed #transactions, manage simulation complexity (next),

cope with workload variability (next next)

Full System FunctionalSimulator (Simics)

Memory TimingSimulator (Ruby)

Processor TimingSimulator (Opal)

Commercial Server(Sun Fire V880)

Scaled WorkloadsFull Workloads

Pseudo-RandomProtocol Checker

Memory ProtocolGenerator (SLICC)

Wisconsin Multifacet Project15 Methods

Outline

• Workload & Simulation Methods

• Separate Timing & Functional Simulation– Simulation Challenges & Complexity– Timing-First Simulation

• Cope with Workload Variability

• NSF Challenges in Computer Architecture Evaluation

Wisconsin Multifacet Project16 Methods

Simulating Function Getting Harder!

(Simulated) Target System

Target Application

SPEC Benchmarks

Kernels

Database

Operating System

Web Server

RAM

Processor

PCI Bus

Ethernet Controller

Fiber Channel

Controller

Graphics Card

SCSI Controller

CD-ROM

SCSI Disk

SCSI Disk…

DMA Controller

TerminalI/O MMU Controller

IRQ Controller

Status Registers

Serial PortMMUReal Time

Clock

Wisconsin Multifacet Project17 Methods

Simulating Timing Getting Harder!

• Micro-architecture complexity– Multiple “in-flight” instructions– Speculative execution– Out-of-order execution

• Thread-level parallelism– Hardware Multi-threading– Traditional Multi-processing

Wisconsin Multifacet Project18 Methods

Managing Simulator Complexity

Functional Simulator

Timing Simulator Functional-First (Trace-driven)

- Timing feedback

+ Timing feedback- Tight Coupling- Performance?

Timing and FunctionalSimulator Integrated (SimOS)

- Complex

Timing-DirectedFunctional Simulator

Timing Simulator

Complete TimingNo? Function

No TimingComplete Function

Timing-First (Multifacet)Functional Simulator

Timing Simulator

Complete TimingPartial Function

No TimingComplete Function

Wisconsin Multifacet Project19 Methods

Timing-First Operation

Timing Simulator

Functional Simulator

CPUSystem

RAMNet

wor

k

addload

Cache

CPU

Execute Commit

Reload

Verify

• Timing Simulator runs speculatively ahead• On commit, calls Functional Simulator to verify• Reload Timing Simulator state if necessary,

e.g., interrupt, unimplemented instruction

Wisconsin Multifacet Project20 Methods

Timing-First Discussion

• Supports speculative multi-processor timing models• Leverages existing simulators• Rapid development time (e.g., immediate checks)• Has low simulation overhead (18% uniprocessor)• Introduces relatively little performance error (< 3%)• BUT duplicates some code & function

Timing-First SimulationFunctional Simulator

Timing Simulator

Complete TimingPartial Function

No TimingComplete Function

Wisconsin Multifacet Project21 Methods

Outline

• Workload & Simulation Methods

• Separate Timing & Functional Simulation

• Cope with Workload Variability– Variability in Multithreaded Workloads– Coping in Simulation

• NSF Challenges in Computer Architecture Evaluation

Wisconsin Multifacet Project22 Methods

What is Happening Here?

OLTP

Wisconsin Multifacet Project23 Methods

What is Happening Here?

• How can slower memory lead to faster workload?

• Answer: Multithreaded workload takes different path– Different lock race outcomes– Different scheduling decisions

• (1) Does this happen for real hardware?

• (2) If so, what should we do about it?

Wisconsin Multifacet Project24 Methods

One Second Intervals (on real hardware)

OLTP

Wisconsin Multifacet Project25 Methods

60 Second Intervals (on real hardware)

16-day simulation

OLTP

Wisconsin Multifacet Project26 Methods

Coping with Workload Variability

• Running (simulating) long enough not appealing

• Need to separate coincidental & real effects• Standard statistics on real hardware

– Variation within base system runs

vs. variation between base & enhanced system runs– But deterministic simulation has no “within” variation

• Solution with deterministic simulation– Add pseudo-random delay on L2 misses– Simulate base (enhanced) system many times– Use simple or complex statistics

Wisconsin Multifacet Project27 Methods

Confidence Interval Example

• Estimate #runs to getnon-overlapping confidence intervals

ROB

Wisconsin Multifacet Project28 Methods

Outline

• Workload & Simulation Methods

• Separate Timing & Functional Simulation

• Cope with Workload Variability

• NSF Challenges in Computer Architecture EvaluationAdvice researchers, program committees, & funders

basically “know," but often forget to heed

Wisconsin Multifacet Project29 Methods

NSF Challenges in Computer Architecture Evaluation

• Dec 2001 NSF Computer Systems Architecture Workshop– Report in IEEE Computer, Aug 2003

– By Kevin Skadon, Margaret Martonosi,David August,Mark Hill, David Lilja, & Vijay Pai

• Simulation Frameworks– P (Problem): Need more modularity, portability, & reuse

– R (Recommendation): More simulations frameworks,e.g., ASIM & Liberty

• Benchmarking– P: Benchmarks for too few domains

– R: Reward benchmark development & characterization; consider micro- and synthetic benchmarks

Wisconsin Multifacet Project30 Methods

NSF Challenges in Computer Architecture Evaluation

• Abstractions & Methodology– P: Believe simulation too much; other methods insufficiently

• 1985 ISCA: 30% simulation & 30% modeling

• 2001 ISCA: 90% simulation & 0% modeling

– R: Push analytic models for insight, cross validation, & far—reaching research

• Metrics, Accuracy, & Validation– P: Too dependent on relative & aggregate metrics– R: More metrics & statistical methods, especially when

balancing multiple dimensions (e.g., performance & power)

Wisconsin Multifacet Project31 Methods

Talk Summary

• Simulations of $2M Commercial Servers must– Complete in reasonable time (on $2K PCs)

– Handle OS, devices, & multithreaded hardware

– Cope with variability of multithreaded software

• Multifacet– Scale & tune transactional workloads

– Separate timing & functional simulation

– Cope w/ workload variability via randomness & statistics

• References (www.cs.wisc.edu/multifacet/papers)– Simulating a $2M Commercial Server on a $2K PC [Computer 2/03]– Full-System Timing-First Simulation [Sigmetrics 02]– Variability in Architectural Simulations … [HPCA 03]

• NSF Panel– Challenges in Computer Architecture Evaluation [Computer 8/03]

Wisconsin Multifacet Project32 Methods

Backup Slides

Wisconsin Multifacet Project33 Methods

Other Multifacet Methods Work

• Specifying & Verifying Coherence Protocols– [SPAA98], [HPCA99], [SPAA99], & [TPDS02]

• Workload Analysis & Improvement– Database systems [VLDB99] & [VLDB01]

– Pointer-based [PLDI99] & [Computer00]

– Middleware [HPCA03]

• Modeling & Simulation– Commercial workloads [Computer02] & [HPCA03]

– Decoupling timing/functional simulation [Sigmetrics02]

– Simulation generation [PLDI01]

– Analytic modeling [Sigmetrics00] & [TPDS TBA]

– Micro-architectural slack [ISCA02]

– Interaction costs [Micro02]