CS 194 Project Results

21
An Investigation of Xen and PTLsim for Exploring Latency Constraints of Co- Processing Units Grant Jenks UCLA

description

CS 194 Project Results. An Investigation of Xen and PTLsim for Exploring Latency Constraints of Co-Processing Units Grant Jenks UCLA. Outline. The Problem Theme The Solution Vtune PTLsim/Xen The Results Research Platform Vtune Data PTLsim Data. The Problem. - PowerPoint PPT Presentation

Transcript of CS 194 Project Results

Page 1: CS 194 Project Results

An Investigation of Xen and PTLsim for Exploring Latency Constraints of Co-Processing Units

Grant JenksUCLA

Page 2: CS 194 Project Results

The Problem◦ Theme

The Solution◦ Vtune◦ PTLsim/Xen

The Results◦ Research Platform◦ Vtune Data◦ PTLsim Data

Page 3: CS 194 Project Results

Original Problem Statement◦ What impact do co-processing units have on

system performance? Motivation

◦ Extending the useful life of current systems.◦ Enhancing the performance of future systems.

How well do PTLsim and Xen answer this question?

Theme◦ Tradeoffs

Page 4: CS 194 Project Results

Step 1◦ Intel’s Vtune Software

Application specific performance statistics on Microsoft Windows computer

Step 2◦ PTLsim/Xen

Allows for custom machine models Cycle accurate statistics Most accurate model for actual physical hardware

Page 5: CS 194 Project Results

Evaluate need for more processing power. Use Intel’s Vtune software to collect runtime

profiles of a system under various loads. Varying Loads

◦ Setup 1 Hyperthreading on, demanding game

◦ Setup 2 Hyperthreading off, demanding game

◦ Setup 3 movie playing, networked file transfer, anti-virus scan

◦ Setup 4 movie playing, file compression

◦ Setup 5 demanding game, music playing, anti-virus scan

Page 6: CS 194 Project Results

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Processor activity is high

Processor utilization is low

Too many threads ready to run

Processor activity is high

Processor utilization is high

Too many threads ready to run

Processor activity is high

Processor utilization is low

Too many threads ready to run

Processor utilization is low

Too many threads ready to run

Processor activity is high

Processor utilization is low

Too many threads ready to run

Setu

p 1:

Setu

p 2:

Setu

p 3:

Setu

p 4:

Setu

p 5:

Part 1: Vtune Test Results

Page 7: CS 194 Project Results

Methodology◦ Reboot the computer◦ Start the programs to be evaluated◦ Start the Vtune monitor◦ Activate the programs◦ Run the Vtune monitor for one minute

Limitations◦ Runs in same space as measured programs

Inevitably affects the environment◦ Repeated runs are not identical

5 trials were run for each setup

Page 8: CS 194 Project Results

PTLsim/Xen Capabilities◦ Simulate any processor model with cycle accurate

measurements PTLsim/Xen layers

◦ Special machine hardware◦ Patched Xen virtualization layer◦ Modified host operating system kernel◦ Modified guest operating system

Page 9: CS 194 Project Results

Requirements◦ 64-bit◦ Virtualization technology support

Complications◦ Processor too new◦ Raid controller◦ Two Ethernet ports

Page 10: CS 194 Project Results

Xen is a virtualization platform for guest operating sytems

Requirements◦ Special C libraries◦ Operating system with Xen 3.0◦ gcc 4.0 to build modifications

Complications◦ Hard to debug

Kernel panic error◦ May silently fail◦ Version patched by PTLsim is not the current

release version

Page 11: CS 194 Project Results

6 operating systems tried◦ OpenSuse 10.2/10.1, Fedora Core 6, Ubuntu 6,

CentOS 5, Xen 3 Finally used OpenSuse 10.2 once raid issues

were overcome Built a custom kernel for the system

◦ 2.6.18 vs. 2.6.20◦ PTLsim patches

Page 12: CS 194 Project Results

Initially tried importing DomU from other sources

Final method imported DomU from a bootable system

Complications◦ Kernel panic error

Use kernel from PTLsim site with no ram disk◦ No networking◦ No graphics

Page 13: CS 194 Project Results

PTLsim in Dom0◦ C files for simulated machine models◦ Can monitor multiple DomU’s simultaneously◦ Very fast◦ No CMP support, only SMT

PTLsim in DomU◦ Completely oblivious◦ No impact as opposed to Vtune

Page 14: CS 194 Project Results

Methodology◦ Start guest domain in paused state and connect

console◦ Use PTLsim to boot guest domain◦ Start benchmark in DomU◦ Start logging in Dom0

Three benchmarks used◦ File compression◦ Random number generation◦ Combination

Page 15: CS 194 Project Results

0.00%

2.00%

4.00%

6.00%

8.00%

10.00%

12.00%

14.00%

16.00%

Branch Misprediction Rate

Data Cache Load Miss Rate

Instruction Cache Miss Rate

tar

smt-1

smt-2

smt-4

smt-6

smt-8

Page 16: CS 194 Project Results

0.00%

2.00%

4.00%

6.00%

8.00%

10.00%

12.00%

Branch Misprediction Rate

Data Cache Load Miss Rate

Instruction Cache Miss Rate

perl

smt-1

smt-2

smt-4

smt-6

smt-8

Page 17: CS 194 Project Results

0.00%

2.00%

4.00%

6.00%

8.00%

10.00%

12.00%

Branch Misprediction Rate

Data Cache Load Miss Rate

Instruction Cache Miss Rate

tar + perl

smt-1

smt-2

smt-4

smt-6

smt-8

Page 18: CS 194 Project Results

Non-determinism drastically affects results Lack of application specific data Setup is challenging Machine modeling is robust but complicated Without network, disk, and graphics

monitoring the simulation environment is incomplete◦ PTLsim/Xen with Vtune?

Page 19: CS 194 Project Results

Future development will eliminate Xen◦ New system uses kernel virtualization which

removes an entire layer Support for CMP systems is new and not

documented

Page 20: CS 194 Project Results

Vtune Data◦ More parallel processing improves system

performance PTLsim/Xen Research Platform

◦ Platform is robust and fully featured◦ Setup is challenging with virtualization

PTLsim Data◦ Too much parallel processing may hinder system

performance◦ More application specific data needed

Page 21: CS 194 Project Results

Consider combining PTLsim/Xen with Vtune to get best of both monitors

Wait for PTLsim to develop more parallel processing support

Questions?