PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA...

25
PerfMon redux: analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy Johns Hopkins University PerfMon redux: analyzing a CUDA application with the Windows Performance Monitor

Transcript of PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA...

Page 1: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

PerfMon redux: analyzing a CUDA application with the Windows

S6287

Richard Wilton

Department of Physics and Astronomy

Johns Hopkins University

PerfMon redux: analyzing a CUDA application with the Windows

Performance Monitor

Page 2: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

S6287: Analyzing a CUDA

application with PerfMon What to monitor and why

� What is there to monitor?

� Speed (duration)

� Resource utilization

� Interactions between resources� Interactions between resources

� Why bother?

� Prove that things are operating as expected

� Make things run faster

� Find performance bottlenecks

� Identify resource contention

Page 3: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

S6287: Analyzing a CUDA

application with PerfMon Setup for performance monitoring

� Tools you need

� Microsoft Windows

� NVidia GPU and CUDA toolkit (NVML)

� Microsoft Visual Studio (PerfLib v2)� Microsoft Visual Studio (PerfLib v2)

� Monitoring setup

� Target machine with target hardware

� Application “release” build

� Choose your performance counters

Page 4: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

Choosing performance countersS6287: Analyzing a CUDA

application with PerfMon

Counters in the GPU group:

• Clock speed (MHz): memory

• Clock speed (MHz): SM

• Fan speed (% maximum)

• Global memory allocated (bytes)

• Global memory allocated (percent)• Global memory allocated (percent)

• Global memory free (bytes)

• Global memory read/write activity (%)

• GPU compute activity (%)

• GPU temperature (°C)

• GPU total power draw (watts)

• PCIe receive throughput (KB/s)

• PCIe transmit throughput (KB/s)

Page 5: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

Choosing performance countersS6287: Analyzing a CUDA

application with PerfMon

Monitoring everything at once

is probably not a good idea.

Page 6: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

Application pipeline (circa 2013)S6287: Analyzing a CUDA

application with PerfMon

� CPU compute activity

� GPU (CUDA) compute activity

Page 7: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

GPU activityS6287: Analyzing a CUDA

application with PerfMon

Device-related counters – device 0, 1, 2

� GPU compute activity %

� Global memory read/write activity %

Host-related counters

� CPU activity %� CPU activity %

� Host memory allocation

Page 8: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

GPU activityS6287: Analyzing a CUDA

application with PerfMon

Device-related counters – device 0

� GPU compute activity %

� Global memory read/write activity %

Host-related counters

� CPU activity %� CPU activity %

� Host memory allocation

Page 9: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

Sampling � JaggednessS6287: Analyzing a CUDA

application with PerfMon

Device-related counters – device 0

� GPU compute activity %

� Global memory read/write activity %

Sampled at 1-second intervalsSampled at 1-second intervals

Samples are “snapshots” (not averaged)

Page 10: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

Concurrency among multiple GPUsS6287: Analyzing a CUDA

application with PerfMon

Device-related counters – device 0, 1, 2

� GPU compute activity %

� Global memory read/write activity %

Host-related counters

� CPU activity %� CPU activity %

� Host memory allocation

Page 11: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

Concurrency among multiple GPUsS6287: Analyzing a CUDA

application with PerfMon

Device-related counters – device 0

� GPU compute activity %

� Global memory read/write activity %

Host-related counters

� CPU activity %� CPU activity %

� Host memory allocation

Page 12: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

Concurrency among multiple GPUsS6287: Analyzing a CUDA

application with PerfMon

Device-related counters – device 1

� GPU compute activity %

� Global memory read/write activity %

Host-related counters

� CPU activity %� CPU activity %

� Host memory allocation

Page 13: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

Concurrency among multiple GPUsS6287: Analyzing a CUDA

application with PerfMon

Device-related counters – device 2

� GPU compute activity %

� Global memory read/write activity %

Host-related counters

� CPU activity %� CPU activity %

� Host memory allocation

Page 14: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

Starving for CPU cyclesS6287: Analyzing a CUDA

application with PerfMon

Device-related counters – device 0, 1, 2

� GPU compute activity %

� Global memory read/write activity %

Host-related counters

� CPU activity %� CPU activity %

� Host memory allocation

Page 15: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

Starving for CPU cyclesS6287: Analyzing a CUDA

application with PerfMon

Device-related counters – device 0

� GPU compute activity %

� Global memory read/write activity %

Host-related counters

� CPU activity %� CPU activity %

� Host memory allocation

Page 16: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

Starving for CPU cyclesS6287: Analyzing a CUDA

application with PerfMon

Device-related counters – device 0, 1, 2

� GPU compute activity %

� Global memory read/write activity %

Host-related counters

� CPU activity %� CPU activity %

� Host memory allocation

Page 17: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

Starving for CPU cyclesS6287: Analyzing a CUDA

application with PerfMon

Device-related counters – device 0

� GPU compute activity %

� Global memory read/write activity %

Host-related counters

� CPU activity %� CPU activity %

� Host memory allocation

Page 18: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

Consuming a resourceS6287: Analyzing a CUDA

application with PerfMon

Device-related counters – device 2

� GPU compute activity %

� Global memory allocated (bytes)

Host-related counters

� CPU activity %

(image TBD)

� CPU activity %

Page 19: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

GPU mysteryS6287: Analyzing a CUDA

application with PerfMon

Device-related counters – device 0, 1

� GPU compute activity %

� Global memory read/write activity %

� GPU temperature (°C)

� GPU total power draw (watts)� GPU total power draw (watts)

Host-related counters

� CPU activity %

� Host memory allocation

Page 20: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

GPU mysteryS6287: Analyzing a CUDA

application with PerfMon

Device-related counters – device 0, 1

� GPU compute activity %

� Global memory read/write activity %

� GPU temperature (°C)

� GPU total power draw (watts)� GPU total power draw (watts)

Host-related counters

� CPU activity %

� Host memory allocation

Page 21: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

GPU mysteryS6287: Analyzing a CUDA

application with PerfMon

Device-related counters – device 0, 1

� GPU compute activity %

� Global memory read/write activity %

� GPU temperature (°C)

� GPU total power draw (watts)� GPU total power draw (watts)

Host-related counters

� CPU activity %

� Host memory allocation

Page 22: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

GPU mysteryS6287: Analyzing a CUDA

application with PerfMon

Device-related counters – device 0, 1

� GPU compute activity %

� Global memory read/write activity %

� GPU temperature (°C)

� GPU total power draw (watts)� GPU total power draw (watts)

Host-related counters

� CPU activity %

� Host memory allocation

Page 23: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

GPU mysteryS6287: Analyzing a CUDA

application with PerfMon

Device-related counters – device 0, 1

� GPU compute activity %

� Global memory read/write activity %

� GPU temperature (°C)

� GPU total power draw (watts)� GPU total power draw (watts)

Host-related counters

� CPU activity %

� Host memory allocation

Page 24: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

S6287: Analyzing a CUDA

application with PerfMon PerfMon and CUDA

� What is there to monitor?

� Speed (duration)

� Resource utilization

� Interactions between resources� Interactions between resources

� Why bother?

� Prove that things are operating as expected

� Make things run faster

� Find performance bottlenecks

� Identify resource contention

Page 25: PerfMon redux : analyzing a CUDA application with the ...€¦ · PerfMon redux : analyzing a CUDA application with the Windows S6287 Richard Wilton Department of Physics and Astronomy

S6287: Analyzing a CUDA application with PerfMon

Questions / Comments