Warp-Aware Trace Scheduling for GPUS James Jablin (Brown) Thomas Jablin (UIUC) Onur Mutlu (CMU)...
65
Warp-Aware Trace Scheduling for GPUS James Jablin (Brown) Thomas Jablin (UIUC) Onur Mutlu (CMU) Maurice Herlihy (Brown)
-
Upload
dwain-thornton -
Category
Documents
-
view
223 -
download
1
Transcript of Warp-Aware Trace Scheduling for GPUS James Jablin (Brown) Thomas Jablin (UIUC) Onur Mutlu (CMU)...
- Slide 1
- Warp-Aware Trace Scheduling for GPUS James Jablin (Brown) Thomas Jablin (UIUC) Onur Mutlu (CMU) Maurice Herlihy (Brown)
- Slide 2
- Historical Trends in GFLOPS: CPUs vs. GPUs Reproduced from NVIDIA C Programming Guide (Version 5.0)
- Slide 3
- Performance Pitfalls Control flow can negatively affect performance.
- Slide 4
- Performance Pitfalls Pipeline Stall execution delay in an instruction pipeline to resolve a dependency
- Slide 5
- Hardware: CPU versus GPU Reproduced from NVIDIA C Programming Guide (Version 5.0)
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Performance Pitfalls Pipeline Stall execution delay in an instruction pipeline to resolve a dependency
- Slide 19
- Performance Pitfalls Pipeline Stall execution delay in an instruction pipeline to resolve a dependency Warp Divergence threads within a warp take different paths and the different execution paths are serialized
- Slide 20
- Warp Divergence Example
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Warp-Aware Trace Scheduling Schedule instructions across basic block boundaries to expose additional ILP
- Slide 29
- Warp-Aware Trace Scheduling Schedule instructions across basic block boundaries to expose additional ILP while managing and optimizing warp divergence.
- Slide 30
- Origins: Microcode Trace Scheduling generalizing local and disparate vertical-to- horizontal microcode compaction StepDescription
- Slide 31
- Origins: Microcode Trace Scheduling generalizing local and disparate vertical-to- horizontal microcode compaction StepDescription 1. Trace Selection
- Slide 32
- Origins: Microcode Trace Scheduling generalizing local and disparate vertical-to- horizontal microcode compaction StepDescription 1. Trace Selection 2. Trace Formation
- Slide 33
- Origins: Microcode Trace Scheduling generalizing local and disparate vertical-to- horizontal microcode compaction StepDescription 1. Trace Selection 2. Trace Formation 3. Local Scheduling
- Slide 34
- Origins: Microcode Trace Scheduling generalizing local and disparate vertical-to- horizontal microcode compaction StepDescription 1. Trace SelectionPartition basic blocks into regions 2. Trace Formation Facilitate local scheduling, potentially adding nodes and edges 3. Local SchedulingSchedule instructions within each region
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Backup Slides
- Slide 52
- Slide 53
- GPU Programming Model
- Slide 54
- Slide 55
- Slide 56
- Characterizing the Grid
- Slide 57
- Characterizing the Grid, Block
- Slide 58
- Slide 59
- Characterizing the Grid, Block, and Thread
- Slide 60
- Slide 61
- Warp Divergence Examples
- Slide 62
- Slide 63
- Slide 64
- Slide 65