Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008
description
Transcript of Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008
![Page 1: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008](https://reader036.fdocuments.in/reader036/viewer/2022062408/56813b0c550346895da3b322/html5/thumbnails/1.jpg)
Many-Core Programming with GRAMPS
Jeremy SugermanStanford University
September 12, 2008
![Page 2: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008](https://reader036.fdocuments.in/reader036/viewer/2022062408/56813b0c550346895da3b322/html5/thumbnails/2.jpg)
2
Background, Outline Stanford Graphics / Architecture Research
– Collaborators: Kayvon Fatahalian, Solomon Boulos, Kurt Akeley, Pat Hanrahan
To appear in ACM Transactions on Graphics
CPU, GPU trends… and collision? Two research areas:
– HW/SW Interface, Programming Model– Future Graphics API
![Page 3: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008](https://reader036.fdocuments.in/reader036/viewer/2022062408/56813b0c550346895da3b322/html5/thumbnails/3.jpg)
3
Problem Statement Drive efficient development and execution in
many-/multi-core systems. Support homogeneous, heterogeneous cores. Inform future hardware
Status Quo: GPU Pipeline (Good for GL, otherwise hard) CPU (No guidance, fast is hard)
![Page 4: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008](https://reader036.fdocuments.in/reader036/viewer/2022062408/56813b0c550346895da3b322/html5/thumbnails/4.jpg)
4
Software defined graphs Producer-consumer, data-parallelism Initial focus on rendering
GRAMPSInput
FragmentQueue
OutputFragment
Queue
Rasterization Pipeline
Ray Tracing Graph
= Thread Stage= Shader Stage= Fixed-func Stage
= Queue= Stage Output
RayQueue
Ray HitQueue Fragment
Queue
Camera Intersect
Shade FB Blend
Shade FB BlendRasterize
![Page 5: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008](https://reader036.fdocuments.in/reader036/viewer/2022062408/56813b0c550346895da3b322/html5/thumbnails/5.jpg)
5
As a Graphics Evolution Not (too) radical for ‘graphics’ Like fixed → programmable shading
– Pipeline undergoing massive shake up– Diversity of new parameters and use cases
Bigger picture than ‘graphics’– Rendering is more than GL/D3D– Compute is more than rendering– Some ‘GPUs’ are losing their innate pipeline
![Page 6: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008](https://reader036.fdocuments.in/reader036/viewer/2022062408/56813b0c550346895da3b322/html5/thumbnails/6.jpg)
6
As a Compute Evolution (1) Sounds like streaming:
Execution graphs, kernels, data-parallelism
Streaming: “squeeze out every FLOP”– Goals: bulk transfer, arithmetic intensity– Intensive static analysis, custom chips (mostly)– Bounded space, data access, execution time
![Page 7: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008](https://reader036.fdocuments.in/reader036/viewer/2022062408/56813b0c550346895da3b322/html5/thumbnails/7.jpg)
7
As a Compute Evolution (2) GRAMPS: “interesting apps are irregular”
– Goals: Dynamic, data-dependent code– Aggregate work at run-time– Heterogeneous commodity platforms
Naturally allows streaming when applicable
![Page 8: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008](https://reader036.fdocuments.in/reader036/viewer/2022062408/56813b0c550346895da3b322/html5/thumbnails/8.jpg)
8
GRAMPS’ Role A ‘graphics pipeline’ is now an app! GRAMPS models parallel state machines.
Compared to status quo:– More flexible than a GPU pipeline– More guidance than bare metal– Portability in between– Not domain specific
![Page 9: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008](https://reader036.fdocuments.in/reader036/viewer/2022062408/56813b0c550346895da3b322/html5/thumbnails/9.jpg)
9
GRAMPS Interfaces Host/Setup: Create execution graph
Thread: Stateful, singleton
Shader: Data-parallel, auto-instanced
![Page 10: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008](https://reader036.fdocuments.in/reader036/viewer/2022062408/56813b0c550346895da3b322/html5/thumbnails/10.jpg)
GRAMPS Entities (1) Accessed via windows
Queues: Connect stages, Dynamically sized– Ordered or unordered– Fixed max capacity or spill to memory
Buffers: Random access, Pre-allocated– RO, RW Private, RW Shared (Not Supported)
![Page 11: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008](https://reader036.fdocuments.in/reader036/viewer/2022062408/56813b0c550346895da3b322/html5/thumbnails/11.jpg)
GRAMPS Entities (2) Queue Sets: Independent sub-queues
– Instanced parallelism plus mutual exclusion– Hard to fake with just multiple queues
![Page 12: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008](https://reader036.fdocuments.in/reader036/viewer/2022062408/56813b0c550346895da3b322/html5/thumbnails/12.jpg)
12
What We’ve Built (System)
![Page 13: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008](https://reader036.fdocuments.in/reader036/viewer/2022062408/56813b0c550346895da3b322/html5/thumbnails/13.jpg)
13
GRAMPS Scheduler Tiered Scheduler
‘Fat’ cores: per-thread, per-core
‘Micro’ cores: shared hw scheduler
Top level: tier N
![Page 14: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008](https://reader036.fdocuments.in/reader036/viewer/2022062408/56813b0c550346895da3b322/html5/thumbnails/14.jpg)
14
What We’ve Built (Apps)Direct3D Pipeline (with Ray-tracing Extension)
Ray-tracing Graph
IA 1 VS 1 RO Rast
Trace
IA N VS N
PS
SampleQueue Set
RayQueue
PrimitiveQueue
Input VertexQueue 1
PrimitiveQueue 1
Input VertexQueue N
OM
PS2
FragmentQueue
Ray HitQueue
Ray-tracing Extension
PrimitiveQueue N
Tiler
Shade FB Blend
SampleQueue
TileQueue
RayQueue
Ray HitQueue
FragmentQueue
CameraSampler Intersect
= Thread Stage= Shader Stage= Fixed-func
= Queue= Stage Output= Push Output
![Page 15: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008](https://reader036.fdocuments.in/reader036/viewer/2022062408/56813b0c550346895da3b322/html5/thumbnails/15.jpg)
15
Initial Results Queues are small, utilization is good
![Page 16: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008](https://reader036.fdocuments.in/reader036/viewer/2022062408/56813b0c550346895da3b322/html5/thumbnails/16.jpg)
16
GRAMPS Visualization
![Page 17: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008](https://reader036.fdocuments.in/reader036/viewer/2022062408/56813b0c550346895da3b322/html5/thumbnails/17.jpg)
17
GRAMPS Visualization
![Page 18: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008](https://reader036.fdocuments.in/reader036/viewer/2022062408/56813b0c550346895da3b322/html5/thumbnails/18.jpg)
18
GRAMPS Portability Portability really means performance.
Less portable than GL/D3D– GRAMPS graph is (more) hardware sensitive
More portable than bare metal– Enforces modularity– Best case, just works – Worst case, saves boiler plate
![Page 19: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008](https://reader036.fdocuments.in/reader036/viewer/2022062408/56813b0c550346895da3b322/html5/thumbnails/19.jpg)
19
High-level Challenges Is GRAMPS a suitable GPU evolution?
– Enable pipeline competitive with bare metal?– Enable innovation: advanced / alternative
methods?
Is GRAMPS a good parallel compute model?– Map well to hardware, hardware trends?– Support important apps?– Concepts influence developers?
![Page 20: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008](https://reader036.fdocuments.in/reader036/viewer/2022062408/56813b0c550346895da3b322/html5/thumbnails/20.jpg)
20
What’s Next: Implementation Better scheduling
– Less bursty, better slot filling– Dynamic priorities– Handle graphs with loops better
More detailed costs– Bill for scheduling decisions– Bill for (internal) synchronization
More statistics
![Page 21: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008](https://reader036.fdocuments.in/reader036/viewer/2022062408/56813b0c550346895da3b322/html5/thumbnails/21.jpg)
21
What’s Next: Programming Model Yes: Graph modification (state change)
Probably: Data sharing / ref-counting
Maybe: Blocking inter-stage calls (join) Maybe: Intra/inter-stage synchronization primitives
![Page 22: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008](https://reader036.fdocuments.in/reader036/viewer/2022062408/56813b0c550346895da3b322/html5/thumbnails/22.jpg)
22
What’s Next: Possible Workloads REYES, hybrid graphics pipelines Image / video processing Game Physics
– Collision detection or particles Physics and scientific simulation AI, finance, sort, search or database query, …
Heavy dynamic data manipulation- k-D tree / octree / BVH build- lazy/adaptive/procedural tree or geometry