AMD’s Uni ed CPU & GPU Processor Concept€¦ · Uni ed CPU & GPU Processor Concept Sven Nobis...
Transcript of AMD’s Uni ed CPU & GPU Processor Concept€¦ · Uni ed CPU & GPU Processor Concept Sven Nobis...
AMD’s Unified CPU & GPU ProcessorConcept
Advanced Seminar Computer Engineering
Sven Nobis
Institute of Computer Engineering (ZITI)University of Heidelberg
February 5, 2014
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
Overview
1 Introduction
2 BackgroundCPU vs. GPUCurrent Platforms: OpenCL & CUDA
3 Related Work
4 The way to HSAHeterogeneous Unified Memory Access
5 Heterogeneous System ArchitectureConceptsSystem ComponentsDevelopment Tools
6 Conclusion / Outlook
2/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
Previous: Single-Core Era
INFLECTIONS IN PROCESSOR DESIGN
© Copyright 2012 HSA Foundation. All Rights Reserved. 5
?
Sin
gle
-th
rea
d
Perf
orm
ance
Time
we are
here
Enabled by: Moore’s
Law
Voltage Scaling
Constrained by:
Power
Complexity
Single-Core Era
Mod
ern
Ap
plic
atio
n
Pe
rfo
rma
nce
Time (Data-parallel exploitation)
we are
here
Heterogeneous
Systems Era
Enabled by: Abundant data
parallelism
Power efficient
GPUs
Temporarily
Constrained by:Programming
models
Comm.overhead
Th
rou
ghp
ut
Pe
rfo
rma
nce
Time (# of processors)
we are
here
Enabled by: Moore’s Law
SMP
architecture
Constrained by:Power
Parallel SW
Scalability
Multi-Core Era
Assembly C/C++ Java … pthreads OpenMP / TBB …Shader CUDA OpenCL
C++ and Java
[8, P. 5]
3/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
Today: Multi-Core Era
INFLECTIONS IN PROCESSOR DESIGN
© Copyright 2012 HSA Foundation. All Rights Reserved. 5
?
Sin
gle
-th
rea
d
Perf
orm
ance
Time
we are
here
Enabled by: Moore’s
Law
Voltage Scaling
Constrained by:
Power
Complexity
Single-Core Era
Mod
ern
Ap
plic
atio
n
Pe
rfo
rma
nce
Time (Data-parallel exploitation)
we are
here
Heterogeneous
Systems Era
Enabled by: Abundant data
parallelism
Power efficient
GPUs
Temporarily
Constrained by:Programming
models
Comm.overhead
Th
rou
ghp
ut
Pe
rfo
rma
nce
Time (# of processors)
we are
here
Enabled by: Moore’s Law
SMP
architecture
Constrained by:Power
Parallel SW
Scalability
Multi-Core Era
Assembly C/C++ Java … pthreads OpenMP / TBB …Shader CUDA OpenCL
C++ and Java
[8, P. 5]
4/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
Today till future: Heterogeneous System Era
INFLECTIONS IN PROCESSOR DESIGN
© Copyright 2012 HSA Foundation. All Rights Reserved. 5
?
Sin
gle
-th
rea
d
Perf
orm
ance
Time
we are
here
Enabled by: Moore’s
Law
Voltage Scaling
Constrained by:
Power
Complexity
Single-Core Era
Mod
ern
Ap
plic
atio
n
Pe
rfo
rma
nce
Time (Data-parallel exploitation)
we are
here
Heterogeneous
Systems Era
Enabled by: Abundant data
parallelism
Power efficient
GPUs
Temporarily
Constrained by:Programming
models
Comm.overhead
Th
rou
ghp
ut
Pe
rfo
rma
nce
Time (# of processors)
we are
here
Enabled by: Moore’s Law
SMP
architecture
Constrained by:Power
Parallel SW
Scalability
Multi-Core Era
Assembly C/C++ Java … pthreads OpenMP / TBB …Shader CUDA OpenCL
C++ and Java
[8, P. 5]
5/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
Introduction
Today’s problems on CPU /GPU programming
programmability barriercommunication costs
SolutionAMD’s Unified CPU & GPUProcessor Concept?
→ Heterogeneous SystemArchitecture (HSA)
[3, P. 4]
6/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
Introduction
Today’s problems on CPU /GPU programming
programmability barriercommunication costs
SolutionAMD’s Unified CPU & GPUProcessor Concept?
→ Heterogeneous SystemArchitecture (HSA)
[3, P. 4]
6/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
Overview
1 Introduction
2 BackgroundCPU vs. GPUCurrent Platforms: OpenCL & CUDA
3 Related Work
4 The way to HSAHeterogeneous Unified Memory Access
5 Heterogeneous System ArchitectureConceptsSystem ComponentsDevelopment Tools
6 Conclusion / Outlook
7/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
CPU vs. GPU
CPU:LCU
Latency Compute Unit
GPU:TCU
Throughput Compute Unit
8/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
OpenCL & CUDA
Both well-established platforms for GPU programming
Compute Unified Device Architecture (CUDA)
ProprietaryOnly for NVIDIA GPUs
Open Computing Language (OpenCL)
Open standardATI, NVIDIA, Intel, ...Not only GPUs
9/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
OpenCLPlatform Model
[10]
10/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
OpenCLExecution Model
[5, P. 11]
11/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
Overview
1 Introduction
2 BackgroundCPU vs. GPUCurrent Platforms: OpenCL & CUDA
3 Related Work
4 The way to HSAHeterogeneous Unified Memory Access
5 Heterogeneous System ArchitectureConceptsSystem ComponentsDevelopment Tools
6 Conclusion / Outlook
12/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
Related Work
In CUDA [4]
Unified Virtual Addressing (UVA) in CUDA 4Unified Memory in CUDA 6
→ Developer view to the memory
Implicit copy & pinning
In OpenCL
Shared Virtual Memory
Copy is still necessary (for fast access)
13/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
Overview
1 Introduction
2 BackgroundCPU vs. GPUCurrent Platforms: OpenCL & CUDA
3 Related Work
4 The way to HSAHeterogeneous Unified Memory Access
5 Heterogeneous System ArchitectureConceptsSystem ComponentsDevelopment Tools
6 Conclusion / Outlook
14/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
CPU and GPU cores in a single die
APU
GPU
CPU
Llano
[3, P. 2] [7, P. 7]
15/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
hUMA: Heterogeneous Unified Memory Access
Today: Non-UniformMemory Access
Different/partitionedphysical memory percompute unitMultiple virtualmemory address spaces
hUMA: HeterogeneousUnified Memory Access
Same physical memorySame virtual memoryfor all compute units
PHYSICAL MEMORY
SHARED VIRTUAL MEMORY (TODAY)
Multiple Virtual memory address spaces
© Copyright 2012 HSA Foundation. All Rights Reserved. 7
CPU0 GPU
VIRTUAL MEMORY1
PHYSICAL MEMORY
VA1->PA1 VA2->PA1
VIRTUAL MEMORY2
PHYSICAL MEMORY
SHARED VIRTUAL MEMORY (HSA)
Common Virtual Memory for all HSA agents
© Copyright 2012 HSA Foundation. All Rights Reserved. 8
CPU0 GPU
VIRTUAL MEMORY
PHYSICAL MEMORY
VA->PA VA->PA
[2, P. 7], [2, P. 8]
16/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
hUMA: Heterogeneous Unified Memory Access
Today: Non-UniformMemory Access
Different/partitionedphysical memory percompute unitMultiple virtualmemory address spaces
hUMA: HeterogeneousUnified Memory Access
Same physical memorySame virtual memoryfor all compute units
PHYSICAL MEMORY
SHARED VIRTUAL MEMORY (TODAY)
Multiple Virtual memory address spaces
© Copyright 2012 HSA Foundation. All Rights Reserved. 7
CPU0 GPU
VIRTUAL MEMORY1
PHYSICAL MEMORY
VA1->PA1 VA2->PA1
VIRTUAL MEMORY2
PHYSICAL MEMORY
SHARED VIRTUAL MEMORY (HSA)
Common Virtual Memory for all HSA agents
© Copyright 2012 HSA Foundation. All Rights Reserved. 8
CPU0 GPU
VIRTUAL MEMORY
PHYSICAL MEMORY
VA->PA VA->PA
[2, P. 7], [2, P. 8]
16/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
hUMA: Heterogeneous Unified Memory Access (2)
Required: hUMA Memory Controller
FeaturesShared page table support
Same large address space as the CPUPage faulting
Coherent memory regions
Fully coherent shared memory modelLike on today’s SMP CPU systems
17/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
Overview
1 Introduction
2 BackgroundCPU vs. GPUCurrent Platforms: OpenCL & CUDA
3 Related Work
4 The way to HSAHeterogeneous Unified Memory Access
5 Heterogeneous System ArchitectureConceptsSystem ComponentsDevelopment Tools
6 Conclusion / Outlook
18/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
Concepts
Unified Address Space
Already mentioned with hUMA
Unified Programming Model
Queuing
HSA Intermediate Language
19/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
ConceptsUnified Programming Model
Current programming models→ Treating the GPU as a remote processor
Extending existing concepts to use HSAProgramming languages like C++Task parallel and data parallel APIs like C++ AMP
Stay in developers environment
#include <iostream> #include <amp.h> using namespace concurrency; int main() // "Hello World" in C++ AMP { int v[11] = {'G', 'd', 'k', 'k', 'n', 31, 'v', 'n', 'q', 'k', 'c'};
array_view<int> av(11, v); parallel_for_each(av.extent, [=](index<1> idx) restrict(amp) { av[idx] += 1; });
for(unsigned int i = 0; i < av.extent.size(); i++) std::cout << static_cast<char>(av(i)); } [6]
20/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
ConceptsQueuing - Current
[5, P.9]
21/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
ConceptsQueuing - New!
[5, P.9]
22/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
ConceptsHSA Intermediate Language
HSAIL: HSA Intermediate Language
BytecodeDesigned for data parallel programmingGPU independent
Generated by compilation stack (later)
Bytecode is compiled at runtime
to the Hardware Instruction Set of the current device
Execution Model is similar to OpenCL
23/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
System Components
APU
Software stack
Compilation StackRuntime StackSystem (Kernel) Software
24/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
System ComponentsCompilation Stack
[5, P. 15]25/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
System ComponentsRuntime-Stack
[5, P. 16]26/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
Development Tools
OpenCL
C++ AMP: C++ Accelerated Massive Parallelism
BOLT Library
Aparapi
27/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
Development ToolsOpenCL
”HSA is an optimized platform architecture for OpenCL- Not an alternative to OpenCL” [8, P. 13]
OpenCL on HSA will benefit from its features
28/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
Development ToolsBOLT Library
Simple Example:
5 | BOLT | June 2012
SIMPLE BOLT EXAMPLE #include <bolt/sort.h> #include <vector> #include <algorithm> void main() { // generate random data (on host) std::vector<int> a(1000000); std::generate(a.begin(), a.end(), rand); // sort, run on best device bolt::sort(a.begin(), a.end()); }
§ Interface similar to familiar C++ Standard Template Library
§ No explicit mention of C++ AMP or OpenCL™ (or GPU!) – More advanced use case allow programmer to supply a kernel in C++ AMP or OpenCL™
§ Direct use of host data structures (ie std::vector)
§ bolt::sort implicitly runs on the platform – Runtime automatically selects CPU or GPU (or both)
[9, P.5]
29/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
Development ToolsBOLT and C++ AMP
Simple Example:
6 | BOLT | June 2012
BOLT FOR C++ AMP : USER-SPECIFIED FUNCTOR #include <bolt/transform.h> #include <vector> struct SaxpyFunctor { float _a; SaxpyFunctor(float a) : _a(a) {}; float operator() (const float &xx, const float &yy) restrict(cpu,amp) { return _a * xx + yy; }; }; void main() { SaxpyFunctor s(100); std::vector<float> x(1000000); // initialization not shown std::vector<float> y(1000000); // initialization not shown std::vector<float> z(1000000); bolt::transform(x.begin(), x.end(), y.begin(), z.begin(), s); };
[9, P.6]
30/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
Overview
1 Introduction
2 BackgroundCPU vs. GPUCurrent Platforms: OpenCL & CUDA
3 Related Work
4 The way to HSAHeterogeneous Unified Memory Access
5 Heterogeneous System ArchitectureConceptsSystem ComponentsDevelopment Tools
6 Conclusion / Outlook
31/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
Conclusion
Interesting concept
Simplifies developmentOpen up new possibilities
Open platform
In heavy developmentMissing hardware with hUMA
→ Outlook
Software components not ready
→ A lot of potential
32/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
Outlook
Middle of January 2014:
Kaveri APU is available[1]Desktop APUSupport for
hUMAQueuing
Can connect bothDDR3 and GDDR5 [11]
Server APU follows:
BerlinARM-Based: Seattle
[11]
33/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
References I
[1] Benz, Benjamin: AMD fordert mit Kaveri Intels Core i5heraus. Heise Online. http://heise.de/-2085447.Version: Januar 2014
[2] Bratt, Ian: HSA Queueing. HOT CHIPS 2013.http://www.slideshare.net/hsafoundation/
hsa-queuing-hot-chips-2013. Version: August 2013
[3] Froning, Holger: Lecture 02 – CUDA Programming.Lecture: GPU Computing, 2013
[4] Harris, Mark: Unified Memory in CUDA 6.http://devblogs.nvidia.com/parallelforall/
unified-memory-in-cuda-6/. Version: November 2013
34/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
References II
[5] Kyriazis, George: A Heterogeneous SystemArchitecture: Technical Review / HSA Foundation. AMD,August 2012. – Forschungsbericht. – Rev. 1.0 S.
[6] Moth, Daniel: ”Hello world” in C++ AMP.http://blogs.msdn.com/b/nativeconcurrency/
archive/2012/03/04/
quot-hello-world-quot-in-c-amp.aspx.Version: Marz 2012
[7] Rogers, Phil: THE PROGRAMMER’S GUIDE TO THEAPU GALAXY. AMD Fusion Developer Summit.http://www.slideshare.net/hsafoundation/
afds-keynote-the-programmers-guide-to-the-apu-galaxy.Version: Juni 2011
35/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
References III
[8] Rogers, Phil: Heterogeneous System ArchitectureOverview. HOT CHIPS 2013.http://de.slideshare.net/hsafoundation/
hsa-intro-hot-chips2013-final. Version: August2013
[9] Sander, Ben: BOLT: A C++ Template Library for HSA.AMD Fusion Developer Summit.http://www.slideshare.net/hsafoundation/
bolt-for-hsa-by-ben-sanders. Version: Juni 2012
[10] Staff, AMD: OpenCL™ and the AMD APP SDK v2.4.http://developer.amd.com/resources/
documentation-articles/articles-whitepapers/
opencl-and-the-amd-app-sdk-v2-4/. Version: April2011
36/37
AMD’sUnified CPU
& GPUProcessorConcept
Sven Nobis
Introduction
Background
CPU vs. GPU
OpenCL &CUDA
Related Work
The way toHSA
HeterogeneousUnified MemoryAccess
HSA
Concepts
SystemComponents
DevelopmentTools
Conclusion /Outlook
References
References IV
[11] Windeck, Christof: AMD Kaveri: Feinheiten aus denDatenblattern. Heise Online.http://heise.de/-2088349. Version: Januar 2014
37/37