OpenCL Sathish Vadhiyar Sources: OpenCL overview from AMD OpenCL learning kit from AMD.
Few words about OpenCL
description
Transcript of Few words about OpenCL
Dmytro Konobrytskyi 1
Few words about OpenCL
Dmytro Konobrytskyi
Dmytro Konobrytskyi 2
Content
• Introduction• Supported devices• OpenCL “Hello world”• OpenCL vs. CUDA performance• Conclusion / Questions
Dmytro Konobrytskyi 3
History
• Open Computing Language (OpenCL) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing unit (CPUs), graphics processing unit (GPUs), and other processors.
• OpenCL 1.0 - 08 December 2008– 5 August 2009, AMD: SDK– 28 September 2009, Nvidia: drivers and SDK
• OpenCL 1.1 - 14 June 2010• OpenCL 1.2 - 15 November 2011
Dmytro Konobrytskyi 4
Supported devices
• NVidia GPU– On CUDA architecture
• AMD CPU, GPU, APU• Intel CPU– SSE/AVX support– IGP support starting with Ivy Bridge
• Multi-core ARM CPUs• ZiiLABS (Creative) ZMS processor• Mobile Phone GPUs?
Dmytro Konobrytskyi 5
OpenCL Platform Model
http://www.nvidia.com/content/GTC/documents/1409_GTC09.pdf
Dmytro Konobrytskyi 6
OpenCL memory model
Dmytro Konobrytskyi 7
Go to Visual Studio
OpenCL “Hello world”
Dmytro Konobrytskyi 8
OpenCL vs. CUDA performance
Dmytro Konobrytskyi 9
OpenCL vs. CUDA performance
• How do we have to compare performance?– The same code vs. Optimized algorithms– The latest drivers vs. The most stable drivers– The latest hardware vs. The most popular– Raw math and simple algorithms vs. Real world
complicated algorithm vs. All possible algorithms• We need to remember that NV GPUs actually
use the same set of commands for both CUDA and OpenCL
Dmytro Konobrytskyi 10
OpenCL vs. CUDA performance
• There is no the right answer for all these questions and testing requires a lot of time.
• And testing results actually may be not valid with new drivers.
• So we will look on existing testing results available in the web from different people, different algorithms and hardware and we will try to see the trends in these data.
Dmytro Konobrytskyi 11
www.sisoftware.net• Typical Arithmetic Results• Environment: Windows Vista x64 SP2; Catalyst 9.11 video / STREAM
1.4.427 / OpenCL 1.0 Beta 4; ForceWare 190.89 video / CUDA 2.3 / OpenCL 1.0 live release.
http://www.sisoftware.net/?d=qa&f=gpgpu_gpu_perf&l=en&a=oca
Sep 2009
Dmytro Konobrytskyi 12
www.sisoftware.net• Typical Memory Bandwidth Results • Environment: Windows Vista x64 SP2; Catalyst 9.11 video / STREAM
1.4.427 / OpenCL 1.0 Beta 4; ForceWare 190.89 video / CUDA 2.3 / OpenCL 1.0 live release.
http://www.sisoftware.net/?d=qa&f=gpgpu_gpu_perf&l=en&a=oca
Sep 2009
Dmytro Konobrytskyi 13
Accelereyes blogMay 2010
http://blog.accelereyes.com/blog/2010/05/10/nvidia-fermi-cuda-and-opencl/
C2050
Dmytro Konobrytskyi 14
Paper: A Performance Comparison of CUDA and OpenCL by Kamran Karimi
• Adiabatic QUantum Algorthms (AQUA), a Monte Carlo simulation
http://arxiv.org/abs/1005.2581
May 2010
Kernel execution and GPU data transfer times in seconds.
Dmytro Konobrytskyi 15
Accelereyes webinar
http://blog.accelereyes.com/blog/2012/02/17/opencl_vs_cuda_webinar_recap/
Feb 2012
Accelereyes: Our OpenCL support is new and not nearly as mature as our support of CUDA. But our initial OpenCL support is better than our initial CUDA support was when we first launched our CUDA products. And we expect OpenCL to continue to mature rapidly in the near future.
Dmytro Konobrytskyi 16
Kyle Spafford of Oak Ridge National Laboratory:Feb 2012
http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2012-02-20/13-shoc.pdf
Dmytro Konobrytskyi 17
Kyle Spafford of Oak Ridge National Laboratory:
http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2012-02-20/13-shoc.pdf
Dmytro Konobrytskyi 18
Kyle Spafford of Oak Ridge National Laboratory:
http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2012-02-20/13-shoc.pdf
Dmytro Konobrytskyi 19
OpenCL vs. CUDA performance
• Conclusions:– Performance of simple math operations was the
same initially and the same now;– OpenCL does not have access to few hardware
instructions and algorithms which use them are slower (texture, cache size selection);
– OpenCL uses more accurate special functions by default (but can use native functions);
– OpenCL was slower initially but the modern implementation is as fast as CUDA.
Dmytro Konobrytskyi 20
• Conclusion suppose to be here but let’s just discuss it together