What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of...
Transcript of What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of...
![Page 1: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/1.jpg)
Photo: necosky on Flickr
![Page 2: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/2.jpg)
Or,
Vector Optimisation For SIMD Newbies
… by a SIMD newbie
![Page 3: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/3.jpg)
What is SIMD?
![Page 4: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/4.jpg)
Image: Wikipedia
![Page 5: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/5.jpg)
x86: MMX/3DNow/SSE*
ARM: NEON
GPGPU: CUDA/OpenCL
![Page 6: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/6.jpg)
Works well for a very specific subset of apps
![Page 7: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/7.jpg)
3D Graphics
![Page 8: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/8.jpg)
Signal processing
![Page 9: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/9.jpg)
Other (specific) forms of data crunching
![Page 10: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/10.jpg)
But writing SIMD assembly is hard
![Page 11: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/11.jpg)
Need to write once per architecture/processor
![Page 12: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/12.jpg)
Enter: Orc
![Page 13: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/13.jpg)
Project started by David Schleef
![Page 14: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/14.jpg)
Write “programs” in simple ASM-like language
![Page 15: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/15.jpg)
Runtime compiler: “programs” → native assembly
![Page 16: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/16.jpg)
Library: extend Orc for your purposes
![Page 17: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/17.jpg)
Supports:
MMX, SSE (1 to 4.2)
ARM, NEON
Altivec (PPC)
C64x (TI DSP)
![Page 18: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/18.jpg)
![Page 19: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/19.jpg)
Getting started
![Page 20: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/20.jpg)
A simple example: PulseAudio echo canceller
![Page 21: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/21.jpg)
That alone provided a >20% speedup
![Page 22: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/22.jpg)
Another one: PulseAudio volume scaling
![Page 23: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/23.jpg)
Sample s: 16-bit signed int (usually)
Volume v: 32-bit unsigned int
Operation: (s * v) >> 16
![Page 24: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/24.jpg)
The C code
![Page 25: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/25.jpg)
The SSE code
![Page 26: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/26.jpg)
The Orc code
![Page 27: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/27.jpg)
TBD: More AEC optimisation (dot product)
![Page 28: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/28.jpg)
Limitations
![Page 29: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/29.jpg)
Future work
![Page 30: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/30.jpg)
Questions?
![Page 31: What is SIMD? - Arun RaghavanARM: NEON GPGPU: CUDA/OpenCL Works well for a very specific subset of apps 3D Graphics Signal processing Other (specific) forms of data crunching But writing](https://reader034.fdocuments.in/reader034/viewer/2022050408/5f858d938b670130a900f683/html5/thumbnails/31.jpg)
IRC: #orc on FreeNode