1. GPU – History First true 3D graphics started with early display controllers (video shifters) ...
-
Upload
franklin-james -
Category
Documents
-
view
218 -
download
1
Transcript of 1. GPU – History First true 3D graphics started with early display controllers (video shifters) ...
![Page 1: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/1.jpg)
1
SEMINARIGOR KAMZIC
COSC3P93
![Page 2: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/2.jpg)
2
GPU – History
First true 3D graphics started with early display controllers (video shifters)
They acted as pass between CPU and display
RCA’s “Pixie” video chip (CDP1861) in 1976 capable of outputting video signal at 62x128 resolution
In 1977 this chip was followed by Television Interface Adapter (TIA) 1A
![Page 3: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/3.jpg)
3
GPU – History
TIA was integrated into Atari 2600 for generating the screen display, sound effects and reading the input from the controller
![Page 4: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/4.jpg)
4
GPU – History Basically vertices transformed into pixels Computing by “hand” and that was very slow Early 80’s to late 90’s early GPUs work with
fixed-function pipeline
![Page 5: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/5.jpg)
5
GPU – History
![Page 6: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/6.jpg)
6
GPU – History
Later general programming extended to shader stage
Data independence is also explored In 2006 NVidia GeForce 8800 mapped
separate graphics stage to a unified array of processors(for vertex shading, geometry and pixel processing)
In 2007 NVidia release of CUDA
![Page 7: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/7.jpg)
7
GPU
Similar to computer CPU but designed for the purpose of computing very complex mathematical and geometric calculation
In past all this work has been done by CPU which put strain on CPU and degraded performance
GPU improves performance because of its parallel processing architecture which allows it to perform multiple calculation at same time
![Page 8: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/8.jpg)
8
GPU Some of the fastest GPU has much more transistors
than average CPU GPU due to intensive calculation and speed produce
a lot of heat so on the motherboard it is usually located under heat sink or fan
GPU typically interface with motherboard using PCI Express bus or accelerated graphic port and can be replaced or upgraded easily
There can be multiple GPUs do draw images simultaneously to the screen increasing the processing power (Google Tango Project)
![Page 9: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/9.jpg)
9
GPU vs CPU
Architecturally CPU consist of few cores that can handle multiple threads at the time
GPU consist of hundreds of cores that can handle thousands of threads simultaneously
![Page 10: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/10.jpg)
10
GPU vs CPU
Discrepancy in floating point capability between CPU and GPU is that GPU is specialized for compute-intensive, highly parallel computation - exactly what graphic rendering is about and so 80% of transistors are devoted for data processing
![Page 11: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/11.jpg)
11
GPU vs CPU
Same function is executed on each element of data with high arithmetic intensity
![Page 12: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/12.jpg)
12
Benefits of using GPU vs CPU
GPU has many benefits such as more computing power, larger memory bandwidth and lower power consumption but regarding its high computing there are some constraints
Developing a code with GPU takes more time and requires highly skilled work
GPU code runs in parallel so data partition and synchronization is needed
![Page 13: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/13.jpg)
13
Benefits of using GPU vs CPU
It is hard to answer this question since it is application dependant
Simply GPU is very good following straight line of processing but not so good when processing different processing path
Code should be executed on GPU when it must be executed many times in parallel
![Page 14: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/14.jpg)
14
Benefits of using GPU vs CPU
Example we can blend pixels from A to B and put them all in C
This task when executed on CPU would be: For (int i = 0; i < pixelCount; i++)
C[i] = A[i] + B[i];
This code can be slow when many pixels
![Page 15: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/15.jpg)
15
Benefits of using GPU vs CPU Code C[i] = A[i] + B[i]; and then we can populate cores with this
code assigning value i for each This is where GPU is at its best because all cores execute program
at same time Example where GPU is not very fast is conditional branching which
implies making copy of the program that follows branch A and populate all cores with this code
Execute until first logical operation Evaluate all elements and continue processing all elements that
follow branch A and enqueue all processes that chose path B Problem is there is no program for B and now all cores that chose B
must be idle
![Page 16: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/16.jpg)
16
Benefits of using GPU vs CPU Possible worst case from prev point? Only one core executes A branch and all
others idle Once cores executing A are done we could
activate branch B version of the program (copying oit from memory buffer to core memory)
Execute B branches and if needed merge results
![Page 17: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/17.jpg)
17
Benefits of using GPU vs CPU GPU is designed for multithreaded calculations GPU makers can easily add more cores
whenever they want to add computational power but the problem is that some problems can not be divided in smaller problems
Lecture point: Not every problem lends itself to parallelism
Ex: nth in Fibonacci series (CPU much faster here)
![Page 18: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/18.jpg)
18
Benefits of using GPU vs CPU GPU can be more efficient for other reasons
beside parallel computing More restrictive memory access Does not support as many data types GPUs have limited instruction sets to perform
specialized calculations GPUs are highly optimized for floating point
calculations Integer point calculation is not necessarily
faster on GPU
![Page 19: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/19.jpg)
19
AMD FirePro™ D-Series GPU
Newest star of GPU in new Mac Pro 3 models D300, D500, D700 Main difference between above is number of
stream processors, VRAM, width of memory bus, memory bandwidth and teraflop performance
More processing power for video editing, 3D modeling and animation and photography
GPU computing using OpenCL (more on it later)
![Page 20: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/20.jpg)
20
AMD FirePro™ D-Series GPU
Architecture of this particular GPU supports OpenCL 2.0 and lower D300 model supports 256 bit memory bus that delivers 160 GB per second memory bandwidth meaning large amounts of data can be read quickly
With support of OpenCL 2.0 it is now possible for application to run both on GPU and CPU simultaneously and AMD refers to this as Accelerated Parallel Programming
![Page 21: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/21.jpg)
21
AMD FirePro™ D-Series GPU
![Page 22: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/22.jpg)
22
GPU Accelerated Computing
It is basically use of GPU together with CPU to accelerate scientific, analytics, consumer and enterprise applications
Started in 2007 by NVIDIA GPUs are currently accelerating applications in
platforms ranging from cars, mobile phones, risk management etc.
GPU Accelerated computing offers better performance by offloading compute – intensive portions of applications to GPU while remainder of the code still runs on CPU
![Page 23: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/23.jpg)
23
GPU Accelerated Computing
We can see from the image on the left that some part of the code runs on GPU and some part runs on GPU and from users perspective applications simply runs faster
![Page 24: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/24.jpg)
24
OpenCL
Open Computing Language From the makers of OpenGL Wide industry support: AMD, Apple,
NVidia, Samsung etc. OpenCL model :
![Page 25: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/25.jpg)
25
OpenCL Architecture
Host controls multiple compute devices
![Page 26: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/26.jpg)
26
OpenCL Architecture Each of these compute devices consist of
multiple compute units Compute units (execution units and arithmetic's
processing units) contain processing elements Processing elements execute OpenCL kernels
(these are just a functions written by programmer in OpenCL language (C with restrictions and special keywords and data types)
Kernels are basic unit of executable code Program is collection of kernels and other
functions
![Page 27: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/27.jpg)
27
OpenCL Architecture
We should also be aware that OpenCL program is divided in two parts
One part that executes on device (GPU) Second part that executes on host
(CPU) Device part is where we need to write
special functions called kernels
![Page 28: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/28.jpg)
28
OpenCL Architecture – Device
Device is GPU Kernel is written which is function
executed on GPU (not only one) Kernels are entry points into device
program (only functions that can be called from host)
We need to program kernels ourselves
![Page 29: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/29.jpg)
29
How to program a kernel - SIMT
SIMT: Single instruction multiple thread which reflects how instructions are executed on device
Same code is executed in parallel by a different thread and each thread executes the code with different data
![Page 30: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/30.jpg)
30
How to program a kernel – Work Item
Work items are equivalent to threads and are smallest execution entity
Every time kernel is launched, lots of work items (a number specified by programmer) are launched and each one is executing same code
Each work item has an ID which is accessible from the kernel and it is used to distinguish the data to be processed by each work item
![Page 31: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/31.jpg)
31
How to program a kernel – Work Group
Work groups are there to allow communication and cooperation between work items
They also reflect how work items are organized
N dimensional grid of work groups (N = 1,2 or 3)
Work groups also have ID which can be called from kernel
![Page 32: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/32.jpg)
32
How to program a kernel – ND Range
ND Range is next organizational level specifying how work groups are organized
N dimensional grid of work groups where N = 1,2 or 3
![Page 33: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/33.jpg)
33
Kernel Example - CPU
void vector_add_cpu (const float* src_a, const float* src_b, float* res, const int num){ for (int i = 0; i < num; i++) res[i] = src_a[i] + src_b[i];}
This is kernel that adds 2 vectors. Here basically we have one thread iterating through all elements.
![Page 34: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/34.jpg)
34
Kernel Example - GPU__kernel void vector_add_gpu (__global const float* src_a, __global const float* src_b, __global float* res, const int num){ /* get_global_id(0) returns the ID of the thread in execution. As many threads are launched at the same time, executing the same kernel, each one will receive a different ID, and consequently perform a different computation.*/ const int idx = get_global_id(0); /* Now each work-item asks itself: "is my ID inside the vector's range?" If the answer is YES, the work-item performs the corresponding computation*/ if (idx < num) res[idx] = src_a[idx] + src_b[idx];}
![Page 35: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/35.jpg)
35
Kernel Example - GPU
Each thread computing one elements “kernel” reserved word which specifies
that the function is kernel Kernel functions always return void In similar ways we can program host
device as well
![Page 36: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/36.jpg)
36
Kernel Example - GPU
![Page 37: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/37.jpg)
37
Parallel Processing and OpenCL
OpenCL data parallel programming model is very hierarchical which can be specified in two ways
Explicitly – programmer defines total number of items to execute in parallel as well as how to group them
Implicitly - programmer defines total number of items to execute in parallel and OpenCL manages grouping them
![Page 38: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/38.jpg)
38
OpenCL and Synchronization
The two domains of synchronization in OpenCL are work items in single work group and command queue in a single context
Work group barriers enable synchronization or work items in work group – barrier()
Barrier and memory fences synchronize threads in a work group
All threads are required to reach barrier before any of them can continue
![Page 39: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/39.jpg)
39
OpenCL and Synchronization
![Page 40: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/40.jpg)
40
OpenCL and Synchronization
Built in functions mem_fence() and barrier() mem_fence(CLK_LOCAL_MEM_FENCE and/or
CLK_GLOBAL_MEM_FENCE) waits until all reads/writes to local and/or global memory
made by the calling work item prior to mem_fence () are visible to all threads in the work group
barrier(CLK_LOCAL_MEM_FENCE and/or CLK_GLOBAL_MEM_FENCE)
waits until all work items in the work group have reached this point and calls mem_fence (CLK_LOCAL_MEM_FENCE and/or CLK_GLOBAL_MEM_FENCE
![Page 41: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/41.jpg)
41
OpenCL and Synchronization
Two types of synchronization between commands in command queue
Command Queue barrier – enforces ordering with single queue and any resulting changes in memory are available to next command in the queue
Events – enforces ordering between or within queues
Enqueued commands in OpenCL return event identifying command as well as memory object updated by it
![Page 42: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/42.jpg)
42
OpenCL – Memory Model
OpenCL had 4 address space private – specific to work item and not visible to
other work items local – specific to work group and accessible
only to work items belonging to that work group global – accessible to all work items executing
in context as well as to the host constant – read only region for host allocated
objects that are not changed during kernel execution
![Page 43: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/43.jpg)
43
OpenCL – Memory Model
There is also host accessible region for application data structure and program data
Pci memory part of host (CPU) memory accessible from and modifiable by host program and GPU device
Modifying this memory requires synchronization between GPU compute device and the CPU
![Page 44: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/44.jpg)
44
OpenCL – Communication
Communication and data transfer between host and GPU occur on PCIe channel
Actual transfer performance is CPU dependant
Transfer from the host to the GPU are done by the command processor
GPU device can read and write system memory directly through kernel instructions over PCIe bus
![Page 45: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/45.jpg)
45
OpenCL – Processing API Calls
Host application does not interact with GPU device directly (data structures for the host)
Driver layer translates and issues commands to the hardware
Most commands to the GPU are buffered in command queue on the host side
Queue of commands is sent to and processed by the GPU
There is no guarantee as to when commands from command queue are executed but only that they are executed in order
![Page 46: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/46.jpg)
46
OpenCL – Scheduling
GPU devices are very efficient in parallelizing large numbers of work items in manner transparent to application
Each GPU device uses large number of wavefronts to hide memory access latencies by having scheduler switch the active wavefront in given compute unit whenever the current wavefront is waiting for a memory access to complete
![Page 47: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/47.jpg)
47
OpenCL – Scheduling
![Page 48: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/48.jpg)
48
OpenCL – Scheduling
![Page 49: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/49.jpg)
49
Data Parallelism in OpenCL
Define N dimensional computation domain (N = 1, 2 or 3)
Each independent element of execution in ND domain is called a work item
The ND domain defines the total number of work items that execute in parallel
E.g., process a 1024 x 1024 image: Global problem dimensions:
1024 x 1024 = 1 kernel execution per pixel: 1,048,576 total executions
![Page 50: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/50.jpg)
50
Data Parallelism in OpenCL
![Page 51: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/51.jpg)
51
Data Parallelism in OpenCL
Kernels executed across a global domain of work items
Global dimensions define the range of computation one work item per computation, executed in parallel
Work items are grouped in local workgroups Local dimensions define the size of the
workgroups Executed together on one device and share
local memory and synchronization
![Page 52: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/52.jpg)
52
OpenCL C (quick glance)
Derived from ISO C99 (with some restrictions)
Language Features Added (Work items and work groups, vector types and synchronization
Included large set of built in functions for image manipulation, work item manipulation and math functions
![Page 53: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/53.jpg)
53
OpenCL C language restriction
Pointers to functions are not allowed Pointers to pointers allowed within a
kernel, but not as an argument Variable length arrays and structures are
not supported Recursion is not supported 3D Image writes are not supported
![Page 54: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/54.jpg)
54
OpenCL C optional extension
Extensions are optional features exposed through OpenCL
The OpenCL working group has already approved many extensions to the OpenCL specification such as double precision floating point types, built in functions to support doubles, byte addressable stores (write to pointers to types < 32 bits)
![Page 55: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/55.jpg)
55
Work Items and Work Groups
![Page 56: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/56.jpg)
56
Work Items and Work Groups
![Page 57: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/57.jpg)
57
OpenCL Data Types
Scalar data types (bool, char, cl_char, unsigned char, uchar, cl_uchar, short, cl_short, unsigned short, etc.)
Image types (image2d_t, image3d_t, image2d_array_t, image1d_t, etc.)
Vector data types (charn, ucharn, shortn, ushortn, intn, uintn etc.)
Supported values of n are 2, 3, 4, 8, and 16 for all vector data types
![Page 58: 1. GPU – History First true 3D graphics started with early display controllers (video shifters) They acted as pass between CPU and display RCA’s.](https://reader030.fdocuments.in/reader030/viewer/2022032706/56649e005503460f94aea366/html5/thumbnails/58.jpg)
58
Q & A