Hetero Lecture Slides 002 Lecture 1 Lecture 1 4 Cuda Intro

download Hetero Lecture Slides 002 Lecture 1 Lecture 1 4 Cuda Intro

of 12

Transcript of Hetero Lecture Slides 002 Lecture 1 Lecture 1 4 Cuda Intro

  • 8/13/2019 Hetero Lecture Slides 002 Lecture 1 Lecture 1 4 Cuda Intro

    1/12

    Introduction to CUDA

    - Data Parallelism and Threads

    Lesson 1.4

  • 8/13/2019 Hetero Lecture Slides 002 Lecture 1 Lecture 1 4 Cuda Intro

    2/12

    O

    To learn about data paralle

    the basic features of CUDA

    heterogeneous parallel prog

    interface that enables expl

    of data parallelism

    Hierarchical thread orga

    Main interfaces for laun

    parallel execution

    Thread index to data ind

  • 8/13/2019 Hetero Lecture Slides 002 Lecture 1 Lecture 1 4 Cuda Intro

    3/12

    A[0]vector A

    vector B

    vector C

    A[1] A[2]

    B[0] B[1] B[2]

    C[0] C[1] C[2]

    +

    + +

    Data Parallelism - Vector Additio

  • 8/13/2019 Hetero Lecture Slides 002 Lecture 1 Lecture 1 4 Cuda Intro

    4/12

    CUDA /OpenCL Executi

    Heterogeneous host+device applica

    Serial parts in host C code Parallel parts in device SPMD ker

    Serial Code (host)

    Parallel Kernel (device)

    KernelA>(args);

    Serial Code (host)

    Parallel Kernel (device)

    KernelB>(args);

  • 8/13/2019 Hetero Lecture Slides 002 Lecture 1 Lecture 1 4 Cuda Intro

    5/12

    From Natural Language to E

    Natural Language (e.g, En

    Algorithm

    High-Level Language (C/CInstruction Set Architec

    Microarchitecture

    Circuits

    Electrons

    Yale Patt and Sanjay Patel, From bits an

    Compiler

  • 8/13/2019 Hetero Lecture Slides 002 Lecture 1 Lecture 1 4 Cuda Intro

    6/12

    An Instruction Set Architec

    is a contract between the h

    and the software.

    As the name suggests, it is

    instructions that the archi

    (hardware) can execute.

  • 8/13/2019 Hetero Lecture Slides 002 Lecture 1 Lecture 1 4 Cuda Intro

    7/12

    A program at the I

    A program is a set of instr

    stored in memory that can b

    interpreted, and executed b

    hardware.

    Program instructions operat

    stored in memory or provide

    Input/Output (I/O) device.

  • 8/13/2019 Hetero Lecture Slides 002 Lecture 1 Lecture 1 4 Cuda Intro

    8/12

    A Von-Neumann

    Memory

    Control Unit

    ALUReg

    File

    PC I

    Processing Unit

    A thread is a virtualized or

    abstracted

    Von-Neumann Processor

  • 8/13/2019 Hetero Lecture Slides 002 Lecture 1 Lecture 1 4 Cuda Intro

    9/12

    Arrays of Parallel T

    A CUDA kernel is executed by a gthreads

    All threads in a grid run the sa(SPMD)

    Each thread has indexes thatcompute memory addresses decisions

    i = blockIdx.x * blo

    threadIdx.x

    C[i] = A[i] + B

    0 1 2 25

  • 8/13/2019 Hetero Lecture Slides 002 Lecture 1 Lecture 1 4 Cuda Intro

    10/12

    Thread Blocks: Scalable Coo

    Divide thread array into mul

    Threads within a block cshared memory, atomic opbarrier synchronization Threads in different blo

    interact

    i = blockIdx.x *

    blockDim.x +

    threadIdx.x;

    C[i] = A[i] + B[i];

    0 1 2 254 255

    Thread Block 0

    1 2 254 255

    Thread Block 1

    0

    i = blockIdx.x *

    blockDim.x +

    threadIdx.x;

    C[i] = A[i] + B[i];

    1

    Threa

    0

    i =

    bl

    th

    C[i] =

  • 8/13/2019 Hetero Lecture Slides 002 Lecture 1 Lecture 1 4 Cuda Intro

    11/12

    blockIdx and t

    Each thread uses indices todecide what data to work on blockIdx: 1D, 2D, or 3D (CUDA

    4.0)

    threadIdx: 1D, 2D, or 3D

    Simplifies memoryaddressing when processingmultidimensional data Image processing

    Solving PDEs on volumes

    device

    GridBlock 0,0)

    BlBlock 1,0)

    Bl

    Block

    Threa

    d

    (0,1,

    0)

    Threa

    d

    (0,1,

    1)

    Threa

    d

    (0,1,

    2)

    Threa

    d

    (0,0,

    0)

    Threa

    d

    (0,0,

    1)

    Threa

    d

    (0,0,

    2)

    (1,0,0)(1,0,1) (1,

    2)

  • 8/13/2019 Hetero Lecture Slides 002 Lecture 1 Lecture 1 4 Cuda Intro

    12/12

    To learn more,

    Chapt