CMSC5743 Lab 03 CUDA Tutorial Materials

23
CMSC5743 Lab 03 CUDA Tutorial Materials Yang BAI Department of Computer Science & Engineering Chinese University of Hong Kong [email protected] October 29, 2021

Transcript of CMSC5743 Lab 03 CUDA Tutorial Materials

Page 1: CMSC5743 Lab 03 CUDA Tutorial Materials

CMSC5743 Lab 03CUDA Tutorial Materials

Yang BAIDepartment of Computer Science & EngineeringChinese University of Hong [email protected]

October 29, 2021

Page 2: CMSC5743 Lab 03 CUDA Tutorial Materials

1 Vector Addition

2 Tensor Core WMMA

3 General Matrix Multiplication

Outline

2/23

Page 3: CMSC5743 Lab 03 CUDA Tutorial Materials

• Heterogeneous Computing

• Host and Device

• CUDA C/C++

CUDA Programming Language

3/23

Page 4: CMSC5743 Lab 03 CUDA Tutorial Materials

Vector Addition

Page 5: CMSC5743 Lab 03 CUDA Tutorial Materials

• Host Code Initialization

• Device Code Initialization

• Kernel Code

• Check Your Results

• Free Source on Host

• Free Source on Device

Coding Style and Organization

5/23

Page 6: CMSC5743 Lab 03 CUDA Tutorial Materials

Host Code Initialization

6/23

Page 7: CMSC5743 Lab 03 CUDA Tutorial Materials

Device Code Initialization

7/23

Page 8: CMSC5743 Lab 03 CUDA Tutorial Materials

Kernel Code

8/23

Page 9: CMSC5743 Lab 03 CUDA Tutorial Materials

Check Your Results

9/23

Page 10: CMSC5743 Lab 03 CUDA Tutorial Materials

Free Source on Host and Device

10/23

Page 11: CMSC5743 Lab 03 CUDA Tutorial Materials

Tensor Core WMMA

Page 12: CMSC5743 Lab 03 CUDA Tutorial Materials

• Access to Tensor Core by cuBLAS/cuDNN

• Access to Tensor Core by CUTLASS

• Access to Tensor Core by TVM

Tensor Core Overview

12/23

Page 13: CMSC5743 Lab 03 CUDA Tutorial Materials

• Volta Tensor Core (1st)

• FP16 supported• 8 x 8 x 4 (M x N x K)

• Turing Tensor Core (2nd)

• FP16 supported• 8 x 8 x 4, 16 x 8 x 8 (recommended)• INT8, INT4, INT1 supported• 8 x 8 x 16, 8 x 8 x 32, 8 x 8 x 128

• Ampere Tensor Core (3rd)

• new bfloat16 supported, FP16• 16 x 8 x 8, 16 x 8 x 16• new TF32 supported, 16 x 8 x 4, 16 x 8 x 8• new Double supported, 8 x 8 x 4

Tensor Core History

13/23

Page 14: CMSC5743 Lab 03 CUDA Tutorial Materials

Data types

14/23

Page 15: CMSC5743 Lab 03 CUDA Tutorial Materials

Data types

15/23

Page 16: CMSC5743 Lab 03 CUDA Tutorial Materials

Tesla T4 GPU Architecture

16/23

Page 17: CMSC5743 Lab 03 CUDA Tutorial Materials

• Architecture: Turing

• SMs: 68

• CUDA Cores/SM: 64

• CUDA Cores/GPU: 4352

• Tensor Cores/SM: 8

• Tensor Cores/GPU: 544

Turing 2080Ti Information

17/23

Page 18: CMSC5743 Lab 03 CUDA Tutorial Materials

Hierarchical Structure

18/23

Page 19: CMSC5743 Lab 03 CUDA Tutorial Materials

Hierarchical Structure• The basic triple loop nest computing matrix multiply may be blocked and tiled to

match concurrency in hardware, memory locality, and parallel programming models.

• In CUTLASS, GEMM is mapped to NVIDIA GPUs with the structure illustrated bythe following loop nest.

This tiled loop nest targets concurrency among• threadblocks-level

• warps

• CUDA and Tensor Cores

and takes advantage of memory locality within• shared memory

• registers

19/23

Page 20: CMSC5743 Lab 03 CUDA Tutorial Materials

The flow of data in CUTLASS

20/23

Page 21: CMSC5743 Lab 03 CUDA Tutorial Materials

General Matrix Multiplication

Page 22: CMSC5743 Lab 03 CUDA Tutorial Materials

Q1 Learn the code in ./Lab03-CUDA/code and it contains three folders(vector_add, gemm, wmma)

• Learn the code style and components of vector_add.cu file• Complete all of the code in gemm folder• Try to make your gemm kernel more efficient

• shared memory• tiling size• block and thread size

Q2 Learn the wmma.cu from the /Lab03-CUDA/code/wmma to run itsuccessfully by compile.sh script

• Learn the different data type in CUDA programming language such asFloat16, Int8

• Learn the basic knowledge of Tensor Core and WMMA in CUDAprogramming language

• Learn the difference between FLOPs and FLOPS• Change the tiling size in wmma.cu to get the different TFLOPS

22/23

Page 23: CMSC5743 Lab 03 CUDA Tutorial Materials

THANK YOU!