Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 April 4, 2013...

Emergence of GPU systems for general purpose high performance computing

ITCS 4145/5145 April 4, 2013 © Barry Wilkinson CUDAIntro.ppt

18,688 NVIDIA Tesla K20X GPUs

20 petaflops

Upgraded from Jaguar supercomputer.

10 times faster and 5 times more energy efficient than 2.3-petaflops Jaguar system while occupying the same floor space.

Titan SupercomputerOak Ridge National Laboratory in Oak Ridge, TennWorld’s fastest computer as of Nov 2012

No 1 rank on TOP500 list

http://nvidianews.nvidia.com/Releases/NVIDIA-Powers-Titan-World-s-Fastest-Supercomputer-For-Open-Scientific-Research-8a0.aspx#source=pr

Tesla K20 GPU Computing modulesKepler architecture. Introduced November 2012

K20 – 2496 thread processors (cores)K20X – 2688 thread processors (cores)

K20:2496 FP32 cores, 832 FP64 coresWattage 225 watts3.5 compute capabilityGFLOPs: Single Precision: 3519 /4106 Double Precision: 1173

2010: NVIDIA Corp. selected UNC-Charlotte Department of Computer Science to be a CUDA Teaching Center, kindly providing GPU equipment and TA support.

2011: NVIDIA kindly provided 50 GTX 480 GPU cards valued at $15,000 as continuing support for the CUDA Teaching Center.

2012: NVIDIA donates a K20!

UNC-C CUDA Teaching Center

Our course materials are posted on NVIDIA’s corporate site next to those from Stanford, and other top schools.

http://developer.nvidia.com/cuda-training

coit-grid01.uncc.edu – coit-grid7.uncc.educluster

coit-grid09.uncc.edu, dual 16-core CPU raid 1 server on order

coit-grid01

switch

coit-grid05

coit-grid03

coit-grid02

coit-grid04

All user’s home directories on coit-grid05 (NFS)

Login from on-campus or off-campusUse coit-grid01.uncc.edu

coit-grid07

NVIDIA C2050 GPU(448 cores)

coit-grid07: GPU server, X5560 2.8GHz quad-core Xeon processor with NVIDIA 2050 GPU, 12GB main memory(Can hold four C2050 GPUs, 1792 cores!)

coit-grid06

Login directly from within UNC-C campus only

NVIDIA C2050 GPU(448 cores)

coit-grid08

NVIDIA K20 GPU

(2496 cores)

coit-grid08: GPU server, E5-1650 3.2GHz 6-core Xeon processor with NVIDIA K20 GPU, 32GB main memory

CPU-GPU architecture evolution1970s - 1980s

Co-processors -- very old idea appeared in 1970s and 1980s -- floating point co-processors attached to microprocessors that did not then have floating point capability.

Coprocessors simply executed floating point instructions that were fetched from memory.

Around same time, interest to provide hardware support for displays, especially with increasing use of graphics and PC games.

Led to graphics processing units (GPUs) attached to CPU to create video display.

Graphics card

Display

Memory

Early design

Pipelined programmable GPUDedicated pipeline (late1990s-early 2000s)

By late1990’s, graphics chips needed to support 3-D graphics, especially for games and graphics APIs such as DirectX and OpenGL.

Graphics chips generally had a pipeline structure with individual stages performing specialized operations, finally leading to loading frame buffer for display.

Individual stages may have access to graphics memory for storing intermediate computed data.

Input stage

Vertex shader stage

Geometry shader stage

Rasterizer stage

Frame buffer

Pixel shading stage

Graphics memory

General-Purpose GPU designs

High performance pipelines call for high-speed (IEEE) floating point operations.

People tried to use GPU cards to speed up scientific computations

Known as GPGPU (General-purpose computing on graphics processing units) -- Difficult to do with specialized graphics pipelines, but possible.)

By mid 2000’s, recognized that individual stages of graphics pipeline could be implemented by a more general purpose processor core (although with a data-parallel paradigm)

Graphics Processing Units (GPUs)Brief History

1970 2010200019901980

Atari 8-bit computer

text/graphics chip

Source of information http://en.wikipedia.org/wiki/Graphics_Processing_Unit

IBM PC Professional Graphics Controller

S3 graphics cards- single chip 2D

accelerator

OpenGL graphics API

Hardware-accelerated 3D graphics

DirectX graphics API

Playstation

GPUs with programmable shading

Nvidia GeForceGE 3 (2001) with

programmable shading

General-purpose computing on graphics processing units

(GPGPUs)

GPU Computing

NVIDIA products

NVIDIA Corp. is the leader in GPUs for high performance computing:

1993 201019991995

http://en.wikipedia.org/wiki/GeForce

20092007 20082000 2001 2002 2003 2004 2005 2006

Established by Jen-Hsun Huang, Chris

Malachowsky, Curtis Priem

NV1 GeForce 1

GeForce 2 series GeForce FX series

GeForce 8 series

GeForce 200 series

GeForce 400 series

GTX460/465/470/475/480/485

GTX260/275/280/285/295GeForce 8800

Quadro

NVIDIA's first GPU with general purpose processors

C870, S870, C1060, S1070, C2050, …

Tesla 2050 GPU has 448 thread processors

Kepler(2011)

Maxwell (2013)

NVIDIA GT 80 chip/GeForce 8800 card (2006)

First GPU for high performance computing as well as graphicsUnified processors that could perform vertex, geometry, pixel, and general computing operations

Could now write programs in C rather than graphics APIs.

Single-instruction multiple thread (SIMT) prog. model

14* Whitepaper NVIDIA’s Next Generation CUDA Compute Architecture: Fermi, NVIDIA, 2008

• Data parallel single instruction multiple data operation (“Stream” processing)

• Up to 512 cores (“stream processing engines”, SPEs, organized as 16 SPEs, each having 32 SPEs)

• 3GB or 6 GB GDDR5 memory

• Many innovations including L1/L2 caches, unified device memory addressing, ECC memory, …

• First implementation: Tesla 20 series (single chip C2050/2070, 4 chip S2050/2070)

3 billion transistor chip?

Number of cores limited by power considerations, C2050 has 448 cores.

Evolving GPU design:NVIDIA Fermi architecture(announced Sept 2009)

GPU performance gains over CPUs

9/22/2002 2/4/2004 6/18/2005 10/31/2006 3/14/2008 7/27/2009

NVIDIA GPUIntel CPU

Westmere

NV30NV40

3GHz Dual Core P4

3GHz Core2 Duo

3GHz Xeon Quad

NVIDIA Kepler architecture and GPUs (2012)

A lot of major new features over earlier Fermi architecture – will look at them later in course

GeForce 600 series card introduced early 2012.

GTX 680 has 1536 cores, 195 watts. Introduced March 2012.

GXT 690 has two dies, 3072 cores (2 x 1536 cores), 300 watts. Introduced April 2012.

CUDA Computer Capability 3.0 see next

http://en.wikipedia.org/wiki/GeForce_600_Series http://www.tomshardware.com/news/Nvidia-Kepler-GK104-GeForce-GTX-670-680,14691.html

GK104 chip with 1536 cores

CUDA(Compute Unified Device Architecture)

• Architecture and programming model introduced in NVIDIA in 2007

• Enables GPUs to execute programs written in C.

• Within C programs, call SIMT “kernel” routines that are executed on GPU.

• CUDA syntax extension to C identify routine as a Kernel.

• Very easy to learn although to get highest possible execution performance requires understanding of hardware architecture.

• Version 3 introduced in 2009 – the one we have been using

• Current version 4 introduced 2011 – significant additions including “unified virtual addressing” – a single address space across GPU and host, see later.

• We will go into CUDA in detail later and have programming experiences.

Questions

Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 April 4, 2013...

Documents

Transcript of Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 April 4, 2013...

120126 Tisa Itcs

1 Load Balancing and Termination Detection ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson, 2009.

Itcs 4120 introduction (c)

ITCS 3153 Artificial Intelligence

PHP ITCS 323

ITCS Quick REFERENCE GUIDE: · Web viewNovember 16, 2015 ITCS Quick REFERENCE GUIDE: Connect to vcl - Apple Page 6 © 2015 ITCS East Carolina University

ITCS 6163 Data Warehousing

ITCS Year in Review

media.cranepedia.commedia.cranepedia.com/uploads/2018/03/7055-7065-2-catalog-ja.pdfCrawler Crane Luffing Tower ... System . 1 mg ) Fine Control . ITCS ITCS ITCS m-f¥-h ITCS ITCS ITCS

5141 5145.output

Opportunities itcs

ITCs Agri Business

ITCS 6010 Natural Language Systems. Overview Welcome to ITCS 6010 Syllabus Introduction.

ITCS Assignment

6.1 Synchronous Computations ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.

Patong villa-4145

4145 Shepherds Hill eflyer

1a-1.1 Parallel Computing Demand for High Performance ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson Dec 27, 2012 slides1a-1.

STEEL AISI 4145 Modified

1 Numerical Algorithms Matrix multiplication Solving a system of linear equations ITCS 4/5145 Parallel Computing UNC-Charlotte, B. Wilkinson, Feb 28, 2012.