MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

38
MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    0

Transcript of MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

Page 1: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

MULTICORE, PARALLELISM, AND MULTITHREADINGBy: Eric Boren, Charles Noneman, and Kristen Janick

Page 2: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

MULTICORE PROCESSING

Why we care

Page 3: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

What is it?

A processor with more than one core on a single chip

Core: An independent system capable of processing instructions and modifying registers and memory

Page 4: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

Motivation

Advancements in component technology and optimization are limited in contribution to processor speed

Many CPU applications attempt to do multiple things at once: Video editing Multi-agent simulation

So, use multiple cores to get it done faster

Page 5: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

Hurdles

Instruction assignment (who does what?) Mostly delegated to the operating system Can be done to a small degree through

dependency analysis on the chip Cores must still communicate at times –

how? Shared-memory Message passing

Page 6: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

Advantages

Multiple Programs: Can be separated between cores Other programs don’t suffer when one hogs CPU

Multi-threaded Applications: Independent threads don’t have to wait as long for

each other – results in faster overall execution VS Multiple Processors

Less distance between chips - faster communication results in higher maximum clock rate

Less expensive due to smaller overall chip area, shared components (caches, etc.)

Page 7: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

Disadvantages

OS and programs must be optimized for multiple cores, or no gain will be seen

In a singly-threaded application, little to no improvement

Overhead in assigning tasks to cores

Real bottleneck is typically memory and disk access time – independent of number of cores

Page 8: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

Amdahl’s Law

Potential performance increase on a parallel computing platform is given by Amdahl’s law. Large problems are made up of several

parallelizable parts and non-parallelizable parts.

S = 1/(1-P) S = speed-up of program P = fraction of program that is parallizable

Page 9: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

Current State of the Art

Commercial processors: Most have at least 2 cores Quad-core are highly popular for desktop

applications 6-core processors have recently appeared on the

market (Intel’s i7 980X) 8-core exist but are less common

Academic and research: MIT: RAW 16-core Intel Polaris – 80-core UC Davis: AsAP – 36 and 167-core, individually-

clocked

Page 10: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

PARALLELISM

Page 11: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

What is Parallel Computing?

Form of computation in which many calculations are carried out simultaneously.

Operating on the principle that large problems can often be divided into smaller ones, which are solved concurrently.

Page 12: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

Types of Parallelism

Bit level parallelism Increase processor word size

Instruction level parallelism Instructions combined into groups

Data parallelism Distribute data over different computing environments

Task parallelism Distribute threads across different computing

environments

Page 13: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

Flynn’s Taxonomy

Page 14: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

Single Instruction, Single Data (SISD)

Provides no parallelism in hardware

1 data stream processed by the CPU in 1 clock cycle

Instructions executed in serial fashion

Page 15: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

Multiple Instruction, Single Data (MISD)

Process single data stream using multiple instruction streams simultaneously

More theoretical model than practical model

Page 16: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

Single Instruction, Multiple Data (SIMD)

Single instruction steam has ability to process multiple data streams in 1 clock cycle

Takes operation specified in one instruction and applies it to more than 1 set of data elements at 1 time

Suitable for graphics and image processing

Page 17: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

Multiple Instruction, Multiple Data (MIMD)

Different processors can execute different instructions on different pieces of data

Each processor can run independent task

Page 18: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

Automatic parallelization

The goal is to relieve programmers from the tedious and error-prone manual parallelization process.

Parallelizing compiler tries to split up a loops so that its iterations can be executed on separate processors concurrently

Identify dependences between references -- independent actions can operate in parallel

Page 19: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

Parallel Programming languages

Concurrent programming languages, libraries, API’s, and parallel programming models have been created for programming parallel computers.

Parallel languages make it easier to write parallel algorithms Resulting code will run more efficiently because

the compiler will have more information to work with

Easier to identify data dependencies so that the runtime system can implicitly schedule independent work

Page 20: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

MULTITHREADING TECHNIQUES

Page 21: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

fork()

Make a (nearly) exact duplicate of the process

Good when there is no or almost no need to communicate between processes

Often used for servers

Page 22: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

fork()

ParentGlobalsHeap Stack

ChildGlobalsHeapStack

ChildGlobalsHeapStack

ChildGlobalsHeapStack

ChildGlobalsHeapStack

Page 23: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

fork()

pid_t pID = fork();

if (pID == 0) {//child

} else {//parent

}

Page 24: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

POSIX Threads

C library for threading

Available in Linux, OS X

Shared Memory

Threads are created and destroyed manually

Has mechanisms for locking memory

Page 25: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

POSIX Threads

ProcessGlobalsHeap

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Page 26: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

POSIX Threads

pthread_t thread;pthread_create( &thread, NULL,

function_to_call, (void*) data);

//Do stuff

pthread_join(thread, NULL);

Page 27: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

POSIX Threads

int total = 0;

void do_work() {//Do stuff to create “result”total = total + result;

}

Thread 1 reads total (0)Thread 2 reads total (0)Thread 1 does add and saves total (1)Thread 2 does add and saves total (2)

Page 28: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

POSIX Threads

int total = 0;pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

void do_work() {//Do stuff to create “result”pthread_mutex_lock( &mutex );total = total + result;pthread_mutex_unlock( &mutex );

}

Page 29: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

OpenMP

Library and compiler directives for multi-threading

Support in Visual C++, gcc

Code compiles even if compiler doesn't support OpenMP

Popular in high performance communities

Easy to add parallelism to existing code

Page 30: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

OpenMPInitialize an Array

const int array_size = 100000;int i, a[array_size]; #pragma omp parallel forfor (i = 0; i < array_size; i++) { a[i] = 2 * i;}

Page 31: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

OpenMPReduction

#pragma omp parallel for reduction(+:total)

for(i = 0; i < array_size; i++) { total = total + a[i]; }

Page 32: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

Grand Central Dispatch

Apple Technology for Multi-ThreadingProgrammer puts work into queuesA system central process determines the number threads to give to each queueAdd code to queues using a closureRight now Mac only, but open sourceEasy to add parallelism to existing code

Page 33: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

Grand Central DispatchInitialize an Array

dispatch_apply(array_size, dispatch_get_global_queue(0, 0), ^(int i) {

a[i] = 2*i; });

Page 34: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

Grand Central DispatchGUI Example

void analyzeDocument(doc) { do_analysis(doc); //May take a very long

time update_display();}

Page 35: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

Grand Central DispatchGUI Example

void analyzeDocument(doc) {

dispatch_async(dispatch_get_global_queue(0, 0), ^{

do_analysis(doc); update_display(); });}

Page 36: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

Other Technologies

Threading in Java, Python, etc.

MPI – for clusters

Page 37: MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

QUESTIONS?