Tera-scale software research: Programming 10s-100s of...

23
© 2007 Intel Corporation Jerry Bautista, PhD Co-director, Tera-scale Computing Research Intel Corporation Tera Tera- scale software research: scale software research: Programming 10s Programming 10s- 100s of cores 100s of cores

Transcript of Tera-scale software research: Programming 10s-100s of...

Page 1: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

© 2007 Intel Corporation

Jerry Bautista, PhDCo-director, Tera-scale Computing Research

Intel Corporation

TeraTera--scale software research:scale software research:Programming 10sProgramming 10s--100s of cores100s of cores

Page 2: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

2

What is Tera-scale?

Teraflops of performanceoperating on

Terabytes of data

TerabytesTerabytes

TIPSTIPS

GigabytesGigabytes

MIPSMIPS

MegabytesMegabytes

GIPSGIPS

Perf

orm

ance

Perf

orm

ance

Dataset SizeDataset SizeKilobytesKilobytes

KIPSKIPS

Mult-Media

3D &Video

Text

EmergingApps

TeraTera--scalescale(many cores)(many cores)

MultiMulti--corecore

SingleSingle--corecore

Page 3: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

3

Example emerging Application

Many emerging apps are parallel and “model-based”Ex: Modeled body, modeled motion, modeled lighting

See Justin’s keynote on “Virtual World” Thursday

Placeholder for real video (14mb)

Page 4: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

4

Tera-scale Computing:A New Drive for Parallel Programming

CoreCoreCountsCounts

TimeTime

We areWe areherehere

11

1010

100100

UtilizationUtilizationof cores by SWof cores by SW

0%0%

50%50%

100%100%

GAPGAP

Growing need for new parallelprogramming tools and capabilities

Page 5: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

5

SW

Joint Hardware & Software R&D

NewArchitecture

Ideas

HWRefine

SoftwareDesigns

PrototypeThreaded

Application

NewApplication

Ideas

RefineHardwareDesigns

PrototypeMulti-core

Architecture

TESTHW+SW

Examples:Cache hierarchyTask scheduling

Examples:Media search

Physical Modeling

SiliconPrototypesGHz speeds

Months to modify

FPGAEmulatorsMHz speeds

Days to modify

Cycle-accurateSimulatorskHz speeds

~hours to modify

RESEARCHTOOLS

See 80-coredemo in tech

showcase

See 80-coredemo in tech

showcase

See Emulatordemo in tech

showcase

See Emulatordemo in tech

showcase

Page 6: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

6

VirtualEnvironments

FinancialModeling

Media Search& Manipulation

Web MiningBots’

EducationalSimulation

ModelModel--Based ApplicationsBased Applications

The Tera-scale Computing Vision

High BandwidthI/O & CommunicationsHigh BandwidthI/O & Communications

Stacked,Shared Memory

Stacked,Shared Memory

Scalable Multi-coreArchitectures

Scalable Multi-coreArchitectures

ThreadThread--SavvySavvyExecution EnvironmentExecution Environment

Parallel ProgrammingParallel ProgrammingTools & TechniquesTools & Techniques

Page 7: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

7

PARALLELAPPLICATIONS

Taking Parallel Programming Mainstream

Used for decades in HPC, parallel programming requiresSpecial expertise, not easily automatedRequires parallel languages/language extensionsMust co-exist with legacy code

Parallel languages or parallel language extensions need to:

Extract parallelism hiding within applications

Express parallelism via programming constructs

Exploit parallelism on multi-core platforms

EXPERTSTools for authoring,debugging tuning

APP DEVELOPERSUse tools to create high

performance parallel code

Page 8: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

8

Types of Parallelism

Task: multiple independentactivities, which may or may notshare data

Data LevelParallelism

Task LevelParallelism

1

2

3

COREDATA

DATA

OPER-ATION

1

2

3

CORE

Data: one or few tasks operatingon a large amount of data

- Will review Ct later in presentation- Subject of demo and class (TCRS0003)

Page 9: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

9

Common Parallel Programming Models

Streaming

Data Parallel

Shared-memoryor Thread

Message Passing

Programmingmodel

ExampleApplication Today

ExampleLanguage

HPC type apps on a cluster

Mathematical simulations

Graphics shaders

General Par Tasks, Database

MPI

CG

OpenMP

MatLab

Page 10: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

10

Diverse Programming Environment

Source: Tim O'Reilly Book Index

Historically, new languages often emerge and compete.Adoption can be slow and success hard to predict.

Lesson: No one language will be a silver bulletthat solves the parallel programming challenges

Page 11: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

11

Research to Help Enable Parallelism

STM (Software Transactional Memory)- Shared memory model is dominant but problematic

due to synchronization of locks in highly parallelenvironments

Ct (C for throughput computing)- Relieves need to worry about threads in data parallel

execution models providing parallel extensions to Cand C++

Exo (Accelerator Exoskeleton)- Bringing IA “look and feel “ of common development

environments to parallel, heterogeneousenvironments.

Page 12: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

12

Unlocking Parallelism in a SharedMemory Environment

- Other threads must wait, reducingmulti-core benefit

- Locking code scales poorly,must re-do for more threads

- Can cause critical softwaredeadlocks and errors

- We need to fix this…

Must carefully control howmultiple threads accessshared memory

Today we “lock” memoryfor one thread at a time.

A $100

B $200

C $200

A $50

Account lockedAccount lockedduring accessduring access

2003 Northeast2003 Northeastblackoutblackout

Mars rover problemMars rover problem

Courtesy NASA/JPLCourtesy NASA/JPL--CaltechCaltech

Presented by Justin Rattner atIDF Day0, Spring 2006

Page 13: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

13

Ensures correct parallel memory access without locks

STM: From Research to Reality

A $100

B $200

C $200

A $50

Greater performance

Easer to program

Scales with hardware

Programmers can try it!

*Other names and brands may be claimed as the property of others.

AVAILABLE STARTING TODAYWhatif.intel.com

Intel® C++ STM Compiler Prototype Edition

Many simultaneoustransactions

SOFTWARE TRANSACTIONAL MEMORY

Page 14: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

14

Ct: Nested Data Parallel Programming

Ct adds new parallel data structures & operators to C/C++;But uses existing & unmodified C/C++ compilers

C/C++

Compiler

C++Ct-based Parallel Data Types

C/C++

libs

Ct

Runtime

Physics, Image,Video, SignalProcessing, …

Tera-scale

Scalable, AdaptivePerformance

12

0 0

0 5

0 6

0 3

0 0

0 0

4 7

1 2 5

63

4

7

Page 15: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

15

Ct Motivation and Vision

Make parallel programming easier now:

- Extend deterministic parallel programming models

I.e. Data races not possible

- Express complex behaviors through simple operators

- Present a simple and predictable performance model

Provide a forward-scaling programming model that

maximizes ISV ROI for new code creation

• “Future-proof” apps from increasing core count and inevitable ISA evolution

See today’s post by Anwar Ghuloum atblogs.intel.com/research for more info

Page 16: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

16

Programming for Heterogeneous Cores

• Multi-core brings the opportunity to integrate fixed functionaccelerators with IA cores

• Implementation difficult and very HW specific

AcceleratorsIA cores

OperatingSystem

DeviceDriver

Application

High-level AcceleratorAbstraction Layer

Accelerators today relyon custom drivers bound

tightly to the OS.

The software layersbetween the appaccelerator cause

performance-limitingoverhead

Page 17: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

17

Host!

Host!

Programming Accelerators Today

Learn acceleratortools, language

Compileaccelerator

program

Compilehost IAprogram

Obtain custom kitsfrom hw vendor

Accelerate!

Host!

MAIN MEMORYAccelerate!

Host loadsaccelerator program

Accelerate!

Accelerate!

OS DriverAbstractionLayer

Send commands viadriver interface

Host! Accelerate!

OS DriverAbstractionLayer

DEVICE MEMORYRESULTS!

Results transferredfrom device memory

SW DEVELOPMENT SW EXECUTION

Page 18: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

18

Exo - Accelerator Exoskeleton Model

OperatingSystem

Application

Multi-threadingRuntimeLibrary

ISA with AcceleratorExoskeleton Extensions

The exoskeleton makes accelerators appear like a part of the processor

Compile singleprogram

Use standardtools to design

code

Programaccelerator(s)

like IAextensions

SHARED VIR-TUAL MEMORYMAIN MEMORY

DEVICE MEMORYProgram

SW DEVELOPMENT SW EXECUTION

Page 19: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

19

Demonstrations

IntelGMA

X3000

1. Video enhancement application (de-interlacing)

CentrinoProcessor

Code for graphicsin standard IDE

Use Intel compilerto build a single “.exe”

De-interlaceApplication

Accel-Exo.Runtime

Exoskeleton environmentdivides the work

2. Similar demo with a financial analytics application

Page 20: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

20

MODEL-BASED Computing

Convergence of Many Parallel Apps

COMMON NUMERICAL METHODS& DATA STRUCTURES

Web searchWeb search

RR

FacialFacialAnimationAnimation

RayRayTracingTracingCancerCancer

DetectionDetection

BodyBodyTrackingTracking

MM

SS

DataDataWarehousingWarehousing

MediaMediaIndexingIndexing

FinancialFinancialPredictionsPredictions

InteractiveInteractiveVirtualVirtualWorldsWorlds

SecuritySecurityBiometricsBiometrics

CollisionDetection

Classifiers

GeometricStructures

OptimizationMethods

VectorTraining

MonteCarlo MATHEMATICAL

MODELS & METHODS

Key architecturalKey architecturalenhancements based onenhancements based onapplication andapplication and user needsuser needs

RMS TaxonomyRMS TaxonomyRRecognitionecognition

MMininginingSSynthesisynthesis

Page 21: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

21

Independent IDC Market Analysis

Spending on computing continues to grow

New usage models are emerging

IDC working with Intel and others to understand this space- "IDC believes that new use cases for computing are emerging which

will drive significant growth…Our research has shown that highlyparallel applications and multi-core processors are key driversenabling this emerging trend.”

-Matt Eastwood, Group Vice President, IDC Enterprise Platform Research

Full report to be published in October

Page 22: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

22

SummaryThe key challenges to parallel programming- Programmability and scalability- Interoperability and integration of heterogeneous execution

blocks

Parallel programming is the key to unlocking the fullpotential of the Tera-scale platforms- Mainstream programmers enabled through broad set of

development and optimization tools.- Technologies developed to extend well-known programming

environments (Ct, Exo and STM)- Tools and technologies are already being deployed now:

TBB – threadingbuildingblocks.orgSTM on Whatif.intel.com

Model-based computing applications are compellingwith market projections supporting potential forbroad adoption

Page 23: Tera-scale software research: Programming 10s-100s of coresdownload.intel.com/pressroom/kits/...Programming10s... · yThe key challenges to parallel programming-Programmability and

23

More information on Tera-scale…

Tera-scale Computing Research Chalk TalkChair: Jerry Bautista, Co-Director, Tera-scale Computing ResearchTCRC001: Wed, Sept. 19, 4:40 - 5:30, Chalk Talk Room

Other Tera-scale Sessions (see IDF guide for full info)TCRS001 Energy Management Innovations for Future Multi-Core ProcessorsTCRS002 Intelligent On-chip Interconnects: The 80-core Prototype and BeyondTCRS003 Data Parallel Programming for Tera-scale with CtTCRS004 Modeling Reality: Ray Traced Graphics and other Apps of the FutureTCRS005 Silicon Photonics: Enabling Terabit Data PipesQATS003 Accelerator Exoskeleton: IA Look-n-Feel for Heterogeneous Cores

Tech & Research Pavilion Demos:80-core Processor, Multi-core emulator, Ct, Log-based Architectures, Ray Tracing

Learn more at www.intel.com/go/terascale and blogs.intel.com/research