Overview of High-Performance

14
CS 594 Spring 2002 Lecture 1: Overview of High-Performance Computing Most Important Slide Computational Science Why Turn to Simulation? Pretty Pictures Automotive Industry

Transcript of Overview of High-Performance

CS 594 Spring 2002Lecture 1:Overview of High-Performance Computing

-DFN�'RQJDUUD�

8QLYHUVLW\�RI�7HQQHVVHH�

Most Important Slide

◆ 1HWOLE � VRIWZDUH�UHSRVLWRU\¾*R�WR�KWWS���ZZZ�QHWOLE�RUJ�

◆ 5HJLVWHU�IRU�WKH�QD�GLJHVW¾*R�WR�KWWS���ZZZ�QHWOLE�RUJ�QD�QHW�

¾5HJLVWHU�WR�UHFHLYH�WKH�QD�GLJHVW

� MYYU���\\\�SJYQNG�TWL�SF�SJY�OTNSDRFNQDKTW\�MYRQ

Computational Science

◆ +3&�RIIHUV�D�QHZ�ZD\�WR�GR�VFLHQFH�¾ ([SHULPHQW�� 7KHRU\�� &RPSXWDWLRQ

◆ &RPSXWDWLRQ�XVHG�WR�DSSUR[LPDWH�SK\VLFDO�V\VWHPV�� $GYDQWDJHV�LQFOXGH�¾3OD\LQJ�ZLWK�VLPXODWLRQ�SDUDPHWHUV�WR�VWXG\�HPHUJHQW�WUHQGV

¾3RVVLEOH�UHSOD\�RI�D�SDUWLFXODU�VLPXODWLRQ�HYHQW

¾6WXG\�V\VWHPV�ZKHUH�QR�H[DFW�WKHRULHV�H[LVW�

Why Turn to Simulation?

◆ :KHQ�WKH�SUREOHP�LV�WRR������¾ &RPSOH[

¾ /DUJH���VPDOO

¾ ([SHQVLYH

¾ 'DQJHURXV

◆ WR�GR�DQ\�RWKHU�ZD\�

7DXUXVBWRB7DXUXVB��SHUB��GHJ�PSHJ

Pretty Pictures

Automotive Industry◆ +XJH�XVHUV�RI�+3&�WHFKQRORJ\�

¾ )RUG�LV���WK�ODUJHVW�XVHU�RI�+3&�LQ�WKH�ZRUOG

◆ 0DLQ�XVHV�RI�VLPXODWLRQ�¾ $HURG\QDPLFV��VLPLODU�WR�DHURVSDFH�¾ &UDVK�VLPXODWLRQ¾ 0HWDO�VKHHW�IRUPDWLRQ¾ 1RLVH�YLEUDWLRQ�RSWLPL]DWLRQ¾ 7UDIILF�VLPXODWLRQ

◆ 0DLQ�EHQHILWV�¾ 5HGXFHG�WLPH�WR�PDUNHW�RI�QHZ�FDUV¾ ,QFUHDVHG�TXDOLW\¾ 5HGXFHG�QHHG�WR�EXLOG�SURWRW\SHV¾ PRUH�HIILFLHQW��LQWHJUDWHG�PDQXIDFWXULQJ�SURFHVVHV

Why Turn to Simulation?◆ &OLPDWH���:HDWKHU�0RGHOLQJ

◆ 'DWD�LQWHQVLYH�SUREOHPV��GDWD�PLQLQJ��RLO�UHVHUYRLU�VLPXODWLRQ�

◆ 3UREOHPV�ZLWK�ODUJH�OHQJWK�DQG�WLPH�VFDOHV��FRVPRORJ\�

Units of High Performance Computing

1 Mflop/s 1 Megaflop/s 106 Flop/sec

1 Gflop/s 1 Gigaflop/s 109 Flop/sec

1 Tflop/s 1 Teraflop/s 1012 Flop/sec

1 Pflop/s 1 Petaflop/s 1015 Flop/sec

1 MB 1 Megabyte 106 Bytes

1 GB 1 Gigabyte 109 Bytes

1 TB 1 Terabyte 1012 Bytes

1 PB 1 Petabyte 1015 Bytes

High-Performance Computing Today

◆ ,Q�WKH�SDVW�GHFDGH��WKH�ZRUOG�KDV�H[SHULHQFHG�RQH�RI�WKH�PRVW�H[FLWLQJ�SHULRGV�LQ�FRPSXWHU�GHYHORSPHQW�

◆ 0LFURSURFHVVRUV�KDYH�EHFRPH�VPDOOHU��GHQVHU��DQG�PRUH�SRZHUIXO�

◆ 7KH�UHVXOW�LV�WKDW�PLFURSURFHVVRU�EDVHG�VXSHUFRPSXWLQJ�LV�UDSLGO\�EHFRPLQJ�WKH�WHFKQRORJ\�RI�SUHIHUHQFH�LQ�DWWDFNLQJ�VRPH�RI�WKH�PRVW�LPSRUWDQW�SUREOHPV�RI�VFLHQFH�DQG�HQJLQHHULQJ�

��

Technology Trends: Microprocessor Capacity

2X transistors/Chip Every 1.5 years

Called “Moore’s Law”

Moore’s Law

Microprocessors have become smaller, denser, and more powerful.Not just processors, bandwidth, storage, etc

Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips would double roughly every 18 months.

��

Internet –4th Revolution in Telecommunications

◆ 7HOHSKRQH��5DGLR��7HOHYLVLRQ

◆ *URZWK�LQ�,QWHUQHW�RXWVWULSV�WKH�RWKHUV

◆ ([SRQHQWLDO�JURZWK�VLQFH�����

◆ 7UDIILF�GRXEOHV�HYHU\�����GD\V

��

The Web Phenomenon

◆ ���² ���:HE�LQYHQWHG

◆ 8�RI�,OOLQRLV�0RVDLF�UHOHDVHG�0DUFK�����a������WUDIILF

◆ 6HSWHPEHU����a����WUDIILF�Z�����VLWHV

◆ -XQH����a�����RI�WUDIILF�Z�������VLWHV

◆ 7RGD\�����RI�WUDIILF�Z�����������VLWHV

◆ (YHU\�RUJDQL]DWLRQ��FRPSDQ\��VFKRRO

��

Peer to Peer Computing◆ 3HHU�WR�SHHU�LV�D�VW\OH�RI�QHWZRUNLQJ�LQ�ZKLFK������������������

D�JURXS�RI��FRPSXWHUV�FRPPXQLFDWH�GLUHFWO\�ZLWK�����������HDFK�RWKHU�

◆ :LUHOHVV�FRPPXQLFDWLRQ

◆ +RPH�FRPSXWHU�LQ�WKH�XWLOLW\�URRP��QH[W�WR�WKH�ZDWHU�KHDWHU�DQG�IXUQDFH��

◆ :HE�WDEOHWV�

◆ ,PEHGGHG�FRPSXWHUV�LQ�WKLQJV������������������������������������DOO�WLHG�WRJHWKHU�¾ %RRNV��IXUQLWXUH��PLON�FDUWRQV��HWF

◆ 6PDUW�$SSOLDQFHV¾ 5HIULJHUDWRU��VFDOH��HWF

��

Internet On Everything

��

SETI@home: Global Distributed Computing

◆ 5XQQLQJ�RQ���������3&V��a�����&38�<HDUV�SHU�'D\¾ ��������&38�<HDUV�VR�IDU

◆ 6RSKLVWLFDWHG�'DWD��6LJQDO�3URFHVVLQJ�$QDO\VLV◆ 'LVWULEXWHV�'DWDVHWV�IURP�$UHFLER 5DGLR�7HOHVFRSH

��

SETI@home◆ 8VH�WKRXVDQGV�RI�,QWHUQHW�

FRQQHFWHG�3&V�WR�KHOS�LQ�WKH�VHDUFK�IRU�H[WUDWHUUHVWULDO�LQWHOOLJHQFH�

◆ 8VHV�GDWD�FROOHFWHG�ZLWK�WKH�$UHFLER 5DGLR�7HOHVFRSH��LQ�3XHUWR�5LFR�

◆ :KHQ�WKHLU�FRPSXWHU�LV�LGOH�RU�EHLQJ�ZDVWHG�WKLV�VRIWZDUH�ZLOO�GRZQORDG�D�����NLORE\WH�FKXQN�RI�GDWD�IRU�DQDO\VLV��

◆ 7KH�UHVXOWV�RI�WKLV�DQDO\VLV�DUH�VHQW�EDFN�WR�WKH�6(7,�WHDP��FRPELQHG�ZLWK�WKH�FUXQFKHG�GDWD�IURP�WKH�PDQ\�WKRXVDQGV�RI�RWKHU�6(7,#KRPH�SDUWLFLSDQWV�

��

Next Generation Web

◆ 7R�WUHDW�&38�F\FOHV�DQG�VRIWZDUH�OLNH�FRPPRGLWLHV�

◆ (QDEOH�WKH�FRRUGLQDWHG�XVH�RI�JHRJUDSKLFDOO\�GLVWULEXWHG�UHVRXUFHV�² LQ�WKH�DEVHQFH�RI�FHQWUDO�FRQWURO�DQG�H[LVWLQJ�WUXVW�UHODWLRQVKLSV��

◆ &RPSXWLQJ�SRZHU�LV�SURGXFHG�PXFK�OLNH�XWLOLWLHV�VXFK�DV�SRZHU�DQG�ZDWHU�DUH�SURGXFHG�IRU�FRQVXPHUV�

◆ 8VHUV�ZLOO�KDYH�DFFHVV�WR�´SRZHUµ�RQ�GHPDQG�

◆ 7KLV�LV�RQH�RI�RXU�HIIRUWV�DW�87�

Performance vs. Time

Mips25 mhz

0.1

1.0

10

100

Per

form

ance

(V

AX

780

s)

1980 1985 1990

MV10K

68K

7805 Mhz

RISC 60% / yr

uVAX 6K(CMOS)

8600

TTL

ECL 15%/yr

CMOS CISC

38%/yr

o ||MIPS (8 Mhz)

o9000

Mips(65 Mhz)

uVAXCMOS Will RISC continue on a

60%, (x4 / 3 years)? Moore’s speed law?

4K

��

Trends in Computer PerformanceT

ASCI Red

��

Other Examples: Sony PlayStation2

◆ (PRWLRQ�(QJLQH������*IORS�V�����PLOOLRQ�SRO\JRQV�SHU�VHFRQG��0LFURSURFHVVRU�5HSRUW�������¾6XSHUVFDODU�0,36�FRUH���YHFWRU�FRSURFHVVRU���JUDSKLFV�'5$0¾&ODLP��´7R\�6WRU\µ�UHDOLVP�EURXJKW�WR�JDPHV¾$ERXW�����

��

Sony PlayStation2 Export Limits?

��

Where Has This Performance Improvement Come From?

◆ 7HFKQRORJ\"

◆ 2UJDQL]DWLRQ"

◆ ,QVWUXFWLRQ�6HW�$UFKLWHFWXUH"

◆ 6RIWZDUH"

◆ 6RPH�FRPELQDWLRQ�RI�DOO�RI�WKH�DERYH"

1st Principles

◆:KDW�KDSSHQV�ZKHQ�WKH�IHDWXUH�VL]H�VKULQNV�E\�D�IDFWRU�RI�[ "

◆&ORFN�UDWH�JRHV�XS�E\�[¾DFWXDOO\�OHVV�WKDQ�[��EHFDXVH�RI�SRZHU�FRQVXPSWLRQ

◆7UDQVLVWRUV�SHU�XQLW�DUHD�JRHV�XS�E\�[�

◆'LH�VL]H�DOVR�WHQGV�WR�LQFUHDVH¾W\SLFDOO\�DQRWKHU�IDFWRU�RI�a[

◆5DZ�FRPSXWLQJ�SRZHU�RI�WKH�FKLS�JRHV�XS�E\�a�[���¾RI�ZKLFK�[��LV�GHYRWHG�HLWKHU�WR�SDUDOOHOLVP RU�ORFDOLW\

��

How fast can a serial computer be?

◆ &RQVLGHU�WKH���7IORS�VHTXHQWLDO�PDFKLQH¾ GDWD�PXVW�WUDYHO�VRPH�GLVWDQFH��U��WR�JHW�IURP�PHPRU\�WR�&38

¾ WR�JHW���GDWD�HOHPHQW�SHU�F\FOH��WKLV�PHDQV������WLPHV�SHU�VHFRQG�DW�WKH�VSHHG�RI�OLJKW��F� ��[��� P�V

¾ VR�U���F����� ����PP◆ 1RZ�SXW���7%�RI�VWRUDJH�LQ�D����PP� DUHD�

¾ HDFK�ZRUG�RFFXSLHV�DERXW���$QJVWURPV���WKH�VL]H�RI�D�VPDOO�DWRP

r = .3 mm1 Tflop 1 TB sequential machine

��

Processor-Memory Problem

◆ 3URFHVVRUV�LVVXH�LQVWUXFWLRQV�URXJKO\�HYHU\�QDQRVHFRQG�

◆ '5$0�FDQ�EH�DFFHVVHG�URXJKO\�HYHU\�����QDQRVHFRQGV�����

◆ '5$0�FDQQRW�NHHS�SURFHVVRUV�EXV\��$QG�WKH�JDS�LV�JURZLQJ�¾SURFHVVRUV�JHWWLQJ�IDVWHU�E\�����SHU�\HDU

¾'5$0�JHWWLQJ�IDVWHU�E\����SHU�\HDU��6'5$0�DQG�('2�5$0�PLJKW�KHOS��EXW�QRW�HQRXJK�

��

RAM◆ 6'5$0�LQFRUSRUDWHV�QHZ�IHDWXUHV�WKDW�DOORZ�LW�WR�NHHS�SDFH�

ZLWK�EXV�VSHHGV�DV�KLJK�DV�����0+]��,W�GRHV�WKLV�SULPDULO\�E\�DOORZLQJ�WZR�VHWV�RI�PHPRU\�DGGUHVVHV�WR�EH�RSHQHG�VLPXOWDQHRXVO\��¾ 'DWD FDQ�WKHQ�EH�UHWULHYHG�DOWHUQDWHO\�IURP�HDFK�VHW��HOLPLQDWLQJ�

WKH�GHOD\V�WKDW�QRUPDOO\�RFFXU�ZKHQ�RQH�EDQN�RI�DGGUHVVHV�PXVW�EH�VKXW�GRZQ�DQG�DQRWKHU�SUHSDUHG�IRU�UHDGLQJ�GXULQJ�HDFK�UHTXHVW��

◆ ('2��H[WHQGHG�GDWD�RXWSXW��5$0�LV�D�W\SH�RI�UDQGRP�DFFHVV�PHPRU\��5$0��FKLS�WKDW�LPSURYHV�WKH�WLPH�WR�UHDG�IURP�PHPRU\�RQ�IDVWHU�PLFURSURFHVVRUV�VXFK�DV�WKH�,QWHO�3HQWLXP��¾ 7KLV�IRUP�RI�G\QDPLF�5$0�VSHHGV�DFFHVV�WR�PHPRU\�ORFDWLRQV�E\�

ZRUNLQJ�RQ�D�VLPSOH�DVVXPSWLRQ��WKH�QH[W�WLPH�PHPRU\�LV�DFFHVVHG��LW�ZLOO�EH�DW�D�FRQWLJXRXV�DGGUHVV�LQ�D�FRQWLJXRXV�FKXQN�RI�KDUGZDUH��7KLV�DVVXPSWLRQ�VSHHGV�XS�PHPRU\�DFFHVV�WLPHV�E\�XS�WR���SHUFHQW�RYHU�VWDQGDUG�'5$0��

��

Processor-DRAM Gap (latency)

µProc60%/yr.

DRAM7%/yr.

1

10

100

1000

1980

1981

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

DRAM

CPU

1982

Processor-MemoryPerformance Gap:(grows 50% / year)

Per

form

ance

Time

“Moore’s Law”

��

•Why Parallel Computing

◆ 'HVLUH�WR�VROYH�ELJJHU��PRUH�UHDOLVWLF�DSSOLFDWLRQV�SUREOHPV�

◆ )XQGDPHQWDO�OLPLWV�DUH�EHLQJ�DSSURDFKHG�

◆ 0RUH�FRVW�HIIHFWLYH�VROXWLRQ

��

Principles of Parallel Computing

◆ 3DUDOOHOLVP�DQG�$PGDKO·V�/DZ

◆ *UDQXODULW\◆ /RFDOLW\

◆ /RDG�EDODQFH

◆ &RRUGLQDWLRQ�DQG�V\QFKURQL]DWLRQ

◆ 3HUIRUPDQFH�PRGHOLQJ

All of these things makes parallel programming even harder than sequential programming.

��

“Automatic” Parallelism in Modern Machines

◆ %LW�OHYHO�SDUDOOHOLVP

¾ ZLWKLQ�IORDWLQJ�SRLQW�RSHUDWLRQV��HWF�

◆ ,QVWUXFWLRQ�OHYHO�SDUDOOHOLVP��,/3�

¾ PXOWLSOH�LQVWUXFWLRQV�H[HFXWH�SHU�FORFN�F\FOH

◆ 0HPRU\�V\VWHP�SDUDOOHOLVP

¾ RYHUODS�RI�PHPRU\�RSHUDWLRQV�ZLWK�FRPSXWDWLRQ

◆ 26�SDUDOOHOLVP

¾ PXOWLSOH�MREV�UXQ�LQ�SDUDOOHO�RQ�FRPPRGLW\�603V

Limits to all of these -- for very high performance, need user to identify, schedule and coordinate parallel tasks

��

Finding Enough Parallelism

◆ 6XSSRVH�RQO\�SDUW�RI�DQ�DSSOLFDWLRQ�VHHPV�SDUDOOHO

◆ $PGDKO·V�ODZ¾ OHW�IV EH�WKH�IUDFWLRQ�RI�ZRUN�GRQH�VHTXHQWLDOO\�����IV��LV�IUDFWLRQ�SDUDOOHOL]DEOH

¾1� �QXPEHU�RI�SURFHVVRUV

◆ (YHQ�LI�WKH�SDUDOOHO�SDUW�VSHHGV�XS�SHUIHFWO\�PD\�EH�OLPLWHG�����������������������������E\�WKH�VHTXHQWLDO�SDUW

��

Amdahl’s Law places a strict limit on the speedup that can be realized by using multiple processors. Two equivalent expressions for Amdahl’s Law are given below:

tN = (fp/N + fs)t1 Effect of multiple processors on run time

S = 1/(fs + fp/N) Effect of multiple processors on speedup

Where:fs = serial fraction of codefp = parallel fraction of code = 1 - fs

N = number of processors

Amdahl’s Law

��

��

���

���

���

���

� �� ��� ��� ��� ���

Number of processors

spee

dup

IS� ������

IS� ������

IS� ������

IS� ������

It takes only a small fraction of serial content in a code to degrade the parallel performance. It is essential to determine the scaling behavior of your code before doing production runs using large numbers of processors

Illustration of Amdahl’s Law

��

Overhead of Parallelism

◆ *LYHQ�HQRXJK�SDUDOOHO�ZRUN��WKLV�LV�WKH�ELJJHVW�EDUULHU�WR�JHWWLQJ�GHVLUHG�VSHHGXS

◆ 3DUDOOHOLVP�RYHUKHDGV�LQFOXGH�¾ FRVW�RI�VWDUWLQJ�D�WKUHDG�RU�SURFHVV¾ FRVW�RI�FRPPXQLFDWLQJ�VKDUHG�GDWD¾ FRVW�RI�V\QFKURQL]LQJ¾ H[WUD��UHGXQGDQW��FRPSXWDWLRQ

◆ (DFK�RI�WKHVH�FDQ�EH�LQ�WKH�UDQJH�RI�PLOOLVHFRQGV���� PLOOLRQV�RI�IORSV��RQ�VRPH�V\VWHPV

◆ 7UDGHRII��$OJRULWKP�QHHGV�VXIILFLHQWO\�ODUJH�XQLWV�RI�ZRUN�WR�UXQ�IDVW�LQ�SDUDOOHO��,�H��ODUJH�JUDQXODULW\���EXW�QRW�VR�ODUJH�WKDW�WKHUH�LV�QRW�HQRXJK�SDUDOOHO�ZRUN�

Locality and Parallelism

◆ /DUJH�PHPRULHV�DUH�VORZ��IDVW�PHPRULHV�DUH�VPDOO

◆ 6WRUDJH�KLHUDUFKLHV�DUH�ODUJH�DQG�IDVW�RQ�DYHUDJH

◆ 3DUDOOHO�SURFHVVRUV��FROOHFWLYHO\��KDYH�ODUJH��IDVW��¾WKH�VORZ�DFFHVVHV�WR�´UHPRWHµ�GDWD�ZH�FDOO�´FRPPXQLFDWLRQµ

◆ $OJRULWKP�VKRXOG�GR�PRVW�ZRUN�RQ�ORFDO�GDWD

ProcCache

L2 Cache

L3 Cache

Memory

Conventional Storage Hierarchy

ProcCache

L2 Cache

L3 Cache

Memory

ProcCache

L2 Cache

L3 Cache

Memory

potentialinterconnects

��

Load Imbalance

◆ /RDG�LPEDODQFH�LV�WKH�WLPH�WKDW�VRPH�SURFHVVRUV�LQ�WKH�V\VWHP�DUH�LGOH�GXH�WR¾ LQVXIILFLHQW�SDUDOOHOLVP��GXULQJ�WKDW�SKDVH�¾XQHTXDO�VL]H�WDVNV

◆ ([DPSOHV�RI�WKH�ODWWHU¾DGDSWLQJ�WR�´LQWHUHVWLQJ�SDUWV�RI�D�GRPDLQµ¾WUHH�VWUXFWXUHG�FRPSXWDWLRQV�¾IXQGDPHQWDOO\�XQVWUXFWXUHG�SUREOHPV�

◆ $OJRULWKP�QHHGV�WR�EDODQFH�ORDG

��

Performance Trends Revisited(Architectural Innovation)

Microprocessors

Minicomputers

Mainframes

Supercomputers

Year

0.1

1

10

100

1000

1965 1970 1975 1980 1985 1990 1995 2000

CISC/RISC

��

Performance Trends Revisited (Microprocessor Organization)

Year

1000

10000

100000

1000000

10000000

100000000

1970 1975 1980 1985 1990 1995 2000

r4400

r4000

r3010

i80386

i4004

i8080

i80286

i8086

• Bit Level Parallelism

• Pipelining

• Caches

• Instruction Level Parallelism

• Out-of-order Xeq

• Speculation

• . . .

��

◆ *UHDWHU�LQVWUXFWLRQ�OHYHO�SDUDOOHOLVP"

◆ %LJJHU�FDFKHV"

◆ 0XOWLSOH�SURFHVVRUV�SHU�FKLS"

◆ &RPSOHWH�V\VWHPV�RQ�D�FKLS"��3RUWDEOH�6\VWHPV�

◆ +LJK�SHUIRUPDQFH�/$1��,QWHUIDFH��DQG�,QWHUFRQQHFW

What is Ahead?

��

Directions

◆ 0RYH�WRZDUG�VKDUHG�PHPRU\¾603V DQG�'LVWULEXWHG�6KDUHG�0HPRU\¾6KDUHG�DGGUHVV�VSDFH�Z�GHHS�PHPRU\�KLHUDUFK\

◆ &OXVWHULQJ�RI�VKDUHG�PHPRU\�PDFKLQHV�IRU�VFDODELOLW\

◆ (IILFLHQF\�RI�PHVVDJH�SDVVLQJ�DQG�GDWD�SDUDOOHO�SURJUDPPLQJ¾+HOSHG�E\�VWDQGDUGV�HIIRUWV�VXFK�DV�03,�DQG�+3)

��

High Performance Computers◆ a����\HDUV�DJR

¾ �[����)ORDWLQJ�3RLQW�2SV�VHF��0IORS�V��� 8HFQFW�GFXJI

◆ a����\HDUV�DJR¾ �[��� )ORDWLQJ�3RLQW�2SV�VHF��*IORS�V��

� ;JHYTW���8MFWJI�RJRTW^�HTRUZYNSL��GFSI\NIYM�F\FWJ

� 'QTHP�UFWYNYNTSJI��QFYJSH^�YTQJWFSY

◆ a�7RGD\¾ �[�����)ORDWLQJ�3RLQW�2SV�VHF��7IORS�V��

� -NLMQ^�UFWFQQJQ��INXYWNGZYJI�UWTHJXXNSL��RJXXFLJ�UFXXNSL��SJY\TWP�GFXJI

� IFYF�IJHTRUTXNYNTS��HTRRZSNHFYNTS�HTRUZYFYNTS

◆ a����\HDUV�DZD\¾ �[�����)ORDWLQJ�3RLQW�2SV�VHF��3IORS�V��

� 2FS^�RTWJ�QJ[JQX�2-��HTRGNSFYNTS�LWNIX�-5(

� 2TWJ�FIFUYN[J��19�FSI�GFSI\NIYM�F\FWJ��KFZQY�YTQJWFSY�����������J]YJSIJI�UWJHNXNTS��FYYJSYNTS�YT�825�STIJX

��

TOP500TOP500- Listing of the 500 most powerful

Computers in the World- Yardstick: Rmax from LINPACK MPP

Ax=b, dense problem

- Updated twice a yearSC‘xy in the States in NovemberMeeting in Mannheim, Germany in June

All data available from www.top500.org

8N_J

7FYJ

955�UJWKTWRFSHJ

��,Q������D�FRPSXWDWLRQ�WKDW�WRRN���IXOO�\HDU�WR�FRPSOHWHFDQ�QRZ�EH�GRQH�LQ�a����KRXUV�

Fastest Computer Over Time

0

10

20

30

40

50

60

70

1990 1992 1994 1996 1998 2000

Year

GF

lop/

s

X Y (S c a tte r) 1

Cray Y-MP (8)

TMCCM-2(2048)

Fujitsu VP-2600

��

Fastest Computer Over Time

TMCCM-2(2048)

Fujitsu VP-2600

Cray Y-MP (8)

0

100

200

300

400

500

600

700

1990 1992 1994 1996 1998 2000

Year

GF

lop/

s

X Y (S c a tte r) 1

HitachiCP-PACS(2040)

IntelParagon(6788)

FujitsuVPP-500(140)

TMC CM-5(1024)NEC

SX-3(4)

,Q������D�FRPSXWDWLRQ�WKDW�WRRN���IXOO�\HDU�WR�FRPSOHWHFDQ�QRZ�EH�GRQH�LQ�a����PLQXWHV�

��

Fastest Computer Over Time

HitachiCP-PACS

(2040)

IntelParagon(6788)

FujitsuVPP-500

(140)

TMC CM-5(1024)

NEC S X-3(4)

TMCCM-2(2048)

Fujitsu VP-2600

Cray Y-MP (8)

0

1000

2000

3000

4000

5000

6000

7000

1990 1992 1994 1996 1998 2000

Year

GF

lop/

s

X Y (S c a tte r) 1

ASCI White Pacific(7424)

Intel ASCI Red Xeon(9632)

SGI ASCIBlue Mountain(5040)

Intel ASCI Red(9152)

ASCI BluePacific SST(5808)

,Q������D�FRPSXWDWLRQ�WKDW�WRRN���IXOO�\HDU�WR�FRPSOHWHFDQ�WRGD\�EH�GRQH�LQ�a����VHFRQGV�

��

Rank Manufacturer Computer Rmax

[TF/s] Installation Site Country Year Area of

Installation # Proc

1 IBM ASCI White SP Power3

7.23 Lawrence Livermore National Laboratory

USA 2000 Research 8192

2 Compaq AlphaServer SC

ES45 1 GHz 4.06

Pittsburgh Supercomputing Center

USA 2001 Academic 3024

3 IBM SP Power3 375 MHz 3.05 NERSC/LBNL USA 2001 Research 3328

4 Intel ASCI Red 2.38 Sandia National Laboratory USA 1999 Research 9632

5 IBM ASCI Blue Pacific SST, IBM SP

604E

2.14 Lawrence Livermore National Laboratory

USA 1999 Research 5808

6 Compaq AlphaServer SC ES45 1 GHz

2.10 Los Alamos National Laboratory

USA 2001 Research 1536

7 Hitachi SR8000/MPP 1.71 University of Tokyo Japan 2001 Academic 1152

8 SGI ASCI Blue Mountain

1.61 Los Alamos National Laboratory

USA 1998 Research 6144

9 IBM SP Power3 375Mhz

1.42 Naval Oceanographic Office (NAVOCEANO)

USA 2000 Research 1336

10 IBM SP Power3

375Mhz 1.29 Deutscher Wetterdienst Germany 2001 Research 1280

Top 10 Machines (November 2001)

��

Performance Development134 TF/s

1.167 TF/s

59.7 GF/s

7.23 TF/s

0.4 GF/s

94 GF/s

Jun-93

Nov-93

Jun-94

Nov-94

Jun-95

Nov-95

Jun-96

Nov-96

Jun-97

Nov-97

Jun-98

Nov-98

Jun-99

Nov-99

Jun-00

Nov-00

Jun-01

Nov-01

Intel XP/S140Sandia

Fujitsu’NWT’ NAL

SNI VP200EXUni Dresden

Hitachi/TsukubaCP-PACS/2048

Intel ASCI RedSandia

IBM ASCI WhiteLLNL

N=1

N=500

SUM

IBM SP 232 procs

Chase Manhattan NY 1 Gflop/s

1 Tflop/s

100 Mflop/s

100 Gflop/s

100 Tflop/s

10 Gflop/s

10 Tflop/s

1 Pflop/s

SunHPC 10000Merril Lynch

IBM 604e69 proc

A&P

0\�/DSWRS

@��,�� ����2B@����9KQTU�X ��,KQTU�XB��8HM\FG����������UJW�^JFW������#�����,K��KFXYJW�YMFS�2TTWJgX�QF\

��

Performance Development

0.1

1

10

100

1000

10000

100000

1000000

Jun-93

Jun-94

Jun-95

Jun-96

Jun-97

Jun-98

Jun-99

Jun-00

Jun-01

Jun-02

Jun-03

Jun-04

Jun-05

Jun-06

Jun-07

Jun-08

Jun-09

Per

form

ance

[G

Flo

p/s

]

N=1

N=500

Sum

1 TFlop/s

1 PFlop/sASCI

Earth Simulator

*SYW^���9������FSI���5�����

0\�/DSWRS

��

Architectures

Single Processor

SMP

MPP

SIMD

Constellation

Cluster - NOW

0

100

200

300

400

500

Jun-93

Nov-93

Jun-94

Nov-94

Jun-95

Nov-95

Jun-96

Nov-96

Jun-97

Nov-97

Jun-98

Nov-98

Jun-99

Nov-99

Jun-00

Nov-00

Jun-01

Nov-01

Y-MP C90

Sun HPC

Paragon

CM5T3D

T3E

SP2

Cluster ofSun HPC

ASCI Red

CM2

VP500

SX3

Constellation: # of p/n . n��

Chip Technology

Alpha

Power

HP

intel

MIPS

Sparcother COTS

proprietary

0

100

200

300

400

500

Jun-93

Nov-93

Jun-94

Nov-94

Jun-95

Nov-95

Jun-96

Nov-96

Jun-97

Nov-97

Jun-98

Nov-98

Jun-99

Nov-99

Jun-00

Nov-00

Jun-01

Nov-01

��

Manufacturer

Cray

SGI

IBM

Sun

HP

TMC

Intel

FujitsuNEC

Hitachiothers

0

100

200

300

400

500

Jun-93

Nov-93

Jun-94

Nov-94

Jun-95

Nov-95

Jun-96

Nov-96

Jun-97

Nov-97

Jun-98

Nov-98

Jun-99

Nov-99

Jun-00

Nov-00

Jun-01

Nov-01

.'2�����-5�����8,.����(WF^����8:3����+ZON����3*(����-NYFHMN��

��

Top500 Conclusions

◆ 0LFURSURFHVVRU�EDVHG�VXSHUFRPSXWHUV�KDYH�EURXJKW�D�PDMRU�FKDQJH�LQ�DFFHVVLELOLW\�DQG�DIIRUGDELOLW\�

◆ 033V FRQWLQXH�WR�DFFRXQW�RI�PRUH�WKDQ�KDOI�RI�DOO�LQVWDOOHG�KLJK�SHUIRUPDQFH��FRPSXWHUV�ZRUOGZLGH�

��

Performance Numbers on RISC Processors

◆ 8VLQJ�/LQSDFN�%HQFKPDUNMachine MHz Linpack

n=100 Mflop/s

Ax=b n=1000 Mflop/s

Peak Mflop/s

Compaq Alpha 667 561 (42%) 1031 (77%) 1334 IBM Power 3 375 424 (28%) 1208 (80%) 1500 HP PA 550 468 (21%) 1583 (71%) 2200 AMD Athlon 600 260 (21%) 557 (46%) 1200 SUN Ultra 80 450 208 (23%) 607 (67%) 900 SGI Origin 2K 300 173 (29%) 553 (92%) 600 Intel Xeon 450 98 (22%) 295 (66%) 450 Cray T90 454 705 (39%) 1603 (89%) 1800 Cray C90 238 387 (41%) 902 (95%) 952 Cray Y-MP 166 161 (48%) 324 (97%) 333 Cray X-MP 118 121 (51%) 218 (93%) 235 Cray J-90 100 106 (53%) 190 (95%) 200 Cray 1 80 27 (17%) 110 (69%) 160

��

High-Performance Computing Directions: Beowulf-class PC Clusters

◆ &276�3&�1RGHV¾ 3HQWLXP��$OSKD��3RZHU3&��

603

◆ &276�/$1�6$1�,QWHUFRQQHFW¾ (WKHUQHW��0\ULQHW��

*LJDQHW��$70

◆ 2SHQ�6RXUFH�8QL[¾ /LQX[��%6'

◆ 0HVVDJH�3DVVLQJ�&RPSXWLQJ¾ 03,��390¾ +3)

◆ %HVW�SULFH�SHUIRUPDQFH◆ /RZ�HQWU\�OHYHO�FRVW◆ -XVW�LQ�SODFH�

FRQILJXUDWLRQ◆ 9HQGRU�LQYXOQHUDEOH◆ 6FDODEOH◆ 5DSLG�WHFKQRORJ\�WUDFNLQJ

Definition: Advantages:

Enabled by PC hardware, networks and operating system achieving capabilities of scientific workstations at a fraction of the cost and availability of industry standard message passing libraries. However, much more of a contact sport.

��

��

◆ 3HDN�SHUIRUPDQFH�

◆ ,QWHUFRQQHFWLRQ

◆ KWWS���FOXVWHUV�WRS����RUJ�

◆ %HQFKPDUN�UHVXOWV�WR�IROORZ�LQ�WKH�FRPLQJ�PRQWKV

Distributed and Parallel Systems

Distributedsystemshetero-geneous

Massivelyparallelsystemshomo-geneous

Grid

Com

putin

gB

eow

ulf

Ber

kley

NO

WS

NL

Cpl

ant

Ent

ropi

a

AS

CI T

flops

◆ *DWKHU��XQXVHG��UHVRXUFHV

◆ 6WHDO�F\FOHV

◆ 6\VWHP�6:�PDQDJHV�UHVRXUFHV

◆ 6\VWHP�6:�DGGV�YDOXH

◆ ����� ����RYHUKHDG�LV�2.

◆ 5HVRXUFHV�GULYH�DSSOLFDWLRQV

◆ 7LPH�WR�FRPSOHWLRQ�LV�QRW�FULWLFDO

◆ 7LPH�VKDUHG

◆ %RXQGHG�VHW�RI�UHVRXUFHV�

◆ $SSV�JURZ�WR�FRQVXPH�DOO�F\FOHV

◆ $SSOLFDWLRQ�PDQDJHV�UHVRXUFHV

◆ 6\VWHP�6:�JHWV�LQ�WKH�ZD\

◆ ���RYHUKHDG�LV�PD[LPXP

◆ $SSV�GULYH�SXUFKDVH�RI�HTXLSPHQW

◆ 5HDO�WLPH�FRQVWUDLQWV

◆ 6SDFH�VKDUHG

SET

I@ho

me

Par

alle

l Dis

t mem

��

Virtual Environments����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���

����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���

����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���

����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���

����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���

����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���

����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���

����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���

����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���

����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���

����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���

����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���

����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���

����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���

����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���

����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���

����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���

����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���

����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���

'R�WKH\�PDNH�DQ\�VHQVH"

��

Performance Improvements for Scientific Computing Problems

Derived from Computational Methods

1

10

100

1000

10000

1970 1975 1980 1985 1990 1995

Spee

d-Up Fa

ctor

Multi-grid

Conjugate Gradient

SOR

Gauss-Seidel

Sparse GE

1

10

100

1000

10000

1970 1975 1980 1985 1995

��

Different Architectures

◆ 3DUDOOHO�FRPSXWLQJ��VLQJOH�V\VWHPV�ZLWK�PDQ\�SURFHVVRUV�ZRUNLQJ�RQ�VDPH�SUREOHP

◆ 'LVWULEXWHG�FRPSXWLQJ��PDQ\�V\VWHPV�ORRVHO\�FRXSOHG�E\�D�VFKHGXOHU�WR�ZRUN�RQ�UHODWHG�SUREOHPV

◆ *ULG�&RPSXWLQJ��PDQ\�V\VWHPV�WLJKWO\�FRXSOHG�E\�VRIWZDUH��SHUKDSV�JHRJUDSKLFDOO\�GLVWULEXWHG��WR�ZRUN�WRJHWKHU�RQ�VLQJOH�SUREOHPV�RU�RQ�UHODWHG�SUREOHPV

��

��

Types of Parallel Computers

◆ 7KH�VLPSOHVW�DQG�PRVW�XVHIXO�ZD\�WR�FODVVLI\�PRGHUQ�SDUDOOHO�FRPSXWHUV�LV�E\�WKHLU�PHPRU\�PRGHO�¾ VKDUHG�PHPRU\

¾ GLVWULEXWHG�PHPRU\

��

P P P P P P

BUS

Memory

M

P

M

P

M

P

M

P

M

P

M

P

Network

Shared memory - single address space. All processors have access to a pool of shared memory. (Ex: SGI Origin, Sun E10000)

Distributed memory - each processor has it’s own local memory. Must do message passing to exchange data between processors. (Ex: CRAY T3E, IBM SP, clusters)

Shared vs. Distributed Memory

��

P P P P P P

BUS

Memory

Uniform memory access (UMA): Each processor has uniform access to memory. Also known as symmetric multiprocessors(Sun E10000)

P P P P

BUS

Memory

P P P P

BUS

Memory

Network

Non-uniform memory access (NUMA): Time for memory access depends on location of data. Local access is faster than non-local access. Easier to scale than SMPs (SGI Origin)

Shared Memory: UMA vs. NUMA

��

Distributed Memory: MPPs vs. Clusters

◆ 3URFHVVRUV�PHPRU\�QRGHV�DUH�FRQQHFWHG�E\�VRPH�W\SH�RI�LQWHUFRQQHFW�QHWZRUN¾0DVVLYHO\�3DUDOOHO�3URFHVVRU��033���WLJKWO\�LQWHJUDWHG��VLQJOH�V\VWHP�LPDJH�

¾&OXVWHU��LQGLYLGXDO�FRPSXWHUV�FRQQHFWHG�E\�V�Z

CPU

MEMCPU

MEM

CPU

MEMCPU

MEMCPU

MEM

CPU

MEM

CPU

MEMCPU

MEMCPU

MEM

InterconnectNetwork

��

Processors, Memory, & Networks

◆ %RWK�VKDUHG�DQG�GLVWULEXWHG�PHPRU\�V\VWHPV�KDYH��� SURFHVVRUV��QRZ�JHQHUDOO\�FRPPRGLW\�5,6&�

SURFHVVRUV

�� PHPRU\��QRZ�JHQHUDOO\�FRPPRGLW\�'5$0

�� QHWZRUN�LQWHUFRQQHFW��EHWZHHQ�WKH�SURFHVVRUV�DQG�PHPRU\��EXV��FURVVEDU��IDW�WUHH��WRUXV��K\SHUFXEH��HWF��

◆ :H�ZLOO�QRZ�EHJLQ�WR�GHVFULEH�WKHVH�SLHFHV�LQ�GHWDLO��VWDUWLQJ�ZLWK�GHILQLWLRQV�RI�WHUPV�

��

Processor-Related Terms

&ORFN�SHULRG��FS���WKH�PLQLPXP�WLPH�LQWHUYDO�EHWZHHQ�VXFFHVVLYH�DFWLRQV�LQ�WKH�SURFHVVRU��)L[HG��GHSHQGV�RQ�GHVLJQ�RI�SURFHVVRU��0HDVXUHG�LQ�QDQRVHFRQGV��a����IRU�IDVWHVW�SURFHVVRUV���,QYHUVH�RI�IUHTXHQF\��0+]�

,QVWUXFWLRQ��DQ�DFWLRQ�H[HFXWHG�E\�D�SURFHVVRU��VXFK�DV�D�PDWKHPDWLFDO�RSHUDWLRQ�RU�D�PHPRU\�RSHUDWLRQ�

5HJLVWHU��D�VPDOO��H[WUHPHO\�IDVW�ORFDWLRQ�IRU�VWRULQJ�GDWD�RU�LQVWUXFWLRQV�LQ�WKH�SURFHVVRU

��

��

Processor-Related Terms

)XQFWLRQDO�8QLW��D�KDUGZDUH�HOHPHQW�WKDW�SHUIRUPV�DQ�RSHUDWLRQ�RQ�DQ�RSHUDQG�RU�SDLU�RI�RSHUDWLRQV��&RPPRQ�)8V DUH�$''��08/7��,19��6457��HWF�

3LSHOLQH ��WHFKQLTXH�HQDEOLQJ�PXOWLSOH�LQVWUXFWLRQV�WR�EH�RYHUODSSHG�LQ�H[HFXWLRQ

6XSHUVFDODU��PXOWLSOH�LQVWUXFWLRQV�DUH�SRVVLEOH�SHU�FORFN�SHULRG

)ORSV��IORDWLQJ�SRLQW�RSHUDWLRQV�SHU�VHFRQG��

Processor-Related Terms

&DFKH��IDVW�PHPRU\��65$0��QHDU�WKH�SURFHVVRU��+HOSV�NHHS�LQVWUXFWLRQV�DQG�GDWD�FORVH�WR�IXQFWLRQDO�XQLWV�VR�SURFHVVRU�FDQ�H[HFXWH�PRUH�LQVWUXFWLRQV�PRUH�UDSLGO\��

7/%��7UDQVODWLRQ�/RRNDVLGH %XIIHU�NHHSV�DGGUHVVHV�RI�SDJHV��EORFN�RI�PHPRU\��LQ�PDLQ�PHPRU\�WKDW�KDYH�UHFHQWO\�EHHQ�DFFHVVHG��D�FDFKH�IRU�PHPRU\�DGGUHVVHV�

��

Memory-Related Terms

65$0��6WDWLF�5DQGRP�$FFHVV�0HPRU\��5$0���9HU\�IDVW��a���QDQRVHFRQGV���PDGH�XVLQJ�WKH�VDPH�NLQG�RI�FLUFXLWU\�DV�WKH�SURFHVVRUV��VR�VSHHG�LV�FRPSDUDEOH�

'5$0��'\QDPLF�5$0��/RQJHU�DFFHVV�WLPHV��a����QDQRVHFRQGV���EXW�KROG�PRUH�ELWV�DQG�DUH�PXFK�OHVV�H[SHQVLYH����[�FKHDSHU��

0HPRU\�KLHUDUFK\��WKH�KLHUDUFK\�RI�PHPRU\�LQ�D�SDUDOOHO�V\VWHP��IURP�UHJLVWHUV�WR�FDFKH�WR�ORFDO�PHPRU\�WR�UHPRWH PHPRU\ 0RUH ODWHU

��

Interconnect-Related Terms

◆ /DWHQF\��+RZ�ORQJ�GRHV�LW�WDNH�WR�VWDUW�VHQGLQJ�D��PHVVDJH�"�0HDVXUHG�LQ�PLFURVHFRQGV�

�$OVR�LQ�SURFHVVRUV��+RZ�ORQJ�GRHV�LW�WDNH�WR�RXWSXW�UHVXOWV�RI�VRPH�RSHUDWLRQV��VXFK�DV�IORDWLQJ�SRLQW�DGG��GLYLGH�HWF���ZKLFK�DUH�SLSHOLQHG"�

◆ %DQGZLGWK��:KDW�GDWD�UDWH�FDQ�EH�VXVWDLQHG�RQFH�WKH�PHVVDJH�LV�VWDUWHG"�0HDVXUHG�LQ�0E\WHV�VHF�

��

Interconnect-Related Terms

7RSRORJ\��WKH�PDQQHU�LQ�ZKLFK�WKH�QRGHV�DUH�FRQQHFWHG��¾%HVW�FKRLFH�ZRXOG�EH�D�IXOO\�FRQQHFWHG�QHWZRUN��HYHU\�SURFHVVRU�WR�HYHU\�RWKHU���8QIHDVLEOH�IRU�FRVW�DQG�VFDOLQJ�UHDVRQV�

¾,QVWHDG��SURFHVVRUV�DUH�DUUDQJHG�LQ�VRPH�YDULDWLRQ�RI�D�JULG��WRUXV��RU�K\SHUFXEH�

3-d hypercube 2-d mesh 2-d torus

��

Highly Parallel Supercomputing: Where Are We?

◆ 3HUIRUPDQFH�¾ 6XVWDLQHG�SHUIRUPDQFH�KDV�GUDPDWLFDOO\�LQFUHDVHG�GXULQJ�WKH�ODVW

\HDU�¾ 2Q�PRVW�DSSOLFDWLRQV��VXVWDLQHG�SHUIRUPDQFH�SHU�GROODU�QRZ�

H[FHHGV�WKDW�RI�FRQYHQWLRQDO�VXSHUFRPSXWHUV��%XW���

¾ &RQYHQWLRQDO�V\VWHPV�DUH�VWLOO�IDVWHU�RQ�VRPH�DSSOLFDWLRQV�

◆ /DQJXDJHV�DQG�FRPSLOHUV�¾ 6WDQGDUGL]HG��SRUWDEOH��KLJK�OHYHO�ODQJXDJHV�VXFK�DV�+3)��390�

DQG�03,�DUH�DYDLODEOH��%XW����¾ ,QLWLDO�+3)�UHOHDVHV�DUH�QRW�YHU\�HIILFLHQW�¾ 0HVVDJH�SDVVLQJ�SURJUDPPLQJ�LV�WHGLRXV�DQG�KDUG

WR�GHEXJ�

¾ 3URJUDPPLQJ�GLIILFXOW\�UHPDLQV�D�PDMRU�REVWDFOH�WRXVDJH�E\�PDLQVWUHDP�VFLHQWLVW�

��

��

Highly Parallel Supercomputing: Where Are We?

◆ 2SHUDWLQJ�V\VWHPV�¾5REXVWQHVV�DQG�UHOLDELOLW\�DUH�LPSURYLQJ�¾1HZ�V\VWHP�PDQDJHPHQW�WRROV�LPSURYH�V\VWHP�XWLOL]DWLRQ��%XW���

¾5HOLDELOLW\�VWLOO�QRW�DV�JRRG�DV�FRQYHQWLRQDO�V\VWHPV�

◆ ,�2�VXEV\VWHPV�¾1HZ�5$,'�GLVNV��+L33, LQWHUIDFHV��HWF��SURYLGH�VXEVWDQWLDOO\�LPSURYHG�,�2�SHUIRUPDQFH��%XW���

¾,�2�UHPDLQV�D�ERWWOHQHFN�RQ�VRPH�V\VWHPV�

��

The Importance of Standards -Software◆ :ULWLQJ�SURJUDPV�IRU�033�LV�KDUG����

◆ %XW�����RQH�RII�HIIRUWV�LI�ZULWWHQ�LQ�D�VWDQGDUG�ODQJXDJH

◆ 3DVW�ODFN�RI�SDUDOOHO�SURJUDPPLQJ�VWDQGDUGV����¾ ����KDV�UHVWULFWHG�XSWDNH�RI�WHFKQRORJ\��WR��HQWKXVLDVWV��

¾ ����UHGXFHG�SRUWDELOLW\��RYHU�D�UDQJH�RI�FXUUHQWDUFKLWHFWXUHV�DQG�EHWZHHQ�IXWXUH�JHQHUDWLRQV�

◆ 1RZ�VWDQGDUGV�H[LVW���390��03,��+3)���ZKLFK����¾ ����DOORZV�XVHUV��PDQXIDFWXUHUV�WR�SURWHFW�VRIWZDUH�LQYHVWPHQW

¾ ����HQFRXUDJH�JURZWK�RI�D��WKLUG�SDUW\��SDUDOOHO�VRIWZDUH�LQGXVWU\��SDUDOOHO�YHUVLRQV�RI�ZLGHO\�XVHG�FRGHV

��

The Importance of Standards -Hardware

◆ 3URFHVVRUV¾ FRPPRGLW\�5,6&�SURFHVVRUV

◆ ,QWHUFRQQHFWV¾ KLJK�EDQGZLGWK��ORZ�ODWHQF\�FRPPXQLFDWLRQV�SURWRFRO¾ QR�GH�IDFWR�VWDQGDUG�\HW��$70��)LEUH &KDQQHO��+33,��)'',�

◆ *URZLQJ�GHPDQG�IRU�WRWDO�VROXWLRQ�¾ UREXVW�KDUGZDUH���XVDEOH�VRIWZDUH

◆ +3&�V\VWHPV�FRQWDLQLQJ�DOO�WKH�SURJUDPPLQJ�WRROV��HQYLURQPHQWV���ODQJXDJHV���OLEUDULHV���DSSOLFDWLRQVSDFNDJHV�IRXQG�RQ�GHVNWRSV

��

The Future of HPC◆ 7KH�H[SHQVH�RI�EHLQJ�GLIIHUHQW�LV�EHLQJ�UHSODFHG�E\�WKH�HFRQRPLFV�RI�EHLQJ�WKH�VDPH

◆ +3&�QHHGV�WR�ORVH�LWV��VSHFLDO�SXUSRVH��WDJ

◆ 6WLOO�KDV�WR�EULQJ�DERXW�WKH�SURPLVH�RI�VFDODEOH�JHQHUDO�SXUSRVH�FRPSXWLQJ����

◆ ����EXW�LW�LV�GDQJHURXV�WR�LJQRUH�WKLV�WHFKQRORJ\

◆ )LQDO�VXFFHVV�ZKHQ�033�WHFKQRORJ\�LV�HPEHGGHG�LQ�GHVNWRS�FRPSXWLQJ

◆ <HVWHUGD\V�+3&�LV�WRGD\V�PDLQIUDPH�LV�WRPRUURZV�ZRUNVWDWLRQ

��

Achieving TeraFlops

◆ ,Q���������*IORS�V

◆ �����IROG�LQFUHDVH¾$UFKLWHFWXUH

� J]UQTNYNSL�UFWFQQJQNXR

¾3URFHVVRU��FRPPXQLFDWLRQ��PHPRU\�2TTWJgX�1F\

¾$OJRULWKP�LPSURYHPHQWV

� GQTHP�UFWYNYNTSJI�FQLTWNYMRX

��

Future: Petaflops ( fl pt ops/s)

◆ $�3IORS�IRU���VHFRQG��O D�W\SLFDO�ZRUNVWDWLRQ�FRPSXWLQJ�IRU���\HDU�

◆ )URP�DQ�DOJRULWKPLF�VWDQGSRLQW¾ FRQFXUUHQF\

¾ GDWD�ORFDOLW\

¾ ODWHQF\��V\QF

¾ IORDWLQJ�SRLQW�DFFXUDF\

1015

¾ G\QDPLF�UHGLVWULEXWLRQ�RI������ZRUNORDG

¾ QHZ�ODQJXDJH�DQG�FRQVWUXFWV

¾ UROH�RI�QXPHULFDO�OLEUDULHV

¾ DOJRULWKP�DGDSWDWLRQ�WR�KDUGZDUH�IDLOXUH

Today flops for our workstations≈ 1015

��

��

A Petaflops Computer System

◆ ��3IORS�V�VXVWDLQHG�FRPSXWLQJ

◆ %HWZHHQ��������DQG�����������SURFHVVRUV

◆ %HWZHHQ����7%�DQG��3%�PDLQ�PHPRU\

◆ &RPPHQVXUDWH�,�2�EDQGZLGWK��PDVV�VWRUH��HWF�

◆ ,I�EXLOW�WRGD\��FRVW�����%�DQG�FRQVXPH���7:DWW�

◆ 0D\�EH�IHDVLEOH�DQG�´DIIRUGDEOHµ�E\�WKH�\HDU�����