Overview of recent supercomputers - High Performance Computing
Overview of High-Performance
Transcript of Overview of High-Performance
�
�
CS 594 Spring 2002Lecture 1:Overview of High-Performance Computing
-DFN�'RQJDUUD�
8QLYHUVLW\�RI�7HQQHVVHH�
�
Most Important Slide
◆ 1HWOLE � VRIWZDUH�UHSRVLWRU\¾*R�WR�KWWS���ZZZ�QHWOLE�RUJ�
◆ 5HJLVWHU�IRU�WKH�QD�GLJHVW¾*R�WR�KWWS���ZZZ�QHWOLE�RUJ�QD�QHW�
¾5HJLVWHU�WR�UHFHLYH�WKH�QD�GLJHVW
� MYYU���\\\�SJYQNG�TWL�SF�SJY�OTNSDRFNQDKTW\�MYRQ
�
Computational Science
◆ +3&�RIIHUV�D�QHZ�ZD\�WR�GR�VFLHQFH�¾ ([SHULPHQW�� 7KHRU\�� &RPSXWDWLRQ
◆ &RPSXWDWLRQ�XVHG�WR�DSSUR[LPDWH�SK\VLFDO�V\VWHPV�� $GYDQWDJHV�LQFOXGH�¾3OD\LQJ�ZLWK�VLPXODWLRQ�SDUDPHWHUV�WR�VWXG\�HPHUJHQW�WUHQGV
¾3RVVLEOH�UHSOD\�RI�D�SDUWLFXODU�VLPXODWLRQ�HYHQW
¾6WXG\�V\VWHPV�ZKHUH�QR�H[DFW�WKHRULHV�H[LVW�
�
Why Turn to Simulation?
◆ :KHQ�WKH�SUREOHP�LV�WRR������¾ &RPSOH[
¾ /DUJH���VPDOO
¾ ([SHQVLYH
¾ 'DQJHURXV
◆ WR�GR�DQ\�RWKHU�ZD\�
7DXUXVBWRB7DXUXVB��SHUB��GHJ�PSHJ
�
Pretty Pictures
�
Automotive Industry◆ +XJH�XVHUV�RI�+3&�WHFKQRORJ\�
¾ )RUG�LV���WK�ODUJHVW�XVHU�RI�+3&�LQ�WKH�ZRUOG
◆ 0DLQ�XVHV�RI�VLPXODWLRQ�¾ $HURG\QDPLFV��VLPLODU�WR�DHURVSDFH�¾ &UDVK�VLPXODWLRQ¾ 0HWDO�VKHHW�IRUPDWLRQ¾ 1RLVH�YLEUDWLRQ�RSWLPL]DWLRQ¾ 7UDIILF�VLPXODWLRQ
◆ 0DLQ�EHQHILWV�¾ 5HGXFHG�WLPH�WR�PDUNHW�RI�QHZ�FDUV¾ ,QFUHDVHG�TXDOLW\¾ 5HGXFHG�QHHG�WR�EXLOG�SURWRW\SHV¾ PRUH�HIILFLHQW��LQWHJUDWHG�PDQXIDFWXULQJ�SURFHVVHV
�
�
Why Turn to Simulation?◆ &OLPDWH���:HDWKHU�0RGHOLQJ
◆ 'DWD�LQWHQVLYH�SUREOHPV��GDWD�PLQLQJ��RLO�UHVHUYRLU�VLPXODWLRQ�
◆ 3UREOHPV�ZLWK�ODUJH�OHQJWK�DQG�WLPH�VFDOHV��FRVPRORJ\�
�
Units of High Performance Computing
1 Mflop/s 1 Megaflop/s 106 Flop/sec
1 Gflop/s 1 Gigaflop/s 109 Flop/sec
1 Tflop/s 1 Teraflop/s 1012 Flop/sec
1 Pflop/s 1 Petaflop/s 1015 Flop/sec
1 MB 1 Megabyte 106 Bytes
1 GB 1 Gigabyte 109 Bytes
1 TB 1 Terabyte 1012 Bytes
1 PB 1 Petabyte 1015 Bytes
�
High-Performance Computing Today
◆ ,Q�WKH�SDVW�GHFDGH��WKH�ZRUOG�KDV�H[SHULHQFHG�RQH�RI�WKH�PRVW�H[FLWLQJ�SHULRGV�LQ�FRPSXWHU�GHYHORSPHQW�
◆ 0LFURSURFHVVRUV�KDYH�EHFRPH�VPDOOHU��GHQVHU��DQG�PRUH�SRZHUIXO�
◆ 7KH�UHVXOW�LV�WKDW�PLFURSURFHVVRU�EDVHG�VXSHUFRPSXWLQJ�LV�UDSLGO\�EHFRPLQJ�WKH�WHFKQRORJ\�RI�SUHIHUHQFH�LQ�DWWDFNLQJ�VRPH�RI�WKH�PRVW�LPSRUWDQW�SUREOHPV�RI�VFLHQFH�DQG�HQJLQHHULQJ�
��
Technology Trends: Microprocessor Capacity
2X transistors/Chip Every 1.5 years
Called “Moore’s Law”
Moore’s Law
Microprocessors have become smaller, denser, and more powerful.Not just processors, bandwidth, storage, etc
Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips would double roughly every 18 months.
��
Internet –4th Revolution in Telecommunications
◆ 7HOHSKRQH��5DGLR��7HOHYLVLRQ
◆ *URZWK�LQ�,QWHUQHW�RXWVWULSV�WKH�RWKHUV
◆ ([SRQHQWLDO�JURZWK�VLQFH�����
◆ 7UDIILF�GRXEOHV�HYHU\�����GD\V
��
The Web Phenomenon
◆ ���² ���:HE�LQYHQWHG
◆ 8�RI�,OOLQRLV�0RVDLF�UHOHDVHG�0DUFK�����a������WUDIILF
◆ 6HSWHPEHU����a����WUDIILF�Z�����VLWHV
◆ -XQH����a�����RI�WUDIILF�Z�������VLWHV
◆ 7RGD\�����RI�WUDIILF�Z�����������VLWHV
◆ (YHU\�RUJDQL]DWLRQ��FRPSDQ\��VFKRRO
�
��
Peer to Peer Computing◆ 3HHU�WR�SHHU�LV�D�VW\OH�RI�QHWZRUNLQJ�LQ�ZKLFK������������������
D�JURXS�RI��FRPSXWHUV�FRPPXQLFDWH�GLUHFWO\�ZLWK�����������HDFK�RWKHU�
◆ :LUHOHVV�FRPPXQLFDWLRQ
◆ +RPH�FRPSXWHU�LQ�WKH�XWLOLW\�URRP��QH[W�WR�WKH�ZDWHU�KHDWHU�DQG�IXUQDFH��
◆ :HE�WDEOHWV�
◆ ,PEHGGHG�FRPSXWHUV�LQ�WKLQJV������������������������������������DOO�WLHG�WRJHWKHU�¾ %RRNV��IXUQLWXUH��PLON�FDUWRQV��HWF
◆ 6PDUW�$SSOLDQFHV¾ 5HIULJHUDWRU��VFDOH��HWF
��
Internet On Everything
��
SETI@home: Global Distributed Computing
◆ 5XQQLQJ�RQ���������3&V��a�����&38�<HDUV�SHU�'D\¾ ��������&38�<HDUV�VR�IDU
◆ 6RSKLVWLFDWHG�'DWD��6LJQDO�3URFHVVLQJ�$QDO\VLV◆ 'LVWULEXWHV�'DWDVHWV�IURP�$UHFLER 5DGLR�7HOHVFRSH
��
SETI@home◆ 8VH�WKRXVDQGV�RI�,QWHUQHW�
FRQQHFWHG�3&V�WR�KHOS�LQ�WKH�VHDUFK�IRU�H[WUDWHUUHVWULDO�LQWHOOLJHQFH�
◆ 8VHV�GDWD�FROOHFWHG�ZLWK�WKH�$UHFLER 5DGLR�7HOHVFRSH��LQ�3XHUWR�5LFR�
◆ :KHQ�WKHLU�FRPSXWHU�LV�LGOH�RU�EHLQJ�ZDVWHG�WKLV�VRIWZDUH�ZLOO�GRZQORDG�D�����NLORE\WH�FKXQN�RI�GDWD�IRU�DQDO\VLV��
◆ 7KH�UHVXOWV�RI�WKLV�DQDO\VLV�DUH�VHQW�EDFN�WR�WKH�6(7,�WHDP��FRPELQHG�ZLWK�WKH�FUXQFKHG�GDWD�IURP�WKH�PDQ\�WKRXVDQGV�RI�RWKHU�6(7,#KRPH�SDUWLFLSDQWV�
��
Next Generation Web
◆ 7R�WUHDW�&38�F\FOHV�DQG�VRIWZDUH�OLNH�FRPPRGLWLHV�
◆ (QDEOH�WKH�FRRUGLQDWHG�XVH�RI�JHRJUDSKLFDOO\�GLVWULEXWHG�UHVRXUFHV�² LQ�WKH�DEVHQFH�RI�FHQWUDO�FRQWURO�DQG�H[LVWLQJ�WUXVW�UHODWLRQVKLSV��
◆ &RPSXWLQJ�SRZHU�LV�SURGXFHG�PXFK�OLNH�XWLOLWLHV�VXFK�DV�SRZHU�DQG�ZDWHU�DUH�SURGXFHG�IRU�FRQVXPHUV�
◆ 8VHUV�ZLOO�KDYH�DFFHVV�WR�´SRZHUµ�RQ�GHPDQG�
◆ 7KLV�LV�RQH�RI�RXU�HIIRUWV�DW�87�
Performance vs. Time
Mips25 mhz
0.1
1.0
10
100
Per
form
ance
(V
AX
780
s)
1980 1985 1990
MV10K
68K
7805 Mhz
RISC 60% / yr
uVAX 6K(CMOS)
8600
TTL
ECL 15%/yr
CMOS CISC
38%/yr
o ||MIPS (8 Mhz)
o9000
Mips(65 Mhz)
uVAXCMOS Will RISC continue on a
60%, (x4 / 3 years)? Moore’s speed law?
4K
�
��
Trends in Computer PerformanceT
ASCI Red
��
Other Examples: Sony PlayStation2
◆ (PRWLRQ�(QJLQH������*IORS�V�����PLOOLRQ�SRO\JRQV�SHU�VHFRQG��0LFURSURFHVVRU�5HSRUW�������¾6XSHUVFDODU�0,36�FRUH���YHFWRU�FRSURFHVVRU���JUDSKLFV�'5$0¾&ODLP��´7R\�6WRU\µ�UHDOLVP�EURXJKW�WR�JDPHV¾$ERXW�����
��
Sony PlayStation2 Export Limits?
��
Where Has This Performance Improvement Come From?
◆ 7HFKQRORJ\"
◆ 2UJDQL]DWLRQ"
◆ ,QVWUXFWLRQ�6HW�$UFKLWHFWXUH"
◆ 6RIWZDUH"
◆ 6RPH�FRPELQDWLRQ�RI�DOO�RI�WKH�DERYH"
1st Principles
◆:KDW�KDSSHQV�ZKHQ�WKH�IHDWXUH�VL]H�VKULQNV�E\�D�IDFWRU�RI�[ "
◆&ORFN�UDWH�JRHV�XS�E\�[¾DFWXDOO\�OHVV�WKDQ�[��EHFDXVH�RI�SRZHU�FRQVXPSWLRQ
◆7UDQVLVWRUV�SHU�XQLW�DUHD�JRHV�XS�E\�[�
◆'LH�VL]H�DOVR�WHQGV�WR�LQFUHDVH¾W\SLFDOO\�DQRWKHU�IDFWRU�RI�a[
◆5DZ�FRPSXWLQJ�SRZHU�RI�WKH�FKLS�JRHV�XS�E\�a�[���¾RI�ZKLFK�[��LV�GHYRWHG�HLWKHU�WR�SDUDOOHOLVP RU�ORFDOLW\
��
How fast can a serial computer be?
◆ &RQVLGHU�WKH���7IORS�VHTXHQWLDO�PDFKLQH¾ GDWD�PXVW�WUDYHO�VRPH�GLVWDQFH��U��WR�JHW�IURP�PHPRU\�WR�&38
¾ WR�JHW���GDWD�HOHPHQW�SHU�F\FOH��WKLV�PHDQV������WLPHV�SHU�VHFRQG�DW�WKH�VSHHG�RI�OLJKW��F� ��[��� P�V
¾ VR�U���F����� ����PP◆ 1RZ�SXW���7%�RI�VWRUDJH�LQ�D����PP� DUHD�
¾ HDFK�ZRUG�RFFXSLHV�DERXW���$QJVWURPV���WKH�VL]H�RI�D�VPDOO�DWRP
r = .3 mm1 Tflop 1 TB sequential machine
�
��
Processor-Memory Problem
◆ 3URFHVVRUV�LVVXH�LQVWUXFWLRQV�URXJKO\�HYHU\�QDQRVHFRQG�
◆ '5$0�FDQ�EH�DFFHVVHG�URXJKO\�HYHU\�����QDQRVHFRQGV�����
◆ '5$0�FDQQRW�NHHS�SURFHVVRUV�EXV\��$QG�WKH�JDS�LV�JURZLQJ�¾SURFHVVRUV�JHWWLQJ�IDVWHU�E\�����SHU�\HDU
¾'5$0�JHWWLQJ�IDVWHU�E\����SHU�\HDU��6'5$0�DQG�('2�5$0�PLJKW�KHOS��EXW�QRW�HQRXJK�
��
RAM◆ 6'5$0�LQFRUSRUDWHV�QHZ�IHDWXUHV�WKDW�DOORZ�LW�WR�NHHS�SDFH�
ZLWK�EXV�VSHHGV�DV�KLJK�DV�����0+]��,W�GRHV�WKLV�SULPDULO\�E\�DOORZLQJ�WZR�VHWV�RI�PHPRU\�DGGUHVVHV�WR�EH�RSHQHG�VLPXOWDQHRXVO\��¾ 'DWD FDQ�WKHQ�EH�UHWULHYHG�DOWHUQDWHO\�IURP�HDFK�VHW��HOLPLQDWLQJ�
WKH�GHOD\V�WKDW�QRUPDOO\�RFFXU�ZKHQ�RQH�EDQN�RI�DGGUHVVHV�PXVW�EH�VKXW�GRZQ�DQG�DQRWKHU�SUHSDUHG�IRU�UHDGLQJ�GXULQJ�HDFK�UHTXHVW��
◆ ('2��H[WHQGHG�GDWD�RXWSXW��5$0�LV�D�W\SH�RI�UDQGRP�DFFHVV�PHPRU\��5$0��FKLS�WKDW�LPSURYHV�WKH�WLPH�WR�UHDG�IURP�PHPRU\�RQ�IDVWHU�PLFURSURFHVVRUV�VXFK�DV�WKH�,QWHO�3HQWLXP��¾ 7KLV�IRUP�RI�G\QDPLF�5$0�VSHHGV�DFFHVV�WR�PHPRU\�ORFDWLRQV�E\�
ZRUNLQJ�RQ�D�VLPSOH�DVVXPSWLRQ��WKH�QH[W�WLPH�PHPRU\�LV�DFFHVVHG��LW�ZLOO�EH�DW�D�FRQWLJXRXV�DGGUHVV�LQ�D�FRQWLJXRXV�FKXQN�RI�KDUGZDUH��7KLV�DVVXPSWLRQ�VSHHGV�XS�PHPRU\�DFFHVV�WLPHV�E\�XS�WR���SHUFHQW�RYHU�VWDQGDUG�'5$0��
��
Processor-DRAM Gap (latency)
µProc60%/yr.
DRAM7%/yr.
1
10
100
1000
1980
1981
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
DRAM
CPU
1982
Processor-MemoryPerformance Gap:(grows 50% / year)
Per
form
ance
Time
“Moore’s Law”
��
•Why Parallel Computing
◆ 'HVLUH�WR�VROYH�ELJJHU��PRUH�UHDOLVWLF�DSSOLFDWLRQV�SUREOHPV�
◆ )XQGDPHQWDO�OLPLWV�DUH�EHLQJ�DSSURDFKHG�
◆ 0RUH�FRVW�HIIHFWLYH�VROXWLRQ
��
Principles of Parallel Computing
◆ 3DUDOOHOLVP�DQG�$PGDKO·V�/DZ
◆ *UDQXODULW\◆ /RFDOLW\
◆ /RDG�EDODQFH
◆ &RRUGLQDWLRQ�DQG�V\QFKURQL]DWLRQ
◆ 3HUIRUPDQFH�PRGHOLQJ
All of these things makes parallel programming even harder than sequential programming.
��
“Automatic” Parallelism in Modern Machines
◆ %LW�OHYHO�SDUDOOHOLVP
¾ ZLWKLQ�IORDWLQJ�SRLQW�RSHUDWLRQV��HWF�
◆ ,QVWUXFWLRQ�OHYHO�SDUDOOHOLVP��,/3�
¾ PXOWLSOH�LQVWUXFWLRQV�H[HFXWH�SHU�FORFN�F\FOH
◆ 0HPRU\�V\VWHP�SDUDOOHOLVP
¾ RYHUODS�RI�PHPRU\�RSHUDWLRQV�ZLWK�FRPSXWDWLRQ
◆ 26�SDUDOOHOLVP
¾ PXOWLSOH�MREV�UXQ�LQ�SDUDOOHO�RQ�FRPPRGLW\�603V
Limits to all of these -- for very high performance, need user to identify, schedule and coordinate parallel tasks
�
��
Finding Enough Parallelism
◆ 6XSSRVH�RQO\�SDUW�RI�DQ�DSSOLFDWLRQ�VHHPV�SDUDOOHO
◆ $PGDKO·V�ODZ¾ OHW�IV EH�WKH�IUDFWLRQ�RI�ZRUN�GRQH�VHTXHQWLDOO\�����IV��LV�IUDFWLRQ�SDUDOOHOL]DEOH
¾1� �QXPEHU�RI�SURFHVVRUV
◆ (YHQ�LI�WKH�SDUDOOHO�SDUW�VSHHGV�XS�SHUIHFWO\�PD\�EH�OLPLWHG�����������������������������E\�WKH�VHTXHQWLDO�SDUW
��
Amdahl’s Law places a strict limit on the speedup that can be realized by using multiple processors. Two equivalent expressions for Amdahl’s Law are given below:
tN = (fp/N + fs)t1 Effect of multiple processors on run time
S = 1/(fs + fp/N) Effect of multiple processors on speedup
Where:fs = serial fraction of codefp = parallel fraction of code = 1 - fs
N = number of processors
Amdahl’s Law
��
�
��
���
���
���
���
� �� ��� ��� ��� ���
Number of processors
spee
dup
IS� ������
IS� ������
IS� ������
IS� ������
It takes only a small fraction of serial content in a code to degrade the parallel performance. It is essential to determine the scaling behavior of your code before doing production runs using large numbers of processors
Illustration of Amdahl’s Law
��
Overhead of Parallelism
◆ *LYHQ�HQRXJK�SDUDOOHO�ZRUN��WKLV�LV�WKH�ELJJHVW�EDUULHU�WR�JHWWLQJ�GHVLUHG�VSHHGXS
◆ 3DUDOOHOLVP�RYHUKHDGV�LQFOXGH�¾ FRVW�RI�VWDUWLQJ�D�WKUHDG�RU�SURFHVV¾ FRVW�RI�FRPPXQLFDWLQJ�VKDUHG�GDWD¾ FRVW�RI�V\QFKURQL]LQJ¾ H[WUD��UHGXQGDQW��FRPSXWDWLRQ
◆ (DFK�RI�WKHVH�FDQ�EH�LQ�WKH�UDQJH�RI�PLOOLVHFRQGV���� PLOOLRQV�RI�IORSV��RQ�VRPH�V\VWHPV
◆ 7UDGHRII��$OJRULWKP�QHHGV�VXIILFLHQWO\�ODUJH�XQLWV�RI�ZRUN�WR�UXQ�IDVW�LQ�SDUDOOHO��,�H��ODUJH�JUDQXODULW\���EXW�QRW�VR�ODUJH�WKDW�WKHUH�LV�QRW�HQRXJK�SDUDOOHO�ZRUN�
Locality and Parallelism
◆ /DUJH�PHPRULHV�DUH�VORZ��IDVW�PHPRULHV�DUH�VPDOO
◆ 6WRUDJH�KLHUDUFKLHV�DUH�ODUJH�DQG�IDVW�RQ�DYHUDJH
◆ 3DUDOOHO�SURFHVVRUV��FROOHFWLYHO\��KDYH�ODUJH��IDVW��¾WKH�VORZ�DFFHVVHV�WR�´UHPRWHµ�GDWD�ZH�FDOO�´FRPPXQLFDWLRQµ
◆ $OJRULWKP�VKRXOG�GR�PRVW�ZRUN�RQ�ORFDO�GDWD
ProcCache
L2 Cache
L3 Cache
Memory
Conventional Storage Hierarchy
ProcCache
L2 Cache
L3 Cache
Memory
ProcCache
L2 Cache
L3 Cache
Memory
potentialinterconnects
��
Load Imbalance
◆ /RDG�LPEDODQFH�LV�WKH�WLPH�WKDW�VRPH�SURFHVVRUV�LQ�WKH�V\VWHP�DUH�LGOH�GXH�WR¾ LQVXIILFLHQW�SDUDOOHOLVP��GXULQJ�WKDW�SKDVH�¾XQHTXDO�VL]H�WDVNV
◆ ([DPSOHV�RI�WKH�ODWWHU¾DGDSWLQJ�WR�´LQWHUHVWLQJ�SDUWV�RI�D�GRPDLQµ¾WUHH�VWUXFWXUHG�FRPSXWDWLRQV�¾IXQGDPHQWDOO\�XQVWUXFWXUHG�SUREOHPV�
◆ $OJRULWKP�QHHGV�WR�EDODQFH�ORDG
�
��
Performance Trends Revisited(Architectural Innovation)
Microprocessors
Minicomputers
Mainframes
Supercomputers
Year
0.1
1
10
100
1000
1965 1970 1975 1980 1985 1990 1995 2000
CISC/RISC
��
Performance Trends Revisited (Microprocessor Organization)
Year
1000
10000
100000
1000000
10000000
100000000
1970 1975 1980 1985 1990 1995 2000
r4400
r4000
r3010
i80386
i4004
i8080
i80286
i8086
• Bit Level Parallelism
• Pipelining
• Caches
• Instruction Level Parallelism
• Out-of-order Xeq
• Speculation
• . . .
��
◆ *UHDWHU�LQVWUXFWLRQ�OHYHO�SDUDOOHOLVP"
◆ %LJJHU�FDFKHV"
◆ 0XOWLSOH�SURFHVVRUV�SHU�FKLS"
◆ &RPSOHWH�V\VWHPV�RQ�D�FKLS"��3RUWDEOH�6\VWHPV�
◆ +LJK�SHUIRUPDQFH�/$1��,QWHUIDFH��DQG�,QWHUFRQQHFW
What is Ahead?
��
Directions
◆ 0RYH�WRZDUG�VKDUHG�PHPRU\¾603V DQG�'LVWULEXWHG�6KDUHG�0HPRU\¾6KDUHG�DGGUHVV�VSDFH�Z�GHHS�PHPRU\�KLHUDUFK\
◆ &OXVWHULQJ�RI�VKDUHG�PHPRU\�PDFKLQHV�IRU�VFDODELOLW\
◆ (IILFLHQF\�RI�PHVVDJH�SDVVLQJ�DQG�GDWD�SDUDOOHO�SURJUDPPLQJ¾+HOSHG�E\�VWDQGDUGV�HIIRUWV�VXFK�DV�03,�DQG�+3)
��
High Performance Computers◆ a����\HDUV�DJR
¾ �[����)ORDWLQJ�3RLQW�2SV�VHF��0IORS�V��� 8HFQFW�GFXJI
◆ a����\HDUV�DJR¾ �[��� )ORDWLQJ�3RLQW�2SV�VHF��*IORS�V��
� ;JHYTW���8MFWJI�RJRTW^�HTRUZYNSL��GFSI\NIYM�F\FWJ
� 'QTHP�UFWYNYNTSJI��QFYJSH^�YTQJWFSY
◆ a�7RGD\¾ �[�����)ORDWLQJ�3RLQW�2SV�VHF��7IORS�V��
� -NLMQ^�UFWFQQJQ��INXYWNGZYJI�UWTHJXXNSL��RJXXFLJ�UFXXNSL��SJY\TWP�GFXJI
� IFYF�IJHTRUTXNYNTS��HTRRZSNHFYNTS�HTRUZYFYNTS
◆ a����\HDUV�DZD\¾ �[�����)ORDWLQJ�3RLQW�2SV�VHF��3IORS�V��
� 2FS^�RTWJ�QJ[JQX�2-��HTRGNSFYNTS�LWNIX�-5(
� 2TWJ�FIFUYN[J��19�FSI�GFSI\NIYM�F\FWJ��KFZQY�YTQJWFSY�����������J]YJSIJI�UWJHNXNTS��FYYJSYNTS�YT�825�STIJX
��
TOP500TOP500- Listing of the 500 most powerful
Computers in the World- Yardstick: Rmax from LINPACK MPP
Ax=b, dense problem
- Updated twice a yearSC‘xy in the States in NovemberMeeting in Mannheim, Germany in June
All data available from www.top500.org
8N_J
7FYJ
955�UJWKTWRFSHJ
�
��,Q������D�FRPSXWDWLRQ�WKDW�WRRN���IXOO�\HDU�WR�FRPSOHWHFDQ�QRZ�EH�GRQH�LQ�a����KRXUV�
Fastest Computer Over Time
0
10
20
30
40
50
60
70
1990 1992 1994 1996 1998 2000
Year
GF
lop/
s
X Y (S c a tte r) 1
Cray Y-MP (8)
TMCCM-2(2048)
Fujitsu VP-2600
��
Fastest Computer Over Time
TMCCM-2(2048)
Fujitsu VP-2600
Cray Y-MP (8)
0
100
200
300
400
500
600
700
1990 1992 1994 1996 1998 2000
Year
GF
lop/
s
X Y (S c a tte r) 1
HitachiCP-PACS(2040)
IntelParagon(6788)
FujitsuVPP-500(140)
TMC CM-5(1024)NEC
SX-3(4)
,Q������D�FRPSXWDWLRQ�WKDW�WRRN���IXOO�\HDU�WR�FRPSOHWHFDQ�QRZ�EH�GRQH�LQ�a����PLQXWHV�
��
Fastest Computer Over Time
HitachiCP-PACS
(2040)
IntelParagon(6788)
FujitsuVPP-500
(140)
TMC CM-5(1024)
NEC S X-3(4)
TMCCM-2(2048)
Fujitsu VP-2600
Cray Y-MP (8)
0
1000
2000
3000
4000
5000
6000
7000
1990 1992 1994 1996 1998 2000
Year
GF
lop/
s
X Y (S c a tte r) 1
ASCI White Pacific(7424)
Intel ASCI Red Xeon(9632)
SGI ASCIBlue Mountain(5040)
Intel ASCI Red(9152)
ASCI BluePacific SST(5808)
,Q������D�FRPSXWDWLRQ�WKDW�WRRN���IXOO�\HDU�WR�FRPSOHWHFDQ�WRGD\�EH�GRQH�LQ�a����VHFRQGV�
��
Rank Manufacturer Computer Rmax
[TF/s] Installation Site Country Year Area of
Installation # Proc
1 IBM ASCI White SP Power3
7.23 Lawrence Livermore National Laboratory
USA 2000 Research 8192
2 Compaq AlphaServer SC
ES45 1 GHz 4.06
Pittsburgh Supercomputing Center
USA 2001 Academic 3024
3 IBM SP Power3 375 MHz 3.05 NERSC/LBNL USA 2001 Research 3328
4 Intel ASCI Red 2.38 Sandia National Laboratory USA 1999 Research 9632
5 IBM ASCI Blue Pacific SST, IBM SP
604E
2.14 Lawrence Livermore National Laboratory
USA 1999 Research 5808
6 Compaq AlphaServer SC ES45 1 GHz
2.10 Los Alamos National Laboratory
USA 2001 Research 1536
7 Hitachi SR8000/MPP 1.71 University of Tokyo Japan 2001 Academic 1152
8 SGI ASCI Blue Mountain
1.61 Los Alamos National Laboratory
USA 1998 Research 6144
9 IBM SP Power3 375Mhz
1.42 Naval Oceanographic Office (NAVOCEANO)
USA 2000 Research 1336
10 IBM SP Power3
375Mhz 1.29 Deutscher Wetterdienst Germany 2001 Research 1280
Top 10 Machines (November 2001)
��
Performance Development134 TF/s
1.167 TF/s
59.7 GF/s
7.23 TF/s
0.4 GF/s
94 GF/s
Jun-93
Nov-93
Jun-94
Nov-94
Jun-95
Nov-95
Jun-96
Nov-96
Jun-97
Nov-97
Jun-98
Nov-98
Jun-99
Nov-99
Jun-00
Nov-00
Jun-01
Nov-01
Intel XP/S140Sandia
Fujitsu’NWT’ NAL
SNI VP200EXUni Dresden
Hitachi/TsukubaCP-PACS/2048
Intel ASCI RedSandia
IBM ASCI WhiteLLNL
N=1
N=500
SUM
IBM SP 232 procs
Chase Manhattan NY 1 Gflop/s
1 Tflop/s
100 Mflop/s
100 Gflop/s
100 Tflop/s
10 Gflop/s
10 Tflop/s
1 Pflop/s
SunHPC 10000Merril Lynch
IBM 604e69 proc
A&P
0\�/DSWRS
@��,�� ����2B@����9KQTU�X ��,KQTU�XB��8HM\FG����������UJW�^JFW������#�����,K��KFXYJW�YMFS�2TTWJgX�QF\
��
Performance Development
0.1
1
10
100
1000
10000
100000
1000000
Jun-93
Jun-94
Jun-95
Jun-96
Jun-97
Jun-98
Jun-99
Jun-00
Jun-01
Jun-02
Jun-03
Jun-04
Jun-05
Jun-06
Jun-07
Jun-08
Jun-09
Per
form
ance
[G
Flo
p/s
]
N=1
N=500
Sum
1 TFlop/s
1 PFlop/sASCI
Earth Simulator
*SYW^���9������FSI���5�����
0\�/DSWRS
�
��
Architectures
Single Processor
SMP
MPP
SIMD
Constellation
Cluster - NOW
0
100
200
300
400
500
Jun-93
Nov-93
Jun-94
Nov-94
Jun-95
Nov-95
Jun-96
Nov-96
Jun-97
Nov-97
Jun-98
Nov-98
Jun-99
Nov-99
Jun-00
Nov-00
Jun-01
Nov-01
Y-MP C90
Sun HPC
Paragon
CM5T3D
T3E
SP2
Cluster ofSun HPC
ASCI Red
CM2
VP500
SX3
Constellation: # of p/n . n��
Chip Technology
Alpha
Power
HP
intel
MIPS
Sparcother COTS
proprietary
0
100
200
300
400
500
Jun-93
Nov-93
Jun-94
Nov-94
Jun-95
Nov-95
Jun-96
Nov-96
Jun-97
Nov-97
Jun-98
Nov-98
Jun-99
Nov-99
Jun-00
Nov-00
Jun-01
Nov-01
��
Manufacturer
Cray
SGI
IBM
Sun
HP
TMC
Intel
FujitsuNEC
Hitachiothers
0
100
200
300
400
500
Jun-93
Nov-93
Jun-94
Nov-94
Jun-95
Nov-95
Jun-96
Nov-96
Jun-97
Nov-97
Jun-98
Nov-98
Jun-99
Nov-99
Jun-00
Nov-00
Jun-01
Nov-01
.'2�����-5�����8,.����(WF^����8:3����+ZON����3*(����-NYFHMN��
��
Top500 Conclusions
◆ 0LFURSURFHVVRU�EDVHG�VXSHUFRPSXWHUV�KDYH�EURXJKW�D�PDMRU�FKDQJH�LQ�DFFHVVLELOLW\�DQG�DIIRUGDELOLW\�
◆ 033V FRQWLQXH�WR�DFFRXQW�RI�PRUH�WKDQ�KDOI�RI�DOO�LQVWDOOHG�KLJK�SHUIRUPDQFH��FRPSXWHUV�ZRUOGZLGH�
��
Performance Numbers on RISC Processors
◆ 8VLQJ�/LQSDFN�%HQFKPDUNMachine MHz Linpack
n=100 Mflop/s
Ax=b n=1000 Mflop/s
Peak Mflop/s
Compaq Alpha 667 561 (42%) 1031 (77%) 1334 IBM Power 3 375 424 (28%) 1208 (80%) 1500 HP PA 550 468 (21%) 1583 (71%) 2200 AMD Athlon 600 260 (21%) 557 (46%) 1200 SUN Ultra 80 450 208 (23%) 607 (67%) 900 SGI Origin 2K 300 173 (29%) 553 (92%) 600 Intel Xeon 450 98 (22%) 295 (66%) 450 Cray T90 454 705 (39%) 1603 (89%) 1800 Cray C90 238 387 (41%) 902 (95%) 952 Cray Y-MP 166 161 (48%) 324 (97%) 333 Cray X-MP 118 121 (51%) 218 (93%) 235 Cray J-90 100 106 (53%) 190 (95%) 200 Cray 1 80 27 (17%) 110 (69%) 160
��
High-Performance Computing Directions: Beowulf-class PC Clusters
◆ &276�3&�1RGHV¾ 3HQWLXP��$OSKD��3RZHU3&��
603
◆ &276�/$1�6$1�,QWHUFRQQHFW¾ (WKHUQHW��0\ULQHW��
*LJDQHW��$70
◆ 2SHQ�6RXUFH�8QL[¾ /LQX[��%6'
◆ 0HVVDJH�3DVVLQJ�&RPSXWLQJ¾ 03,��390¾ +3)
◆ %HVW�SULFH�SHUIRUPDQFH◆ /RZ�HQWU\�OHYHO�FRVW◆ -XVW�LQ�SODFH�
FRQILJXUDWLRQ◆ 9HQGRU�LQYXOQHUDEOH◆ 6FDODEOH◆ 5DSLG�WHFKQRORJ\�WUDFNLQJ
Definition: Advantages:
Enabled by PC hardware, networks and operating system achieving capabilities of scientific workstations at a fraction of the cost and availability of industry standard message passing libraries. However, much more of a contact sport.
��
��
◆ 3HDN�SHUIRUPDQFH�
◆ ,QWHUFRQQHFWLRQ
◆ KWWS���FOXVWHUV�WRS����RUJ�
◆ %HQFKPDUN�UHVXOWV�WR�IROORZ�LQ�WKH�FRPLQJ�PRQWKV
Distributed and Parallel Systems
Distributedsystemshetero-geneous
Massivelyparallelsystemshomo-geneous
Grid
Com
putin
gB
eow
ulf
Ber
kley
NO
WS
NL
Cpl
ant
Ent
ropi
a
AS
CI T
flops
◆ *DWKHU��XQXVHG��UHVRXUFHV
◆ 6WHDO�F\FOHV
◆ 6\VWHP�6:�PDQDJHV�UHVRXUFHV
◆ 6\VWHP�6:�DGGV�YDOXH
◆ ����� ����RYHUKHDG�LV�2.
◆ 5HVRXUFHV�GULYH�DSSOLFDWLRQV
◆ 7LPH�WR�FRPSOHWLRQ�LV�QRW�FULWLFDO
◆ 7LPH�VKDUHG
◆ %RXQGHG�VHW�RI�UHVRXUFHV�
◆ $SSV�JURZ�WR�FRQVXPH�DOO�F\FOHV
◆ $SSOLFDWLRQ�PDQDJHV�UHVRXUFHV
◆ 6\VWHP�6:�JHWV�LQ�WKH�ZD\
◆ ���RYHUKHDG�LV�PD[LPXP
◆ $SSV�GULYH�SXUFKDVH�RI�HTXLSPHQW
◆ 5HDO�WLPH�FRQVWUDLQWV
◆ 6SDFH�VKDUHG
SET
I@ho
me
Par
alle
l Dis
t mem
��
Virtual Environments����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
'R�WKH\�PDNH�DQ\�VHQVH"
��
Performance Improvements for Scientific Computing Problems
Derived from Computational Methods
1
10
100
1000
10000
1970 1975 1980 1985 1990 1995
Spee
d-Up Fa
ctor
Multi-grid
Conjugate Gradient
SOR
Gauss-Seidel
Sparse GE
1
10
100
1000
10000
1970 1975 1980 1985 1995
��
Different Architectures
◆ 3DUDOOHO�FRPSXWLQJ��VLQJOH�V\VWHPV�ZLWK�PDQ\�SURFHVVRUV�ZRUNLQJ�RQ�VDPH�SUREOHP
◆ 'LVWULEXWHG�FRPSXWLQJ��PDQ\�V\VWHPV�ORRVHO\�FRXSOHG�E\�D�VFKHGXOHU�WR�ZRUN�RQ�UHODWHG�SUREOHPV
◆ *ULG�&RPSXWLQJ��PDQ\�V\VWHPV�WLJKWO\�FRXSOHG�E\�VRIWZDUH��SHUKDSV�JHRJUDSKLFDOO\�GLVWULEXWHG��WR�ZRUN�WRJHWKHU�RQ�VLQJOH�SUREOHPV�RU�RQ�UHODWHG�SUREOHPV
��
��
Types of Parallel Computers
◆ 7KH�VLPSOHVW�DQG�PRVW�XVHIXO�ZD\�WR�FODVVLI\�PRGHUQ�SDUDOOHO�FRPSXWHUV�LV�E\�WKHLU�PHPRU\�PRGHO�¾ VKDUHG�PHPRU\
¾ GLVWULEXWHG�PHPRU\
��
P P P P P P
BUS
Memory
M
P
M
P
M
P
M
P
M
P
M
P
Network
Shared memory - single address space. All processors have access to a pool of shared memory. (Ex: SGI Origin, Sun E10000)
Distributed memory - each processor has it’s own local memory. Must do message passing to exchange data between processors. (Ex: CRAY T3E, IBM SP, clusters)
Shared vs. Distributed Memory
��
P P P P P P
BUS
Memory
Uniform memory access (UMA): Each processor has uniform access to memory. Also known as symmetric multiprocessors(Sun E10000)
P P P P
BUS
Memory
P P P P
BUS
Memory
Network
Non-uniform memory access (NUMA): Time for memory access depends on location of data. Local access is faster than non-local access. Easier to scale than SMPs (SGI Origin)
Shared Memory: UMA vs. NUMA
��
Distributed Memory: MPPs vs. Clusters
◆ 3URFHVVRUV�PHPRU\�QRGHV�DUH�FRQQHFWHG�E\�VRPH�W\SH�RI�LQWHUFRQQHFW�QHWZRUN¾0DVVLYHO\�3DUDOOHO�3URFHVVRU��033���WLJKWO\�LQWHJUDWHG��VLQJOH�V\VWHP�LPDJH�
¾&OXVWHU��LQGLYLGXDO�FRPSXWHUV�FRQQHFWHG�E\�V�Z
CPU
MEMCPU
MEM
CPU
MEMCPU
MEMCPU
MEM
CPU
MEM
CPU
MEMCPU
MEMCPU
MEM
InterconnectNetwork
��
Processors, Memory, & Networks
◆ %RWK�VKDUHG�DQG�GLVWULEXWHG�PHPRU\�V\VWHPV�KDYH��� SURFHVVRUV��QRZ�JHQHUDOO\�FRPPRGLW\�5,6&�
SURFHVVRUV
�� PHPRU\��QRZ�JHQHUDOO\�FRPPRGLW\�'5$0
�� QHWZRUN�LQWHUFRQQHFW��EHWZHHQ�WKH�SURFHVVRUV�DQG�PHPRU\��EXV��FURVVEDU��IDW�WUHH��WRUXV��K\SHUFXEH��HWF��
◆ :H�ZLOO�QRZ�EHJLQ�WR�GHVFULEH�WKHVH�SLHFHV�LQ�GHWDLO��VWDUWLQJ�ZLWK�GHILQLWLRQV�RI�WHUPV�
��
Processor-Related Terms
&ORFN�SHULRG��FS���WKH�PLQLPXP�WLPH�LQWHUYDO�EHWZHHQ�VXFFHVVLYH�DFWLRQV�LQ�WKH�SURFHVVRU��)L[HG��GHSHQGV�RQ�GHVLJQ�RI�SURFHVVRU��0HDVXUHG�LQ�QDQRVHFRQGV��a����IRU�IDVWHVW�SURFHVVRUV���,QYHUVH�RI�IUHTXHQF\��0+]�
,QVWUXFWLRQ��DQ�DFWLRQ�H[HFXWHG�E\�D�SURFHVVRU��VXFK�DV�D�PDWKHPDWLFDO�RSHUDWLRQ�RU�D�PHPRU\�RSHUDWLRQ�
5HJLVWHU��D�VPDOO��H[WUHPHO\�IDVW�ORFDWLRQ�IRU�VWRULQJ�GDWD�RU�LQVWUXFWLRQV�LQ�WKH�SURFHVVRU
��
��
Processor-Related Terms
)XQFWLRQDO�8QLW��D�KDUGZDUH�HOHPHQW�WKDW�SHUIRUPV�DQ�RSHUDWLRQ�RQ�DQ�RSHUDQG�RU�SDLU�RI�RSHUDWLRQV��&RPPRQ�)8V DUH�$''��08/7��,19��6457��HWF�
3LSHOLQH ��WHFKQLTXH�HQDEOLQJ�PXOWLSOH�LQVWUXFWLRQV�WR�EH�RYHUODSSHG�LQ�H[HFXWLRQ
6XSHUVFDODU��PXOWLSOH�LQVWUXFWLRQV�DUH�SRVVLEOH�SHU�FORFN�SHULRG
)ORSV��IORDWLQJ�SRLQW�RSHUDWLRQV�SHU�VHFRQG��
Processor-Related Terms
&DFKH��IDVW�PHPRU\��65$0��QHDU�WKH�SURFHVVRU��+HOSV�NHHS�LQVWUXFWLRQV�DQG�GDWD�FORVH�WR�IXQFWLRQDO�XQLWV�VR�SURFHVVRU�FDQ�H[HFXWH�PRUH�LQVWUXFWLRQV�PRUH�UDSLGO\��
7/%��7UDQVODWLRQ�/RRNDVLGH %XIIHU�NHHSV�DGGUHVVHV�RI�SDJHV��EORFN�RI�PHPRU\��LQ�PDLQ�PHPRU\�WKDW�KDYH�UHFHQWO\�EHHQ�DFFHVVHG��D�FDFKH�IRU�PHPRU\�DGGUHVVHV�
��
Memory-Related Terms
65$0��6WDWLF�5DQGRP�$FFHVV�0HPRU\��5$0���9HU\�IDVW��a���QDQRVHFRQGV���PDGH�XVLQJ�WKH�VDPH�NLQG�RI�FLUFXLWU\�DV�WKH�SURFHVVRUV��VR�VSHHG�LV�FRPSDUDEOH�
'5$0��'\QDPLF�5$0��/RQJHU�DFFHVV�WLPHV��a����QDQRVHFRQGV���EXW�KROG�PRUH�ELWV�DQG�DUH�PXFK�OHVV�H[SHQVLYH����[�FKHDSHU��
0HPRU\�KLHUDUFK\��WKH�KLHUDUFK\�RI�PHPRU\�LQ�D�SDUDOOHO�V\VWHP��IURP�UHJLVWHUV�WR�FDFKH�WR�ORFDO�PHPRU\�WR�UHPRWH PHPRU\ 0RUH ODWHU
��
Interconnect-Related Terms
◆ /DWHQF\��+RZ�ORQJ�GRHV�LW�WDNH�WR�VWDUW�VHQGLQJ�D��PHVVDJH�"�0HDVXUHG�LQ�PLFURVHFRQGV�
�$OVR�LQ�SURFHVVRUV��+RZ�ORQJ�GRHV�LW�WDNH�WR�RXWSXW�UHVXOWV�RI�VRPH�RSHUDWLRQV��VXFK�DV�IORDWLQJ�SRLQW�DGG��GLYLGH�HWF���ZKLFK�DUH�SLSHOLQHG"�
◆ %DQGZLGWK��:KDW�GDWD�UDWH�FDQ�EH�VXVWDLQHG�RQFH�WKH�PHVVDJH�LV�VWDUWHG"�0HDVXUHG�LQ�0E\WHV�VHF�
��
Interconnect-Related Terms
7RSRORJ\��WKH�PDQQHU�LQ�ZKLFK�WKH�QRGHV�DUH�FRQQHFWHG��¾%HVW�FKRLFH�ZRXOG�EH�D�IXOO\�FRQQHFWHG�QHWZRUN��HYHU\�SURFHVVRU�WR�HYHU\�RWKHU���8QIHDVLEOH�IRU�FRVW�DQG�VFDOLQJ�UHDVRQV�
¾,QVWHDG��SURFHVVRUV�DUH�DUUDQJHG�LQ�VRPH�YDULDWLRQ�RI�D�JULG��WRUXV��RU�K\SHUFXEH�
3-d hypercube 2-d mesh 2-d torus
��
Highly Parallel Supercomputing: Where Are We?
◆ 3HUIRUPDQFH�¾ 6XVWDLQHG�SHUIRUPDQFH�KDV�GUDPDWLFDOO\�LQFUHDVHG�GXULQJ�WKH�ODVW
\HDU�¾ 2Q�PRVW�DSSOLFDWLRQV��VXVWDLQHG�SHUIRUPDQFH�SHU�GROODU�QRZ�
H[FHHGV�WKDW�RI�FRQYHQWLRQDO�VXSHUFRPSXWHUV��%XW���
¾ &RQYHQWLRQDO�V\VWHPV�DUH�VWLOO�IDVWHU�RQ�VRPH�DSSOLFDWLRQV�
◆ /DQJXDJHV�DQG�FRPSLOHUV�¾ 6WDQGDUGL]HG��SRUWDEOH��KLJK�OHYHO�ODQJXDJHV�VXFK�DV�+3)��390�
DQG�03,�DUH�DYDLODEOH��%XW����¾ ,QLWLDO�+3)�UHOHDVHV�DUH�QRW�YHU\�HIILFLHQW�¾ 0HVVDJH�SDVVLQJ�SURJUDPPLQJ�LV�WHGLRXV�DQG�KDUG
WR�GHEXJ�
¾ 3URJUDPPLQJ�GLIILFXOW\�UHPDLQV�D�PDMRU�REVWDFOH�WRXVDJH�E\�PDLQVWUHDP�VFLHQWLVW�
��
��
Highly Parallel Supercomputing: Where Are We?
◆ 2SHUDWLQJ�V\VWHPV�¾5REXVWQHVV�DQG�UHOLDELOLW\�DUH�LPSURYLQJ�¾1HZ�V\VWHP�PDQDJHPHQW�WRROV�LPSURYH�V\VWHP�XWLOL]DWLRQ��%XW���
¾5HOLDELOLW\�VWLOO�QRW�DV�JRRG�DV�FRQYHQWLRQDO�V\VWHPV�
◆ ,�2�VXEV\VWHPV�¾1HZ�5$,'�GLVNV��+L33, LQWHUIDFHV��HWF��SURYLGH�VXEVWDQWLDOO\�LPSURYHG�,�2�SHUIRUPDQFH��%XW���
¾,�2�UHPDLQV�D�ERWWOHQHFN�RQ�VRPH�V\VWHPV�
��
The Importance of Standards -Software◆ :ULWLQJ�SURJUDPV�IRU�033�LV�KDUG����
◆ %XW�����RQH�RII�HIIRUWV�LI�ZULWWHQ�LQ�D�VWDQGDUG�ODQJXDJH
◆ 3DVW�ODFN�RI�SDUDOOHO�SURJUDPPLQJ�VWDQGDUGV����¾ ����KDV�UHVWULFWHG�XSWDNH�RI�WHFKQRORJ\��WR��HQWKXVLDVWV��
¾ ����UHGXFHG�SRUWDELOLW\��RYHU�D�UDQJH�RI�FXUUHQWDUFKLWHFWXUHV�DQG�EHWZHHQ�IXWXUH�JHQHUDWLRQV�
◆ 1RZ�VWDQGDUGV�H[LVW���390��03,��+3)���ZKLFK����¾ ����DOORZV�XVHUV��PDQXIDFWXUHUV�WR�SURWHFW�VRIWZDUH�LQYHVWPHQW
¾ ����HQFRXUDJH�JURZWK�RI�D��WKLUG�SDUW\��SDUDOOHO�VRIWZDUH�LQGXVWU\��SDUDOOHO�YHUVLRQV�RI�ZLGHO\�XVHG�FRGHV
��
The Importance of Standards -Hardware
◆ 3URFHVVRUV¾ FRPPRGLW\�5,6&�SURFHVVRUV
◆ ,QWHUFRQQHFWV¾ KLJK�EDQGZLGWK��ORZ�ODWHQF\�FRPPXQLFDWLRQV�SURWRFRO¾ QR�GH�IDFWR�VWDQGDUG�\HW��$70��)LEUH &KDQQHO��+33,��)'',�
◆ *URZLQJ�GHPDQG�IRU�WRWDO�VROXWLRQ�¾ UREXVW�KDUGZDUH���XVDEOH�VRIWZDUH
◆ +3&�V\VWHPV�FRQWDLQLQJ�DOO�WKH�SURJUDPPLQJ�WRROV��HQYLURQPHQWV���ODQJXDJHV���OLEUDULHV���DSSOLFDWLRQVSDFNDJHV�IRXQG�RQ�GHVNWRSV
��
The Future of HPC◆ 7KH�H[SHQVH�RI�EHLQJ�GLIIHUHQW�LV�EHLQJ�UHSODFHG�E\�WKH�HFRQRPLFV�RI�EHLQJ�WKH�VDPH
◆ +3&�QHHGV�WR�ORVH�LWV��VSHFLDO�SXUSRVH��WDJ
◆ 6WLOO�KDV�WR�EULQJ�DERXW�WKH�SURPLVH�RI�VFDODEOH�JHQHUDO�SXUSRVH�FRPSXWLQJ����
◆ ����EXW�LW�LV�GDQJHURXV�WR�LJQRUH�WKLV�WHFKQRORJ\
◆ )LQDO�VXFFHVV�ZKHQ�033�WHFKQRORJ\�LV�HPEHGGHG�LQ�GHVNWRS�FRPSXWLQJ
◆ <HVWHUGD\V�+3&�LV�WRGD\V�PDLQIUDPH�LV�WRPRUURZV�ZRUNVWDWLRQ
��
Achieving TeraFlops
◆ ,Q���������*IORS�V
◆ �����IROG�LQFUHDVH¾$UFKLWHFWXUH
� J]UQTNYNSL�UFWFQQJQNXR
¾3URFHVVRU��FRPPXQLFDWLRQ��PHPRU\�2TTWJgX�1F\
¾$OJRULWKP�LPSURYHPHQWV
� GQTHP�UFWYNYNTSJI�FQLTWNYMRX
��
Future: Petaflops ( fl pt ops/s)
◆ $�3IORS�IRU���VHFRQG��O D�W\SLFDO�ZRUNVWDWLRQ�FRPSXWLQJ�IRU���\HDU�
◆ )URP�DQ�DOJRULWKPLF�VWDQGSRLQW¾ FRQFXUUHQF\
¾ GDWD�ORFDOLW\
¾ ODWHQF\��V\QF
¾ IORDWLQJ�SRLQW�DFFXUDF\
1015
¾ G\QDPLF�UHGLVWULEXWLRQ�RI������ZRUNORDG
¾ QHZ�ODQJXDJH�DQG�FRQVWUXFWV
¾ UROH�RI�QXPHULFDO�OLEUDULHV
¾ DOJRULWKP�DGDSWDWLRQ�WR�KDUGZDUH�IDLOXUH
Today flops for our workstations≈ 1015