Earth Simulator

29
The Earth Simulator Presented by Jin Soon Lim for CS 566

Transcript of Earth Simulator

Page 1: Earth Simulator

The Earth Simulator

Presented by Jin Soon Lim for CS 566

Page 2: Earth Simulator

Outline

• Brief history of Supercomputing• The architecture of the Earth Simulator

– Processor node– Arithmetic processor– Interconnection Network– Inter-node communication mechanism– Inter-node synchronization

• Parallel programming environment– MPI

• Performance• Projects

Page 3: Earth Simulator

Historical Background of Supercomputing

• Modern Scalar-processors• The first modern scalar computer: CDC 6600 in 1964• Seymour Cray

• Vector Processors• Cray-1 in 1975

• 1980’s• Supercomputing became one of the most important research

tools• In 1989, USA had 167, Japan 90, Europe 92.

• 1990’s• Parallel supercomputers with tightly connected CPUs• Clusters and grid systems.

Page 4: Earth Simulator

Earth Simulator

• “Earth Simulator project” started in 1997 by Japanese government

• For simulating global environment change problems

• System design proposed by NEC Corporation

• Construction completed in February 2002 and the operation started from March 2002

• Cost = about $350 million

• Fastest supercomputer in 2002 ~ 2004

• No. 7 in Top500 list for November 2005

Page 5: Earth Simulator

Earth Simulator Facilities

• Located in Yokohama, Japan

Page 6: Earth Simulator

Earth Simulator Facilities

Page 7: Earth Simulator

Earth Simulator Facilities

Page 8: Earth Simulator

Earth Simulator Facilities

• 640x130 = 83200 cables.• 2,400 km

Page 9: Earth Simulator

System Overview

• Highly parallel vector supercomputer• 640 processor nodes• Crossbar interconnection network• 8 arithmetic processors per node (Total 640x8 = 5120)• Distributed shared memory• System disk 415 TB, user disk 225 TB.

Page 10: Earth Simulator

System Overview

• Three architectural features for high-performance and high-efficiency– Vector processor– Shared Memory– High-bandwidth and non-blocking interconnection

crossbar network

• Three levels of parallelizing paradigms– Vector processing on a processor– Parallel processing with shared memory within a node– Parallel processing among distributed nodes via the

interconnection network

Page 11: Earth Simulator

Processor Node

• Shared memory parallel vector supercomputer• 8 arithmetic processors (8Gflops per AP)• Peak performance: 64Gflops• Data transfer rate btw AP and main memory: 32GB/s• Aggregate bandwidth: 256GB/s

Page 12: Earth Simulator

Arithmetic Processor (AP)

• 1 chip LSI• 8Gflops• 500MHz (1GHz)

• Vector Unit• 6 types of vector

pipelines• 72 vector registers• (72x256x64 = 144KB)

• Scalar Unit• 4-way super scalar• 128 scalar registers

Page 13: Earth Simulator

Interconnection Network

• 640 x 640 single-stage non-blocking crossbar switch• Global addressing and synchronization• 2 control units (XCT)• 128 crossbar switches (XSW)• Data transfer rate btw two nodes: 12.3GB/s x 2 ways

Page 14: Earth Simulator

Inter-node communication mechanism

Page 15: Earth Simulator

Inter-node synchronization

• Global Barrier Counter (GBC)• Global Barrier Flag (GBF)

Page 16: Earth Simulator

Parallel programming Environment

• Operating System– UNIX-based system (SUPER-UX for NEC SX series)

• Hybrid Parallel programming environment

Page 17: Earth Simulator

MPI for Earth Simulator

• MPI/ES• Supports the full MPI-2 Standard• Optimized to achieve highest performance of

communication on the ES architecture• Communication mode

– Point to point– One-sided– Collective

• Parallel I/O• Dynamic process management

Page 18: Earth Simulator

Performance of MPI libraries

• Memory space of a process on the ES– Local memory (LMEM)– Global memory (GMEM)

• Both can be assigned to buffers of MPI functions• GMEM is addressed globally over nodes and can be

accessed by every MPI processes allocated to different nodes

• The behavior of MPI communications is different according to the memory area where the buffers are resided.

Page 19: Earth Simulator

Performance of MPI libraries

• Case 1: Data stored in LMEM of a process A are transferred to LMEM of another process B in the same node.

• Case 2: Data stored in LMEM of the process A are transferred to LMEM of a process C invoked in different node.

• Case 3: Data stored in GMEM of the process A are transferred to GMEM of the process B in the same node.

• Case 4: Data stored in GMEM of the process A are transferred to GMEM of the process C invoked in different node.

Page 20: Earth Simulator

Performance of MPI libraries (MPI_Send)

• Case 3: maximum 14.8GB/s• Case 4: maximum 11.8GB/s

Page 21: Earth Simulator

Performance of MPI libraries

• MPI_Get: max 11.62GB/s MPI_PUT: 11.62 GB/s• MPI_Accumulate: max 3.16 GB/s

Page 22: Earth Simulator

Performance of MPI libraries (MPI_Barrier)

• Scalability of MPI_Barrier

Page 23: Earth Simulator

Performance of MPI libraries (MPI_Barrier)

• Scalability of MPI_Barrier

Page 24: Earth Simulator

Performance of ES

• Peak performance / AP: 8Gflops • Peak performance / PN: 64Gflops• Total peak performance: 40Tflops (64Gflops x 640)• Memory bandwidth / AP: 32GB/s• Memory bandwidth / PN: 256GB/s• Main memory / PN: 16GB• Total main memory: 10TB (16GB x 640)• Total memory throughput: 160TB/s

Page 25: Earth Simulator

LINPACK performance

• Achieved 35.86Tflops• Ratio of peak performance > 85%

Page 26: Earth Simulator

Projects using ES in 2005

• Ocean & Atmosphere (12)– Future Climate Change Projection using a High-

Resolution Coupled Ocean-Atmosphere Climate Model

• Solid Earth (9)– Numerical simulation of the mantle convection

• Computer Science (1)– Development of Micro-Macro Interaction Simulation

Algorithm• Epoch-making Simulation (22)

– Nano-simulation of electrode reaction in fuel cells

Page 27: Earth Simulator

The use of ES

• Condition for an application to run on ES• The Number of PNs for a job must be less than or equal

to 10• This can be extended to 512 based on other conditions

being satisfied.• The number of PNs can be expanded if the vectorization

ratio > 95% and the parallel efficiency > 0.5

Page 28: Earth Simulator

Summary

• Highly parallel vector supercomputer system• Distributed shared memory• High-bandwidth and non-blocking crossbar

interconnection network• Three levels of parallel programming

– Inter-node: Message Passing (MPI)– Intra-node: Shared Memory (Open MP)– AP: Vectorization

• Promotes research on environmental problems.

Page 29: Earth Simulator

References

• The Earth Simulator Center <http://www.es.jamstec.go.jp/esc/eng/ES/index.html>

• S. Habata, M. Yokokawa, and S. Kitawaki, "The Earth Simulator System", NEC Res. & Develop., Vol. 44, No.1, pp. 21-26, January 2003. <http://www.nec.co.jp/techrep/en/r_and_d/r03/r03-no1/rd06.pdf>

• T. Sato, S. Kitawaki, and M. Yokokawa, "Earth Simulator Running", ISC, June 2002. <http://www.ultrasim.info/sato.pdf>

• Jack Dongarra, “The Earth Simulator”, WTEC Panel Report, December 2004. <http://www.wtec.org/hec/report/02-Earth.pdf>

• Christopher Lazou, “Historical Perspective of Supercomputing”, NEC HPCE, June 2002. <http://www.hpce.nec.com/typo3conf/ext/nf_downloads/pi1/passdownload.php?downloaddata=26>