Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005...
-
Upload
abigail-waters -
Category
Documents
-
view
218 -
download
0
description
Transcript of Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005...
![Page 1: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/1.jpg)
Performance Analysis of HPC with Lmbench
Didem Unat Supervisor: Nahil Sobh
July 22nd 2005
netfiles.uiuc.edu/dunat2/www
![Page 2: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/2.jpg)
Lmbench: Micro-Benchmark Suite
• Simple, portable benchmarks• Compares different Unix systems
performance• Measures latency and bandwidth • Only analyzes performance of
processor, memory, network, file system and disk
• Free software
![Page 3: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/3.jpg)
Compiler & optimization issues
• The GNU C compiler is used for all the resources but copper
• IBM xlc compiler was used on copper. • All of the benchmarks were compiled with
optimization -O except the benchmarks that calculate clock speed and the context switch times
![Page 4: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/4.jpg)
Metrics in the Benchmark
Bandwidth • Pipe/ TCP• Cached file read• Memory copy• Memory read/write
Latency• System call• Signal handling• Process creation• Basic CPU operations• Context switching• Inter process communication• File and VM system• Memory read latencies
![Page 5: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/5.jpg)
Metrics in the Benchmark
Bandwidth • Pipe/ TCP• Cached file read• Memory copy• Memory read/write
Latency• System call• Signal handling• Process creation• Basic CPU operations• Context switching• Inter process communication• File and VM system• Memory read latencies
![Page 6: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/6.jpg)
Inter Process Communication Bandwidth
• Transfers 64 MB of data in 64 KB chunks
through• Unix Pipe • Unix sockets • TCP/IP sockets 0
500
1000
1500
2000
2500
3000
Pipe AF Unix TCP
W Co Cu Hg
MB/sec
![Page 7: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/7.jpg)
Inter Process Communication Bandwidth
• Transfers 64 MB of data in 64 KB chunks
through• Unix Pipe • Unix sockets • TCP/IP sockets 0
500
1000
1500
2000
2500
3000
Pipe AF Unix TCP
W Co Cu Hg
MB/sec
W
Co
![Page 8: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/8.jpg)
Metrics in the Benchmark
Bandwidth • Pipe/ TCP• Cached file read• Memory copy• Memory read/write
Latency• System call• Signal handling• Process creation• Basic CPU operations• Context switching• File and VM system• Inter process communication • Memory read latencies
![Page 9: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/9.jpg)
Cached file read• A reread benchmark, intended to be used
on a file that is in memory • File reread :
copies data from the kernel’s file system page into the processor’s buffer
• Mmap reread :
maps the entire file (8 MB) into process’s address space
![Page 10: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/10.jpg)
![Page 11: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/11.jpg)
![Page 12: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/12.jpg)
![Page 13: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/13.jpg)
![Page 14: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/14.jpg)
Metrics in the Benchmark
Bandwidth • Pipe/TCP• Cached file read• Memory copy• Memory read/write
Latency• System call• Signal handling• Process creation• Basic CPU operations• Context switching• File and VM system• Inter process communication • Memory read latencies
![Page 15: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/15.jpg)
Memory copy• Measures how fast the system
can bcopy data• Bcopy copies n bytes from string
source to string destination• An 8 MB to 8 MB copy, does not
fit in the cache• Kernel bcopy and C library bcopy• C library bcopy shown in the
next slide
![Page 16: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/16.jpg)
![Page 17: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/17.jpg)
Metrics in the Benchmark
Bandwidth • Pipe/TCP• Cached file read• Memory copy• Memory read/write
Latency• System call• Signal handling• Process creation• Basic CPU operations• Context switching• File and VM system• Inter process communication • Memory read latencies
![Page 18: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/18.jpg)
Memory read/writeRead• Measures the time to read data into
the processor• An unrolled loop that sums up a series
of integers
Write• Measures the time to write data to
memory• An unrolled loop that stores a value
into an integer
![Page 19: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/19.jpg)
![Page 20: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/20.jpg)
![Page 21: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/21.jpg)
12
3
![Page 22: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/22.jpg)
![Page 23: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/23.jpg)
![Page 24: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/24.jpg)
![Page 25: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/25.jpg)
Metrics in the Benchmark
Bandwidth • Pipe/ TCP• Cached file read• Memory copy• Memory read/write
Latency• System call• Signal handling• Process creation• Basic CPU operations• Context switching• Inter process communication• File and VM system• Memory read latencies
![Page 26: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/26.jpg)
Operating System Entry/ Signal Handling / Process Creation Costs
• Process-related latencies
• System Call null call, null I/O, stat, open/close
• Signal Handling signal installation, signal handling
• Process Creation fork + exit, fork + execve, fork +
/bin/sh -c
![Page 27: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/27.jpg)
![Page 28: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/28.jpg)
Metrics in the Benchmark
Bandwidth • Pipe/ TCP• Cached file read• Memory copy• Memory read/write
Latency• System call• Signal handling• Process creation• Basic CPU operations• Context switching• Inter process communication• File and VM system• Memory read latencies
![Page 29: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/29.jpg)
![Page 30: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/30.jpg)
![Page 31: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/31.jpg)
Metrics in the Benchmark
Bandwidth • Pipe/ TCP• Cached file read• Memory copy• Memory read/write
Latency• System call• Signal handling• Process creation• Basic CPU operations• Context switching• Inter process communication• File and VM system• Memory read latencies
![Page 32: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/32.jpg)
Context Switching• The time to save the state of one process and
restore the state of another process
• The processes are connected in a ring of Unix pipes
• A token is passed from process to process
• The process allocates an array and sums the array
• Context-switch time doesn't include the overhead of doing the work.
• Two parameters: number and size of processes
![Page 33: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/33.jpg)
![Page 34: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/34.jpg)
Metrics in the Benchmark
Bandwidth • Pipe/ TCP• Cached file read• Memory copy• Memory read/write
Latency• System call• Signal handling• Process creation• Basic CPU operations• Context switching• Inter process communication• File and VM system• Memory read latencies
![Page 35: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/35.jpg)
Interprocess Communication Latencies• Passing a small message back and forth
between two processes
• The time reported is one round trip
• Message size: a byte or a word
• Metrics: Pipe, Unix Socket, UDP and TCP , RPC/UDP-TCP, TCP connection latency
![Page 36: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/36.jpg)
![Page 37: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/37.jpg)
Metrics in the Benchmark
Bandwidth • Pipe/ TCP• Cached file read• Memory copy• Memory read/write
Latency• System call• Signal handling• Process creation• Basic CPU operations• Context switching• Inter process communication• File and VM system• Memory read latencies
![Page 38: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/38.jpg)
File & VM System• File create/ delete creates a number of small files in the current
working directory and then removes the files
• Mmap latency : costs of mmapping and unmmapping varying file sizes
• Prot fault : the time to catch a protection fault • Page fault : the cost of page faulting pages from a file
• 100 fd selct : the time to do a select on n file descriptors
![Page 39: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/39.jpg)
![Page 40: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/40.jpg)
Metrics in the Benchmark
Bandwidth • Pipe/ TCP• Cached file read• Memory copy• Memory read/write
Latency• System call• Signal handling• Process creation• Basic CPU operations• Context switching• Inter process communication • File and VM system• Memory read latencies
![Page 41: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/41.jpg)
Memory Latencies
• Measures memory read latency for varying memory sizes and strides
• The size of the array starts from 512 bytes
• The stride varies from 16 to 1024
• Does not include the instruction execution time
![Page 42: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/42.jpg)
![Page 43: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/43.jpg)
![Page 44: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/44.jpg)
Conclusion the best has problems IPC bandwidth
Co W, Cu
Cashed I/O bandwidth
W Co, Hg
Memory R/W Bandwidth
W Co, Hg
Process Creation
Cu Co
CPU ops W , Co, Hg Cu
Network Lat W Co, Cu
Memory Lat W, Co Cu
![Page 45: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/45.jpg)
THANK YOU !
Have a nice weekend !
![Page 46: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.](https://reader036.fdocuments.in/reader036/viewer/2022062504/5a4d1b5c7f8b9ab0599ab59c/html5/thumbnails/46.jpg)
References
• “Lmbench – Tools for Performance Analysis” http://www.bitmover.com/lmbench/
• Larry McVoy and Carl Staelin, “Lmbench: Portable tools for performance analysis”
http://www.usenix.org/publications/library/proceedings/ sd96/full_papers/mcvoy.pdf
• Carl Staelin, “Lmbench:an extensible micro-benchmark suite”
http://www.hpl.hp.com/techreports/2004/HPL-2004-213.html