Evaluating current processors performance and machines stability R. Esposito 2, P. Mastroserio 2, F....

1
Evaluating current processors performance and machines stability R. Esposito 2 , P. Mastroserio 2 , F. Taurino 2,1 , G. Tortone 2 1 INFM, Sez. di Napoli, Italy 1 INFN, Sez. di Napoli, Italy Benchmarks and Stress Tests Benchmarks and Stress Tests Accurately estimate performance of currently available processors is becoming a key activity, particularly in HENP environment, where high computing power is crucial. This document describes the methods and programs, opensource or freeware, used to benchmark processors, memory and disk subsystems and network connection architectures. These tools are also useful to stress test new machines, before their acquisition or before their introduction in a production environment, where high uptimes are requested. The “benchmarking suite” shown in this poster consists in some free applications used to evaluate and stress test each machine subsystem: CPU, memory hierarchy, disks, network. CPU and Memory CPU and Memory GLIBENCH GLIBENCH This tool executes Dhrystones (MIPS), Whetstones (MFLOPS), Matrix operations, Number crunching, Floating point and Memory throughput tests. NBENCH NBENCH Based on beta release 2 of BYTE Magazine's BYTEmark benchmark program (previously known as BYTE's Native Mode Benchmarks), and runs 10 tests to compare the running machine with an AMD K6 @233MHz. It returns a three indexes: Memory, Integer and Floating-point. BYTEBENCH BYTEBENCH Used to test a *nix machine in different ways. It runs arithmetic tests, system tests like process spawning or context switching. LMBENCH LMBENCH It’s a series of micro benchmarks intended to measure basic operating system and hardware system metrics. The benchmarks fall into three general classes: bandwidth, latency, and ``other''. UBENCH UBENCH Ubench is executing rather senseless mathematical integer and floating-point calculations for 3 mins concurrently using several processes, and the result is Ubench CPU benchmark. It is executing rather senseless memory allocation and memory to memory copying CHEP03 - March 24-28, 2003 - La Jolla, California MEMPERF MEMPERF It measures the memory bandwidth in a 2 dimensional way. First it varies the block size which provides information of the throughput in different memory system hierarchys (different cache levels). Secondly it varies the access pattern from contiguous blocks to different strided accesses. STREAM STREAM The STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth (in MB/s) and the corresponding computation rate for simple vector kernels. LLCBENCH LLCBENCH It groups three benchmarks: BlasBench, to test BLAS routines; CacheBench, to test cache memory; MPBench, to test MPI implementations Disks and Network Disks and Network BONNIE++ BONNIE++ A modified version of Bonnie, which creates, reads, writes and deletes very big files. IOZONE IOZONE This benchmark generates and measures a variety of file operations: Read, write, re-read, re-write, read backwards, read strided, fread, fwrite, random read/write, pread/pwrite variants, aio_read, aio_write, mmap NETPERF NETPERF It provides tests for both unidirecitonal throughput, and end-to- end latency with TCP, UDP, sockets. Conclusions Conclusions This set of benchmarks allows us to accurately characterize raw performances of available machines. Though many commercial or free benchmark tools are currently available, we have chosen the ones shown in this poster because, in our experience, they seem to give a satisfying performance analysis of every single hardware component. Furthermore this suite of benchmarks has proven to be a valid tool to stress test machines before starting production activities. NETPIPE NETPIPE This tool can benchmark network communications with non standard hardware, like high speed interconnections used in cluster environments. PALLAS PALLAS It’s a complex benchmarks used to evaluate MPI performance. It provides a concise set of benchmarks targeted at measuring the most important MPI functions. 0 2000 4000 6000 M IPS G LIB EN C H -D hrystones A thlon Xeon 0 1000 2000 3000 M FLO P S G LIBENCH -W hetstones A thlon Xeon 0 50000 100000 150000 200000 kops G LIBENCH -M atrix ops A thlon Xeon Example of Example of glibench results: results: athlon 2100+ vs Xeon athlon 2100+ vs Xeon 2.2Ghz 2.2Ghz PALLAS -M PIcom m unication on Fast Ethernet 0 2 4 6 8 10 12 14 16 18 20 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 block size [K B ytes] bandw ith [M bytes/s] NETPIPE -FastEthernetthroughput 0,00 10,00 20,00 30,00 40,00 50,00 60,00 70,00 80,00 90,00 100,00 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 block size [K B ytes] bandw ith [M bit/s] Example of Example of NETPIPE and and PALLAS results results on Intel epro100 on Intel epro100 fast ethernet NIC fast ethernet NIC 0 500 1000 1500 2000 2500 3000 3500 M Byte/s 1 7 13 19 1 K 4 K 16 K 64 K 256 K 1 M 4 M Strides Block size M EM PERF -M em ory read throughput on a P entium III1Ghz 0-500 500-1000 1000-1500 1500-2000 2000-2500 2500-3000 3000-3500 0 5 10 15 20 N bench index M em index Intindex Fp index N bench -A thlon 2100+ 500 550 600 650 700 Mbyte/s Copy Scale Add Triad Stream -Athlon 2100+ Example of results obtained by Example of results obtained by NBENCH , , STREAM and and MEMPERF : : APFLOAT APFLOAT It is a high performance arbitrary precision package that can be used to perform calculations involving millions of digits, such as . POVRAY POVRAY This well known program creates 3dimensional graphics, using standard, Athlon optimized or Pentium optimized binaries. 0 200 400 600 seconds 1 proc 2 proc A pfloat C alculation ofpiw ith 10M digits Athlon Xeon 0 1000 2000 3000 4000 seconds Norm al P ovR ay R endering tim e for"benchm ark.pov" Athlon Xeon Example of results obtained by Example of results obtained by POVRAY and and APFLOAT on a dual Athlon 2100+ and dual Xeon 2.2Ghz on a dual Athlon 2100+ and dual Xeon 2.2Ghz

Transcript of Evaluating current processors performance and machines stability R. Esposito 2, P. Mastroserio 2, F....

Page 1: Evaluating current processors performance and machines stability R. Esposito 2, P. Mastroserio 2, F. Taurino 2,1, G. Tortone 2 1 INFM, Sez. di Napoli,

Evaluating current processors performance and machines stability R. Esposito2, P. Mastroserio2, F. Taurino2,1, G. Tortone2

1INFM, Sez. di Napoli, Italy 1INFN, Sez. di Napoli, Italy

Benchmarks and Stress TestsBenchmarks and Stress Tests

Accurately estimate performance of currently available processors is becoming a key activity, particularly in HENP environment, where high computing power is crucial. This document describes the methods and programs, opensource or freeware, used to benchmark processors, memory and disk subsystems and network connection architectures. These tools are also useful to stress test new machines, before their acquisition or before their introduction in a production environment, where high uptimes are requested.The “benchmarking suite” shown in this poster consists in some free applications used to evaluate and stress test each machine subsystem: CPU, memory hierarchy, disks, network.

CPU and MemoryCPU and Memory

GLIBENCHGLIBENCHThis tool executes Dhrystones (MIPS), Whetstones (MFLOPS), Matrix operations, Number crunching, Floating point and Memory throughput tests.

NBENCHNBENCHBased on beta release 2 of BYTE Magazine's BYTEmark benchmark program (previously known as BYTE's Native Mode Benchmarks), and runs 10 tests to compare the running machine with an AMD K6 @233MHz. It returns a three indexes: Memory, Integer and Floating-point.

BYTEBENCHBYTEBENCHUsed to test a *nix machine in different ways.It runs arithmetic tests, system tests like process spawning or context switching.

LMBENCHLMBENCHIt’s a series of micro benchmarks intended to measure basic operating system and hardware system metrics.The benchmarks fall into three general classes: bandwidth, latency, and ``other''.

UBENCHUBENCHUbench is executing rather senseless mathematical integer and floating-point calculations for 3 mins concurrently using several processes, and the result is Ubench CPU benchmark.It is executing rather senseless memory allocation and memory to memory copying operations for another 3 mins concurrently using several processes, and the result is Ubench MEM benchmark.

CHEP03 - March 24-28, 2003 - La Jolla,

California

MEMPERFMEMPERFIt measures the memory bandwidth in a 2 dimensional way. First it varies the block size which provides information of the throughput in different memory system hierarchys (different cache levels). Secondly it varies the access pattern from contiguous blocks to different strided accesses.

STREAMSTREAMThe STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth (in MB/s) and the corresponding computation rate for simple vector kernels.

LLCBENCHLLCBENCHIt groups three benchmarks: BlasBench, to test BLAS routines; CacheBench, to test cache memory; MPBench, to test MPI implementations

Disks and NetworkDisks and Network

BONNIE++BONNIE++A modified version of Bonnie, which creates, reads, writes and deletes very big files.

IOZONEIOZONEThis benchmark generates and measures a variety of file operations: Read, write, re-read, re-write, read backwards, read strided, fread, fwrite, random read/write, pread/pwrite variants, aio_read, aio_write, mmap

NETPERFNETPERFIt provides tests for both unidirecitonal throughput, and end-to-end latency with TCP, UDP, sockets.

ConclusionsConclusions

This set of benchmarks allows us to accurately characterize raw performances of available machines.Though many commercial or free benchmark tools are currently available, we have chosen the ones shown in this poster because, in our experience, they seem to give a satisfying performance analysis of every single hardware component. Furthermore this suite of benchmarks has proven to be a valid tool to stress test machines before starting production activities.

NETPIPENETPIPEThis tool can benchmark network communications with non standard hardware, like high speed interconnections used in cluster environments.

PALLASPALLASIt’s a complex benchmarks used to evaluate MPI performance.It provides a concise set of benchmarks targeted at measuring the most important MPI functions.

0

2000

4000

6000

MIPS

GLIBENCH - Dhrystones

Athlon

Xeon

0

1000

2000

3000

MFLOPS

GLIBENCH - Whetstones

Athlon

Xeon

0

50000

100000

150000

200000

kops

GLIBENCH - Matrix ops

Athlon

Xeon

Example of Example of glibench results: results:athlon 2100+ vs Xeon 2.2Ghzathlon 2100+ vs Xeon 2.2Ghz

PALLAS - MPI communication on Fast Ethernet

0

2

4

6

8

10

12

14

16

18

20

1 2 4 8 16 32 64 128 256 512 1024 2048 4096

block size [KBytes]

ba

nd

wit

h [

Mb

yte

s/s]

NETPIPE - Fast Ethernet throughput

0,00

10,00

20,00

30,00

40,00

50,00

60,00

70,00

80,00

90,00

100,00

1 2 4 8 16 32 64 128 256 512 1024 2048 4096

block size [KBytes]

ba

nd

wit

h [

Mb

it/s

]

Example of Example of NETPIPE and and PALLAS results on Intel results on Intel epro100 fast ethernet NICepro100 fast ethernet NIC

0500100015002000250030003500

MByte/s

1

7

13

19

1 K

4 K

16 K

64 K

256

K

1 M

4 M

Strides

Block size

MEMPERF - Memory read throughput on a Pentium III 1Ghz

0-500 500-1000 1000-1500 1500-2000

2000-2500 2500-3000 3000-35000

5

10

15

20

Nb

ench

in

dex

Mem index Int index Fp index

Nbench - Athlon 2100+

500

550

600

650

700

Mby

te/s

Copy Scale Add Triad

Stream - Athlon 2100+

Example of results obtained byExample of results obtained by

NBENCH,, STREAM andand MEMPERF::

APFLOATAPFLOATIt is a high performance arbitrary precision package that can be used to perform calculations involving millions of digits, such as .

POVRAYPOVRAYThis well known program creates 3dimensional graphics, using standard, Athlon optimized or Pentium optimized binaries.

0

200

400

600

seco

nds

1 proc 2 proc

ApfloatCalculation of pi with 10M digits

Athlon

Xeon

0

1000

2000

3000

4000

seco

nd

s

Normal

PovRayRendering time for "benchmark.pov"

Athlon

Xeon

Example of results obtained byExample of results obtained by

POVRAY andand APFLOAT

on a dual Athlon 2100+ and dual Xeon 2.2Ghzon a dual Athlon 2100+ and dual Xeon 2.2Ghz