Investigating Network Performance – A Case Study

34
Investigating Network Performance – A Case Study Ralph Spencer, Richard Hughes- Jones, Matt Strong and Simon Casey The University of Manchester G2 Technical Workshop, Cambridge, Jan 2006

description

Investigating Network Performance – A Case Study. Ralph Spencer, Richard Hughes-Jones, Matt Strong and Simon Casey The University of Manchester G2 Technical Workshop, Cambridge, Jan 2006. Very Long Baseline Interferometry. eVLBI – using the Internet for data transfer. - PowerPoint PPT Presentation

Transcript of Investigating Network Performance – A Case Study

Page 1: Investigating Network Performance – A Case Study

Investigating Network Performance – A Case Study

Ralph Spencer, Richard Hughes-Jones, Matt Strong and Simon Casey

The University of ManchesterG2 Technical Workshop, Cambridge, Jan

2006

Page 2: Investigating Network Performance – A Case Study

Very Long Baseline Interferometry

eVLBI – using the Internet for data transfer

Page 3: Investigating Network Performance – A Case Study

GRS 1915+105: 15 solar mass BH in an X-ray binary: MERLIN observations

receding

600 mas = 6000 A.U. at 10 kpc

Page 4: Investigating Network Performance – A Case Study

Sensitivity in Radio Astronomy

• Noise level• B=bandwidth, integration

time.• High sensitivity requires large

bandwidths as well as large collecting area e.g Lovell, GBT, Effelsberg, Camb. 32-m

• Aperture synthesis needs signals from individual antennas to be correlated together at a central site

• Need for interconnection data rates of many Gbit/sec

B/1

Page 5: Investigating Network Performance – A Case Study

New Instruments are making the best use of bandwidth:

• eMERLIN 30 Gbps• Atacama Large mm Array

(ALMA) 120 Gbps• EVLA 120 Gbps• Upgrade to European VLBI:

eVLBI 1 Gbps• Square Km Array (SKA)

many Tbps

Page 6: Investigating Network Performance – A Case Study

The European VLBI NetworkEVN

• Detailed radio imaging uses antenna networks over 100s-1000s km

• Currently use disk recording at 512Mb/s (Mk5)

• real-time connection allows greater – response– reliability– sensitivity– Need Internet

eVLBI

Page 7: Investigating Network Performance – A Case Study

WesterborkNetherlands

Dedicated

Gbit link

EVN-NREN

OnsalaSweden

Gbit link

Jodrell BankUK

DwingelooDWDM link

CambridgeUK

MERLIN

MedicinaItaly

Chalmers University

of Technolo

gy, Gothenbu

rg

TorunPoland

Gbit link

Page 8: Investigating Network Performance – A Case Study

Testing the Network for eVLBI Aim is to obtain maximum BW compatible

with VLBI observing systems in Europe and USA.

First sustained data flow tests in Europe:

iGRID 200224-26 September 2002

Amsterdam Science and Technology Centre (WTCW)

The Netherlands“ We hereby challenge the international research

community to demonstrate applications that benefit from huge amounts of bandwidth! ”

Page 9: Investigating Network Performance – A Case Study

iGRID2002 Radio Astronomy VLBI Demo.

• Web based demonstration sending VLBI data– A controlled stream of UDP packets– 256-500 Mbit/s

• production network Man –Superjanet Geant --Amsterdam

• Dedicated lambda Amsterdam Dwingeloo

Page 10: Investigating Network Performance – A Case Study

The Works:

n bytes

Wait timetime

Raid0Disc

UDP Data

Raid0Disc

RingBuffer RingBuffer

TCP Control

Web Interface

Page 11: Investigating Network Performance – A Case Study

UDP Throughput on the Production WAN

Manc-UvA SARA 750 Mbit/s SJANET4 + Geant +

SURFnet 75% Manchester Access link

Manc-UvA SARA 825 Mbit/s

UDP Man-UvA Gig 19 May 02

0

100

200

300

400

500

600

700

800

900

1000

0 5 10 15 20 25 30 35 40

Transmit Time per frame us

Rec

v W

ire

rate

Mb

its/

s

50 bytes

100 bytes

200 bytes

400 bytes

600 bytes

800 bytes

1000 bytes

1200 bytes

1472 bytes

UDP Man-UvA Gig 28 Apr 02

0

100

200

300

400

500

600

700

800

900

1000

0 5 10 15 20 25 30 35 40

Transmit Time per frame us

Rec

v W

ire

rate

Mbi

ts/s

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

Page 12: Investigating Network Performance – A Case Study
Page 13: Investigating Network Performance – A Case Study

How do we test the network?• Simple connectivity test from Telescope site to

correlator (at JIVE, Dwingeloo, The Netherlands, or MIT Haystack Observatory, Massachusetts) : traceroute, bwctl

• Performance of link and end hosts: UDPmon, iPERF• Sustained data tests vlbiUDP (under development)• True eVLBI data from Mk5 recorder: pre-recorded

(Disk2Net) or Real Time (Out2Net)

Mk 5’s are 1.2 GHz P3’s with Streamstore cardsand 8-pack exchangeable disks, 1.3 Tbytes storage.Capable of 1 Gbps continuous recording and playback.Made by Conduant, Haystack design.

Page 14: Investigating Network Performance – A Case Study

Jodrell BankUK

OnsalaSweden

MedicinaItaly

TorunPoland

EffelsbergGermany

WesterborkNetherlands

Telescope connections

JIVE

1Gb/s

1Gb/s

1Gb/s

155Mb/s

MERLIN

CambridgeUK

2* 1G

1Gb/s light now

MERLINe

??end 06???

Page 15: Investigating Network Performance – A Case Study

eVLBI Milestones • January 2004: Disk buffered eVLBI session:

• Three telescopes at 128Mb/s for first eVLBI image

• On – Wb fringes at 256Mb/s

• April 2004: Three-telescope, real-time eVLBI session.

• Fringes at 64Mb/s• First real-time EVN image - 32Mb/s.

• September 2004: Four telescope real-time eVLBI• Fringes to Torun and Arecibo• First EVN, eVLBI Science session

• January 2005: First “dedicated light-path” eVLBI• ??Gbyte of data from Huygens descent

transferred from Australia to JIVE• Data rate ~450Mb/s

Page 16: Investigating Network Performance – A Case Study

• 20 December 20 2004• connection of JBO to Manchester by 2 x 1 GE• eVLBI tests between Poland Sweden UK and Netherlands at 256 Mb/s

• February 2005• TCP and UDP memory – memory tests at rates up to 450 Mb/s(TCP) and 650 Mb/s (UDP)• Tests showed inconsistencies betweeb Red Hat kernals, rates of 128 Mb/s only obtained on 10 Feb• Haystack (US) – Onsala (Sweden) runs at 256 Mb/s

• 11 March 2005 Science demo• JBO telescope winded off, short run on calibrator source done

Page 17: Investigating Network Performance – A Case Study
Page 18: Investigating Network Performance – A Case Study

Summary of EVN eVLBI tests

• Regular tests with eVLBI Mk5 data every ~6 weeks – 128 Mpbs OK, 256 Mpbs often,– 512 Mbps Onsala – Jive occasionally– but not JBO at 512 Mbps – WHY NOT?(NB using Jumbo packets 4470 or 9000 bytes)

• Note correlator can cope with large error rates– up to ~ 1 %– but need high throughput for sensitivity– implications for protocols, since throughput on TCP

is very sensitive to packet loss.

Page 19: Investigating Network Performance – A Case Study

Gnt5-DwMk5 11Nov03-1472 bytes

0

2

4

6

8

10

12

0 5 10 15 20 25 30 35 40Spacing between frames us

% P

acket

loss

Gnt5-DwMk5

DwMk5-Gnt5

Throughput vs packet spacing Manchester: 2.0G Hz Xeon Dwingeloo: 1.2 GHz PIII Near wire rate, 950 Mbps UDPmon

Packet loss

CPU Kernel Load sender

CPU Kernel Load receiver 4th Year project

Adam Mathews Steve O’Toole

Gnt5-DwMk5 11Nov03/DwMk5-Gnt5 13Nov03-1472bytes

0

200

400

600

800

1000

1200

0 5 10 15 20 25 30 35 40Spacing between frames us

Recv W

ire r

ate

Mbits/s

Gnt5-DwMk5

DwMk5-Gnt5

Gnt5-DwMk5 11Nov03 1472 bytes

020406080

100

0 5 10 15 20 25 30 35 40Spacing between frames us

% K

erne

l S

ende

r

Gnt5-DwMk5 11Nov03 1472 bytes

020406080

100

0 5 10 15 20 25 30 35 40Spacing between frames us

% K

erne

l R

ecei

ver

UDP Throughput Oct-Nov 2003 Manchester-Dwingeloo Production

Page 20: Investigating Network Performance – A Case Study

ESLEA

• Packet loss will cause low throughput in TCP/IP

• Congestion will result in routers drooping packets: use Switched Light Paths!

• Tests with MB-NG network Jan-Jun 05

• JBO connected to JIVE via UKLight in June (thanks to John Graham, UKERNA)

• Comparison tests between UKLight connections JBO-JIVE and production (SJ4-Geant)

Page 21: Investigating Network Performance – A Case Study

Project Partners

Project Collaborators

The Council for the Central Laboratoryof the Research Councils

Funded by

EPSRC GR/T04465/01

www.eslea.uklight.ac.uk

£1.1 M, 11.5 FTE

Page 22: Investigating Network Performance – A Case Study

UKLight Switched light path

Page 23: Investigating Network Performance – A Case Study

Tests on the UKLight switched light-path Manchester : Dwingeloo

• Throughput as a function of inter-packet spacing (2.4 GHz dual Xeon machines)

• Packet loss for small packet size • Maximum size packets can reach

full line rates with no loss, and there was no re-ordering (plot not shown).

gig03-jiveg1_UKL_25Jun05

0100200300400500600700800900

1000

0 10 20 30 40Spacing between frames us

Rec

v W

ire r

ate

Mbi

t/s

50 bytes

100 bytes

200 bytes

400 bytes

600 bytes

800 bytes

1000 bytes

1200 bytes

1400 bytes

1472 bytes

gig03-jiveg1_UKL_25Jun05

0.0001

0.001

0.01

0.1

1

10

100

0 10 20 30 40Spacing between frames us

% P

acke

t lo

ss

50 bytes

100 bytes 200 bytes

400 bytes 600 bytes

800 bytes 1000 bytes

1200 bytes 1400 bytes

1472 bytes

Page 24: Investigating Network Performance – A Case Study

Tests on the production network Manchester : Dwingeloo.

• Throughput

• Small (0.2%) packet loss was seen

• Re-ordering of packets was significant

gig6-jivegig1_31May05

0.0001

0.001

0.01

0.1

1

10

100

0 10 20 30 40Spacing between frames us

% P

acket

loss

50 bytes

100 bytes 200 bytes

400 bytes 600 bytes

800 bytes 1000 bytes

1200 bytes 1400 bytes

1472 bytes

Page 25: Investigating Network Performance – A Case Study

UKLight using Mk5 recording terminals

Page 26: Investigating Network Performance – A Case Study

Jodrell BankUK

DwingelooDWDM link

MedicinaItaly Torun

Poland

e-VLBI at the GÉANT2 Launch Jun 2005

Page 27: Investigating Network Performance – A Case Study

UDP Performance: 3 Flows on GÉANT• Throughput: 5 Hour run 1500 byte MTU

• Jodrell: JIVE2.0 GHz dual Xeon – 2.4 GHz dual Xeon670-840 Mbit/s

• Medicina (Bologna): JIVE 800 MHz PIII – Mk5 (623) 1.2 GHz PIII 330 Mbit/s limited by sending PC

• Torun: JIVE 2.4 GHz dual Xeon – Mk5 (575) 1.2 GHz PIII

245-325 Mbit/s limited by security policing

(>600Mbit/s 20 Mbit/s) ?

• Throughput: 50 min period• Period is ~17 min

BW 14Jun05

0

200

400

600

800

1000

0 500 1000 1500 2000Time 10s steps

Rec

v w

ire ra

te M

bit/s

JodrellMedicinaTorun

BW 14Jun05

0

200

400

600

800

1000

200 250 300 350 400 450 500Time 10s steps

Rec

v w

ire ra

te M

bit/s

JodrellMedicinaTorun

Page 28: Investigating Network Performance – A Case Study

18 Hour Flows on UKLightJodrell – JIVE, 26 June 2005

• Throughput:• Jodrell: JIVE

2.4 GHz dual Xeon – 2.4 GHz dual Xeon

960-980 Mbit/s

• Traffic through SURFnet

• Packet Loss– Only 3 groups with 10-150 lost

packets each– No packets lost the rest of the

time

• Packet re-ordering– None

man03-jivegig1_26Jun05

0

200

400

600

800

1000

0 1000 2000 3000 4000 5000 6000 7000

Time 10s steps

Rec

v w

ire r

ate

Mbi

t/s

w10

man03-jivegig1_26Jun05

900910920930940950

960970980990

1000

5000 5050 5100 5150 5200

Time 10s

Recv w

ire r

ate

Mbit/s w10

man03-jivegig1_26Jun05

1

10

100

1000

0 1000 2000 3000 4000 5000 6000 7000

Time 10s steps

Packet

Loss

w10

Page 29: Investigating Network Performance – A Case Study

Recent Results 1:

• iGRID 2005 and SC 2005– Global eVLBI demonstration– Achieved 1.5 Gbps across Atlantic using UKLight– 3 VC-3-13c ~700 Mbps SDH links carrying data

across the Atlantic from Onsala, JBO and Westerbork telescopes

– 512 Mps K4 – Mk5data from Japan to USA– 512 Mbs Mk5 real time interferometry between

Onsala, Westford, Maryland Point antennas correlated at Haystack observatory

– Used VLSR technology from DRAGON project in US to set up light paths.

Page 30: Investigating Network Performance – A Case Study

<JBO Mk2 Westerbork array>

Onsala 20-m

Kashima 34-m >

Page 31: Investigating Network Performance – A Case Study

Recent results 2:• Why can Onsala achieve 512 Mbps from Mk5 to Mk5 even transatlantic?

– Identical Mk5 to JBO – Longer link

• iperf TCP JBO Mk5 to Man. rtt ~1ms 4420 byte packets get 960 Mpbs

• iperf TCP JBO Mk5 to JIVE rtt ~15ms 4420 byte packets get 777 Mpbs

Not much wrong with the networks!

• – –

• shows 94.7% kernel usage and 1.5% idle

• shows 96.3% kernel usage and 0.06% idle – no cpu left!

• Likelihood is that Onsala Mk 5 marginally faster cpu – at critical point for 512 Mbps transmission

• Solution – better motherboards for Mk5’s – about 40 machines to upgrade!

mk5-606-jive_9Dec05

0102030405060708090

100

0 1 2 3 4 5trial

% C

PU

ker

nel

00.511.522.533.544.55

% C

PU

mod

e

kernel

user

nice

idle

mk5-606-g7_10Dec05

0100200300400500600700800900

1000

0 2 4 6 8 10 12 14 16 18 20nice large value - low priority

Thr

ough

put M

bit/s

no CPU load

Page 32: Investigating Network Performance – A Case Study

The Future:• Regular eVLBI tests in EVN continue• Testing Mk5 SuperStor interface <-> network

interaction• Test upgraded Mk5 recording devices• Investigate alternatives to TCP/UDP – DCCP,

vlbiUDP, tsunami, etc.• ESLEA comparing UKLight with production• EU’s EXPReS eVLBI project starts March 2006

– Connection of 100-m Effelsberg telescope in 2006– Protocols for distributed processing– Onsala-JBO correlator test link at 4 Gbps in 2007

• eVLBI will become routine in 2006!

Page 33: Investigating Network Performance – A Case Study

Processing Nodes

Controller/DataConcentrator

VLBI Correlation: GRID Computation task

Page 34: Investigating Network Performance – A Case Study

Questions ?