SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico...

21
SCARIe FABRIC SCARIe FABRIC A pilot study of distributed A pilot study of distributed correlation correlation Huib Jan van Langevelde Huib Jan van Langevelde Ruud Oerlemans Ruud Oerlemans Nico Kruithof Nico Kruithof Sergei Pogrebenko Sergei Pogrebenko and many others… and many others…

Transcript of SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico...

Page 1: SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…

SCARIe FABRICSCARIe FABRICA pilot study of distributed A pilot study of distributed

correlationcorrelation

Huib Jan van LangeveldeHuib Jan van Langevelde

Ruud OerlemansRuud Oerlemans

Nico KruithofNico Kruithof

Sergei PogrebenkoSergei Pogrebenkoand many others…and many others…

Page 2: SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…

hu

ib 0

2/1

1/0

6

2/17GiGaPort meeting SURF Utrecht 2 Nov 2006

What correlators do…• Synthesis imaging simulates a very large

telescope•by measuring Fourier components of sky brightness•on each baseline pair

• Sensitivity is proportional to √bandwidth•optimal use of available recording bandwidth•by sampling 2 bits (4 level) at Nyquist rate

• Correlator calculates ½N(N-1) baseline outputs•after compensating for the geometry of array• Integrates output signal to something relatively slow•and samples with delay/frequency resolution

Page 3: SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…

hu

ib 0

2/1

1/0

6

3/17GiGaPort meeting SURF Utrecht 2 Nov 2006

EVN MkIV data processor at JIVE

• Implements this in custom silicon

•16 stations input from tapes•now hard-disks and fibres

• Input data is 1 Gb/s max•1 or 2 bit sampled•up to 16 sub-bands• format includes time codes

• “Super computer” 1024 chips•256 complex correlations each•at 32 MHz clock

• Around 100 T-operations/sec•2 bit only!•Depends a bit how you do itShould next correlator also use special hardware?

Page 4: SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…

hu

ib 0

2/1

1/0

6

4/17GiGaPort meeting SURF Utrecht 2 Nov 2006

Next generation…

•Can be implemented on standard computing?

• Time critical, keep up with input

• example: LOFAR on BlueGene

• Higher precision and new applications

• Better sensitivity, interference mitigation, spacecraft navigation

•Can CPU cycles be found on the Grid?

• From 16 antenna @ 1Gb/s (eVLBI)

• And growing…• To 1000s at 100 Gb/s (SKA)• Pilot projects FABRIC &

SCARIe• Connectivity, workflow• Real-time resource

allocation

BG/LR ac k

BG/LR ac k

BG/LR ac k

BG/LR ac k

GbE

sw

itch

GbE

switc

hG

bEsw

itch

GbE

switc

hG

bEsw

itch

10G

bEsw

itch

10G

bEsw

itch

10G

bEsw

itch

10G

bEsw

itch

10G

bEsw

itch

Cl u

ste r

of s

e rve

r s4B

G R

AM

/nod

eIn

finib

and

inte

rcon

n ect

C luster o f serversgeneral purpo se no desInf iniband interco nnect

Clu

ste r

of s

e rve

r s10

TB

RA

ID p

er n

ode

Infin

iban

d in

terc

onn e

ct

C luster o f serversgeneral purpo se no desInf iniband interco nnect

LOFAR central processor

FABRIC eVLBI

SKA inner core (5km)

Page 5: SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…

hu

ib 0

2/1

1/0

6

5/17GiGaPort meeting SURF Utrecht 2 Nov 2006

Tflops, Pflops…

•2 bit operations ⇒ floating point

• Results in enormous computing tasks

• Very few operations / bit

• Some could be associated with telescopetypical VLBI problems

descriptionN

telescopesN

subbandsdata-rate

[Mb/s]N

spect/prod Tflops1 Gb/s full array 16 16 1024 16 83.89typical eVLBI continuum 8 8 128 16 2.62typical spectral line 10 2 16 512 16.38FABRIC demo 4 2 16 32 0.16future VLBI 32 32 4096 256 21474.84

Rough estimate based on XF correlation

SKA not even in here…

Page 6: SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…

hu

ib 0

2/1

1/0

6

6/17GiGaPort meeting SURF Utrecht 2 Nov 2006

SCARIe FABRIC

•EC funded project EXPReS (03/2006)• To turn eVLBI into an operational system• Plus: Joint Research Activity: FABRIC

• Future Arrays of Broadband Radio-telescopes on Internet Computing

•One work-package on 4Gb/s data acquisition and transport(Jodrell Bank, Metsahovi, Onsala, Bonn, ASTRON)

•One work-package on distributed correlation (JIVE, PSNC Poznan)

•Dutch NWO funded project SCARIe (10/2006)• Software Correlator Architecture Research and Implementation

for eVLBI

• Collaboration with SARA and UvA• Use Dutch Grid with configurable high connectivity:

StarPlane• Software correlation with data originating from JIVE

•Complementary projects with matching funding• International and national expertise from other partners• Total of 9 man year at JIVE, plus some matching from staff

• plus similar amount at partners

Page 7: SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…

hu

ib 0

2/1

1/0

6

7/17GiGaPort meeting SURF Utrecht 2 Nov 2006

Aim of the project

• Research the possibility of distributed correlation• Using the Grid for getting the CPU cycles • Can it be employed for the next generation VLBI

correlation?• Exercise the advantages of software correlation

• Using floating point accuracy and special filtering• Explore (push) the boundaries of the Grid paradigm

• “Real time” applications, data transfer limitations

• To lead to a modest size demo• With some possible real applications:

• Monitoring EVN network performance• Continuous available eVLBI network with few telescopes

•Monitoring transient sources•Astrometry, possibly of spectral line sources

• Special correlator modes: spacecraft navigation, pulsar gating• Test bed for broadband eVLBI research

Something to try on the roadmap for the next generation correlator,

even if you do not believe it is the solution…

Page 8: SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…

hu

ib 0

2/1

1/0

6

8/17GiGaPort meeting SURF Utrecht 2 Nov 2006

Previous experience on Software correlation

• Builds on previous experience at JIVE

• regular and automated network performance tests

• Using Japanese software correlator from NICT

•Huygens extreme narrow band correlation

• Home grown superFX with sub-Hz resolution

Page 9: SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…

hu

ib 0

2/1

1/0

6

9/17GiGaPort meeting SURF Utrecht 2 Nov 2006

Work packages

• Grid resource allocation•Grid workflow management

• Tool to allocate correlator resources and schedule correlation

• Data flow from telescopes to appropriate correlator resources

•Expertise from the Poznan group in Virtual Laboratories• Will this application fit on Grid?• As it is very data intensive• And time-critical if not real-time

• Software correlation•correlator algorithm design

• High precision correlation on standard computing• Scalable to cluster computers • Portable for grid computers and interfaced to standard

middleware• Interactive visualization and output definition

• Collect & merge data in EVN archive• Standard format and proprietary rights

Page 10: SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…

hu

ib 0

2/1

1/0

6

10/17GiGaPort meeting SURF Utrecht 2 Nov 2006

Basic idea •Use the Grid for correlation•CPU cycles on compute nodes•The Net could be crossbar switch?

•Correlation will be asynchronous•Based on floating point arithmetic•Portable code, standard environment

Page 11: SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…

hu

ib 0

2/1

1/0

6

11/17GiGaPort meeting SURF Utrecht 2 Nov 2006

Workflow Management

• Must interact with normal VLBI schedules•Divide data, route to compute nodes, setup correlation•Dynamic resource allocation, keep up with incoming data!

Eff

ort fro

m P

ozn

an

, based

on

their V

irtual

Lab

.

Page 12: SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…

hu

ib 0

2/1

1/0

6

12/17GiGaPort meeting SURF Utrecht 2 Nov 2006

Topology

•Slice in time• Every node gets an interval

• A “new correlator” for every time slice

• Employ clusters computers at nodes

• Minimizes total data transport

• Bottleneck at compute node

• Probably good connectivity at Grid nodes anyway

• Scales perfectly• Easily estimated how many

nodes are needed• Works with heterogeneous

nodes• But leaves sorting to

compute nodes• Memory access may limit

effectiveness

•Slice in baseline• Assign a (or a range of)

products to a certain node• E.g. two data streams

meet in some place• Transport Bottleneck at

sources (telescopes)• Maybe curable with

multicast transport mechanism which forks at network nodes

• Some advantage when local nodes at telescopes

• Does not scale very simply• Simple schemes for ½N2

nodes• Need to re-sort output

• But reduces the compute problem

• Using the network as the cross-bar switch

Page 13: SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…

hu

ib 0

2/1

1/0

6

13/17GiGaPort meeting SURF Utrecht 2 Nov 2006

Work packages

• Grid resource allocation•Grid workflow management

• Tool to allocate correlator resources and schedule correlation

• Data flow from telescopes to appropriate correlator resources

•Expertise from the Poznan group in Virtual Laboratories• Will this application fit on Grid?• As it is very data intensive• And time-critical if not real-time

• Software correlation•correlator algorithm design

• High precision correlation on standard computing• Scalable to cluster computers • Portable for grid computers and interfaced to standard

middleware• Interactive visualization and output definition

• Collect & merge data in EVN archive• Standard format and proprietary rights

Page 14: SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…

hu

ib 0

2/1

1/0

6

14/17GiGaPort meeting SURF Utrecht 2 Nov 2006

Broadband software correlation

Raw data 16 MHz,Mk4 format on linux disk

Channel extraction

Extracted data

Delay corrections

Delay corrected data

Station 1 Station 2 Station N

Correlation. SFXC

Data Product

Pre-calculated,Delay tables

From Mk5 to linux disk

Raw data BW=16 MHz, Mk4 format on Mk5 disk

DIM,TRM,CRM

DCM,DMM,FR

SU

Correlator

Chip

EVN Mk4 equivalents

Page 15: SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…

hu

ib 0

2/1

1/0

6

15/17GiGaPort meeting SURF Utrecht 2 Nov 2006

Better SNR than Mk4 hardware

Page 16: SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…

hu

ib 0

2/1

1/0

6

16/17GiGaPort meeting SURF Utrecht 2 Nov 2006

Software correlation

•Working on benchmarking• Single core processors so

far• Different CPU’s available

• Already quite efficient• More work on memory

performance

•Must deploy on cluster computers

•And then on Grid

•Organize the output to be used for astronomy

SFX correlator: measuring CPU on single coreAuto and Cross correlations

0

500

1000

1500

2000

2500

3000

3500

4000

0 4 8 12 16 20 24 28 32 36 40 44

number of stations

CP

U tim

e (

s)

jop32

pcint

cedar

SFX correlator:CPU contributions

0

500

1000

1500

2000

2500

3000

3500

4000

0 4 8 12 16 20 24 28 32 36 40 44

number of stations

CP

U tim

e (

s)

cedar

FFT only

I/O only

FFT Auto

Page 17: SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…

hu

ib 0

2/1

1/0

6

17/8NRI eSciences 2 Nov 2006

Side step: Data intensive processing

•Radio-astronomy can be extreme

•User data sets can be large• Few – 100 GB now• Larger: LOFAR, eVLBI, APERTIF, SKA

• All data enter imaging• Iterative calibration schemes• Few operations per Byte

•Parallel computing: not obviously suited for messaging systems

• Task (data oriented) parallelization• Processing traditionally done

interactively on user platform• More and more pipeline approaches

•Addressed in RadioNet• Project ALBUS

• resulted in Python for AIPS

• Looking for extension in FP7• Interoperability with ALMA, LOFAR• But for user domain

Page 18: SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…
Page 19: SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…

hu

ib 0

2/1

1/0

6

19/17GiGaPort meeting SURF Utrecht 2 Nov 2006

Goal of the project

• Develop: methods for high data rate e-VLBI using distributed correlation

•High data rate eVLBI data acquisition and transport• Develop a scalable prototype for broadband data

acquisition•Prototype acquisition system

• Establish a transportation protocol for broadband e-VLBI•Build into prototype, establish interface normal system

• Interface e-VLBI public networks with LOFAR and e-MERLIN dedicated networks

•Correlate wide band Onsala data on eMERLIN•Demonstrate LOFAR connectivity

•Distributed correlation• Setup data distribution over Grid

•Workflow management tool

• Develop a software correlator•Run a modest distributed eVLBI experiment

Page 20: SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…

hu

ib 0

2/1

1/0

6

20/17GiGaPort meeting SURF Utrecht 2 Nov 2006

Current eVLBI practice

observing schedulein VEX format

user correlatorparameters

earth orientationparameters

correlator controlincluding model

calculation

field systemcontrols antennaand acquisition

BBC & samplers

Mk4formatter

Mk5playback

Mk5recorder

Mk4 datain Mk5prop form

over TCPIP

outputdata

Page 21: SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…

hu

ib 0

2/1

1/0

6

21/17GiGaPort meeting SURF Utrecht 2 Nov 2006

FABRIC=

The GRID

FABRIC components

observing schedulein VEX format

user correlatorparameters

GRIDresources data

correlator controlincluding model

calculation

field systemcontrols antennaand acquisition

DBBCVSI

VSIe??on??

outputdata

earth orientationparameters

PC-EVN#2

resource allocationand routing