ASKAP STYLE SKA2 CORRELATOR CONCEPT DESCRIPTION€¦ · such as beamforming. It also does not cover...

Name Designation Affiliation Date Signature

Additional Authors

Submitted by:

J.D. Bunton CSIRO 2011‐03‐26

Approved by:

W. Turner Signal Processing Domain Specialist

SPDO 2011‐03‐26

ASKAP STYLE SKA2 CORRELATOR CONCEPT DESCRIPTION

Document number .................................................................. WP2‐040.060.010‐TD‐001Revision ........................................................................................................................... 1Author ............................................................................................................ J. D. BuntonDate ................................................................................................................. 2010‐03‐29Status ............................................................................................... Approved for release

WP2‐040.060.010‐TD‐001 Revision : 1

2010‐03‐29 Page 2 of 23

DOCUMENT HISTORY

Revision Date Of Issue Engineering Change

Number

Comments

1 29th March 2011‐ ‐ First issue

DOCUMENT SOFTWARE

Package Version Filename

Wordprocessor MsWord Word 2003 03c‐wp2‐040.060.010‐td‐001‐1‐ASKAPconcept‐description‐2003

Block diagrams

Other

ORGANISATION DETAILS

Name SKA Program Development Office Physical/Postal

Address Jodrell Bank Centre for Astrophysics

Alan Turing Building

The University of Manchester

Oxford Road

Manchester, UK

M13 9PL Fax. +44 (0)161 275 4049

Website www.skatelescope.org

WP2‐040.060.010‐TD‐001 Revision : 1

2010‐03‐29 Page 3 of 23

TABLE OF CONTENTS

1 INTRODUCTION ............................................................................................. 6 1.1 Purpose of the document ....................................................................................................... 6

2 REFERENCES ................................................................................................ 7

3 INTRODUCTION ............................................................................................. 9 3.1 Summary ................................................................................................................................. 9 3.2 Scope ....................................................................................................................................... 9

4 SKA SPECIFICATION .................................................................................. 9

5 TECHNICAL ASSUMPTIONS ..................................................................... 10

6 WBSPF CORRELATOR.............................................................................. 11

7 DISH WITH PAF ....................................................................................... 14

8 SIMULTANEOUS WBSF AND PAF OPERATION .................................................... 14

9 COMBINED PAF AND WBSP CORRELATOR ........................................................ 15

10 NON‐IMAGING PROCESSING ............................................................... 16

11 TIED ARRAY BEAMS .................................................................................. 16

12 TRANSIENT AND PULSAR PROCESSING ............................................................ 17

13 INCOHERENT BEAM FORMING ...................................................................... 17

14 TRANSIENT BUFFER AND TRANSIENT TRIGGER ................................................... 18

15 APERTURE ARRAY CORRELATORS ........................................................ 18

16 CONCLUSION ...................................................................................... 20

17 CONCLUSION ...................................................................................... 21

18 ACKNOWLEDGEMENT ......................................................................... 21

19 APPENDIX ........................................................................................... 21 19.1 Multiplication per Input Sample for a Polyphase Filter Bank ............................................... 21 19.2 Correlator compute load ....................................................................................................... 21

WP2‐040.060.010‐TD‐001 Revision : 1

2010‐03‐29 Page 4 of 23

LIST OF FIGURES

Figure 1 Proposed filterbank operation for SKA ................................................................................... 12 Figure 2 WBSPF Correlator .................................................................................................................... 13 Figure 3 WBSPF and PAF correlator ...................................................................................................... 16 Figure 4 Possible configuration of AA correlators ................................................................................ 19 Figure 5 AA correlator with 150MHz combined AA lo and AA hi correlator or separated AA lo and AA

hi correlators. When the combined correlator is operating then the AA hi bandwidth is reduced to 300MHz .................................................................................................................... 20

LIST OF TABLES

Table 1 Summary of SKA modes and correlator requirements ............................................................ 10 Table 2 Correlator compute load for Dishes ......................................................................................... 15

WP2‐040.060.010‐TD‐001 Revision : 1

2010‐03‐29 Page 5 of 23

LIST OF ABBREVIATIONS

AA .................................. Aperture Array

CMAC ............................ Complex Multiply Accumulate

FPGA ............................. Field Programmable Gate Array

GPU ............................... Graphics Processing Unit

GS ................................. GigaSamples

MWA .............................. Murchison Wide-field Array

PAF ............................... Phased Array Feed

MAC .............................. Multiply accumulate

SKA ............................... Square Kilometre Array

SKAMP .......................... SKA Molongo Prototype

WBSPF .......................... Wide Band Single Pixel Feed

Shelf .............................. Also card cage, chassis or crate of boards

SPDO ............................ SKA Program Development Office

Copyright and Disclaimer © 2010 CSIRO To the extent permitted by law, all rights are reserved and no part of this publication covered by copyright may be reproduced or copied in any form or by any means except with the written permission of CSIRO. Important Disclaimer

[1] CSIRO advises that the information contained in this publication comprises general statements based on scientific research. The reader is advised and needs to be aware that such information may be incomplete or unable to be used in any specific situation. No reliance or actions must therefore be made on that information without seeking prior expert professional, scientific and technical advice. To the extent permitted by law, CSIRO (including its employees and consultants) excludes all liability to any person for any consequences, including but all losses, damages, costs, expenses and any other compensation, arising directly or indirectly from using this publication (in part or in whole) and any not limited to information or material contained in it.

WP2‐040.060.010‐TD‐001 Revision : 1

2010‐03‐29 Page 6 of 23

1 Introduction This document is a repackaging of SKA Memo 126. The aim is to bring the document within the framework and review process of the Signal Processing domain whilst giving the document a common look and feel to the other concept descriptions. The document concept is based on an ASKAP ‐based architecture but for an SKA correlator and Central beamformer for the full SKA. This takes into account the candidate receptor technologies proposed for use with the full SKA including WBSPF, PAF and Aperture Arrays.

1.1 Purpose of the document

The purpose of this document is to provide a concept description as part of a larger document set in support of the SKA Signal Processing CoDR. It provides a ‘bottom up ‘perspective of Correlation for the different receptor types proposed for the SKA. This document has been produced in accordance to the Systems Engineering Management Plan and Signal Processing PrepSKA Work Breakdown document and includes:

First draft block diagrams of the relevant subsystem

First draft estimates of cost

First draft estimates of power.

At present, details on reliability have not been included.

SKA Memos 125 and the DRM have been used as the baseline for best information on system parameters while the Systems Requirement Specification, SRS, is being created.

WP2‐040.060.010‐TD‐001 Revision : 1

2010‐03‐29 Page 7 of 23

2 References [2] Schilizzi RT et al, "Preliminary Specifications for the Square Kilometre Array. " 2007

http://www.skatelescope.org/PDF/Preliminary_SKA_Specifications.pdf [3] S. Iguchi, S. K. Okumura, M. Okiura, M. Momose, Y. Chikada, ‘4‐Gsps 2‐bit FX Correlator with

262144‐point FFT’ URSI General Assembly 2002, Maastricht, 17‐24 August, See paper 970, http://alma.mtk.nao.ac.jp/~iguchi/alma.files/p0970.pdf

[4] DeBoer, D.R., et.al., “ Australian SKA Pathfinder: A High‐Dynamic Range Wide‐Field of View Survey Telescope Array” IEEE Proceedings, Sept 2009

[5] Kooistra, E., “RadioNet FP7: UniBoard” , CASPER Workshop 2009, Cape Town, Sept 29, 2009. [6] Bunton, J.D. ‘Multi‐resolution FX Correlator’, ALMA memo 447, Feb 2003, [7] Stratix V Device Handbook, Altera, [8] http://science.nrao.edu/alma/aboutALMA/Technology/ALMA_Memo_Series/main_alma_m

emo_series.shtml [9] http://www.altera.com/literature/hb/stratix‐v/stratix5_handbook.pdf [10] Xilinx 7 Series Product Brief http://www.xilinx.com/publications/prod_mktg/7‐Series‐

Product‐Brief.pdf [11] De Souza, L., Bunton, J.D., Campbell‐Wilson, D., Cappallo, R., Kincaid, B., ‘A Radioastronomy

Correlator Optimised for the Virtex‐4 SX FPGA’, IEEE 17th International Conference on Field Programmable Logic and Applications, Amsterdam, Netherland, Aug 27‐29, 2007

[12] Faulkner, A. et. al. “The Aperture Arrays for the SKA: the SKADS White Paper” SKA memo 122, April 2010, http://www.skatelescope.org/PDF/memos/122_Memo_Faulkner.pdf

[13] SKA Science Case [14] The Square Kilometre Array Design Reference Mission: SKA‐mid and SKA‐Lo v 0.4 [15] Science Operations Plan [16] System Interfaces [17] Environmental requirements (natural and induced) [18] SKA strategies and philosophies [19] Risk Register [20] Requirements Traceability [21] Logistic Engineering Management Plan (LEMP) [22] Risk Management Plan (RMP) [23] Document Handling Procedure [24] Project Dictionary [25] Strategy to proceed to the next phase [26] WP3 SKA array configuration report [27] WP3 SKA site RFI environment report [28] WP3 Troposphere measurement campaign report [29] SKA Science‐Technology Trade‐off Process (WP2‐005.010.030‐MP‐004) [30] E. de Lera‐Acedo et al., System Noise Analysis of an Ultra Wide Band Aperture Array: SKADS

Memo T28. [31] SKA Monitoring and Control Strategy WP2‐005.065.000‐R‐001 Issue Draft E [32] “The Square Kilometre Array”, Peter E. Dewdney, Peter J. Hall, Richard T. Schilizzi, and T.

Joseph L. W. Lazio, Proceedings of the IEEE Vol. 97,No. 8, August 2009

WP2‐040.060.010‐TD‐001 Revision : 1

2010‐03‐29 Page 8 of 23

[33] Thompson, A. R., Moran, J. M., and Swenson, G. W. “Interferometry and Aperture Synthesis in Radio Astronomy” (second edition), Wiley, 1986.

[34] System Engineering Management Plan (SEMP) WP2‐005.010.030‐MP‐001Reference 3 [35] SKA System Requirement Specification (SRS) [36] SKA IP Policy Document [37] International Technology Roadmap for Semiconductors (ITRS), available at www.itrs.net.

WP2‐040.060.010‐TD‐001 Revision : 1

2010‐03‐29 Page 9 of 23

3 Introduction 3.1 Summary

The routing of data places a major constraint on the building of correlators. This document considers a possible implementation for routing the data for a SKA consisting of 250 AA stations and 3600 WBSPF dish antennas, 2000 of which have a PAF that can be switched in. It also looks at some of the non‐imaging processing that could occur in the correlator system.

Assumptions are made regarding the data transport technology and the number of inputs possible in a single shelf of processing boards. In addition some rather arbitrary assumptions are made regarding the bandwidth. The designs proposed are indicative only, but provide a starting point for the SKA. In all cases, FPGAs are used to determine the viability of the design but in the future, it might be ASICs, GPUs or CPUs that are the actual processing unit.

3.2 Scope

This design only covers processing that might occur at the site of the correlator, and then only at the very highest level. It does not include any processing that occurs at the antenna or antenna stations such as beamforming. It also does not cover any processing that occurs after the correlator such as imaging and calibration nor any details of non imaging processing

4 SKA SPECIFICATION SKA specifications are still being refined. The ultimate specification will be a balance between science requirements, capital cost and operating cost. Here the specifications are allowed to diverge from various currently proposed specifications in [1 ] and also from specifications put forward by various subsystem proponents. For wide band single pixel feeds WBSPF on parabolic dish reflectors the bandwidth is chosen to be 9GHz as this fits into three 100Gb/s links per antenna. To make comparisons easier the phased array feed on parabolic dishes PAFs and dense aperture arrays bandwidth are both set to 0.6GHz.

There are 250 dense aperture array stations (AA hi). Each station operates in the frequency range 0.3 to 1.4GHz. At the same location are sparse aperture arrays (AA lo) that operate in the frequency range 0.05 to 0.45GHz. The AA lo stations process the full bandwidth of 0.4GHz. Each aperture array generates 1200 simultaneous beams. In the frequency range 0.3 to 0.45 GHz the two types of aperture array operate simultaneously. Total sensitivity in this frequency band is doubled if correlations between the two different arrays are formed. The compute requirements for these correlators is calculated in Appendix [A2] and are shown in Table 1.

WP2‐040.060.010‐TD‐001 Revision : 1

2010‐03‐29 Page 10 of 23

Table 1 Summary of SKA modes and correlator requirements

There are 3600 dishes, 900 of which are beyond 180km from the core. These dishes are formed into stations and only station beams are transported to the correlator. The actual details of the beamforming do not greatly impact the correlator. Here it is assumed that the number of beams from these dishes is at most 300. Thus the total number of inputs into the correlator is 3000 dual polarisation signals for dishes with WBSPF.

Of the remaining dishes the 2000 within 20km of the core can carry phased array feeds (PAFs). Each PAF generates 36 beams with a bandwidth of 0.6GHz each. When the core antennas are operating with PAFs, the 700 antennas in the range 20km to 180km are separately correlated. In addition all WBSPF antennas beyond 20km are in stations. These stations are beamformed and the corresponding beams correlated. A possible configuration is 133 stations with 4 beams per station. The compute load for these correlators are also given in Table 1.

5 TECHNICAL ASSUMPTIONS To generate a design some constraints are needed. These take the form of the following assumptions. Some of the assumptions are based on the author’s experience with the ASKAP and SKAMP/MWA correlators, and some are based on extrapolation to 2020. It is not known if the assumption will be valid in 2020, however they form a starting point for developing a straw‐man design.

Assumption 1

ADC resolution for the WBSPF systems is ~8bits. Correlator data precision is 4bits. The lower bandwidth systems on PAFs or on an AA may have more bits, however as it is assumed that beamforming will occur at the antenna, this does not impact the correlator or signal transport to it.

Assumption 2

The maximum length of an FFT or polyphase filterbank is ~1000 within an FPGA or ASIC. As the size of an FFT increases the computation increases as log(N) but the memory is proportional to N. Eventually the memory dominates. When this starts to occur it is better to use memory external to the processing FPGA/CPU/GPU and process long FFTs in two stages. An example of this is the ~256

WP2‐040.060.010‐TD‐001 Revision : 1

2010‐03‐29 Page 11 of 23

thousand point FFTs built for the ALMA compact array correlator [2]. In these an ASIC implements a 512 point FFT. Two such ASICs with a corner turner memory in between are used to implement the long FFT. Simulations on FPGAs show that a ~2000 point polyphase filter bank uses similar percentages of the block ram and multiplier resources. Beyond this the design is memory limited, leading to underutilisation of multiplier and logic resources.

Assumption 3

A single shelf (crate, chassis or card cage) is limited to ~256 optical connections (16 boards with 16 inputs each). Both the ASKAP beamformer [3] and a full UNIboard system [4] are close to this limit. Digital optical modules are usually sold with both a transmitter and receiver so this gives ~256 optical inputs and 256 outputs. It is assumed that the input and output of an optical module can be used for independent data paths, for example, an input from the antenna and an output to the correlator. This can half the number of expensive optical modules. It will be seen that data transport is a major correlator cost so this optimisation is needed to reduce costs. However, use of optical systems in this way may preclude the use of commercial switches.

Assumption 4

Data transport to the correlator will be on fibres. Already standards exist for 10x10G links to transport 100GE (CAUI IEEE 802.3ba) and existing FPGAs can implement up to 6 such interfaces. Hence, three current generation FPGAs are sufficient to interface to the 16 optical inputs proposed in assumption 3. In the future even more advanced interfaces, possibly quad 25Gb/s system, will be in place.

By the time of SKA the cost of fibre transceivers will be lower and the sweet spot for fibre will be ~100Gb/s. Use of 100Gb/s fibre reduces the physical data routing problem by a factor of 10 compared to 10Gb/s. Use of 100Gb/s fibre also reduces the total number of equipment racks (see assumption (3) – 10 times the fibres means the minimum number of shelves is up to 10 times higher).

Within the correlator path, cabling lengths are likely to exceed 10m. Even now fibre is competitive. It is expected that all data routing between equipment cabinets is on fibre.

6 WBSPF CORRELATOR At a quantisation resolution of 8bits, a 100Gb/s fibre can transport 6GHz of bandwidth for a single polarisation beam (8bits by 12GS/s). Three fibres are needed to transport all data from a single antenna (2x9GHz). If there is no filtering at the antenna, the samples for a single polarisation are split across two fibres and must be recombined at the filterbank. The data rate from the antenna is 36 GS/s. This must be first processed by a filterbank at a cost of ~20 real MAC per sample [A.1]. The total compute load per antenna is ~0.7TMAC per antenna for the filterbanks. This can be achieved in a single present‐day FPGAs or GPUs.

The SKA specifications are for 105 frequency channels, by assumption 2, this is achieved as cascade of two filterbank [5] with external memory to store the intermediate results. It is assumed DRAM is used as external memory. The possible filterbank system is shown in Figure 1. With it, frequency resolutions from 1MHz to 4.5kHz are possible in a 9GHz band. Internal to the processing device a

WP2‐040.060.010‐TD‐001 Revision : 1

2010‐03‐29 Page 12 of 23

single 9 MHz stream is processed at any one time by the fine filterbank. For higher frequency resolutions, a third filterbank can be cascaded on some of the fine filter bank channels.

Figure 1 Proposed filterbank operation for SKA

After data reordering a fine filterbank process a single coarse channel at a time. Also shown is the cross connection needed to aggregate data before transport to the correlator.

The data rate out of the coarse filterbanks for both polarisations is approximately the same as that from the antenna,

9GHz x 2pol x 8+8bits/sample = 288Gb/s.

Already FPGAS with DRAMs data rates of ~2Gb/s per pin [7] have been announces. Future devices will have higher data rates so at most 150 pins are needed to write data to DRAM and a further 150 pins are needed to read the data. Five 72‐pin DRAM modules are sufficient for this. The DRAMs provide a 4 second transient buffer assuming 32Gbyte DRAM are used. By the time of implementation, 32Gbyte DRAM are expected to be standard, and higher capacity DRAMs should be available allowing transient buffering of over 10 seconds. A possible design for the filterbank board has 4 FPGAs and 20 DRAM modules. Each FPGA processes dual polarisation data from a single antenna. The FPGA has 30 10G links to support the I/O (using the same bi‐directional port for input from the antenna and output to the correlator). This requires 30 10G SERDES per FPGA, already Altera [6] have devices with 66 10G SERDES. The board processes the data from 4 antennas (12 fibres). A single 16‐board shelf can process data for 64 antennas, so 47 shelves are needed for a full SKA. The 47 shelves contain 752 boards so by assumption 3, it is not possible to transport the data from individual filterbank boards to a single correlator shelf. With 3000 dual polarisation signals and at most 256 fibres into a correlator shelf each fibre must transport data for at least 12 antennas. Thus a cross connect is needed between filterbank boards.

There are 12 fibres, carrying 8 bit data, coming into a filterbank board. The output is 4 bit data, for which 6 fibres are sufficient. For a shelf with 16 filterbank boards there are at least 96 output fibres per shelf. The cross connect that occurs between the boards allows each output fibre to carry part of the data for all filterbanks. For example, if there are 96,000 frequency channels and 96 fibres then after the cross connect, Figure 1, each fibre carries data for 1000 frequency channels. At least one fibre is needed to connect a beamformer shelf to a correlator shelf and each shelf processes part of the total bandwidth. Assumption 3 limits the possible outputs from a filterbank to fewer than ~256, so there are at most 256 correlator shelves. The minimum correlator shelf count occurs if there are multiple fibres from each filterbank shelf to a correlator shelf. For 47 filterbank shelves there are up

WP2‐040.060.010‐TD‐001 Revision : 1

2010‐03‐29 Page 13 of 23

to 5 fibre connections to each correlator shelf, giving 235 inputs per correlator. In this case, there are 20 correlator shelves.

The choice of the number of shelves at this time is arbitrary but to make comparisons easier the compute capacity of a shelf is chosen to match requirements for an aperture array correlator shelf CS=7.5x1014 CMAC/s [A.2]. For WBSPFs the compute load CWPSPF is 1.63x1017 CMAC/s [A.2]. Hence, 216 correlator shelves are needed as shown in Figure 2. In this design the data rate on each fibre to the correlator is less than 50Gb/s. A system with fewer correlator shelves would make better use of the filterbank‐correlator 100Gb/s links.

Figure 2 WBSPF Correlator

Each fibre to the correlator carries data for 464 (16x29) frequency channels for all 64 antennas processed by a filterbank shelf, assuming ~105 frequency channels. To aggregate the data this requires a cross connect within the filterbank shelf. Another cross connect is needed in the correlator shelf to distribute the 464 frequency channels across the 16 correlator boards. Each correlator board processes data for 29 frequency channels.

The SKAMP/MWA 4bit correlation cell [8] implements a 4bit CMAC with a single 18‐bit multiplier and associated memory and logic. Currently proposed FPGAs [7] have ~4000 18‐bit multipliers that can implement up to,

4000 CMAC at 0.4GHz = 1.6x1012CMAC/s.

Assuming 8 FPGAs per board and 16 boards per shelf, the shelf has 128 FPGAs and can implement 2.048x1014 CMAC/s. This is a factor of 4 less than the requirements. A SKA correlator based on FPGAs should be possible with two more generations of FPGA. This is expected in 2015. If an ASIC is designed then a correlator is possible with current 40nm technology and each ASIC would dissipate about the same as a high‐end FPGA: ~50W. There are 216 shelves with 128 ASICs each, giving a total of 28,000 ASICs. These would dissipate ~1.4MW. Data transport and distribution, and the filterbank would take the total dissipation to close to 2MW. This shows the feasibility of an SKA correlator using 100G fibre links and today’s processing technology. With future lower‐power technology the correlator power dissipation will be much less than 1MW.

As a rule of thumb the cost of a fully populated shelf is $100‐200k so the cost of the 263 correlator and filterbank shelves is estimated to be $26 to $52 million. The costs excludes any NRE costs for an ASIC or system development costs. Added to this are the 9000 input 100G fibre links and the 47x216=10,152 short haul filterbank to correlator links. These are possibly $200 to $1000 per link pair at the time of the SKA which adds $4 to $20M to the cost. Moving the filterbanks to the antenna

WP2‐040.060.010‐TD‐001 Revision : 1

2010‐03‐29 Page 14 of 23

reduces the long haul data fibre costs as the correlator data has fewer bits per sample. However, the cross connect operation implemented in what were the filterbank shelves is still needed. There is now an extra filterbank system to be built at the antennas. An investigation is needed to determine which approach is cheaper.

7 DISH WITH PAF For dishes with PAFs a beamformer is needed. Here it is assumed that the beamformer is at the antenna as this reduces the data rate from the antenna. In the beamformer the data is first decomposed to coarse frequency channels of ~1MHz. This data is then beamformed using a weighted add of the data in the ~1MHz channels. Up to 36 beams are formed with a bandwidth of 600MHz. The beamformer will also allow a tradeoff to be made between beams and bandwidth for example 25 beams at a bandwidth of 864MHz. But this does not affect the correlator requirements.

To minimise data transport from the antennas the beam data should be decimated to its final frequency resolution. This allows the number of bits to be reduced to the 4+4‐bit data resolution of the correlator. Each fibre carries dual polarisation “fine” filterbank data for part of the bandwidth and some of the beams. This could be 9 dual polarisation beams at 600MHz per fibre, but the actual split is unimportant. There are 4 fibres per antenna to carry the data for 36 beams. As the WBSPF and PAF are not used simultaneously, then 3 WBSPF fibres can be reused to transport PAF data. An extra fibre per antenna is needed to carry the rest of the data. The compute load for the PAF correlations is 1.73x1017 CMAC/s [A.2] which is 6% more than that for the WBSF correlator.

8 Simultaneous WBSF and PAF operation This provides a permanent VLBI mode. The total point source sensitivity for this VLBI mode is 4/9 of that for the full 3600 antennas. The observing speed is the square of this. This suggests that for high resolution astronomy the outer antennas operate at 0.2 of the observing speed of the full 3600 antennas. However, for high resolution astronomy the baselines less than 20km add little to the sensitivity. In effect, they provide a single almost noise free datum. It is the correlations between core and the outer antennas, which provide the vast majority of the data. In terms of correlation there are 16002/2 correlations between outer antennas and there are 1600x2000 correlations between core and outer antennas. Observing speed is proportional to the total number of correlation. So the reduction in observing speed is 162/2/(162/2+16*20) = 0.28. Note, station beamforming does not change the sensitivity calculation, only the field of view.

It is expected that all the outer antenna stations will be beamformed in this mode of operation. The larger the number of antennas beamformed the lower the compute load. For an upper limit consider stations with 12 antennas, with each station generating 4 beams. The compute load is 1.3x1015 CMAC/s [A.2].

Another possible mode (180km mode) is correlation of the WBSPF data from the 700 antennas in the range 20 to 180km from the core. This has a sensitivity that is ~25% of the full SKA for the same resolution but it still provides a useful adjunct while the PAFs are being used. The compute load for this mode is 8.8x1015 CMAC/s [A.2].

WP2‐040.060.010‐TD‐001 Revision : 1

2010‐03‐29 Page 15 of 23

Table 2 Correlator compute load for Dishes

A summary of the compute load for the correlator processing dish data is shown in Table 2. The added correlator load for always operating the dishes beyond 20km is small. Adding 12% to the capacity of the WPSPF correlator is sufficient to for all the proposed dish PAF and WBSPF modes. This added capacity increases the number of correlator shelves to 242.

9 Combined PAF and WBSP correlator The existing WPSPF fibres now carry either WBSPF data or 3/4 of the data for the PAF. An extra 2000 fibres are needed for the rest of the PAF data. These fibres could carry data for 6 of the 36 beams at the final frequency and quantisation resolution needed by the correlator. Eleven filterbank shelves are needed to cross connect the data from the 2000 fibres even though the compute capacity of these shelves is not needed. However, for these 11 shelves the data on each output fibre is close to 100Gb/s as there is data reduction in the filterbank shelves. There are now 59 filterbank shelves and each shelf routes a single 100G fibres to each of the 242 correlator shelves. If this number of shelves is excessive then higher capacity correlator shelves are required. The number of possible modes complicates the operation of the correlator but an FPGA, GPU or CPU based correlators are easy to reconfigure. Alternatively, a combination of these for control and data routing with ASICs to form the correlations is a possibility. The layout of the resulting correlator is shown in Figure 3. Also shown are the data paths and some of the processing for non‐imaging process. These are discussed in the next section.

WP2‐040.060.010‐TD‐001 Revision : 1

2010‐03‐29 Page 16 of 23

Figure 3 WBSPF and PAF correlator

10 NON‐IMAGING PROCESSING Non‐imaging processing involves all processing that generates astronomy data that does not result in an image being formed. It includes:

the generation of tied array beams,

searching for pulsars and other transient events in tied array beams,

incoherent summing of beam data as a simple way to detect strong transients in the field of view of an antenna beam,

storage of beam data in a transient buffer and the triggering of that buffer, and

retrieval and processing of the data in the transient buffer after a trigger

11 Tied Array Beams The data from all stations is first aggregated in the correlator. This makes the correlator a location where tied array beams can be implemented. In earlier parts of the system only part of the antenna data is present so partial tied array beams can be generated. These partial beams would then need to be summed separately. For example, in the WBSPF filterbanks data for up to 96 antennas is available and beam data from the 47 filterbanks needs to be aggregated and summed before a complete tied array beam is available. This increases the tied array data transported by a factor of 47 times compared to generating them in the correlator. The possible number of beams is limited there are fewer filterbank shelves than correlator shelves and most of the most of the output data capacity is used to connect to the correlator.

WP2‐040.060.010‐TD‐001 Revision : 1

2010‐03‐29 Page 17 of 23

The compute cost of a tied array beam is one complex multiply per input sample per beam. In FPGAs this could possibly be a 4+4 bit complex multiply using as single 18‐bit multiplier [8]. This gives station phasing with a maximum error of 4 degrees. For the correlator the compute cost per input sample is 2000 complex multiplies per input sample for PAFs and 3000 for WBSPFs (number of correlation divided by number of electrical inputs). If the tied array beamforming compute load is limited to a maximum of 10% of the correlation compute load then 200 tied array beams can be generated per PAF beam. This gives up to 36*200 = 7200 tied array beams for the dish with PAF system. This is a total of 4,000GHz of dual polarisation tied array beams. A 100Gb/s fibre can transport 3GHz of dual polarisation data so ~1300 fibres are needed. This corresponds to 6 fibres for each of the 242 PAF shelves. This number of output optical connections is within the capabilities of the proposed design with the limit being reached with over 15000 tied array beams.

For the WBSPF there are ~3000 complex multiplies per input sample. With a 10% added load there are 300 dual polarisation beams. This requires 900 fibres for signal transport. Four fibres per correlator shelf are needed to transport this.

Each fibre carries data for part of the full bandwidth of the correlator shelf for a subset of the tied array beams. In the case of WBSPF the bandwidth on each fibre is 42MHz. To generate a single tied array beam this data is aggregated. There are separate shelves to aggregate the data. Each of these shelves takes as input a single fibre from each correlator shelf. This requires six shelves with 242 fibres into each shelf. Inside these shelves the data is distributed across the processing boards so that each board receives data for the full bandwidth of the beams it is processing.

12 Transient and Pulsar processing With ~130Tb/s of tied array beam data it will be necessary to process the data in real time. The first operation that may be needed is a resampling of the data as the correlator is normally operating at a frequency resolution needed for imaging. In the case of the PAF the frequency resolution may be ~20kHz. For the detection of transients it may be necessary to bring the data back to the original sampling rate.

A major use of these beams is the detection of transients. For short time duration transients, dispersion smears the transient. It is proposed that the data is first de‐dispersed over a fairly coarse range of DM (dispersion measures) before being sent to transient detection. A simpler task is pulsar timing where de‐dispersion to a single DM is needed. For pulsar searching very high time resolution Fourier transforms are needed. It is proposed this operation occur in a separate shelf to the one that aggregates and resamples the data.

13 Incoherent beam forming Summing of beam power gives a field of view equal to that of an antenna beam but at a lower sensitivity compared to a tied array beam. It provides yet another way to detect transients. The computing cost for this processing is quite low. Assuming a 1ms dump of 1MHz bandwidth data then the beam power data is ~1000 times less than the voltage data. A single 100 Gb/s fibre can transport this data for each filterbank shelf and still have room to increase the dump rate or frequency resolution. This data is collected in a single shelf and summed. For dishes with PAFs strong transients can be detected over ~30 square degrees.

WP2‐040.060.010‐TD‐001 Revision : 1

2010‐03‐29 Page 18 of 23

14 Transient buffer and transient trigger The two‐stage filterbank method described here requires a DRAM to store data in between the coarse and fine filterbank. At the time of the SKA the cheapest DRAM module may hold 32Gbytes with a data rate of 16 Gbytes/s per DRAM. Half of this is used for input so the DRAMs hold ~4 seconds of data. Higher capacity DRAMs would increase this to tens of seconds. An alternative is to use less data precision and a separate transient buffer write to increase the buffer storage time by a factor of 2 to 8. The data paths for transient buffer data is shown in Figure 3.

The transient buffer is written continuously. Systems such as the tied array beams, incoherent beam processing, or external systems such as Xray satellites, provide a trigger to freeze most of the buffer. Part of the buffer is still needed for the operation of the filterbanks. In a multibeam system, this freeze would be on one or a limited number of beams. This buffer is useful for transients with time variation faster than a second. A couple of seconds of data covering the time of the transient is sufficient for most applications. For very high dispersion events a longer buffer is needed (Science input is needed to determine appropriate buffer sizes). However, the trigger needs to be applied before the data is lost from the buffer. For the local detection of transients a 4 second buffer is sufficient, but for transient detected by other means a longer buffer is needed. The possible paths by which the buffer might be triggered need further exploration by the SKA scientists. After a trigger the buffer, or appropriate parts of the buffer, is transferred to a CPU cluster and processed in non‐real time. This process could, for example, be the imaging of the data at high time resolution.

15 APERTURE ARRAY CORRELATORS This description closely follows the design already proposed for the AA [9] but uses 100G fibre links. It also addresses the proposal for correlating dense and sparse aperture in the frequency range that they overlap.

Dense and sparse aperture arrays have the same number of stations, so the correlator for all of them can be built using a common correlator shelf. Beamforming takes place at the station, so there is a choice between sending coarse filterbank data at ~8bit resolution or fine filterbank data ready for the correlator at 4‐bit resolution. With 4‐bit correlator data the 100Gb/s fibres from the station can carry 12 GHz of data. Assuming 1200 dual polarisation beams per station then each fibre could carry 5MHz of data for all beams. For AA hi the bandwidth is 600MHz requiring 120 fibres from each station. For AA lo the bandwidth is 400MHz and 80 fibres are needed. The large number of optical connections indicates that it will be better to transport 4‐bit correlator data.

If the arrays are operated separately a single fibre from each station carrying 5MHz bandwidth, could be the inputs to a correlator shelf. The compute load Cs for these correlator shelves is 7.5x1014 CMAC/s [A.2.1].

The aperture arrays overlap in the frequency range 300 to 450MHz. Forming the correlation between the two arrays in this frequency band improves sensitivity. In the frequency range 300 to 450MHz each correlator shelf would need 500 inputs to directly receive data from all aperture arrays. This is not possible by assumption 3. Thus a cross connect is needed at the correlator for data in the overlap region. If wavelength division multiplexing is used on the 100G fibres then a passive optical cross connect is possible. In the cross connect, all data in a 5MHz band is divided into

WP2‐040.060.010‐TD‐001 Revision : 1

2010‐03‐29 Page 19 of 23

1.25MHz bands, carrying data for all 500 inputs on each output fibre. The data in each 1.25MHz band is transported on 125 fibres. The correlation load for 500 inputs in a 1.25MHz band is identical to that for 250 inputs for a 5 MHz band. Hence, the same correlator shelf can be used to provide the required correlation capacity.

A possible configuration for this correlator is shown in Figure 4.

Figure 4 Possible configuration of AA correlators

In the non‐overlap region, each correlator shelf receives a single fibre from each station. There is a one‐to‐one correspondence between fibres from an antenna station and a correlator shelf. Hence there is one shelf for each 5MHz of bandwidth or 90 shelves for 450MHz of bandwidth from AA hi and 50 shelves for 250MHz of bandwidth from AA lo. In the 150MHz overlap region, the cross connect redistributes the data into 120 groups of 125 fibres. Each group carries 1.25MHz of data for all aperture arrays and connects to a single shelf. For transient processing the transient buffer data and incoherent beam data comes from the stations.

There are 260 correlator shelves compared to 242 shelves in the dish with WBSPF/PAF correlator, Figure 3. The cost of the correlator is ~$26‐$52M for the correlator shelves. The 50,000 100G fibre links from the antenna arrays add $10‐$50M. This does not include the cost of the optical crossconnect, filterbanks or NRE for an ASIC. Also within the antenna station beamformer will be the

WP2‐040.060.010‐TD‐001 Revision : 1

2010‐03‐29 Page 20 of 23

equivalent of the dish filterbank input fibres. These will probably carry data at a resolution of 8+8bits complex and will require the equivalent of 100,000 100G fibres, ~$20‐$100M.

A significant fraction of the correlator compute capacity is devoted to the overlap correlation. However AA hi will not be producing data in the overlap region all the time. In this case the overlap compute correlator is 50% utilised. To have the system fully utilise the correlator capacity the AA hi bandwidth is reduced to 300MHz when the overlap is in operation. When there is no overlap between AA hi and AA lo the overlap correlator shelves are split. In this case 30 shelves process AA lo data and 90 shelves AA hi data. Further modes are possible; with a different balance between the AA hi total bandwidth and the combined bandwidth. The total number of correlator shelves is reduced to 200. This configuration is shown below. However, the cross connect is more complex and it may not be possible to implement it optically. This may add 120 shelves of cross connection or switching hardware.

16 CONCLUSION

Figure 5 AA correlator with 150MHz combined AA lo and AA hi correlator or separated AA lo and AA hi correlators. When the combined correlator is operating then the AA hi bandwidth is reduced to 300MHz

WP2‐040.060.010‐TD‐001 Revision : 1

2010‐03‐29 Page 21 of 23

17 CONCLUSION The correlation requirements for a WBSPF and a PAF are similar. The PAF has more total bandwidth but there are more WBSPF antennas. Assuming both are not operating simultaneously, then the correlator can be shared. When the ~2000 antennas with PAFs are in operation, the longer baseline antennas can be operated independently with little extra cost in the correlator. The observing speed for VLBI is ~0.28 of the full SKA in this case.

For aperture arrays it is possible to connect the stations directly to a correlator shelf. But this is expected to fully utilise the input capacity of a correlator shelf. In the overlap region between AA lo and AA hi, data from both can be correlated. Science is maximised if there is a dedicated correlator for this overlap region. However it is not fully utilised when AA hi operates above 450MHz. Better correlator resource utilisation is achieved if a dual mode correlator is used, but AA hi bandwidth is reduced to 300MHz when AA hi and AA lo are combined.

The design also addresses non‐imaging data processing and demonstrates how transient buffers, tied array beams and incoherent processing can be implemented.

The designs given here are for a fairly arbitrary specification but all designs can be easily scaled. The design scales linearly with beams and bandwidth. Changing the number of antennas or AA stations would require a more detailed analysis.

18 ACKNOWLEDGEMENT The author would like to thank Neale Morison, Tim Bateman and Wallace Turner for their corrections. The author also thanks David Hawkins for his suggestions, checking the memo and in particular for uncovering a number of hidden assumptions that needed clarifying.

19 APPENDIX 19.1 Multiplication per Input Sample for a Polyphase Filter Bank

A polyphase filterbank consists of short FIR filters at the input to an FFT. The short FIR filters are of length ~10 and the cost of the FFT is ~3(log4(L)‐1) real multiplications per input sample, where L is the length of the FFT. By using the real and imaginary inputs for different signal the cost of the FFT is reduced to ~1.5(log4(L)‐1). For each input sample the cost in multiplications for the filterbanks Cfb is

Cfb = k [10 + 1.5(log4(L)‐1)]S

~ 20S for a 1000 channel filterbank

where k is the oversampling ratio for the filterbank ~1.2. There is also ~2 additions per multiplication in the FFT and one per multiply in the FIR filter. For simplicity the number of multiply accumulates MACs is made equal to the number of additions as it is the multiplies that are normally the limiting resource in ASICs and FPGAs.

19.2 Correlator compute load

1. Aperture Array

WP2‐040.060.010‐TD‐001 Revision : 1

2010‐03‐29 Page 22 of 23

For dense aperture arrays there are 1200 dual polarisation beams with a bandwidth of 0.6 GHz. For the 250 stations proposed the total correlator compute load CDAA is;

CDAA = 2502/2 baselines x 1200 beams x 0.6X109 GHz x 4 Stokes = 9x1016 CMAC/s.

Each 100G fibre from the antenna station can carry 6GHz of bandwidth for dual polarisation beam data to the correlator which is assumed to process 4bit complex data (6GHz x 2polx 4+4bits/Hz = 96Gb/s). The compute load for a correlator shelf CS with a single fibre from each station is;

CS = 2502/2 baselines x 6X109 GHz x 4 Stokes = 7.5x1014 CMAC/s.

The sparse aperture arrays process 0.4GHz of bandwidth and the correlator compute load CSAA is;

CSAA = 2502/2 baselines x 1200 beams x 0.4X109 GHz x 4 Stokes = 6 x1016CMAC/s

To include the correlations in the 150 MHz region there are 250 dense array stations to be correlated with 250 sparse array stations. This adds a correlator compute load CO of

CO = 250x250 baselines x 1200 beams x 0.15X109 GHz x 4 Stokes = 4.5x1016CMAC/s

The total correlator compute load is the sum of that for dense AA, spares AA and the overlap, CDAA+ CsAA+ C0 = 2.1x1016CMAC/s.

2. Phases Array Feeds

For dishes with phased array feeds there are 36 dual polarisation beams with a bandwidth of 0.6 GHz. For the 2000 antennas proposed the total correlator compute load CPAF is;

CPAF = 20002/2 baselines x 36 beams x 0.6X109 GHz x 4 Stokes = 1.73x1017 CMAC/s.

3. WBSPFs

For dishes with WBSPFs there are 2700 antennas with in 180 km of the core and for the antennas beyond this there are a total of 300 beams. The number of baselines is approximately that for 3000 separate antennas. With a bandwidth of 9 GHz the correlator compute load CWSPF is;

CWBSPF = 30002/2 baselines x 9X109 GHz x 4 Stokes = 1.62x1017 CMAC/s.

When the inner 2000 antennas are being operated with PAFs it is expected that all outer antenna stations will be beamformed and correlated separately. The larger the number of antennas beamformed the lower the compute load. For an upper limit, consider station with 12 antennas, with each station generating 4 beams. There are 133 stations and the total compute load CVLBI for the long baseline WBSPFs is;

CVLBI =1332/2 baselines x 4 beams x 9GHz x 4 Stokes = 1.3x1015 CMAC/s.

Another possible mode (180km mode) is correlation of the WBSPF from the 700 antennas in the range 20 to 180km from the core. This has a sensitivity that is ~25% of the full SKA for the same resolution but it still provides a useful adjunct while the PAFs are being used. The correlation load C180 for this mode is;

WP2‐040.060.010‐TD‐001 Revision : 1

2010‐03‐29 Page 23 of 23

C180 = 7002/2 baselines x 9GHz x 4 Stokes =8.8x1015 CMAC/s.

ASKAP STYLE SKA2 CORRELATOR CONCEPT DESCRIPTION€¦ · such as beamforming. It also does not cover...

Documents

Transcript of ASKAP STYLE SKA2 CORRELATOR CONCEPT DESCRIPTION€¦ · such as beamforming. It also does not cover...