ADAPTIVE OPTICS REAL-TIME CONTROL SYSTEMS FOR THE E...

Adaptive Optics for Extremely Large Telescopes III

ADAPTIVE OPTICS REAL-TIME CONTROL SYSTEMS FOR THE E-ELT

N.A.Dipper1,a, A.Basden1, U.Bitenc1, R.M.Myers1, A.Richards1 and E.J.Younger1 1Durham University, Centre for Advanced Instrumentation, Department of Physics, South Road, Durham, DH1 3LE, UK

Abstract. The next generation of large telescopes will depend critically on Adaptive Optics. The instrumentation now proposed for the E-ELT makes substantial demands on computing power for real-time control. These demands will be met by a combination of novel algorithms and the use of new developments in the world of high power computing. This poster summarises the developments made in meeting this challenge at the CfAI in Durham and our research and development plan over the next few years. We will demonstrate what can be done on an ELT scale with existing hardware (FPGA, GPU, CPU) and to what aspects of the real-time control system these technologies are best applied. In addition, we report on initial attempts at Durham to abstract the hardware from the software, using the high-level language OpenCL. This will be critical to making software for the E-ELT ‘future proof’ allowing for the introduction of new computing technology that will emerge over the long development period of E-ELT instrumentation.

1. Introduction A major part of the research program of the Centre for Advanced Instrumentation CfAI) is in the field of Astronomical Adaptive Optics (AO). As a part of this program, we have produced low latency real-time control systems for various AO systems culminating in the Durham AO Real-time Controller (DARC) [1,2]. The DARC software has been developed to run on multiple CPU workstations with options for the acceleration of critical elements using Field Programmable Gate Arrays (FPGA) and Graphical Processing Units (GPU). DARC provides the real-time control system (RTC) for the CANARY on-sky AO system [3,4] designed to demonstrate the techniques required for AO on the European Extremely Large Telescope (ELT). It will also provide the RTC for the Durham laboratory based AO bench, DRAGON described briefly in section 2.

The program of RTC development at Durham is being performed in collaboration with several other European institutions and with ESO. This collaboration is investigating acceleration hardware and software development tools with the goal of down selecting these to establish a standard for the development of an RTC for E-ELT instrumentation. Some of the guiding principles of this investigation are:

• Use of commercial ‘off-the-shelf’ hardware wherever possible

• Likely to be a heterogeneous computing environment

• Use of standard libraries wherever possible – maintained and updated elsewhere

• Abstract the software from the hardware to allow for future upgrades – EG Via OpenCL

a e-mail : [email protected]

Third AO4ELT Conference - Adaptive Optics for Extremely Large TelescopesFlorence, Italy. May 2013ISBN: 978-88-908876-0-4DOI: 10.12839/AO4ELT3.13267


2. The DRAGON Laboratory AO Test Bench The DRAGON test bench is currently under construction at the CfAI in Durham [4,5]. It is designed to provide an environment in which AO techniques required for proposed E-ELT instrumentation can be tested in the laboratory. In brief summary, DRAGON provides the following facilities:

• 3 off Natural Guide Stars (NGS). These can be arbitrarily positioned within a 3’ field of view

• 4 off Laser Guide Stars (LGS) emulated by a fluorescent cell and providing analogues to spot elongation and uplink turbulence.

• Deformable Mirror (DM) correction path options:

o Xinetics DM: 97 actuators; 67 mm aperture; closed loop

o Boston Kilo DM: 1024 actuators; 10 mm aperture; Open or closed loop

• Wave-front Sensors (WFS):

o 3 off NGS Shack Hartmann of 31x31 sub-apertures; Bobcat camera; 480x640 pixels; max 250 Hz

o 4 off LGS Shack Hartmann of 31x31 sub-apertures; Camera to be defined; about 1kx1k pixels; Initially 250 Hz with a goal of 1000 Hz

• Truth sensor: Shack Hartmann 31x31 sub-apertures; Bobcat camera; 480x640 pixels; max 250 Hz

• Science camera: sBig 8.3 Mpixel to view full 3 arc minute field or smaller sub-field

• Atmospheric turbulence emulation via rotating phase screens at 3 different (and variable) altitudes

3. The DRAGON RTC The DRAGON RTC is being designed specifically to investigate techniques that will be required for E-ELT AO RTC systems. This involves upgrades to DARC to provide further acceleration using emerging hardware architectures and the integration of software to simulate E-ELT hardware sub-systems that are as yet unavailable. The proposed overall architecture of the system is shown in Figure 1.

3.1. Hardware The DARC software currently runs on a CPU based server with optional hardware acceleration using NVidia GPUs. Much of the software has now been ported to run on GPUs. Several servers are available with the following specifications:

Mark 1 – Tesla Fermi GPU technology

• Supplier: Workstation Specialists

• 2 off Intel Xeon 5600 series 6-core CPUs with hyperthread technology

• 8 off 16 lane PCIe Gen 2.0 slots

• 3 off NVIDIA Tesla Fermi C2070 cards with 6GB memory each

• Total of 1344 GPU cores (3x448)

Third AO4ELT Conference - Adaptive Optics for Extremely Large Telescopes


Figure 1 – The DRAGON Bench RTC Architecture

Mark 2 – Tesla Kepler GPU technology and Xeon Phi

• Supplier: Supermicro – 7047GR-TPRF workstation

• 2 off Intel Xeon E5-2650 8-core processors with hyperthread technology

• 4 off 16 lane PCIe Gen 3.0 slots



• 2 off NVIDIA Tesla K20X (Kepler) cards with 6GB memory each

• 250 GB/s GPU memory bandwidth

• Total of 5376 GPU cores (2x2688)

• 1 off Xeon Phi 5110 with 61 cores, 8 GB Memory and 320 GB/s bandwidth

The RTC also has options for handling pixel data in FPGA. This is used for input from test cameras using the sFPDP protocol to Curtiss Wright 03F card hosting a Virtex-2 pro FPGA. As part of the CfAI development program, we are investigating several more recent FPGA cards for pixel data handling with the goal of producing a ‘smart camera’ capable of being configured to handle some or indeed all of the RTC data pipeline. Current development systems are:

• Xilinx SP605 – Spartan series 6 development system

• Xilinx KC705 – Kintex series 7 development system

• National Instruments FlexRIO for LabViewFPGA

The use of FPGAs can provide customized systems with very low latency for very high order systems . The role of FPGAs in the E-ELT will depend on the evolution of development tools that speed up the production of FPGA firmware. Current systems at CfAI have been programmed using the Xilinx ISE tool-chain. We are now evaluating the new Xilinx ‘Vivado’ tool chain with High Level Synthesis (HLS). We have also made use of LabView to produce IP cores very rapidly for pixel handling on National Instruments hardware.

3.2. Software Development Porting RTC software, or parts of it, to accelerator hardware often involves considerable re-writing and development effort. This is particularly true for the production of FPGA firmware. The ideal situation would be for a developer to write code in the C language and simply re-compile it to run on accelerator hardware. This is not (yet?) possible. The most popular accelerators in recent years have been based on NVidia GP-GPU cards. In order to move code designed for a CPU based system to these devices, one must port the (usually C based) code into the CUDA language. This involves a learning curve and thus considerable development time. It also ties the code to hardware from a single manufacturer. There are several extensions to the C language that provide it with more portability. Perhaps the most popular of these is OpenCL. This is a standard maintained by Khronos Group. Many hardware acceleration manufacturers are now providing OpenCL support for their cards that allows programs written in OpenCL to run with minimal changes.

In order to investigate the applicability of OpenCL to RTC data pipelines, we have selected GPU hardware from two different manufacturers and developed an OpenCL implementation of a full AO data pipeline consisting of pixel calibration, Shack Hartmann centroiding and Matrix-Vector-Multiplication (MVM), as used for wave-front reconstruction in most AO RTCs. This code ran on both cards with no changes apart from identifying the hardware in use. The two cards tested were:

• NVidia Tesla C2070 – 448 cores, 144 GB/s bandwidth

• AMD Radeon HD 7970 – 2048 cores, 264 GB/s bandwidth

As well as demonstrating that hardware independent code can be produced using OpenCL, we were primarily interested in the efficiency of OpenCL code when compared to code optimized for the specific hardware (using CUDA). The NVidia card can run either code so comparisons were made on that card. We simulated Shack-Hartmann data from detectors with from 2x2 to 64x64 sub-apertures and from 2x2 to 16x16 pixels per sub-aperture. and measured the execution time of the full pipeline. These results are shown in Figure 2 for code written using CUDA and in Figure 3 for code written using OpenCL.



Figure 2 – Execution times for the full system on NVidia C2070 using CUDA. (a) shows systems with

smaller numbers of sub-apertures, (b) shows systems with larger numbers of sub-apertures.

It can be seen that the OpenCL code performs at a very similar level to the CUDA code thus demonstrating that, for this application, very little if any efficiency is sacrificed in using hardware independent code written in OpenCL. It should be noted that the large steps in all of this data occur when the system size expands to require further blocks of resources within the GPU.

These results were based on a full AO system including pixel calibration, sub-aperture centroiding and MVM wave-front reconstruction. It is of interest to know which of these processes (if any) dominates in execution time. This varies depending on the size of the system. We have plotted the relative contributions of the three processes for a small system of 14x14 sub-apertures in Figure 4 and for a large system of 64x64 sub-apertures in Figure 5. The data is shown for the OpenCL code and for the CUDA code where appropriate. We have also included data here for a low budget NVidia GTX 580 GPU card for comparison. (These ‘GForce’ cards are designed for computer gaming rather than high power computing but have similar specifications.) It can clearly be seen that, for a system based on GPUs, no process dominates for small systems whereas at larger E-ELT scale systems, the MVM wave-front reconstruction completely dominates. This is important when considering to which process to apply the bulk of the acceleration capability of an RTC.



Figure 3 – Execution times for the full system on NVidia C2070 using OpenCL. (c) shows systems

with smaller numbers of sub-apertures, (d) shows systems with larger numbers of sub-apertures.

These execution times represent a measure of the latency in this system. Note that the data has been simulated so that these figures do not include any camera readout times. As well as latency, it is important to measure the jitter on the latency in these GPU based systems. This is expected to be very small and much smaller and more predictable than that of a CPU based system. We measured the spread in execution times for one part of the AO pipeline, the image calibration. 10000 execution times were measured for identical OpenCL code running on both the NVidia C2070 and the AMD Radeon HD 7970 cards. These times include the duration of the data transfer to the GPU and are for images of size 1664x1664 pixels. The results for the NVidia card, shown in Figure 6 are much as expected with typical jitter spread of <<1% with no ’outliers’ - occasional larger latency values due to operating system delays often seen in CPU based systems.

The results for the AMD card were unexpected. The jitter was generally higher than in the NVidia card, with a bi-modal distribution and substantial outliers as shown in Figure 7. The reasons for this structure in the jitter for the AMD card are unknown and still being investigated. It is possible that they are an artifact of the AMD card drivers and may be eliminated in later versions.



Figure 4 – Relative execution times for the three data pipeline processes for 3 different GPU cards (using OpenCL and CUDA where appropriate) for a ‘small’ system with 14x14 sub-apertures. No

process dominates.

Figure 5 – Relative execution times for the three data pipeline processes for 3 different GPU cards (using OpenCL and CUDA where appropriate) for a ‘large’ system with 64x64 sub-apertures. The

MVM process dominates.

4. Conclusions It is clear that, to meet the requirements of AO instrumentation on the E-ELT, a heterogeneous computer hardware architecture will be required. A program of investigation of suitable hardware is underway in the form of a collaboration of many institutions both in the UK and elsewhere in Europe, coordinated by ESO. This paper describes the current state of work in this field at the CfAI at Durham University.

The abstraction of AO data pipeline software from the hardware is critical to the porting of software to new hardware that will be available before first light on the E-ELT. As a first step towards this abstraction, we have tested code written in OpenCL that can run without modification on two different GPU cards that are currently available from two different manufacturers. The main result of this testing



has been the demonstration that code written in OpenCL can achieve a very similar efficiency on an NVidia GPU card to that written in the manufacturer-specific system CUDA.

Figure 6 – Histogram for the calibration of 10000 1664x1664 images for OpenCL code running on

the NVidia C2070 card.

Figure 7 – Histogram for the calibration of 10000 1664x1664 images for OpenCL code running on

the AMD Radeon HD 7970 card. The inset shows the ‘outliers’ at a larger scale.

5. References 1. A. Basden, D. Geng, R. Myers, E. Younger, Applied Optics, 49, pp. 6354, (2010) 2. A. Basden and R. Myers, MNRAS, 424, pp. 1483, (2012) 3. R. Myers, Z. Hubert, T. Morris et al, Proc. SPIE 7015, pp. 70150E, (2008) 4. E. Gendron, F. Vidal, M. Brangier et al, A&A, 529, pp. L2, (2011) 5. A. Reeves, R. Myers, T. Morris, A. Basden, N. Bharmal, This conference, (2013) 6. A. Reeves, R. Myers, T. Morris, A. Basden, N. Bharmal, S. Rolt, D. Bramall, N. Dipper, Proc. SPIE 8447, Adaptive Optic Systems III, (2012)


ADAPTIVE OPTICS REAL-TIME CONTROL SYSTEMS FOR THE E...

Documents

Transcript of ADAPTIVE OPTICS REAL-TIME CONTROL SYSTEMS FOR THE E...