microlab, NTUA, GR › ~specs › events › wrc2020 › files › Dimitrios Soudri… · microlab,...

29
microlab, NTUA, GR keywords processing platforms FPGA acceleration vision-based navigation …and more Prof. Dimitrios Soudris National Technical Univ. of Athens [email protected]

Transcript of microlab, NTUA, GR › ~specs › events › wrc2020 › files › Dimitrios Soudri… · microlab,...

  • microlab, NTUA, GR

    keywords▪ processing platforms▪ FPGA acceleration▪ vision-based navigation▪ …and more

    Prof. Dimitrios SoudrisNational Technical Univ. of Athens

    [email protected]

  • NTUA, 2019

    1) acceleration in space applications ▪ survey & benchmarking of processing platforms

    ▪ comments based on past experience

    2) application: autonomous rover navigation (SPARTAN)▪ project overview & results, future steps

    3) application: spacecraft proximity operations (HIPNOS)▪ project overview & results, future steps

  • NTUA, 2019

    Quick overview of our projects

  • NTUA, 2019

    ESA activities

    SPARTAN/SEXTANT/COMPASS (2011-2016)▪ Vision Based Navigation for Mars Rovers, with FPGA co-processor

    ▪ accelerated multiple odometry and 3D mapping algorithms

    HIPNOS (2016-2017)▪ avionics solutions for high-performance tasks based on SoC-FPGA

    ▪ demo with pose estimation algorithm for Active Debris Removal

    QUEENS (2017-2018)▪ tool-chain testing, benchmarking, hands-on demos of new EU FPGA

    Radiation Testing of COTS FPGAs (2018)▪ Dec’17 at CERN test (SPS!) for SEE, Napoli (INFN) and ESTEC for TID

    Porting of algorithms on new embedded devices (2019)▪ on Intel/Movidius Myriad2 DSP and on Nanoxplore NG-LARGE FPGA

  • NTUA, 2019

    group platforms & compare (form clouds of results)

    ▪ CPU: space+embed similar, 1-2 orders slower than desktop

    ▪ GPU: 1-2 orders better performance/Watt than any CPU (vs desktop GPU, mobiles just trade 1 order of perf. for Watt)

    ▪ mutli-DSP: better performance/Watt than mobile-GPU

    ▪ FPGA: highest perf/Watt (by orders), almost highest perform.

  • NTUA, 2019

    • FPGAs = best choice for HW acceleration, next to rad-hard CPU• outperform mobile GPUs and DSP multi-cores• allow for effective hardening even with COTS versions

    • significant acceleration with limited power, e.g., P ≤ 10 Watts• 10x at system level (HW/SW) vs latest rad-hard CPUs• 100-1000x at function level vs conventional rad-hard CPUs

    • necessary for future on-board processing• for high rate & high resolution images (e.g., unprepared docking)• for increased autonomy and faster exploration (e.g., rovers)• offload OBC, do data fusion, other real-time payload processing

    • European technology catching up (rad-hard BRAVE FPGAs)

    • started examining Myriad2 for space → also very interesting

  • NTUA, 2019

    SPARTAN/SEXTANT/COMPASS(app= rover navigation)

  • NTUA, 2019

    SPARTAN/SEXTANT/COMPASS projects (ESA)

    ❑ NTUA (GR), GMV (ES), FORTH (GR), DUTH (GR)

    ▪ HW/SW co-design of rover navigation algorithms

    ▪ emulate Martian scenarios and space-grade devices

    ▪ project time to 150 MIPS CPU, use limited FPGA resources

    ▪ synthetic datasets of Mars, real images of Mars-like terrains

    ✓ “localization” in 1sec, “mapping” in 20sec

    April 2011 SPARTAN

    (SPAring Robotics Technologies for Autonomous Navigation)

    July 2013

    April 2012 SEXTANT

    (Spartan EXTensionActivity – Not Tendered)

    May 2014

    June 2014 COMPASS

    (Code OptimisationModication

    Partitioning)

  • NTUA, 2019

    stereo camera

    MER Rovers (2003) Curiosity (2012) ESA ExoMars (2020)

    3D map

    rover position (at every step)

    Martian Rover

    CPU

  • NTUA, 2019

    highly complex Computer Vision algorithms low processing power CPUs (space-grade )➢ huge execution time, not very practical to use

    ➢MER rover: speed only 10 m/h with VO (124 without!)➢ used only on sand terrains and slopes (high slippage)

    future: faster + more accurate (more complex!)

    ▪ considering our own/proposed CV algorithms:➢ 1 hour for 3D map on 150 MIPS CPU (budget = 20sec)➢ 1 minute for 1 step on 150 MIPS CPU (budget = 1sec)➢ looking for speed-up factors 10x to 1000x

  • NTUA, 2019

    mapping mode (3D reconstruction of scene)

    ▪ use “navigation” camera (high-definition stereo)

    ▪ 1m above ground, 20cm baseline (parallel), tilted 39o

    ▪ generate 3d map of 4m radius (120o) in front of rover

    ▪ error < 2cm, execution time per map < 20 sec

  • NTUA, 2019

    localization mode (6D pose of the rover)

    ▪ use “localization/hazard-avoidance” camera (stereo)

    ▪ 30cm above ground, 12cm baseline, tilted 31.55o, FoV 660

    ▪ rover stops every ~6cm to acquire new image (1Hz rate)

    ▪ estimate x-y-z position and pitch-roll-yaw of rover

    ▪ error < 2m after 100m path (2%, attitude

  • NTUA, 2019

    generic rover geometry several CV algorithms multiple platforms

    Xilinx Virtex6XC6VLX240T

    - Intel Core 2 Duo- Executing C algorithms(time scaled to 150 MIPS)

    - Calling FPGA accelerators

  • NTUA, 2019

    synthetic videos

    ▪ 3DROV simulator

    ▪ mix of sand,rocks, diffuse lighting

    ▪ loc.: 512x384px

    map.: 1120x1120px

    real videos

    ▪ Atacama, Chile

    ▪ Devon, Canada

    ▪ Thrace, Greece

    s1 s2 s3

    r1 r2 r3

    3DROV

  • NTUA, 2019

    3D reconstruction(mapping)

    ▪ Disparity, Spacesweep

    Visual Odometry(localization)

    ▪ Feature detection

    ▪ SURF, Harris, FAST

    ▪ Feature description

    ▪ SIFT, SURF, BRIEF

    ▪ Feature matching

    ▪ distances L1,L2,x2,Hamming

    ▪ Filtering and egomotion

    ▪ absolute orientation, LHMhistogram

    of gradients

    matching of histograms

    image 2image 1

  • NTUA, 2019

    Mapping Mode Functional Dependency

    Laye

    r 2

    Laye

    r 1

    Laye

    r 3

    Laye

    r 4

    Functional Phase

    Demo mapping

    Imaging 3D Reconstruction

    Debayer ContrastRectify disparity mapmergemapgen

    Superimposistion

    Edge detection Normalize

    Absolutedifferences

    Normalize ADs

    Gaussian weight

    Aggregation

    Minimum disparity search

    subpixel interpolation

    Co

    arse

    -gra

    in

    anal

    ysis

    Fin

    e-g

    rain

    an

    alys

    is

    Component Mult-Div Add-sub-comp Typical param.debayer 6×W×H 4×W×H W×H=1120×1120contrast 2×W×H 2×W×H -rectify 8×W×H 6×W×H -edge detection 2×k2×W×H 2× (k2+20)

    ×W×Hk=13

    superimposition 0 2×W×H -normalize 2×W×H 2×W×H -absolute differences 0 6×D×W×H D=200normalize ADs 2×D×W×H 2×D×W×H -aggregation 2×l2×D×W×

    H2×l2×D×W×H l=19

    min disparity search 0 2× (D+1) ×W×H -interpolation W×H 13×W×H -map generation 3×W×H 5×W×H -map merge 9×W×H 6×W×H -

  • NTUA, 2019

    to FPGA: repetitive & computationally intensive functions to CPU: high program complexity & lightweight functions

  • NTUA, 2019

    Target low-cost implementations▪ especially w.r.t. memory: bottleneck for CV on FPGA

    ▪ resource reuse: decompose input data, process successively

    Target sufficient speed-up (for ESA specs) ▪ pipelining on pixel-basis

    ▪ burst read of image, transform on-the-fly (1 datum/cycle)▪ parallel memories & parallel processing elements

    ▪ parallel calculation of arithmetic formulas

    Target configurability (tuning, adaptation)▪ parametric VHDL: data size, accuracy, parallelization,

  • NTUA, 2019

    X1 X2 X3 X4

    f(X)

  • NTUA, 2019

    multiple accelerators on FPGA

    ▪ Disparity, Spacesweep

    ▪ SURF detector, SURF descriptor, SIFT descriptor, Harris, matching

    ➢ significant speedups 62x – 1111x

    multiple CPU-FPGA pipelines

    ▪ 2 for mapping, 5 for localization

    ▪ speedup 16x – 444x, meet specs

    3D map: accuracy 2cm at 4m depth, 120oFoV, 97% coverage

    localization: accuracy 1.3m in 100m paths, and 5o in attitude

  • NTUA, 2019

    Localization at 1sec▪ system speedup = 20x

    Mapping at 8.4sec▪ system speedup = 444x

    accelerators’ speedup

    xc6vlx240t@172MHz vs. 150 MIPS CPU

    ▪SpaceSweep: 637x

    ▪Disparity: 120x

    ▪Harris detector: 75x

    ▪SURF detector: 56x

    ▪SIFT descriptor: 100x

    ▪SURF descriptor: 84x

    ▪SIFT matching: 180x

    ▪BRIEFmatching: 100x

    ▪Communication 81 Mbps

  • NTUA, 2019

    FPGAs enable more complex/accurate/robust CV algorithms

    ▪ respecting given time constraints during rover traverse

    FPGA acceleration would also enable bigger range for rovers

    ▪ acceleration= 10x at system level (HW/SW) vs rad-hard CPUs

    ▪ 100-1000x at function level vs conventional rad-hard CPUs

    but, need considerable effort to optimize the design, especially when targeting resource-constrained space-grade devices

    ▪ efficiency-driven & demanding projects →manual coding▪ understand the algorithm (in-depth profiling/analysis)▪ design parallel architectures, think parallel

  • NTUA, 2019

    HIPNOS(app= spacecraft proximity operations)

  • NTUA, 2019

    High Performance Avionics Solution for Advanced and Complex GNC Systems

    ESA, program GSP, low TRL

    ▪ scenario: VBN for e.Deorbit mission

    ▪ focus: avionics, COTS accelerators

    ▪ task: study & select best platform

    ▪ goal: design new avionics architect-ture + CV algorithm (estimate pose)

    ➢ accelerate 10x faster vs conventional

    July 2016 HIPNOS October 2017

  • NTUA, 2019

    Active Debris Removal missions

    ▪ chaser autonomously tracks/syncs with uncooperative target →VBN

    ▪ e.Deorbit (frozen), rendezvous this:

    very high computational needs!

    ▪ real-time processing of HD images

    ▪ 1 Mpix at 5-10 fps at critical stages

    ▪ 10x more than rovers, + hard limits

    ▪ increased accuracy, e.g., 1% error

    ▪ no markers → complex algorithms

    ▪ but, short LEO mission →COTS?

    ENVISAT2.5x2.5x10 m3

    8 tons

    2o/s spin

    LEO

  • NTUA, 2019

    RESOURCES

    • tested on biggest Zynq7000 FPGA (xc7z100-2 of MMP)

    • 36% LUTs, 48% DSPs, 77% RAMBs, Fmax>200MHz▪ most demanding is Renderer (94% logic of design)

    • power≈4.5W (peak 9W) (CPU@667MHz, PS@200MHz)

    • rough estimations for other FPGA devices

    • xc7z045/xc7z030 (smaller): maybe feasible, requires much optimization, tolerable penalty in time/accuracy

    • zu19eg (big upcoming RT): easy fit, utilization

  • NTUA, 2019

    FROM TRADE-OFF STUDY

    • latest space-grade CPUs 10x faster than predecessors, still slow for high-performance VBN (e.g., 0.1x)

    • by offering best perf/Watt vs all platforms, FPGAs can bridge the gap with reasonable power (

  • NTUA, 2019

    now using Myriad2 as SoC acceleration platform (new ESA pr.) board = EOT (from H2020)

    ▪ very small, lower power, very few I/Fs (+GPIOs!)

    ▪ now also being tested for radiation resilience (ESA)

  • NTUA, 2019

    contact▪ Professor Dimitrios Soudris [email protected]

    ▪ Senior researcher George Lentaris [email protected]

    ▪ PhD student Kostantinos Maragos [email protected]

    ▪ PhD student Ioannis Stratakos [email protected]

    ▪ PhD student Ioannis Stamoulias [email protected]

    ▪ PhD student Vasileios Leon [email protected]

    links▪ http://www.microlab.ntua.gr/

    ▪ https://microlab.ntua.gr/academics/dimitrios-soudris/

    ▪ http://users.uoa.gr/~glentaris

    mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]://www.microlab.ntua.gr/http://users.uoa.gr/~glentaris