microlab, NTUA, GR
keywords▪ processing platforms▪ FPGA acceleration▪ vision-based navigation▪ …and more
Prof. Dimitrios SoudrisNational Technical Univ. of Athens
NTUA, 2019
1) acceleration in space applications ▪ survey & benchmarking of processing platforms
▪ comments based on past experience
2) application: autonomous rover navigation (SPARTAN)▪ project overview & results, future steps
3) application: spacecraft proximity operations (HIPNOS)▪ project overview & results, future steps
NTUA, 2019
Quick overview of our projects
NTUA, 2019
ESA activities
SPARTAN/SEXTANT/COMPASS (2011-2016)▪ Vision Based Navigation for Mars Rovers, with FPGA co-processor
▪ accelerated multiple odometry and 3D mapping algorithms
HIPNOS (2016-2017)▪ avionics solutions for high-performance tasks based on SoC-FPGA
▪ demo with pose estimation algorithm for Active Debris Removal
QUEENS (2017-2018)▪ tool-chain testing, benchmarking, hands-on demos of new EU FPGA
Radiation Testing of COTS FPGAs (2018)▪ Dec’17 at CERN test (SPS!) for SEE, Napoli (INFN) and ESTEC for TID
Porting of algorithms on new embedded devices (2019)▪ on Intel/Movidius Myriad2 DSP and on Nanoxplore NG-LARGE FPGA
NTUA, 2019
group platforms & compare (form clouds of results)
▪ CPU: space+embed similar, 1-2 orders slower than desktop
▪ GPU: 1-2 orders better performance/Watt than any CPU (vs desktop GPU, mobiles just trade 1 order of perf. for Watt)
▪ mutli-DSP: better performance/Watt than mobile-GPU
▪ FPGA: highest perf/Watt (by orders), almost highest perform.
NTUA, 2019
• FPGAs = best choice for HW acceleration, next to rad-hard CPU• outperform mobile GPUs and DSP multi-cores• allow for effective hardening even with COTS versions
• significant acceleration with limited power, e.g., P ≤ 10 Watts• 10x at system level (HW/SW) vs latest rad-hard CPUs• 100-1000x at function level vs conventional rad-hard CPUs
• necessary for future on-board processing• for high rate & high resolution images (e.g., unprepared docking)• for increased autonomy and faster exploration (e.g., rovers)• offload OBC, do data fusion, other real-time payload processing
• European technology catching up (rad-hard BRAVE FPGAs)
• started examining Myriad2 for space → also very interesting
NTUA, 2019
SPARTAN/SEXTANT/COMPASS(app= rover navigation)
NTUA, 2019
SPARTAN/SEXTANT/COMPASS projects (ESA)
❑ NTUA (GR), GMV (ES), FORTH (GR), DUTH (GR)
▪ HW/SW co-design of rover navigation algorithms
▪ emulate Martian scenarios and space-grade devices
▪ project time to 150 MIPS CPU, use limited FPGA resources
▪ synthetic datasets of Mars, real images of Mars-like terrains
✓ “localization” in 1sec, “mapping” in 20sec
April 2011 SPARTAN
(SPAring Robotics Technologies for Autonomous Navigation)
July 2013
April 2012 SEXTANT
(Spartan EXTensionActivity – Not Tendered)
May 2014
June 2014 COMPASS
(Code OptimisationModication
Partitioning)
NTUA, 2019
stereo camera
MER Rovers (2003) Curiosity (2012) ESA ExoMars (2020)
3D map
rover position (at every step)
Martian Rover
CPU
NTUA, 2019
highly complex Computer Vision algorithms low processing power CPUs (space-grade )➢ huge execution time, not very practical to use
➢MER rover: speed only 10 m/h with VO (124 without!)➢ used only on sand terrains and slopes (high slippage)
future: faster + more accurate (more complex!)
▪ considering our own/proposed CV algorithms:➢ 1 hour for 3D map on 150 MIPS CPU (budget = 20sec)➢ 1 minute for 1 step on 150 MIPS CPU (budget = 1sec)➢ looking for speed-up factors 10x to 1000x
NTUA, 2019
mapping mode (3D reconstruction of scene)
▪ use “navigation” camera (high-definition stereo)
▪ 1m above ground, 20cm baseline (parallel), tilted 39o
▪ generate 3d map of 4m radius (120o) in front of rover
▪ error < 2cm, execution time per map < 20 sec
NTUA, 2019
localization mode (6D pose of the rover)
▪ use “localization/hazard-avoidance” camera (stereo)
▪ 30cm above ground, 12cm baseline, tilted 31.55o, FoV 660
▪ rover stops every ~6cm to acquire new image (1Hz rate)
▪ estimate x-y-z position and pitch-roll-yaw of rover
▪ error < 2m after 100m path (2%, attitude
NTUA, 2019
generic rover geometry several CV algorithms multiple platforms
Xilinx Virtex6XC6VLX240T
- Intel Core 2 Duo- Executing C algorithms(time scaled to 150 MIPS)
- Calling FPGA accelerators
NTUA, 2019
synthetic videos
▪ 3DROV simulator
▪ mix of sand,rocks, diffuse lighting
▪ loc.: 512x384px
map.: 1120x1120px
real videos
▪ Atacama, Chile
▪ Devon, Canada
▪ Thrace, Greece
s1 s2 s3
r1 r2 r3
3DROV
NTUA, 2019
3D reconstruction(mapping)
▪ Disparity, Spacesweep
Visual Odometry(localization)
▪ Feature detection
▪ SURF, Harris, FAST
▪ Feature description
▪ SIFT, SURF, BRIEF
▪ Feature matching
▪ distances L1,L2,x2,Hamming
▪ Filtering and egomotion
▪ absolute orientation, LHMhistogram
of gradients
matching of histograms
image 2image 1
NTUA, 2019
Mapping Mode Functional Dependency
Laye
r 2
Laye
r 1
Laye
r 3
Laye
r 4
Functional Phase
Demo mapping
Imaging 3D Reconstruction
Debayer ContrastRectify disparity mapmergemapgen
Superimposistion
Edge detection Normalize
Absolutedifferences
Normalize ADs
Gaussian weight
Aggregation
Minimum disparity search
subpixel interpolation
Co
arse
-gra
in
anal
ysis
Fin
e-g
rain
an
alys
is
Component Mult-Div Add-sub-comp Typical param.debayer 6×W×H 4×W×H W×H=1120×1120contrast 2×W×H 2×W×H -rectify 8×W×H 6×W×H -edge detection 2×k2×W×H 2× (k2+20)
×W×Hk=13
superimposition 0 2×W×H -normalize 2×W×H 2×W×H -absolute differences 0 6×D×W×H D=200normalize ADs 2×D×W×H 2×D×W×H -aggregation 2×l2×D×W×
H2×l2×D×W×H l=19
min disparity search 0 2× (D+1) ×W×H -interpolation W×H 13×W×H -map generation 3×W×H 5×W×H -map merge 9×W×H 6×W×H -
NTUA, 2019
to FPGA: repetitive & computationally intensive functions to CPU: high program complexity & lightweight functions
NTUA, 2019
Target low-cost implementations▪ especially w.r.t. memory: bottleneck for CV on FPGA
▪ resource reuse: decompose input data, process successively
Target sufficient speed-up (for ESA specs) ▪ pipelining on pixel-basis
▪ burst read of image, transform on-the-fly (1 datum/cycle)▪ parallel memories & parallel processing elements
▪ parallel calculation of arithmetic formulas
Target configurability (tuning, adaptation)▪ parametric VHDL: data size, accuracy, parallelization,
NTUA, 2019
X1 X2 X3 X4
f(X)
NTUA, 2019
multiple accelerators on FPGA
▪ Disparity, Spacesweep
▪ SURF detector, SURF descriptor, SIFT descriptor, Harris, matching
➢ significant speedups 62x – 1111x
multiple CPU-FPGA pipelines
▪ 2 for mapping, 5 for localization
▪ speedup 16x – 444x, meet specs
3D map: accuracy 2cm at 4m depth, 120oFoV, 97% coverage
localization: accuracy 1.3m in 100m paths, and 5o in attitude
NTUA, 2019
Localization at 1sec▪ system speedup = 20x
Mapping at 8.4sec▪ system speedup = 444x
accelerators’ speedup
xc6vlx240t@172MHz vs. 150 MIPS CPU
▪SpaceSweep: 637x
▪Disparity: 120x
▪Harris detector: 75x
▪SURF detector: 56x
▪SIFT descriptor: 100x
▪SURF descriptor: 84x
▪SIFT matching: 180x
▪BRIEFmatching: 100x
▪Communication 81 Mbps
NTUA, 2019
FPGAs enable more complex/accurate/robust CV algorithms
▪ respecting given time constraints during rover traverse
FPGA acceleration would also enable bigger range for rovers
▪ acceleration= 10x at system level (HW/SW) vs rad-hard CPUs
▪ 100-1000x at function level vs conventional rad-hard CPUs
but, need considerable effort to optimize the design, especially when targeting resource-constrained space-grade devices
▪ efficiency-driven & demanding projects →manual coding▪ understand the algorithm (in-depth profiling/analysis)▪ design parallel architectures, think parallel
NTUA, 2019
HIPNOS(app= spacecraft proximity operations)
NTUA, 2019
High Performance Avionics Solution for Advanced and Complex GNC Systems
ESA, program GSP, low TRL
▪ scenario: VBN for e.Deorbit mission
▪ focus: avionics, COTS accelerators
▪ task: study & select best platform
▪ goal: design new avionics architect-ture + CV algorithm (estimate pose)
➢ accelerate 10x faster vs conventional
July 2016 HIPNOS October 2017
NTUA, 2019
Active Debris Removal missions
▪ chaser autonomously tracks/syncs with uncooperative target →VBN
▪ e.Deorbit (frozen), rendezvous this:
very high computational needs!
▪ real-time processing of HD images
▪ 1 Mpix at 5-10 fps at critical stages
▪ 10x more than rovers, + hard limits
▪ increased accuracy, e.g., 1% error
▪ no markers → complex algorithms
▪ but, short LEO mission →COTS?
ENVISAT2.5x2.5x10 m3
8 tons
2o/s spin
LEO
NTUA, 2019
RESOURCES
• tested on biggest Zynq7000 FPGA (xc7z100-2 of MMP)
• 36% LUTs, 48% DSPs, 77% RAMBs, Fmax>200MHz▪ most demanding is Renderer (94% logic of design)
• power≈4.5W (peak 9W) (CPU@667MHz, PS@200MHz)
• rough estimations for other FPGA devices
• xc7z045/xc7z030 (smaller): maybe feasible, requires much optimization, tolerable penalty in time/accuracy
• zu19eg (big upcoming RT): easy fit, utilization
NTUA, 2019
FROM TRADE-OFF STUDY
• latest space-grade CPUs 10x faster than predecessors, still slow for high-performance VBN (e.g., 0.1x)
• by offering best perf/Watt vs all platforms, FPGAs can bridge the gap with reasonable power (
NTUA, 2019
now using Myriad2 as SoC acceleration platform (new ESA pr.) board = EOT (from H2020)
▪ very small, lower power, very few I/Fs (+GPIOs!)
▪ now also being tested for radiation resilience (ESA)
NTUA, 2019
contact▪ Professor Dimitrios Soudris [email protected]
▪ Senior researcher George Lentaris [email protected]
▪ PhD student Kostantinos Maragos [email protected]
▪ PhD student Ioannis Stratakos [email protected]
▪ PhD student Ioannis Stamoulias [email protected]
▪ PhD student Vasileios Leon [email protected]
links▪ http://www.microlab.ntua.gr/
▪ https://microlab.ntua.gr/academics/dimitrios-soudris/
▪ http://users.uoa.gr/~glentaris
mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]://www.microlab.ntua.gr/http://users.uoa.gr/~glentarisTop Related