microlab, NTUA, GR › ~specs › events › wrc2020 › files › Dimitrios Soudri… · microlab,...
Transcript of microlab, NTUA, GR › ~specs › events › wrc2020 › files › Dimitrios Soudri… · microlab,...
-
microlab, NTUA, GR
keywords▪ processing platforms▪ FPGA acceleration▪ vision-based navigation▪ …and more
Prof. Dimitrios SoudrisNational Technical Univ. of Athens
-
NTUA, 2019
1) acceleration in space applications ▪ survey & benchmarking of processing platforms
▪ comments based on past experience
2) application: autonomous rover navigation (SPARTAN)▪ project overview & results, future steps
3) application: spacecraft proximity operations (HIPNOS)▪ project overview & results, future steps
-
NTUA, 2019
Quick overview of our projects
-
NTUA, 2019
ESA activities
SPARTAN/SEXTANT/COMPASS (2011-2016)▪ Vision Based Navigation for Mars Rovers, with FPGA co-processor
▪ accelerated multiple odometry and 3D mapping algorithms
HIPNOS (2016-2017)▪ avionics solutions for high-performance tasks based on SoC-FPGA
▪ demo with pose estimation algorithm for Active Debris Removal
QUEENS (2017-2018)▪ tool-chain testing, benchmarking, hands-on demos of new EU FPGA
Radiation Testing of COTS FPGAs (2018)▪ Dec’17 at CERN test (SPS!) for SEE, Napoli (INFN) and ESTEC for TID
Porting of algorithms on new embedded devices (2019)▪ on Intel/Movidius Myriad2 DSP and on Nanoxplore NG-LARGE FPGA
-
NTUA, 2019
group platforms & compare (form clouds of results)
▪ CPU: space+embed similar, 1-2 orders slower than desktop
▪ GPU: 1-2 orders better performance/Watt than any CPU (vs desktop GPU, mobiles just trade 1 order of perf. for Watt)
▪ mutli-DSP: better performance/Watt than mobile-GPU
▪ FPGA: highest perf/Watt (by orders), almost highest perform.
-
NTUA, 2019
• FPGAs = best choice for HW acceleration, next to rad-hard CPU• outperform mobile GPUs and DSP multi-cores• allow for effective hardening even with COTS versions
• significant acceleration with limited power, e.g., P ≤ 10 Watts• 10x at system level (HW/SW) vs latest rad-hard CPUs• 100-1000x at function level vs conventional rad-hard CPUs
• necessary for future on-board processing• for high rate & high resolution images (e.g., unprepared docking)• for increased autonomy and faster exploration (e.g., rovers)• offload OBC, do data fusion, other real-time payload processing
• European technology catching up (rad-hard BRAVE FPGAs)
• started examining Myriad2 for space → also very interesting
-
NTUA, 2019
SPARTAN/SEXTANT/COMPASS(app= rover navigation)
-
NTUA, 2019
SPARTAN/SEXTANT/COMPASS projects (ESA)
❑ NTUA (GR), GMV (ES), FORTH (GR), DUTH (GR)
▪ HW/SW co-design of rover navigation algorithms
▪ emulate Martian scenarios and space-grade devices
▪ project time to 150 MIPS CPU, use limited FPGA resources
▪ synthetic datasets of Mars, real images of Mars-like terrains
✓ “localization” in 1sec, “mapping” in 20sec
April 2011 SPARTAN
(SPAring Robotics Technologies for Autonomous Navigation)
July 2013
April 2012 SEXTANT
(Spartan EXTensionActivity – Not Tendered)
May 2014
June 2014 COMPASS
(Code OptimisationModication
Partitioning)
-
NTUA, 2019
stereo camera
MER Rovers (2003) Curiosity (2012) ESA ExoMars (2020)
3D map
rover position (at every step)
Martian Rover
CPU
-
NTUA, 2019
highly complex Computer Vision algorithms low processing power CPUs (space-grade )➢ huge execution time, not very practical to use
➢MER rover: speed only 10 m/h with VO (124 without!)➢ used only on sand terrains and slopes (high slippage)
future: faster + more accurate (more complex!)
▪ considering our own/proposed CV algorithms:➢ 1 hour for 3D map on 150 MIPS CPU (budget = 20sec)➢ 1 minute for 1 step on 150 MIPS CPU (budget = 1sec)➢ looking for speed-up factors 10x to 1000x
-
NTUA, 2019
mapping mode (3D reconstruction of scene)
▪ use “navigation” camera (high-definition stereo)
▪ 1m above ground, 20cm baseline (parallel), tilted 39o
▪ generate 3d map of 4m radius (120o) in front of rover
▪ error < 2cm, execution time per map < 20 sec
-
NTUA, 2019
localization mode (6D pose of the rover)
▪ use “localization/hazard-avoidance” camera (stereo)
▪ 30cm above ground, 12cm baseline, tilted 31.55o, FoV 660
▪ rover stops every ~6cm to acquire new image (1Hz rate)
▪ estimate x-y-z position and pitch-roll-yaw of rover
▪ error < 2m after 100m path (2%, attitude
-
NTUA, 2019
generic rover geometry several CV algorithms multiple platforms
Xilinx Virtex6XC6VLX240T
- Intel Core 2 Duo- Executing C algorithms(time scaled to 150 MIPS)
- Calling FPGA accelerators
-
NTUA, 2019
synthetic videos
▪ 3DROV simulator
▪ mix of sand,rocks, diffuse lighting
▪ loc.: 512x384px
map.: 1120x1120px
real videos
▪ Atacama, Chile
▪ Devon, Canada
▪ Thrace, Greece
s1 s2 s3
r1 r2 r3
3DROV
-
NTUA, 2019
3D reconstruction(mapping)
▪ Disparity, Spacesweep
Visual Odometry(localization)
▪ Feature detection
▪ SURF, Harris, FAST
▪ Feature description
▪ SIFT, SURF, BRIEF
▪ Feature matching
▪ distances L1,L2,x2,Hamming
▪ Filtering and egomotion
▪ absolute orientation, LHMhistogram
of gradients
matching of histograms
image 2image 1
-
NTUA, 2019
Mapping Mode Functional Dependency
Laye
r 2
Laye
r 1
Laye
r 3
Laye
r 4
Functional Phase
Demo mapping
Imaging 3D Reconstruction
Debayer ContrastRectify disparity mapmergemapgen
Superimposistion
Edge detection Normalize
Absolutedifferences
Normalize ADs
Gaussian weight
Aggregation
Minimum disparity search
subpixel interpolation
Co
arse
-gra
in
anal
ysis
Fin
e-g
rain
an
alys
is
Component Mult-Div Add-sub-comp Typical param.debayer 6×W×H 4×W×H W×H=1120×1120contrast 2×W×H 2×W×H -rectify 8×W×H 6×W×H -edge detection 2×k2×W×H 2× (k2+20)
×W×Hk=13
superimposition 0 2×W×H -normalize 2×W×H 2×W×H -absolute differences 0 6×D×W×H D=200normalize ADs 2×D×W×H 2×D×W×H -aggregation 2×l2×D×W×
H2×l2×D×W×H l=19
min disparity search 0 2× (D+1) ×W×H -interpolation W×H 13×W×H -map generation 3×W×H 5×W×H -map merge 9×W×H 6×W×H -
-
NTUA, 2019
to FPGA: repetitive & computationally intensive functions to CPU: high program complexity & lightweight functions
-
NTUA, 2019
Target low-cost implementations▪ especially w.r.t. memory: bottleneck for CV on FPGA
▪ resource reuse: decompose input data, process successively
Target sufficient speed-up (for ESA specs) ▪ pipelining on pixel-basis
▪ burst read of image, transform on-the-fly (1 datum/cycle)▪ parallel memories & parallel processing elements
▪ parallel calculation of arithmetic formulas
Target configurability (tuning, adaptation)▪ parametric VHDL: data size, accuracy, parallelization,
-
NTUA, 2019
X1 X2 X3 X4
f(X)
-
NTUA, 2019
multiple accelerators on FPGA
▪ Disparity, Spacesweep
▪ SURF detector, SURF descriptor, SIFT descriptor, Harris, matching
➢ significant speedups 62x – 1111x
multiple CPU-FPGA pipelines
▪ 2 for mapping, 5 for localization
▪ speedup 16x – 444x, meet specs
3D map: accuracy 2cm at 4m depth, 120oFoV, 97% coverage
localization: accuracy 1.3m in 100m paths, and 5o in attitude
-
NTUA, 2019
Localization at 1sec▪ system speedup = 20x
Mapping at 8.4sec▪ system speedup = 444x
accelerators’ speedup
xc6vlx240t@172MHz vs. 150 MIPS CPU
▪SpaceSweep: 637x
▪Disparity: 120x
▪Harris detector: 75x
▪SURF detector: 56x
▪SIFT descriptor: 100x
▪SURF descriptor: 84x
▪SIFT matching: 180x
▪BRIEFmatching: 100x
▪Communication 81 Mbps
-
NTUA, 2019
FPGAs enable more complex/accurate/robust CV algorithms
▪ respecting given time constraints during rover traverse
FPGA acceleration would also enable bigger range for rovers
▪ acceleration= 10x at system level (HW/SW) vs rad-hard CPUs
▪ 100-1000x at function level vs conventional rad-hard CPUs
but, need considerable effort to optimize the design, especially when targeting resource-constrained space-grade devices
▪ efficiency-driven & demanding projects →manual coding▪ understand the algorithm (in-depth profiling/analysis)▪ design parallel architectures, think parallel
-
NTUA, 2019
HIPNOS(app= spacecraft proximity operations)
-
NTUA, 2019
High Performance Avionics Solution for Advanced and Complex GNC Systems
ESA, program GSP, low TRL
▪ scenario: VBN for e.Deorbit mission
▪ focus: avionics, COTS accelerators
▪ task: study & select best platform
▪ goal: design new avionics architect-ture + CV algorithm (estimate pose)
➢ accelerate 10x faster vs conventional
July 2016 HIPNOS October 2017
-
NTUA, 2019
Active Debris Removal missions
▪ chaser autonomously tracks/syncs with uncooperative target →VBN
▪ e.Deorbit (frozen), rendezvous this:
very high computational needs!
▪ real-time processing of HD images
▪ 1 Mpix at 5-10 fps at critical stages
▪ 10x more than rovers, + hard limits
▪ increased accuracy, e.g., 1% error
▪ no markers → complex algorithms
▪ but, short LEO mission →COTS?
ENVISAT2.5x2.5x10 m3
8 tons
2o/s spin
LEO
-
NTUA, 2019
RESOURCES
• tested on biggest Zynq7000 FPGA (xc7z100-2 of MMP)
• 36% LUTs, 48% DSPs, 77% RAMBs, Fmax>200MHz▪ most demanding is Renderer (94% logic of design)
• power≈4.5W (peak 9W) (CPU@667MHz, PS@200MHz)
• rough estimations for other FPGA devices
• xc7z045/xc7z030 (smaller): maybe feasible, requires much optimization, tolerable penalty in time/accuracy
• zu19eg (big upcoming RT): easy fit, utilization
-
NTUA, 2019
FROM TRADE-OFF STUDY
• latest space-grade CPUs 10x faster than predecessors, still slow for high-performance VBN (e.g., 0.1x)
• by offering best perf/Watt vs all platforms, FPGAs can bridge the gap with reasonable power (
-
NTUA, 2019
now using Myriad2 as SoC acceleration platform (new ESA pr.) board = EOT (from H2020)
▪ very small, lower power, very few I/Fs (+GPIOs!)
▪ now also being tested for radiation resilience (ESA)
-
NTUA, 2019
contact▪ Professor Dimitrios Soudris [email protected]
▪ Senior researcher George Lentaris [email protected]
▪ PhD student Kostantinos Maragos [email protected]
▪ PhD student Ioannis Stratakos [email protected]
▪ PhD student Ioannis Stamoulias [email protected]
▪ PhD student Vasileios Leon [email protected]
links▪ http://www.microlab.ntua.gr/
▪ https://microlab.ntua.gr/academics/dimitrios-soudris/
▪ http://users.uoa.gr/~glentaris
mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]://www.microlab.ntua.gr/http://users.uoa.gr/~glentaris