DC8251 - SYNTHETIC DATA FOR TRAINING DEEP LEARNING...
Transcript of DC8251 - SYNTHETIC DATA FOR TRAINING DEEP LEARNING...
NON-EXPORT CONTROLLEDTHESE ITEM(S) / DATA HAVE BEEN REVIEWED IN ACCORDANCE WITH THE
INTERNATIONAL TRAFFIC IN ARMS REGULATIONS (ITAR), 22 CFR PART
120.11, AND THE EXPORT ADMINISTRATION REGULATIONS (EAR), 15 CFR
734(3)(b)(3), AND MAY BE RELEASED WITHOUT EXPORT RESTRICTIONS.
HARRIS.COM | #HARRISCORP
Place image here
(13.33” x 3.5”)
UNCLASSIFIED
DC8251 - SYNTHETIC DATA FOR TRAINING DEEP LEARNING REMOTE SENSING ALGORITHMS
WILL RORRER, PRODUCT MANAGER
Nvidia GPU Technology Conference – 22 – 24 Oct 2018
| 2DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
Agenda
• Background
• Humanitarian Aide and Disaster Relief (HADR) Needs
• Harris Deep Learning
• The Label Data Burden
• Synthetic Training Data
| 3DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
“The GEOINT discipline has grown beyond the
limits of human interpretation and
explanation. The explosion of available data
diminishes the comparative advantage collection
provides. Instead, automated processing,
advancing tradecraft, human-machine
collaboration, and the ability to anticipate
behaviors will provide us a new advantage.”
Robert Cardillo,
Director of NGA
“We’re going to find ourselves in the not too
distant future swimming in sensors and drowning
in data”Lt. Gen. David A Deptula,
USAF Dep Chief of Staff for ISR 2010
"The skies will ‘darken’ with the hundreds of small
satellites to be launched by U.S. companies and
as procedures are developed to allow safe
operation of unmanned aerial vehicles in civil
airspace,"Robert Cardillo,
Director – NGA 2015
“So just how big is this rising tide? If we were to attempt to manually exploit the
commercial satellite imagery we expect to have over the next 20 years, we would
need eight million imagery analysts. Even now, every day in just one combat
theater with a single sensor, we collect the data equivalent of three NFL
seasons – every game. In high definition!
Imagine a coach trying to understand the strategy of his opponents by watching
every play made by every team in every game for three seasons – all in one single
day. Because three more seasons will be coming tomorrow. That’s what we ask
our analysts to do – when we don’t augment them with automation. But with all this
data – and dramatic improvements in computing power – we have a phenomenal
opportunity to do and achieve even more.”
Robert Cardillo,
Director – NGA 2017
A call to action: the urgency behind the adoption of AI for remote sensing
| 4DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
14M training images
1,000 object categories
A call to action: the urgency behind the adoption of AI for remote sensing
| 5DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
Establishment of the Joint AI Center – 27 June 2018
Key Points:
• Chartered by Deputy
Secretary of Defense
Patrick Shanahan
• “Overarching goal of
accelerating delivery of
AI-enabled capabilities,
scaling the Department-
wide impact of AI, and
synchronizing DoD AI
activities to expand Joint
Force advantages”
• Achieve goals by guiding
National Mission
Initiatives (NMIs)
| 6DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
• Humanitarian Assistance and Disaster Relief Mission
• Potential benefits:
• Detect emerging disasters
• Improve response
• Quantify impact
• Save lives
• Possible application:
• Automated satellite & airborne imagery analysis
JAIC National Mission Initiative:Developing and Applying AI for HADR
Hurricane Wildfire
Flood Earthquake
| 7DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
Natural Disaster Statistics
NOAA National Centers for Environmental Information (NCEI) U.S. Billion-Dollar
Weather and Climate Disasters (2018). https://www.ncdc.noaa.gov/billions/
• 238 $1B+ natural disaster events from 1980 – 2018 totaling $1.5T+
• 11 separate $1B+ events impacted US Jan – Sept 2018
• Large scale events requiring large scale response
| 8DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
Examples of Harris Applications of Deep Learning
| 9DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
Analytic pipeline and label data paradox
Model
Governance
Model
Application
Model
Refinement
Manage
Observations
Higher Order
Sense Making
Increased Volume and Usage
Traditional Hand-
Constructed
Algorithms / Analytics
Basic Computer Vision
Algorithms / Analytics
Supervised Deep
Learning Algorithms /
Analytics
Unsupervised Deep
Learning Algorithms /
Analytics
Supervised deep learning based algorithms represent the state
of the art and are ready for widespread adoption IF the label
data burden can be overcome
Expert Intensive &
Mediocre Accuracy
Expert Intensive &
Some Accuracy Improvement
Less Expert Intensive &
Large Accuracy Improvement,
BUT Label Data Hungry
Technology Not Mature
Goal is Zero Label Data
Still Data Hungry
| 10DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
Traditional approaches for handling the label data burden
Manual Harvesting of Label Data ( Individual or Crowdsourced ) Positives Negatives
| 11DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
Traditional approaches for handling the label data burden
Group Random Chips by Semantic Similarities
CURATE
| 12DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
Traditional approaches for handling the label data burden
Public data sets• Natural Imagery:
‒ Common Objects in Context (COCO)http://cocodataset.org/
‒ Pattern Analysis, Statistical Modelling and Computational Learning Visual Object Classes (PASCAL VOC)
http://host.robots.ox.ac.uk/pascal/VOC/index.html
‒ ImageNet http://www.image-net.org/
• Overhead Imagery:
‒ Cars Overhead with Context (COWC)https://gdo152.llnl.gov/cowc/
‒ SpaceNethttps://wwwtc.wpengine.com/spacenet
‒ xViewhttp://xviewdataset.org/
| 13DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
xView Dataset
| 14DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
Grouping of Random Chips
Pros
Pros and cons of traditional label data approaches
Cons
Public Datasets
Targeted Manual Labeling
• Minimal upfront work to begin
training on classes
• Starting point for transfer
learning
• Generate large number of
coarsely labeled chips quickly
• Staring point for transfer
learning
• Label by label human-level
accuracy
• ‘Scalable’ with crowdsourcing
• Starting point for transfer
learning
• Limited to datatypes, classes,
and conditions included in the
dataset
• Requires significant manual
curation after grouping
• Limited to classes and
conditions present in the data
• Time consuming
• Limited to classes and
conditions present in the data
Method
| 15DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
Motivations for a different label data source
Synthetic
Label Data
Algorithm RobustnessRare Events
Rapid Algorithm Development Chain of Custody
https://www.wired.com/story/machine-learning-backdoors/
| 16DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
Answer: “Good” Labeled Data
To Define what a ‘Good’ label dataset is, first define how the desired algorithm is expected to be used
An example: the ubiquitous ‘Airplane Finder’
• If the algorithm is only expected to be applied to a very narrow distribution of images to make detections, a relatively narrow distribution of labeled training data is needed
• HOWEVER, if the algorithm is expected to be applied to a very wide distribution of images to make detections, a robust distribution of labeled training data is needed
A = brittle, B = brittle, C = robust = valuable
Algorithm robustness is largely driven by training label data robustness
Motivation: Algorithm Robustness What makes a “good” deep learning algorithm?
| 17DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
Variations to Consider – Collection Geometry
Angles These identify the angle at which the sensor is imaging the ground, as well as the angular location of the sun with respect to the ground and image. These features can be added without preprocessing. The following angles are provided:
Off-nadir Angle Angle in degrees (0-90∘) between the point on the ground directly below the sensor and the center of the image swath.
Target Azimuth Angle in degrees (0-360∘) of clockwise rotation off north to the image swath’s major axis.
Sun Azimuth Angle in degrees (0-360∘) of clockwise rotation off north to the sun.
Sun Elevation Angle in degrees (0-90∘) of elevation, measured from the horizontal, to the sun.
| 18DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
High Off Nadir Image Examples
Off nadir angle: 51°Off nadir angle: 32.7°Off nadir angle: 34.5°
Off nadir angle: 61°Off nadir angle: 61°
Increased deep learning algorithm robustness requires
exposure to a wide range of collection conditions
| 19DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
Variations to Consider – Many Different Sensor Models
The democratization of space
• Many new sensors flying – offering much more persistent coverage
• However, this results in many different sensor models each with their own characteristics
• To make deep learning algorithms robust, they will need exposure to these varieties of sensor models
Increased deep learning algorithm
robustness requires exposure to or ability
to quickly adapt to multiple sensor models
| 20DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
Algorithm RobustnessExample of real sensor and collection geometry variation
0
50
100
150
200
250
300
350
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 22.5 25.0 27.5 30.0 More
Off Nadir Angle
AOI Off Nadir Angle
| 21DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
Why is so much label data needed?
Training Data Variation Variation in Data to be Analyzed
Target
Variation
Background
Variation
Collection
Variation
Sensor
Variation
Target
Variation
Background
Variation
Collection
Variation
Sensor
Variation
Target
Variation
Background
Variation
Collection
Variation
Sensor
Variation
Target
Variation
Background
Variation
Collection
Variation
Sensor
Variation
Single / Very Few Labeled Data
Larger Collection of Manual / Crowd Sourced Labeled Data
Variation in Training Data = Variation in Data to be Analyzed
Target
Variation
Background
Variation
Collection
Variation
Sensor
Variation
Target
Variation
Background
Variation
Collection
Variation
Sensor
Variation
Brittle Algorithm Performance Window
Broader Algorithm Performance Window
Robust Algorithm Performance Window
| 22DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
Prior to launch of a space based imaging systems, Harris generates imagery that simulates what the sensor will produce when in operations
A new approach to label data
Harris has decades long legacy providing high fidelity, physics-based, radiometrically correct remote sensing modelling and simulation services
| 23DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
DIRSIG
10 am, 7 degree look angle, Jan 1, Scene
Azimuth 0
2 pm, 7 degree look angle, Jan 1, Scene
Azimuth 225
Harris’ work to scale deep learning for defensesource information – synthetic label data generation
• 100% of training data synthesized using CAD models and Scene Simulator
• The trained model is applied to real imagery
• Successful detector produced for fighter jets in WV-2 Pan imagery
• Limiting factors: (1) content of scene generator and (2) quality of simulation
6 CAD models used
Objects placed in scene at
various geometries
Heat Map for fighter jets in IKONOS
Pan Imagery
| 24DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
Automated synthetic labeled training data vision
Scene Modeling Collection Modeling
Automated Data Generation Workflow
Order of Battle
• Air
• Ground
• Naval
• Urban
Background Materials
• Concrete
• Asphalt
• Crushed Stone
• Dirt
• Vegetation
• Metal
• Plastic
• Glass
• Sand
Target Classes
• Planes
• Vehicles
• Ships
• People
• Facilities
Target Types
• Commercial
• Consumer
• Military
Target Configurations
• Open / Obscured
• Orientation
Scenarios
• Formations
• Specific Routes
• Dynamics
Atmosphere
• Tropical
• Desert
• Clouds
• Sun Conditions
Platform / Sensor Type
• Array Size
• Bandpass
• Sampling
• Scan Type
Platform Motion
Scene Location
Truth Generation
Sensor Modeling
Noise
MTF
• Optics
• Detector
Exposure
• Integration Time
Sensor Artifacts
• Failed Detectors
• Non-Uniformity
Ground Processing
• DRA
• Sharpening
• Registration Effects
• GANs
| 25DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
Pipeline for rapid build of new deep learning algorithms
Select Target
of Interest
Synthesize
Training Data
Manage Data Train DL
Algorithm
Model
Governance
Apply Model Refine Model Manage
Observations
Higher Order
Sense Making
1 2 3 4 5 6 7 8 9
Hydra
Deep
Learning
Frameworks
• CAD models of
target of interest
• RIT DIRSIG
• Harris LYNX
• Scene generation
• Object insertion
• Augmentation
• Output physicals
based synthetic
training images
• Label data from
movers
• System that
ingests and
manages all the
training data in a
method in which DL
algorithms can
access
• Positives
• Negatives
• Hard Positives
• Hard Negatives
• Data Curation
• Selected
framework on
backend
• GSF web
interface to execute
training
• Training results
presented
• Time to train
presented
• Load newly
trained model into
algorithm
marketplace and
registered with
algorithm
governance
• Multiple
algorithms
registered, Harris
made as well as 3rd
party
• Using
Hydra/DAGR
imagery is passed
to the model for
detections to be
made
• Using DAGR
demo the ability to
evaluate true/false
positives, and
true/false negatives
• Understanding
information from
movement
• Modify training set
• Update curation
• Data curation
• Observations
managed by Hydra
/ DAGR
• Activity pattern
recognition based
on movement alone
• Correlation of PIA
info
• Correlation of
other INTs (SIGINT)
LYNX
DIRSIG
| 26DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
Test scenario for synthetic pipeline buildout
| 27DC8251 - Synthetic Label Data for Training
Deep Learning Remote Sensing AlgorithmsNON-Export Controlled Information UNCLASSIFIED
1 – Synthetic training data pipeline refinement
• Workflow focused
• Interfaces
• Usability
• Scalability
• Highside / Lowside
2 – Performance characterization
• Establish which variations have biggest and least impact to CNN performance
• Leverage benchmark ‘real’ data trained CNN’s to compare performance of CNN’s trained with synthetic
• Tune synthetic pipeline accordingly
3 – 3rd party evaluation
• Comparison of different neural net architectures on performance when trained on synthetic data
On-Going R&D, Next Steps
Will RorrerMachine Learning Product Manager
571-550-0580
Trademarks are registered marks of their respective companies.