AIAA Conference - Big Data Session_ Final - Jan 2016

25
L A N G L E Y R E S E A R C H C E N T E R Big Data Analytics and Machine Intelligence Capability Overview Manjula Ambur and Data Analytics Team Presentation at AIAA SciTech Conference January 2016

Transcript of AIAA Conference - Big Data Session_ Final - Jan 2016

Page 1: AIAA Conference - Big Data Session_ Final - Jan 2016

L A N G L E Y R E S E A R C H C E N T E R

Big Data Analytics and Machine Intelligence Capability Overview

Manjula Ambur and Data Analytics Team

Presentation at AIAA SciTech Conference January 2016

Page 2: AIAA Conference - Big Data Session_ Final - Jan 2016

NASA Langley Core Technical AreasAerosciences

Atmospheric Characterization

Entry, Descent & Landing

Intelligent Flight Systems

Measurement Systems

Systems Analysis & Concepts

Advanced Materials & Structural Systems

Page 3: AIAA Conference - Big Data Session_ Final - Jan 2016

Comprehensive Digital Transformation

Modeling & Simulation• Physics-based understanding and simulation – Improved

discipline tools • Integrated analysis and design of complex systems• Optimally combine testing and M&S

Open ideation and innovation Focused, relevant research Intelligent and rapid system designs Agile response to emerging missions

Advanced IT• Open, secure collaboration with NASA & partners• Networks handle burgeoning data• Data governance, architecture, and management

High Performance Computing• Next generation code development• Rapid Compute power for M&S and BDA&MI• Architecture for real-time analysis and design

Data Analytics & Machine Intelligence• Rapid synthesis and digestion of global scientific information for

knowledge extraction, insights and answers• Mining of diverse computational and experimental data sets for new

correlations, discoveries and advanced designs• Virtual/Digital Experts – Human Machine symbiosis

External collaboration is of paramount importance

Enable Innovative Solutions to Complex NASA Challenges in Aeronautics, Exploration, and Science

Page 4: AIAA Conference - Big Data Session_ Final - Jan 2016

Big Data Analytics and Machine Intelligence Vision: Virtual Research and Design Partner

Page 5: AIAA Conference - Big Data Session_ Final - Jan 2016

Projects and Pilots Towards Virtual Partners

Page 6: AIAA Conference - Big Data Session_ Final - Jan 2016

Two Key Areas for Virtual Partner – Data Intensive Scientific Discovery

Deriving new insights, correlations, and discoveries not otherwise possible from our diverse experimental and computational data sets

The Fourth Paradigm

Projects & Pilots ( cuts across Technical Areas)• Anomaly Detection in the Nondestructive Evaluation images of Materials• Predicting Flutter from Aeroelasticity Data• Cognitive Assessment of Crew/Pilot State• Knowledge Bot for Complex Simulation Software Optimization• Rapid Exploration of Aerospace Designs• Entry, Descent, and Landing Trajectory Data Analysis

The variety of techniques used in these projects and pilots represents a cross-cutting approach to solving complex, physics-based problems in multiple

aerospace domains with diverse datasets -

Page 7: AIAA Conference - Big Data Session_ Final - Jan 2016

Anomaly Detection in the Non-Destructive Evaluation Images of Materials

Predicting Flutter from Aeroelasticity Data

Develop techniques and algorithms to automatically detect anomalies during the nondestructive evaluation of materials

Goals• Significantly reduce SME analysis time and help experts discover

additional anomalies • Help to design better material compositions and structures

Techniques• Two-Dimensional Regression designed to detect anomalous pixels • Convolutional Neural Networks to classify the image data

Accomplishments & Next Steps• Algorithms are validated with real data sets and being enhanced • Deliver a tool with a good UI for SMEs to use as an ‘Assistant’ for anomaly

detection of composite materials analysis in March

Develop methods to automatically detect the onset of flutter during wind tunnel testing

Goals• Find new ways of predicting flutter in the time domain• Identify non-traditional predictor variables and unseen patterns• Better understand precursors to flutter and improve configurations

Techniques• Piecewise Regression to locate and track structural modes & coalescence• Time Series Motifs to identify signatures in the data that could represent

precursors to flutter

Accomplishments & Next Steps• Peak detection tested with multiple datasets• Several significant time series motifs detected• Generating synthetic data for validation of algorithms

Data Intensive Scientific Discovery Projects - 1

Page 8: AIAA Conference - Big Data Session_ Final - Jan 2016

Pilot Cognitive State Assessment Rapid Exploration of Aerospace Designs

Build classification models for predicting cognitive state using physiological data collected during flight simulations

Goals• Identify unsafe cognitive states in aircrew real-time• Apply results for more effective pilot training

Techniques• Ensemble of machine learning tools (deep neural network, gradient

boosting, random forest, support vector machine, decision tree)• Data pre-processing using detrending and power spectral density

Accomplishments & Next Steps• Initial data mapping, statistical analysis, and signals processing• Support classification efforts on single modalities• Explore combining multiple signal models using ensembling

Develop a generalized machine learning platform to be used for analyzing mod-sim data for design optimization

Goals• Provide surrogate modeling to explore the trade space of aerospace

vehicle designs with easy to use web interface• Use fast machine learning models instead of computationally-

intensive code for rapid exploration and optimization

Techniques• Supervised machine learning algorithms, SVM and Neural Networks,

will be trained on labeled data

Accomplishments & Next Steps• Python 2.7 with SKLearn algorithms are being used • Windows Server 2012 with PHP set up for web interface

Data Intensive Scientific Discovery Projects - 2

Page 9: AIAA Conference - Big Data Session_ Final - Jan 2016

Current State

SME pre-selects data to be analyzed and analyzes relying on traditional methods; Requires expertise and is time-consuming

Being Developed

Long term Vision

Algorithms that mimic SME knowledge:• Validate the algorithm • Save SME time

Application of algorithms to entire dataset and to other legacy datasets

Yields New Insights

Virtual Expert

Autonomous Assistant to SME that analyzes all possible data and augments decision making

Aerospace Data AnalyticsChallenge of Physics-Based Algorithms

All Data

SME-defined subset of data for analysis

Being Developed

Data Mining techniques to detect patterns and correlations which will be

validated by SMEs

Data Analytics Team and

SMEs working together

Page 10: AIAA Conference - Big Data Session_ Final - Jan 2016

Two Key Areas for Virtual Partner - Knowledge Analytics

Obtaining insights, identifying trends, aiding in discovery, and finding answers to specific questions by mining knowledge from scholarly,

web, and multimedia contentCognitive Computing

Knowledge Assistants Using Watson Content Analytics

Aerospace Innovation Advisor POCUsing Watson Discovery Advisor

• Carbon Nanotubes Research• Autonomous Flight Research • Space Radiation Research

Example Topics:• Hybrid Electric Propulsion • On Demand Mobility

Cognitive-based systems are able to build knowledge and learn, through understanding natural language, to reason and interact more naturally with human beings than traditional systems. They are also able to put content into context with confidence-weighted responses and

supporting evidence. Uses Natural language processing, machine learning and speech recognition technologies.

Page 11: AIAA Conference - Big Data Session_ Final - Jan 2016

• Digest and analyze thousands of articles without reading with ability to dissect the content interactively

• Automatically identify sub set of documents from large corpus and provide brief summaries

• Provide a means to rapidly identify trends, and connections

• Identify experts and connections among them at all levels and affiliations

• Explore technology gaps that could be leveraged

• Replace traditional methods of SMEs manually reviewing and tracking research

• Help to identify cross-domain leverages and research

Autonomous Flight

Carbon Nanotubes Research

Space Radiation Research

Knowledge Assistants Using Watson Content Analytics

130,000 articles metadata of ~ 20 year literature analyzedIdentified experts, trends, insights, and connectionsBuying scholarly content is a challenge

4000 metadata and full text articles analyzedIntegrate analysis of scholarly and informal webcontent to identify experts and new partnerships

1000 metadata and full text articles analyzed. Usingidentified possible duplications, connections andtechnology gaps. In the process of analyzing all sixelements of Human Research Program

Key Capabilities

Successfully demonstrated value and developed robust expertise; Buying licensed content is a challenge Being expanded to analyze corpus topics as Systems Design, Uncertainty Quantification and Entry Descent Landing

Page 12: AIAA Conference - Big Data Session_ Final - Jan 2016

Cognitive Computing : Systematic and repeatable approach to learning

Understand scientific and domain language

Adapt and learn quickly from inquires, results, selections and iteration

Compose and visualize

information at large

…built on a massively parallel Big data scalable architecture

Domain content Extraction at scale Visualize at scale Discover at scale Learn with speed

Generate new hypothesis and

discoveries

Page 13: AIAA Conference - Big Data Session_ Final - Jan 2016

Watson Discovery Advisor

Accelerate the discovery of new insights by synthesizing information in seconds

• Take advantage of massive sources of data

• Find answers to questions that have not been asked yet or answered before

• Find insights into hidden relationships and dig deeper

• Generate leads to hard questions and provide evidence to substantiate new claims

• Being used in medicine both as diagnostics/treatment and research advisors

Cognitive Technologies for Aerospace Engineering

Apply cognitive computing technologies that ‘understand’ massive amounts of information and enhance experts abilities

Starting to explore application to our domains: key challenges are adoption to engineering and licensing the scholarly content

Aerospace Innovation Advisor

Proof of Concept (March – June 16)Example Topics: Hybrid Electric Propulsion;

On Demand Mobility

Ames Research Center: Aircraft Dispatcher Assistant: Feasibility Study

Armstrong Flight Research Center: Pilot Assistant: Investigation

Johnson Space Center: Astronaut Health Investigations

LaRC is connected with all of these efforts

Page 14: AIAA Conference - Big Data Session_ Final - Jan 2016

Algorithms and Software

Page 15: AIAA Conference - Big Data Session_ Final - Jan 2016

Linear Regression

Application 1: Non-Destructive Evaluation (NDE) Image Analysis

Goal: Automate delamination detection

Method: Fit data with linear regression and detect outlier regions. Regression performed on 1D and 2D signals; Using C++ code and R

Application 2: Aeroelastic Flutter Data Analytics

Goal: Detect precursors and onset of aeroelastic flutter

Method: Fit best quadratics between structural modes to detect mode coalescence; Using MATLAB

Top: Linear regression of 1D-signals for anomaly detection in carbon fiber; Bottom: Mode identification in flutter time-series data using linear regression

Page 16: AIAA Conference - Big Data Session_ Final - Jan 2016

Gaussian Process

Application: Knowledge Bot for Optimizing Complex Simulation Software

Goal: Emulate simulation to predict convergence divergence

Method: Gaussian Process for emulator and to find next best point to maximum knowledge; Using Python

Justification: Approach followed in literature for weather simulation emulation

Create Initial Points

Evaluate Point(s) on Simulator(FUN3D)

Create/Update Emulator

Find Next Best Point

Methodology

Finding boundary of two circles; ~ 99% accuracy

Fun 3D; ~83% accuracy

Page 17: AIAA Conference - Big Data Session_ Final - Jan 2016

Time Series Motifs

Application: Pattern Mining of Time Domain Aeroelastic Flutter Data Goal: Identify flutter precursors to:• Create a dictionary of motifs for a given configuration• Classify data for use with machine learning

algorithms that will support a real-time ‘Flutter Assistant’

Method: Application of the Motif Enumeration using (MOEN) open source algorithm created by Dr. Abdullah Mueen and MATLAB

Justification: MOEN has been successfully applied to research problems in other scientific domains including robotics, biology, and seismology

In order to detect motifs across the various sensor signals, a given sensor’s output (Signal A) is compared to another sensor (Signal B) by creating a composite signal (Signal A/B).

The algorithm is then applied to the composite signal to detect the motifs (above right) common to both sensors. Significant motifs are identified by a physics-based selection process and then validated by SMEs.

Scott, Robert C., et al. "Aeroservoelastic Wind-Tunnel Test of the SUGAR Truss Braced Wing Wind-Tunnel Model." 56th AIAA/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference. 2015.

Page 18: AIAA Conference - Big Data Session_ Final - Jan 2016

Deep Learning: Convolutional Neural Network (CNN)

Application: NDE Image Analysis to Segment Delaminations

Method: Convolutional encoder/decoder neural network; end-to-end training to map raw data to segmentation; Using Caffe and Lua/Torch

Justification: Very successful in medical image analysis such as wound segmentation (top right)

Results on Simulated Data

Results on Experimental Data

From Wang, Changhan, et al. "A unified framework for automatic wound segmentation and analysis with deep convolutional neural networks." Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual International Conference of the IEEE. IEEE, 2015.

Page 19: AIAA Conference - Big Data Session_ Final - Jan 2016

Artificial neural networks (ANN)Application 1: Crew State Monitoring

Goal: Build classification models capable of accurate, real-time prediction of aircrew cognitive state using physio data collected during flight simulations

Method: ANN trained to classify cognitive state

Application 2: Rapid Exploration of Aerospace Designs (READ)

Goal: Build classification / regression models on user-uploaded simulated data

Method: train ANNs on labeled data, use trained models for prediction and visualization

EEG ECG Galvanic Skin ResponseRespiration Rate Eye Tracking

Feature Generation

Input Layer

Hidden Layer

Output Layer / Classification

“Normal” State

ChannelizedAttention

DivertedAttention

Startle /Surprise

Page 20: AIAA Conference - Big Data Session_ Final - Jan 2016

Ensemble of Machine Learning TechniquesApplication 1: Non-Destructive Evaluation (NDE) Image Analysis

Goal: Automate delamination detection

Method: Combine several machine learning models into overall prediction using regression to determine if sample contains a delamination; Using Python/Scikit-Learn

Application 2: Pilot Cognitive State Assessment

Goal: Build classification models capable of real-time prediction of pilot cognitive state using physiological data collected during flight simulations

Method: Utilize 2-level Meta Model combining multiple classification algorithms to improve classification accuracy; Using Theano and Python

Random forests Extremely random forests Ada Boost

Gradient boosting

k – nearest neighbors

Fully grow k independent classification trees and combine predictions

Similar to random forests but split per node is also randomized

Fit consecutive weak learners based on classification tree stumps and combine predictions

Similar to Ada Boost with different loss function

Identify k closed points to test sample based on distance metric

EEG ECG Galvanic Skin ResponseRespiration Rate Eye Tracking

Feature Generation

Artificial Neural Network Gradient Boost Classifier Random Forest

Level 1 Models

Artificial Neural NetworkLevel 2 Meta Model

“Normal” State Channelized Attn Diverted Attn Startle /Surprise

Page 21: AIAA Conference - Big Data Session_ Final - Jan 2016

Clustering: K-Means

Application: Knowledge Analytics

Goal: Automatically group thousands of documents into useful clusters

Method: Utilizing IBM Watson Content Analytics; K-means is a scalable means of clustering large datasets

More than 130,000 documents are grouped into 20 clusters; Accurate by SMEs validation

Page 22: AIAA Conference - Big Data Session_ Final - Jan 2016

Key Insights, Challenges and Next Steps

Page 23: AIAA Conference - Big Data Session_ Final - Jan 2016

• Focus on ‘Big Analytics’; Big Data is not just about volume

• Data Science is a team effort - Computer Science; Statistics; Mathematics; Subject Matter Expertise/SME

• Strong collaboration with SMEs is essential and critical; algorithm development is iterative

• Machine learning techniques have to be adopted to our data; Researching thoroughly how similar problems are being addressed and adopting them is important

• Understanding the problem domain and associated data is essential, difficult & time consuming

• Use Pilots to learn to apply techniques to science/Eng. domains; Use a Phased Approach

• Leverage extensive work and research happening, and the open source platforms• NASA, Federal, Universities, Industry

• Application of data science/analytics to aerospace research domains is in early stages• Be ready for challenges and it is a journey with a huge potential

Key Insights

Page 24: AIAA Conference - Big Data Session_ Final - Jan 2016

• Trending a new path to develop the capability for very specialized disciplines• Using a mix of research, experimentation, innovation, and persistence

• Application of techniques to aerospace domains needs maturity and resources

• Difficulty in establishing ground truth and labeled data for ML models; data sets are diverse, complex and unique

• Takes multi years to get to achieve big goals; balancing that with short term milestones to demonstrate value and generate buy-in & resources

• Provide a suite of tools and techniques for broad use-Machine Learning/Analytics

• Expand big data movement to get broad buy in and propagating the expertise

Challenges and Next Steps

Page 25: AIAA Conference - Big Data Session_ Final - Jan 2016

Acknowledgements – Big Data Analytics Team

Data Analytics and Machine Learning Expertise : Manjula Ambur, Lin Chen, Christina Heinich, Charles Liles, Robert Milletich, Daniel Sammons, Ted Sidehamer, and Jeremy Yagle

Subject Matter Expertise: Damodar Ambur, Danette Allen, Eric Burke, Kyle Ellis, Christie Funk, Dana Hammond, Angela Harrivel, Jeff Herath, Patty Howell, Constantine Lukashin, Alan Pope, Brandi Quam, Cheryl Rose, Jamshid Samareh, Mark Sanetrik, Rob Scott, Lisa Scott-Carnell, Steve Scotti, Walt Silva, Mia Siochi, Chad Stephens, Scott Striepe, Marty Waszak, Bill Winfree, and Kristopher Wise