Modeling Time-Varying Data

17
LLNL-PRES-755086 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC Jayaraman J. Thiagarajan Modeling Time-Varying Data Attention, Generative Models and Beyond August 7, 2018 Computer Scientist

Transcript of Modeling Time-Varying Data

Page 1: Modeling Time-Varying Data

LLNL-PRES-755086

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

Jayaraman J. Thiagarajan

Modeling Time-Varying DataAttention, Generative Models and Beyond

August 7, 2018

Computer Scientist

Page 2: Modeling Time-Varying Data

2LLNL-PRES-755086

Predictive Modeling with Time-Varying Data is Ubiquitous

Time-Series Modeling is at the Heart of Clinical Diagnosis

Understanding Time-Varying Phenomena Critical in Science

Page 3: Modeling Time-Varying Data

3LLNL-PRES-755086

Predictive Modeling with Time-Varying Data is Ubiquitous

Advances in Sensing are Driving IOT technologies

Speech Processing is Integral to Modern AI Systems

Modeling time-varying data amounts to finding mapping between the measurements and the dynamic system

Page 4: Modeling Time-Varying Data

4LLNL-PRES-755086

▪ Creating representations for dynamic systems from observations is not straightforward – No known priors

▪ Complex dependencies

▪ Multi-variate measurements

▪ Irregular sampling

▪ Missing values and measurement errors

▪ Making progressive predictions is computationally expensive

Predictive Modeling with Time-Varying Data is Ubiquitous and Challenging

Page 5: Modeling Time-Varying Data

5LLNL-PRES-755086

Deep Neural Networks Have Become the De-facto Solution for Sequence Modeling

Recurrent Neural Nets with LSTM or GRU units

Convolutional Neural Nets

Differentiable Neural Computers

Page 6: Modeling Time-Varying Data

6LLNL-PRES-755086

Rethinking Deep Models for Sequential DataAttention Models

▪ Can we parameterize the dependenciesthrough simpler constructs than LSTMs?

▪ Temporal dependence can be viewed asa “network” structure

▪ Parameterized attention models are an effective alternative

Loosely based on human visual attention

Used in NLP to enhance LSTM-RNN by providing context

Page 7: Modeling Time-Varying Data

7LLNL-PRES-755086

Simply Attend and Diagnose – An Attention-only Architecture for Modeling EHR Data

▪ Stacked multi-head attention modules and 1D CNN feed-forward layers

▪ Encoding temporal order partially

▪ Effective training with skip connections and dropout

▪ Remarkable gains in reducing sequence modeling complexity

Page 8: Modeling Time-Varying Data

8LLNL-PRES-755086

*AAAI 2018

Significant Complexity Reduction and State-of-the-art Performance

MIMIC-III Database – Largest public repository of ICU patientrecords

Tasks – Mortality, length of stay, decompensation, phenotyping

State-of-the-art results in all tasks, and outperforms RNNs

Page 9: Modeling Time-Varying Data

9LLNL-PRES-755086

Rethinking Deep Models for Sequential DataLearn a Metric to Compare Time-Series Data

Network

Triplet Loss

Network Network

Face Verification Face Clustering

**A not-so-trivial problem: How do we sample?

Page 10: Modeling Time-Varying Data

10LLNL-PRES-755086

Attention-Based Deep Metric Learning for Speaker Diarization

Audio track:

Diarization result:The Question of Who

Speaks When?

Page 11: Modeling Time-Varying Data

11LLNL-PRES-755086

Learned Metric Generalizes Effectively to Novel Data Distributions

State-of-the-art results in diarization with

challenging benchmarks

* Interspeech 2018

Page 12: Modeling Time-Varying Data

12LLNL-PRES-755086

▪ Predictive modeling is at the heart of clinical diagnosis –Discriminative Models

— Less robust to shifts in domain

— Prone to overfitting and sensitive to initialization, hyperparameters etc.

— Cannot deal with out-of-distribution anomalies

Rethinking Deep Models for Sequential DataUnsupervised Pre-Training with Generative Models

Detecting heart conditions using limited channel ECG

Standard 12-Channel ECG Montage

Page 13: Modeling Time-Varying Data

13LLNL-PRES-755086

▪ Estimate information (implicitly) about missing channels througha generative model

Generative Models Provide a Task-Independent Statistical Description of the Entire Distribution

Predictive Model

Random forestsResNet with 1-D CNNs

InceptionNet with 1-D CNNs

What loss to use?

Page 14: Modeling Time-Varying Data

14LLNL-PRES-755086

Significant Improvements in Disease Prediction and Generalization

Generalization

DiseasePrediction

*IEEE EMBC 2018

Page 15: Modeling Time-Varying Data

15LLNL-PRES-755086

▪ Temporal modeling is a central, yet challenging problem inscience and engineering

▪ Wide-range of applications: Robust predictions, sensing limitations,metric learning, anomaly detection, multi-task learning, correlation

studies, interpolation etc.

▪ Key advances in deep learning solutions for time-series problems:Attention models, Deep metric learning, Generative modeling

Conclusions

Page 16: Modeling Time-Varying Data

16LLNL-PRES-755086

Collaborators

Deepta RajanIBM Research

Huan SongBosch Research

Rushil AnirudhLLNL

[1] ”Attend and Diagnose: Clinical Time-series Modeling using Attention Models”, AAAI 2018

[2] “Triplet Network with Attention for Speaker Diarization”, Interspeech 2018

[3] “A Generative Modeling Approach to Limited Channel ECG Classification”, IEEE EMBC 2018

Page 17: Modeling Time-Varying Data

DisclaimerThis document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.