Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP....

23
Solar Flare Forecasting: A Novel Deep Learning Approach Justin Cai July 31, 2018

Transcript of Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP....

Page 1: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

Solar Flare Forecasting: A Novel Deep Learning Approach

Justin Cai

July 31, 2018

Page 2: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

About me

I Junior at CU Boulder, studying computer science

I At LASP, I’m a student working in the Data Systems software engineeringgroup

I First got interested in machine learning in a project using social media topredict flu vaccination rates

I Outside of ML, I’m interested in software engineering, programminglanguages, and statistics

Page 3: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

Why predict solar flares?

I Solar flares are events that occur near the surface of the sun

I The energy/radiation they release can be harmful to astronauts in space orelectronics and other hardware components of satellites and aircraft

I A prediction could inform astronauts to take shelter in a place where theradiation would not a↵ect them

Page 4: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

Why use deep learning?

I Previous flare prediction models were being research and developedbefore/or during the revolution of deep learning, using simpler algorithmslike SVM and kNN

I The application of deep learning in problems like image recognitionimproved results significantly

I The goal of this project is to apply current deep learning methods to theflare prediction problem and find success over the simpler models

Figure: Error rate of the top models from ImageNet Large Scale Visual RecognitionCompetition (ILSVRC) with the depth of models. Deep networks have consistentlywon the competition every year, yielding lower and lower error rates.

Page 5: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

Previous work on flare prediction

I Bobra & Couvidat achieved a true skill statistic (TSS) of 0.761 usingfeatures calculated from vector magnetic field data and a support vectormachine (SVM) with a RBF kernel

I Nishizuka et al. tested three di↵erent algorithms, SVM, k nearestneighbors (kNN), and extremely randomized trees (ERT). They usedsimilar features to Bobra & Couvidat, although they detected activeregions with their own algorithm. Additionally, they added UV brighteningfeatures, time derivatives of features, and flare history features. Theyachieved a TSS of 0.912 using kNN

Page 6: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

Methods - data

I To predict flares, we’ll need magnetic field dataI Michelson Doppler Imager (MDI) - for magnetic field, it can only records

line of sight, but has been recording for longerI Helioseismic and Magnetic Imager (HMI) - along with line of sight, can

record full vector data

I HMI released a derivative product that automatically tracks active regions,called Spaceweather HMI Active Region Patch (SHARP/HARP)

I Each SHARP is sequence of observations of an active region, givingvarious spaceweather quantities calculated from the vector magnetogram

Figure: Example active region snapshot. These three images show the westward,southward, and radial components of the vector magnetogram.

Page 7: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

Methods - data

HARPNUM: 52

6664

7.71.3...

7.6

3

7775

t1

,

2

6664

0.10.3...

0.2

3

7775

t2

,

2

6664

6.18.8...

0.7

3

7775

t3

, . . . ,

2

6664

8.90.1...

8.2

3

7775

t243

HARPNUM: 42

6664

8.71.6...

2.4

3

7775

t1

,

2

6664

3.57.5...

2.8

3

7775

t2

,

2

6664

6.31.6...

7.5

3

7775

t3

, . . . ,

2

6664

8.47.6...

5.8

3

7775

t298

HARPNUM: 32

6664

6.73.1...

4.2

3

7775

t1

,

2

6664

7.15.3...

8.0

3

7775

t2

,

2

6664

0.11.5...

5.5

3

7775

t3

, . . . ,

2

6664

0.61.0...

6.8

3

7775

t290

HARPNUM: 22

6664

2.34.7...

3.4

3

7775

t1

,

2

6664

1.07.6...

5.5

3

7775

t2

,

2

6664

2.20.6...

5.2

3

7775

t3

, . . . ,

2

6664

2.44.6...

2.6

3

7775

t401

HARPNUM: 12

6664

1.34.8...

5.5

3

7775

t1

,

2

6664

7.84.3...

8.2

3

7775

t2

,

2

6664

0.71.7...

4.4

3

7775

t3

, . . . ,

2

6664

0.97.6...

7.3

3

7775

t239

di↵erent time scales

Figure: Representation of collection of SHARP sequences.

Page 8: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

Methods - data

HARPNUM: 2T REC: 2012-06-05 00:00:00

⇥3.3 7.7 5.7 . . . 0.2

⇤HARPNUM: 2T REC: 2012-06-04 00:00:00

⇥0.4 5.6 8.2 . . . 4.7

⇤HARPNUM: 2T REC: 2012-06-03 00:00:00

⇥8.6 2.0 6.6 . . . 1.5

⇤HARPNUM: 2T REC: 2012-06-02 00:00:00

⇥1.5 2.3 0.1 . . . 2.4

⇤HARPNUM: 2T REC: 2012-06-01 00:00:00

⇥2.6 2.4 5.6 . . . 4.1

HARPNUM: 1T REC: 2012-05-05 00:00:00

⇥1.5 4.1 4.4 . . . 0.8

⇤HARPNUM: 1T REC: 2012-05-04 00:00:00

⇥6.3 3.6 0.6 . . . 4.6

⇤HARPNUM: 1T REC: 2012-05-03 00:00:00

⇥5.1 0.3 2.4 . . . 8.6

⇤HARPNUM: 1T REC: 2012-05-02 00:00:00

⇥5.1 0.7 2.2 . . . 3.8

⇤HARPNUM: 1T REC: 2012-05-01 00:00:00

⇥6.6 0.7 7.6 . . . 1.7

Figure: Representation of collection of SHARP observations.

Page 9: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

Methods - data

I Geostationary Operational Environmental Satellite (GOES) measuresX-ray flux, the metric which classifies flares

I They provide a catalog of flare events, giving information about the event,like time, class, and (if there was) what NOAA AR produced the event

I Most HARPs are associated with a NOAA AR, so the two data sets canbe joined on that fact

I Knowing this, we can assign a label to each SHARP observation indicatingif a {X ,M,X or M} class flare happened in {6, 12, 24, 48} hours

Page 10: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

Methods - MLP

I Multilayer perceptrons (MLP) are the most basic and fundamental deeplearning model

I Composed of layers, each of which are functions parameterized by ana�ne transformation and a nonlinear activation function

I In flare prediction, this model can be applied just like any of the simplermachine learning models. The model would SHARP observations andproduce an output of whether or not a flare will occur

I Without augmenting the original data, the model has “tunnel vision”, i.e.no information about past events

Figure: Diagram of a MLP.

Page 11: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

Methods - LSTM

I Long short term memory (LSTM) networks are a special type of recurrentneural network (RNN)

I RNNs process sequences; at each time step they produce an output and astate, the state and/or output along with the next time step are fed backinto the model and the process repeats

I The state allows for the network to remember things when processing eachstep

I LSTMs extend this with a concept of learnable gates, which limit howmuch a certain values can flow through the model.

I The gates allow LSTMs to forget their state when it becomes irrelevant

Figure: Diagram of a LSTM cell.

Page 12: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

Methods - LSTM

I LSTM are a natural fit for the problem, as our data is a collection ofsequences

I History of flares in a region has shown to be a strong predictor in previousmodels, which LSTMs could learn

I Looking at each time step independently is removing things like howquantities are changing over time, which LSTMs also could learn

f

t

= �(Wf

x

t

+ U

f

h

t�1 + b

f

)

i

t

= �(Wi

x

t

+ U

i

h

t�1 + b

i

)

o

t

= �(Wo

x

t

+ U

o

h

t�1 + b

o

)

c

t

= f

t

� ct�1 + i

t

� tanh(Wc

x

t

+ U

c

h

t�1 + b

c

)

h

t

= o

t

� tanh(ct

)

Figure: Equations for a LSTM cell. f

t

, it

, and o

t

are the forget, input, and outputgates. c

t

is the cell state, xt

is the input, and h

t

is the output. The subscript t refersto the input/output at that time step. W , U and b are weight matrices that arelearned during training.

Page 13: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

Methods - metrics

I According to Bobra & Couvidat, they believe that true skill statistic (TSS)should be used when comparing the performances

I Di↵erence between recall and false alarm rateI Compared to other metrics, TSS is independent of the class-imbalance

ratio (N/P)

TSS =TP

TP + FN

� FP

FP + TN

Figure: Dependence of class-imbalance ratio in the test set and scores of a SVMclassifier from Bobra & Couvidat (2015).

Page 14: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

Implementation

I Data contains SHARP observations at roughly an hourly cadence

I Preprocessed by taking the natural log of features with big scales(max(feature) > 108) and then standardizing each feature.

I All of the models were implemented in Keras - a high level deep learningframework

I Training models on a machine with two Titan V GPUs

Page 15: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

First training attempt

I The first attempts at training models produced low TSS scores

I The problem was the imbalance of positive events (a flare happened within24h) and negative events (a flare did not happen within 24h)

I Since there were so few positive events (2% of all observations are positiveevents), any predictor can achieve a low loss by predicting everything as anegative event.

I Two ways to fix this problem traditionally: weighted loss function andresampling (through undersampling and oversampling)

Figure: A visualization of undersampling and oversampling.

Page 16: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

Resampling in flare prediction

I Undersampling and oversampling can be applied to SHARP observations,simply remove or duplicate individual observations

I However, applying these techniques to SHARP sequences is trickier

I Duplicating individual observations and inserting them into sequenceswould ruin the structure of the sequence

I Removing sequences that contain only negative events can help balancethe classes, but not perfectly balance

I You could duplicate sequences that have positive events, but depending onthe ratio of positive events to negative events within the sequence,duplication could lead to a bigger imbalance

Page 17: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

Results

I Three models trained:I MLP trained on vector time steps with weighted loss (WL) - MLP w/ WLI MLP trained on over and undersampled vector time steps - MLP w/ RSI LSTM trained on vector sequences with WL - LSTM w/ WL

I For reference - Bobra & Couvidat achieved 0.76 and Nishizuka et al.achieved 0.91

Figure: TSS of trained models.

Page 18: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

Results

I Oversampling and undersampling seem to be highly e↵ective

I The model with tunnel vision (had no information from the past) had thehighest TSS

I Possible hypotheses

I LSTMs are too complex - need a simpler way of incorporating pastinformation (see future experiments)

I Di↵erent activation functions - I trained the LSTMs with tanhs, while theMLPs with ReLUs. (changing to ReLU would bring the LSTM closer tothe MLP model)

I Past information is not as useful as we think/SHARP parameters arestronger predictors than we thought

I Bad hyperparameters

Page 19: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

Future experiments

Figure: Regular RNN (left) vs teacher forcing RNN (right). The recurrent connectionin the teach forcing RNN comes from the output, instead of hidden layer.

Page 20: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

Future experiments

2

40 1 01 �4 10 1 0

3

5 =

Figure: (Top) Example of convolution operation that detects edges. (Bottom)Example kernels from an image recognition CNN and which images produced thestrongest activation.

Page 21: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

Future experiments

I CNN + time information - with a CNN, we can still apply all other formsof incorporating past information like a LSTM, teacher forcing RNN, anda window of images. These models would be very complex in the numberof features but could possibly be the most powerful models

I Add more features from feature engineering to MLP model

I Turn into a regression problem - take a full disk magnetogram andestimate X ray flux in the next 24h

Figure: Example of a 1 dimensional CNN. Each x would be the value of somespaceweather quantity at some time.

Page 22: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

Acknowledgements

I would like to thank Wendy Carande, Jim Craft, Kim Kokkonen, LauraSandoval, Tracey Morland, and Andrew Jones for listening, supporting, andallowing me to work on this project.

Page 23: Solar Flare Forecasting: A Novel Deep Learning …bpoduval/psfiles/...Figure: Diagram of a MLP. Methods - LSTM I Long short term memory (LSTM) networks are a special type of recurrent

Q&A

email: [email protected]