Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data:...

20
Exploring Deep Learning Algorithms in Forecasting Severe Haze Events in Southeast Asia Chien Wang, Laboratoire d’Aérologie (CNRS/UPS)

Transcript of Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data:...

Page 1: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events

Exploring Deep Learning Algorithms in Forecasting Severe Haze Events in Southeast Asia

Chien Wang, Laboratoire d’Aérologie (CNRS/UPS)

Page 2: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events

Ø Why: Clearing land for palm oil plantation; drained peat lands in the area make thing even worse

Ø Profits: Low priced palm oil is used for making numerous daily necessities and food products

Ø Solution: The ultimate one seems quite obvious though perhaps is difficult to implement

Page 3: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events

Actually, fire is not the whole

story…

(Lee et al., ACP, 2017, 2018)

Year

Page 4: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events

So, …it seems that forecasting-the-occurrence-of-severe-haze-ahead-of-time would be the most practical mitigation measure…

But, can we forecast it with confidence?

Page 5: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events

Process-based modeling

Example:• WRF-Chem, high-resolution regional weather/climate model, including chemistry + aerosols• Simulation Skill: for vis ≤ 10 km events, ~80% (equivalent to training accuracy) with correction based on in-situ aerosol measurements• Forecast skill: practically zero due to lack of real time emission estimates

Page 6: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events

Should we try something else?Using machine learning algorithms to forecast haze

(Lee et al., 2018, ACP)

Ø Haze event ≡ daily surface visibility < 10 kmØ Data: abstract derivatives from meteorological data and satellite retrievalsØ Certain advantages over ”traditional” forecast models (e.g., low demand of

computation); task-centric vs. process-centricØ ~ 93% (training) accuracy in ”same-day” forecast using various algorithmsØ ~ 84% (training) accuracy in “one-day” forecast Ø Applications using “standard” ML algorithms often rely on abstract models,

while our knowledge (expert opinion) about extreme events are very limited

Page 7: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events

“Traditional”Machine Learning

Deep learningan “end-to-end” approach

Deep learning comes to the picture…

Convolutional neural networks: e.g., LeNet-5 (LeCun et al., 1998)

Page 8: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events

Liu et al., 2016; arXiv:1605.01156v12 convolution layers with 8 and 16 filter set

CNN has been applied to, e.g., identify certain weather patterns

Tropical Cyclones

Correctly Classified (True Positive)

Miss-classified (False Negative)

Atmospheric River

Miss-classified (False Negative)

Correctly Classified (True Positive)

Page 9: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events

Ø Supervised learningØ Training data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784,

including 679 events with vis ≤ 7.72 km (p5)Ø Inputs: 18 40x40 or 60x60 features from ERA-Interim reanalysis, 0.75o x 0.75o

Ø Output: 2 classes of vis0: vis > 7.72 km (94.7%); 1: vis ≤ 7.72 km (5.3%)

Ø Output: 3 classes of vis0: vis > 9.98 km (p25); 1: 7.72 km < vis ≤ 9.98 km; 2: vis ≤ 7.72 km

Forecasting Haze Events using Convolutional Neural Networks

Singapore Surface Visibility in kmFrom Global Surface Summary of the Day

(Smith et al., 2011, BAMS)

mean = 10.52, max = 24.94, min = 1.29 km

Page 10: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events

”HazeNet”

a 16-layer convolutional

neural network

Layers Size/Conv filters Kernels

input 40x40x18 or 60x60x18

Conv1 + dropoutConv2

9292

10x1010x10

Maxpool Maxpool Maxpool

Conv3 + dropoutConv4

192192

6x66x6

Maxpool Maxpool Maxpool

Conv5 + dropoutConv6

384384

3x33x3

Maxpool Maxpool Maxpool

Conv7 + dropoutConv8

384384

3x33x3

Maxpool Maxpool Maxpool

Conv9 + dropoutConv10

512512

3x33x3

Maxpool Maxpool Maxpool

Conv11 + dropoutConv12

512512

3x33x3

Maxpool Maxpool Maxpool

All-flat

Dense1Dense2 + dropout

40964096

sigmoid/softmax 2 or 3

Page 11: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events

Inputs: 18 features, 40x40map

Longitude

Latit

ude

Example: August 10, 1982, visibility = 7.56 km (data are normalized)

T1000 V10Z500LgScPercip

TCWV

TCWConvPrecip

MCloudZ850RelHumHCloud BLH

U10 LCloudSWVL3TCloudSWVL2 SWVL1

Page 12: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events

Performance of the Networks:Hazenet-16 with Batch Normalization(averaged over epoch 500-599):

Training accuracy = 0.999Training loss = 0.056Validation accuracy = 0.951Validation loss = 0.413

Importance of the ”hyperparameters” Overcome the overfitting

Page 13: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events

Different Network Structureson training accuracy

(H7, 1-Day)

Different Activationson training scores

Page 14: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events

A closer look at the performanceNote that #class0 >> #class1

P5 Class 1 (vis ≤ 7.72 km)Test samples (33%) = 4219

For forecast window = 0 dayLast 100 ep mean:Vacc = 0.951 (0.947)Prec = 0.575Recall = 0.236F1 = 0.330Heidke Skill Score = 0.343

F1 score or F1 = 2 x (precision x recall) / (precision + recall)HSS = ((tp + tn) - ecr)/(nsample - ecr);

ecr = ((tp + fn)*(tp + fp) + (tn + fn)*(tn + fp))/nsample

Frequency of class0 ~ the best accuracy of no-skill forecasting

Vacc = validation accuracyPrec or pre = precision

Page 15: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events

True Positive (TP)

False Positive (FP)

True Negative (TN)

False Negative (FN)

Rich patterns captured for different events:

Example: V10, class-1, true positive or TP events

What could help us to advance knowledge:

V10: Averaged patterns corresponding to different events

of class-1

What could we learn from the machine?

Page 16: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events

What could we learn from the machine?

Fwin = 0 days Fwin = 1 days

Fwin = 3 days

Fwin = 2 days

Fwin = 4 days Fwin = 5 days

Mean patterns identified for different forecast windows

Page 17: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events

TCW

T P

06/11/13 c = 0 06/14/13 c = 1 06/18/13 c = 2

MSL

Why…?

Page 18: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events

T=11 T=14 T=18

Deterministic forecasting platform

The same deterministic formula or causal relation T(t+1) = f[T(t), T(t-1),…, x1, x2, … xN]

DL forecasting platform

T=11 T=14 T=18

Predictor?

Predictor?

Page 19: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events

Process-OrientatedWRF-Chem

Task-OrientatedDeep ConvNet

Training Time 25-5 km regional domain:1d=630 or 1yr=230K core-hr

27-year daily data: < 2 hr using a Nvidia GPU

Forecasting Time N/A due to the lack of realtime emission data; otherwise same as above

Negligible

Data Preparation Initial conditions & 6-hourly boundary conditions; emissions

All “relevant” data

Code 1 million+ lines FORTRAN < 40 lines Python (using rich software libs)

Benefits Understanding detailed process connections

Identifying hidden features

You will have to deal with PARAMETERS in both platforms!

Page 20: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events

Summary

• Deep CNNs have been deployed to ”forecast” severe haze in Southeast Asia - perhaps we can say that it has passed the proof-of-concept stage

• The same network has recently been modified to explore forecasting the intensive lightning activities in Corsica, results are also promising

• Deep learning algorithms can elevate our knowledge base from a few cases to cover all available samples, benefiting our science

• Challenges in using meteorological data, e.g., scale-sensitive features, rich features while still limited samples

• Current networks still produce high number of miss-classified cases (false negative)

• New algorithms and network configurations are being proposed and will be tested