Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data:...
Transcript of Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data:...
![Page 1: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events](https://reader036.fdocuments.in/reader036/viewer/2022081611/5f026a9d7e708231d4042b23/html5/thumbnails/1.jpg)
Exploring Deep Learning Algorithms in Forecasting Severe Haze Events in Southeast Asia
Chien Wang, Laboratoire d’Aérologie (CNRS/UPS)
![Page 2: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events](https://reader036.fdocuments.in/reader036/viewer/2022081611/5f026a9d7e708231d4042b23/html5/thumbnails/2.jpg)
Ø Why: Clearing land for palm oil plantation; drained peat lands in the area make thing even worse
Ø Profits: Low priced palm oil is used for making numerous daily necessities and food products
Ø Solution: The ultimate one seems quite obvious though perhaps is difficult to implement
![Page 3: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events](https://reader036.fdocuments.in/reader036/viewer/2022081611/5f026a9d7e708231d4042b23/html5/thumbnails/3.jpg)
Actually, fire is not the whole
story…
(Lee et al., ACP, 2017, 2018)
Year
![Page 4: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events](https://reader036.fdocuments.in/reader036/viewer/2022081611/5f026a9d7e708231d4042b23/html5/thumbnails/4.jpg)
So, …it seems that forecasting-the-occurrence-of-severe-haze-ahead-of-time would be the most practical mitigation measure…
But, can we forecast it with confidence?
![Page 5: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events](https://reader036.fdocuments.in/reader036/viewer/2022081611/5f026a9d7e708231d4042b23/html5/thumbnails/5.jpg)
Process-based modeling
Example:• WRF-Chem, high-resolution regional weather/climate model, including chemistry + aerosols• Simulation Skill: for vis ≤ 10 km events, ~80% (equivalent to training accuracy) with correction based on in-situ aerosol measurements• Forecast skill: practically zero due to lack of real time emission estimates
![Page 6: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events](https://reader036.fdocuments.in/reader036/viewer/2022081611/5f026a9d7e708231d4042b23/html5/thumbnails/6.jpg)
Should we try something else?Using machine learning algorithms to forecast haze
(Lee et al., 2018, ACP)
Ø Haze event ≡ daily surface visibility < 10 kmØ Data: abstract derivatives from meteorological data and satellite retrievalsØ Certain advantages over ”traditional” forecast models (e.g., low demand of
computation); task-centric vs. process-centricØ ~ 93% (training) accuracy in ”same-day” forecast using various algorithmsØ ~ 84% (training) accuracy in “one-day” forecast Ø Applications using “standard” ML algorithms often rely on abstract models,
while our knowledge (expert opinion) about extreme events are very limited
![Page 7: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events](https://reader036.fdocuments.in/reader036/viewer/2022081611/5f026a9d7e708231d4042b23/html5/thumbnails/7.jpg)
“Traditional”Machine Learning
Deep learningan “end-to-end” approach
Deep learning comes to the picture…
Convolutional neural networks: e.g., LeNet-5 (LeCun et al., 1998)
![Page 8: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events](https://reader036.fdocuments.in/reader036/viewer/2022081611/5f026a9d7e708231d4042b23/html5/thumbnails/8.jpg)
Liu et al., 2016; arXiv:1605.01156v12 convolution layers with 8 and 16 filter set
CNN has been applied to, e.g., identify certain weather patterns
Tropical Cyclones
Correctly Classified (True Positive)
Miss-classified (False Negative)
Atmospheric River
Miss-classified (False Negative)
Correctly Classified (True Positive)
![Page 9: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events](https://reader036.fdocuments.in/reader036/viewer/2022081611/5f026a9d7e708231d4042b23/html5/thumbnails/9.jpg)
Ø Supervised learningØ Training data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784,
including 679 events with vis ≤ 7.72 km (p5)Ø Inputs: 18 40x40 or 60x60 features from ERA-Interim reanalysis, 0.75o x 0.75o
Ø Output: 2 classes of vis0: vis > 7.72 km (94.7%); 1: vis ≤ 7.72 km (5.3%)
Ø Output: 3 classes of vis0: vis > 9.98 km (p25); 1: 7.72 km < vis ≤ 9.98 km; 2: vis ≤ 7.72 km
Forecasting Haze Events using Convolutional Neural Networks
Singapore Surface Visibility in kmFrom Global Surface Summary of the Day
(Smith et al., 2011, BAMS)
mean = 10.52, max = 24.94, min = 1.29 km
![Page 10: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events](https://reader036.fdocuments.in/reader036/viewer/2022081611/5f026a9d7e708231d4042b23/html5/thumbnails/10.jpg)
”HazeNet”
a 16-layer convolutional
neural network
Layers Size/Conv filters Kernels
input 40x40x18 or 60x60x18
Conv1 + dropoutConv2
9292
10x1010x10
Maxpool Maxpool Maxpool
Conv3 + dropoutConv4
192192
6x66x6
Maxpool Maxpool Maxpool
Conv5 + dropoutConv6
384384
3x33x3
Maxpool Maxpool Maxpool
Conv7 + dropoutConv8
384384
3x33x3
Maxpool Maxpool Maxpool
Conv9 + dropoutConv10
512512
3x33x3
Maxpool Maxpool Maxpool
Conv11 + dropoutConv12
512512
3x33x3
Maxpool Maxpool Maxpool
All-flat
Dense1Dense2 + dropout
40964096
sigmoid/softmax 2 or 3
![Page 11: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events](https://reader036.fdocuments.in/reader036/viewer/2022081611/5f026a9d7e708231d4042b23/html5/thumbnails/11.jpg)
Inputs: 18 features, 40x40map
Longitude
Latit
ude
Example: August 10, 1982, visibility = 7.56 km (data are normalized)
T1000 V10Z500LgScPercip
TCWV
TCWConvPrecip
MCloudZ850RelHumHCloud BLH
U10 LCloudSWVL3TCloudSWVL2 SWVL1
![Page 12: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events](https://reader036.fdocuments.in/reader036/viewer/2022081611/5f026a9d7e708231d4042b23/html5/thumbnails/12.jpg)
Performance of the Networks:Hazenet-16 with Batch Normalization(averaged over epoch 500-599):
Training accuracy = 0.999Training loss = 0.056Validation accuracy = 0.951Validation loss = 0.413
Importance of the ”hyperparameters” Overcome the overfitting
![Page 13: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events](https://reader036.fdocuments.in/reader036/viewer/2022081611/5f026a9d7e708231d4042b23/html5/thumbnails/13.jpg)
Different Network Structureson training accuracy
(H7, 1-Day)
Different Activationson training scores
![Page 14: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events](https://reader036.fdocuments.in/reader036/viewer/2022081611/5f026a9d7e708231d4042b23/html5/thumbnails/14.jpg)
A closer look at the performanceNote that #class0 >> #class1
P5 Class 1 (vis ≤ 7.72 km)Test samples (33%) = 4219
For forecast window = 0 dayLast 100 ep mean:Vacc = 0.951 (0.947)Prec = 0.575Recall = 0.236F1 = 0.330Heidke Skill Score = 0.343
F1 score or F1 = 2 x (precision x recall) / (precision + recall)HSS = ((tp + tn) - ecr)/(nsample - ecr);
ecr = ((tp + fn)*(tp + fp) + (tn + fn)*(tn + fp))/nsample
Frequency of class0 ~ the best accuracy of no-skill forecasting
Vacc = validation accuracyPrec or pre = precision
![Page 15: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events](https://reader036.fdocuments.in/reader036/viewer/2022081611/5f026a9d7e708231d4042b23/html5/thumbnails/15.jpg)
True Positive (TP)
False Positive (FP)
True Negative (TN)
False Negative (FN)
Rich patterns captured for different events:
Example: V10, class-1, true positive or TP events
What could help us to advance knowledge:
V10: Averaged patterns corresponding to different events
of class-1
What could we learn from the machine?
![Page 16: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events](https://reader036.fdocuments.in/reader036/viewer/2022081611/5f026a9d7e708231d4042b23/html5/thumbnails/16.jpg)
What could we learn from the machine?
Fwin = 0 days Fwin = 1 days
Fwin = 3 days
Fwin = 2 days
Fwin = 4 days Fwin = 5 days
Mean patterns identified for different forecast windows
![Page 17: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events](https://reader036.fdocuments.in/reader036/viewer/2022081611/5f026a9d7e708231d4042b23/html5/thumbnails/17.jpg)
TCW
T P
06/11/13 c = 0 06/14/13 c = 1 06/18/13 c = 2
MSL
Why…?
![Page 18: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events](https://reader036.fdocuments.in/reader036/viewer/2022081611/5f026a9d7e708231d4042b23/html5/thumbnails/18.jpg)
T=11 T=14 T=18
Deterministic forecasting platform
The same deterministic formula or causal relation T(t+1) = f[T(t), T(t-1),…, x1, x2, … xN]
DL forecasting platform
T=11 T=14 T=18
Predictor?
Predictor?
![Page 19: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events](https://reader036.fdocuments.in/reader036/viewer/2022081611/5f026a9d7e708231d4042b23/html5/thumbnails/19.jpg)
Process-OrientatedWRF-Chem
Task-OrientatedDeep ConvNet
Training Time 25-5 km regional domain:1d=630 or 1yr=230K core-hr
27-year daily data: < 2 hr using a Nvidia GPU
Forecasting Time N/A due to the lack of realtime emission data; otherwise same as above
Negligible
Data Preparation Initial conditions & 6-hourly boundary conditions; emissions
All “relevant” data
Code 1 million+ lines FORTRAN < 40 lines Python (using rich software libs)
Benefits Understanding detailed process connections
Identifying hidden features
You will have to deal with PARAMETERS in both platforms!
![Page 20: Exploring Deep Learning Algorithms in Forecasting …...ØSupervised learning ØTraining data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784, including 679 events](https://reader036.fdocuments.in/reader036/viewer/2022081611/5f026a9d7e708231d4042b23/html5/thumbnails/20.jpg)
Summary
• Deep CNNs have been deployed to ”forecast” severe haze in Southeast Asia - perhaps we can say that it has passed the proof-of-concept stage
• The same network has recently been modified to explore forecasting the intensive lightning activities in Corsica, results are also promising
• Deep learning algorithms can elevate our knowledge base from a few cases to cover all available samples, benefiting our science
• Challenges in using meteorological data, e.g., scale-sensitive features, rich features while still limited samples
• Current networks still produce high number of miss-classified cases (false negative)
• New algorithms and network configurations are being proposed and will be tested