Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest...

37
Machine learning on Time Series Analysis Cheng-Yu Han Solutions Challenges

Transcript of Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest...

Page 1: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

Machine learning onTime Series Analysis

Cheng-Yu HanSolutions

Challenges

Page 2: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

Investing Gambling

Page 3: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization
Page 4: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization
Page 5: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

FinLab

Capital

Age

840w

1790w

3500w

60 years old(retire)

25 years old

Income per month…..5w (NTD)

Spent per month……..3w (NTD)

Investment…………….70% (of total capital)

Compound interest….5% (each year)

Personal Life Financial Plan

Capital simulation

investing

No investing

Page 6: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

FinLab

lanOutline

Page 7: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

FinLab

lanOutline

https://www.finlab.tw/mopcon.html

Page 8: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

OutlineFinancial Data

Features

Labels

Machine LearningModels

NN

LSTM

CNN

EvaluationBacktesting

Purged K-fold

Page 9: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

Trading programming language

• Easy learning curve for the beginners• Integrated with language editor in platforms• Can be extend by external DLL

• Most of the functions are encrypted or the source code is not provided• Does not support statistic analysis or machine learning toolkit

Page 10: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

Trading programming language

• Friendly statistic toolkit • Friendly statistic toolkit• Strong community and widely applied• Easy to deploy (Flask/Django/…)• More innovative data science applications

Page 11: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

Artificial Intelligence papers

All of the papers available in the “artificial intelligence” section (arXiv)

3000

2000

1000

Page 12: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

ML algorithms in finance?

Page 13: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

Financial Data (Features)Supervised Machine Learning

Color Weight Age Category

3.2 kg 2 cat

4.2 kg 5 cat

6.2 kg 4 dog

features labels

MLModel

Training

TestingColor Weight Age

3.2 kg 2

4.2 kg 5

6.2 kg 4

MLModel

True Answer Prediction

cat cat

cat dog

dog dog

features labels

Page 14: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

Financial Data StructuresSupervised Machine LearningFinancial Data (Features)

Page 15: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

DemoFinancial Data (Features)

• Useful to combine other data types• Difficult to confirm data release date• Missing data is often backfilled• Consider multiple correction

Fundamental data

Financial Data Structures

Focusing on creating a portrait of a company

Trading dataMarket participant characteristic footprintTrading book, price, broker trading summary…etc

• Data often with timestamp• Generate extra features (ex: technical indicators)• Massive amount of data generated in one day• Some of the data is difficult to obtain

Page 16: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

Creating Technical indicators

RSI

KD

Generate technical indicators Predict price movement

Price

Price historical data

Page 17: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

Creating Technical indicatorsChallenging of Labeling the dataLabeling

Page 18: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

Challenging of Labeling the data

pric

e

time

1

0

-1

𝑝 𝑡 + 𝜏

𝑝 𝑡 − 𝜏

𝑡 + 𝑤𝑡

• 𝜏 is a constant• Do not have stop-loss limits

Fixed time horizonA popular method in the literature

𝑝 𝑡 + 𝑤

𝑝 𝑡

Page 19: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

Label Generation Methods

• Triple barrier [Prado 2018]

• Continuous trading signals [Dash 2016]

• Trading Point decision [Chang 2009]

[Prado 2018] Advances in Financial Machine Learning[Tsantekidis 2017] Using Deep Learning to Detect Price Change Indications in Financial Markets[Dash 2016] A hybrid stock trading framework integrating technical analysis with machine learning techniques[Chang 2009] Integrating a Piecewise Linear Representation Method and a Neural Network Model for Stock Trading Points Prediction

Page 20: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

Triple barriers [Prado 2018]

pric

e

time

1

0

-1

𝑝 𝑡 + 𝑤

𝑝 𝑡

𝑝 𝑡 + 𝜏!

𝑝 𝑡 − 𝜏"

• Horizontal barriers are defined by profit-taking and stop-loss limit • 𝜏! and 𝜏" are dynamic according to estimated volatility

𝑡 + 𝑤𝑡

Page 21: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

Continuous trading signals [Dash 2016]

time

pric

e

𝑝!,!#$max

𝑝!,!#$min

𝑝!,!#$min

𝑝!,!#$max0.5

1

0.5

0

• Using momentum of the stock price

• y(t)’s are continuous

• Provides more detailed information

𝑡 + 𝑤𝑡

Page 22: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

Trading point decision

• Find the local minimum and maximum points

• Divide the time series into subsegments

• Threshold value d à length of trend

d

Page 23: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

Trading point decision

• Find the local minimum and maximum points

• Divide the time series into subsegments

• Threshold value d à length of trend

Page 24: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

ML Models

Page 25: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

DemoNeural Network• Built to model the human brain• interpret numeric data through a kind of machine perception

Human neuron structure Single neuron model

y1= g (w1x1+w2x2+w0)

vg(v)

1w0

w1x1

x2

! g

w2

y1

Page 26: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

Single node in neural network

! g y1

1w0

w1x1

x2

w2

Neural Network

Page 27: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

Simplified expression

y1

1w0

w1x1

x2

w2

Neural Network

Page 28: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

A layer contain multiple neurons

1

x1

x2

y2

y1

y3

y4

Neural Network

Page 29: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

Multi-layer deep neural network

1

x1

x2

y2

y3

y1

Deep Neural Network

Page 30: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

w1

w2

Cost function

Neural Network Optimization

Page 31: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

Deep Neural Network Training Result2018-1-1 2019-7-1

Train

2006 ~ 2014 2016 ~ 2019-3-1

Validate

2015

Backtest

FeaturesScaled Technical Indicators

Asset

Data split

LabelsFixed time horizon

Taiwan CapitalizationWeighted Stock Index

benchmark

backtest

Page 32: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

Model Interpretation

Page 33: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

Deep Neural Network Training Result

https://arxiv.org/pdf/1602.04938.pdf

• Explain the predictions of your machine learning models• Approximate the predictions of the underlying black box model• Focuses on training local surrogate models to explain individual predictions

Page 34: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization
Page 35: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

DemoBacktest

• Survivor bias, lookahead bias, transection cost, outlier, overfitting• Finding the lottery tickets that won the last game• Solutions• Develop model for entire asset or classes• Use Bootstrap aggregating• Record every backtest conducted• Resist the temptation of reusing a failed strategy

Page 36: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

Conclusion

Financial DataFeatures

Labels

Machine LearningModels

Evaluation

NN

LSTM

CNN

Backtesting

Purged Validation

Machine Learning

Investing

Life Plan

Investing or gambling ?

Capital

Age

840w

1790w

3500w

60 years old25 years old

Page 37: Challenges Machinelearningon TimeSeriesAnalysis2006~2014 2016~2019-3-1 Validate 2015 Backtest Features ScaledTechnicalIndicators Asset Datasplit Labels Fixedtimehorizon Taiwan Capitalization

FinLab