Download - Time Series Forecasting by Using Wavelet Kernel SVM

8/10/2019 Time Series Forecasting by Using Wavelet Kernel SVM

1/52

TIME SERIES FORECASTINGBY USING

WAVELET KERNEL SUPPORT VECTOR MACHINES

By Ali Habibnia ([email protected]) LSE Time Series Reading Group


2/52

Outline

! Introduction to Statistical Learning and SVM

! SVM & SVR Formula

!

Wavelet as a Kernel Function

! Study 1: Forecasting volatility based on wavelet support vector

Written by Ling-Bing Tang, Ling-Xiao Tang, Huan-Ye Shen

! Study 2: Forecasting Volatility in Financial Markets By Introducin Assisted SVR-Garch Model,

Written by Ali Habibnia

! Suggestion for further research + Q&A

2


3/52

SVMs History

! The Study on Statistical Learninstarted in the 1960s by Vladimwell-known as a founder (togeAlexey Chervonenkis) of this th

! He has also developed the thevector machines (for linear and

output knowledge discovery) instatistical learning theory in 19

! Prof. Vapnik has been awardeBenjamin Franklin medal in ComCognitive Science from the Fra

3


4/52

History and motivation

! SVMs (a novel ANN) is a supervised learning algorithm for!

Pattern Recognition

! Regression Estimation Non Parametric (Applications for function estimation started ~ Support Vector Regression)

! Remarkable characteristics of SVMs

! Good generalization performance:SVMs implement the Structural Risk Minimiz

which seeks to minimizethe upper bound of the generalization errorrather than

the training error.

! Absence of local minima:Training SMV is equivalent to solving a linearly constr

quadratic programming problem. Hence the solution of SVMs is uniqueand glo

! It has a simple geometrical interpretation in a high-dimensional featthat is nonlinearly related to input space

4


5/52

The Advantages of SVM(R)

! Based on a strong and nice Theory:

In contrast to previous black box learning approaches, SVMs allow for somand human understanding.

! Training is relatively easy:No local optimal, unlike in neural network

Training time does not depend on dimensionality of feature space, only on fixspace thanks to the kernel trick.

! Generally avoids over-fitting:Trade-off between complexity and error can be controlled explicitly.

! Generalize well even in high dimensional spaces under small traininconditions. Also it is robust to noise.

5


6/52

Linear Classifiers

! g(x) is a linear function:

( ) Tg b= +x w x

x2

wTx + b < 0

wTx + b > 0

" A hyper-plane in the feature

space

" (Unit-length) normal vector of the

hyper-plane:

=

w

n

w

n


7/52

! How would you classify these

points using a linear discriminant

function in order to minimize the

error rate?

! Infinite number of answers!

Linear Classifiers

x2


8/52




error rate?


Linear Classifiers

x2


9/52




error rate?


Linear Classifiers

x2


10/52

x2

! Which one is the best?

Linear Classifiers


11/52

Large Margin Linear Classifier

safe zone

! The linear discriminant function

(classifier) with the maximummarginis the best

! Margin is defined as the widththat the boundary could be

increased by before hitting adata point

! Why it is the best?

# Robust to outliners and thus

strong generalization ability

x2


12/52

Large Margin Linear Classifier

safe zone

x2! Given a set of data points:

" With a scale transformation on

both wand b, the above is

equivalent to

For 1, 0

For 1, 0

T

i i

T

i i

y b

y b

= + + >

= ! +