INFORMS 2015

Post on 14-Feb-2017

356 views 0 download

Transcript of INFORMS 2015

1

INFORMS PhiladelphiaNovember 2015

Bin Weng ( Email: bzw0018@auburn.edu)Ph.D. Candidate of Industrial and System Engineering

Mohamed A. Ahmed (Email: mza0068@auburn.edu)M.S. Candidate of Industrial and System Engineering

Fadel M. Megahed (Email: fmegahed@auburn.edu)Assistant Professor of Industrial and System Engineering

Stock Market Prediction Using Disparate Data Sources

2Stock Market Prediction Why?• The stock market is

one of the most important way for companies to raise money.• About 48% Americans

invested in the stock market as 2015 (CNBC).• The successful

prediction of a stock’s future price could yield significant PROFIT.

3Stock Market Prediction How?

Guess? Fundamental Analysis

Technical Analysis (Charting) Technological Methods

4Stock Market Prediction

Ray Dalio’s $165B Bridgewater Associates will start a new artificial-intelligence unit to use predictive analysis for trades. (Bloomberg, 2015)

5Related WorksPaper Index Selected Papers

[1] Predicting Financial Markets: Comparing Survey,News, Twitter and Search Engine Data

[2] A fusion model of HMM, ANN and GA for stock market forecasting

[3] Twitter mood predicts the stock market

[4] Stock Market Prediction System with Modular Neural Networks

[5]Empirical evaluation of an automated intraday stock recommendationsystem incorporating both market data and textual news

[6] A Hybrid Machine Learning System for Stock Market Forecasting

[7]Market Index and Stock Price Direction Prediction using MachineLearning Techniques: An empirical study on the KOSPI and HSI

[8] Stock Market Prediction Using Disparate Data Sources (Proposed)

6Related WorksPape

r

Data ModelTarget

Type of Stock

Market Data

Technical Indicator

Social Media News

Secondary

VariableTime

SeriesLogistic

Regression

Decision Trees

Neural Networks

Support Vector

Machines

IT IndexMix of

companies

[1]             Price Volume  

   

[2]                   Price    

[3]               Movement    

[4]               

  Buy and sell signal  

   

  

[5]       Price Volume    

[6]                 Movement    

[7]                 Movement  

[8]   Movement  

7Research Motivation Which sources of data have the most correlation with the stock market time series?

Which logical target has the best prediction capability with regards to the stock movement?

Which technological model is best at predicting the stock movement?

Can we construct a better model using disparate data sources?

8Data Sources

9Process Overview

10Data Sources Social Media and Internet Data• “Financial news articles play a large role in

influencing the movement of a stock as humans react to the information.” (M. Nardo etc. 2015)

• “Data on changes in how often financially related Wikipedia pages were viewed have contained early signs of stock market moves.” (H. Moat etc. 2013)• Blog communication exhibits

remarkable predictive power. (M. Choudhury etc. 2008)

11Data Sources Secondary Variables

• The data from Social Media and Internet always have high variability (e.g. Moving Average, Momentum, Relative Strength Index).

• If the upward or downward movement in predicting variables had an effect on the target movement?

• What range of the primary variables have predicting power over the targets?

1/2/20

14

1/8/201

4

1/14/2

014

1/20/2

014

1/26/2

014

2/1/20

14

2/7/201

4

2/13/2

014

2/19/2

014

2/25/2

014

3/3/201

4

3/9/20

14

3/15/2

014

3/21/2

014

3/27/2

014

4/2/201

4

4/8/20

14

4/14/2

014

4/20/2

014

4/26/2

014

5/2/20

14

5/8/20

14

5/14/2

014

5/20/2

014

5/26/2

014

6/1/20

14

6/7/20

14

6/13/2

014

6/19/2

014

6/25/2

0140

500100015002000250030003500

Google News & Blogs

12Target Matrix

Target Type Method

1 Open (i+1) – Close (i)2 Open (i+1) – Open (i)3 Close (i+1) – Close (i)4 Close (i+1) – Open (i)5 Volume of trades moves as previous day

13Data Fusion

14Feature Selection• Simplification of model• Shorter training times• Improve accuracy• Enhanced generalization by reducing overfitting

15Feature Selection Chord Diagram

16Feature Selection Method : Recursive feature elimination (RFE)

Coding : Python with multiple feature selection package Pseudo Code of RFE

* Code is available on https://github.com/binweng/SFS

17Feature SelectionTarget Variables

Target 1

Close Open High Low P/E RatioWiki_3_day_disparity Wiki_5_day_disparity Wiki_10_day_disparity Wiki_Momentum_1 Wiki_ROC

Google_MA_5 Google_EMA_3 Google_3_Day_disparity

Google_5_day_disparity RSI

Stochastic Ocillater Wiki_RSI Google_MA_4 William %R Google_MA_3

Target 2

Close Open High Low P/E Ratio

Wiki_5_day_disparity Wiki_Move Wiki_MA3_Move Wiki_EMA5_Move Wiki_5day_disparity_Move

Google_EMA5_Move Google_3day_disparity_Move Google_ROC_Move Google_RSI_Move Wiki_3_day_disparity

Stochastic Ocillater RSI_Move Wiki_RSI_Move Google_MA_6 Google_Move

Target 3

Close Open High P/E Ratio Stochastic_MoveWiki_Monentum_1 Wiki_Move Wiki_MA3_Move Wiki_EMA5_Move Wiki_ROC_Move

Google_EMA5_Move Google_3day_disparity_Move Google_ROC_Move Google_RSI_Move Wiki_10_day_disparity

RSI_Move Wiki_RSI_Move Wiki_3_day_disparity Google_Move Google_MA5_Move

Target 4

Close Open High Low P/E RatioRSI_Move Wiki_10_day_Disparity Wiki_Move Wiki_MA3_Move Wiki_EMA5_Move

Google_Move Google_3day_disparity_Move Google_ROC_Move Google_RSI_Move William %R

Stochastic Ocillater Stochastic_Move Wiki_3day_disparity_Move Wiki_ROC_Move Wiki_RSI_Move

Target 5

Close Open High Low William %RWiki_Monentum_1 Wiki_RSI Google_MA_2 Google_MA_3 Google_MA_4

Google_MA_9 Google_3_day_disparity

Google_5_day_disparity

Google_10_day_disparity Wiki_10_day_disparity

Wiki_3_day_disparity Wiki_5_day_disparity Google_MA_6 Google_MA_7 Google_MA_8

18Model Comparison

19Model Comparison

Source: http://scikit-learn.org/stable/tutorial/machine_learning_map/

20Model Comparison

21Experimental Result

Paper 1 – B. Nair etc., 2010 Paper 2 – A. Chen, 2003

22Experimental Result• Comparison of Model Accuracy by information

input

23Experimental Result• Evaluate the model using AUC

24Experimental ResultTarget Coincidence Matrix for SVM

Target1Training 0 1 Testing 0 1

0 55 113 0 60 951 27 229 1 34 183

Target2Training 0 1 Testing 0 1

0 160 28 0 156 391 37 180 1 32 164

Target3Training 0 1 Testing 0 1

0 147 46 0 164 321 30 172 1 31 174

Target4Training 0 1 Testing 0 1

0 150 31 0 165 341 34 172 1 31 179

Target5Training 0 1 Testing 0 1

0 177 29 0 183 371 130 61 1 125 54

25Target Matrix

Target Type Method

1 Open (i+1) – Close (i)2 Open (i+1) – Open (i)3 Close (i+1) – Close (i)4 Close (i+1) – Open (i)5 Volume of trades as previous day

26Evaluation 10 – fold cross validation

27Evaluation Cross validation result

28Evaluation

Accuracy: 82% - 89%

29Moving Prediction

30Conclusion

• Disparate sources of data help predict the stock market.

• Multiple targets’ prediction results can be used in conjunction to successfully track stock market movements.

• Decision tree model and support vector machine model perform the best interchangeably with different combinations of input data.

• With all the types of input data, SVMs performed best.

31Future Work• Identifying and adding into a more inclusive form

of this model, new sources of data that have a predictive effect on the movement of the stock market, like twitter sentiment and market news textual analysis.• Include linguistic modeling, clustering, and

controlling methods like fuzzy theory in obtaining the predictions of price range.

Fuzzy Membership Function

Fuzzy System

32

INFORMS PhiladelphiaNovember 2015

Bin Weng ( Email: bzw0018@auburn.edu)Ph.D. Candidate of Industrial and System Engineering

Mohamed A. Ahmed (Email: mza0068@auburn.edu)M.S. Candidate of Industrial and System Engineering

Fadel M. Megahed (Email: fmegahed@auburn.edu)Assistant Professor of Industrial and System Engineering

Stock Market Prediction Using Disparate Data Sources