Post on 31-Dec-2015
1
Machine Learning for Stock SelectionMachine Learning for Stock Selection
Robert J. YanRobert J. YanCharles X. LingCharles X. Ling
University of Western Ontario, CanadaUniversity of Western Ontario, Canada{jyan, cling}@csd.uwo.ca{jyan, cling}@csd.uwo.ca
2
OutlineOutline
IntroductionThe stock selection taskThe Prototype Ranking methodExperimental resultsConclusions
3
IntroductionIntroduction
Objective: – Use machine learning to select a small number
of “good” stocks to form a portfolio
Research questions:– Learning in the noisy dataset– Learning in the imbalanced dataset
Our solution: Prototype Ranking– A specially designed machine learning method
4
OutlineOutline
IntroductionThe stock selection taskThe Prototype Ranking methodExperimental resultsConclusions
5
Stock Selection TaskStock Selection TaskGiven information prior to week t, predict
performance of stocks of week t– Training set
Predictor 1 Predictor 2 Predictor 3 Goal
Stock ID Return of week t-1
Return of week t-2
Volume ratio of t-2/t-1
Return of week t
Learning a ranking function to rank testing data– Select n highest to buy, n lowest to short-sell
6
OutlineOutline
IntroductionThe stock selection taskThe Prototype Ranking methodExperimental resultsConclusions
7
Prototype RankingPrototype Ranking
Prototype Ranking (PR): special machine learning for noisy and imbalanced stock data
The PR SystemStep 1. Find good “prototypes” in training dataStep 2. Use k-NN on prototypes to rank test data
8
Step 1: Finding PrototypesStep 1: Finding Prototypes
Prototypes: representative points– Goal: discover the underlying
density/clusters of the training samples by distributing prototypes in sample space
– Reduce data sizeprototypes
prototype neighborhood
samples
10
Finding prototypes using competitive learning
General competitive learning Step 1: Randomly initialize a set of prototypes Step 2: Search the nearest prototypes Step 3: Adjust the prototypes Step 4: Output the prototypes
Hidden density in training is reflected in prototypes
11
Modifications for Stock dataModifications for Stock data
In step 1: Initial prototypes organized in a tree-structure– Fast nearest prototype searching
In step 2: Searching prototypes in the predictor space– Better learning effect for the prediction tasks
In step 3: Adjusting prototypes in the goal attribute space– Better learning effect in the imbalanced stock data
In step 4, prune the prototype tree– Prune children prototypes if they are similar to the parent– Combine leaf prototypes to form the final prototypes
12
Step 2: Predicting Test DataStep 2: Predicting Test Data
The weighted average of k nearest prototypesOnline update the model with new data
13
OutlineOutline
IntroductionThe stock selection taskThe Prototype Ranking methodExperimental resultsConclusions
14
DataData
CRSP daily stock database– 300 NYSE and AMEX stocks, largest market cap– From 1962 to 2004
15
Testing PRTesting PR
Experiment 1: Larger portfolio, lower average return, lower risk – diversification
Experiment 2: is PR better than Cooper’s method?
16
Results of Experiment 1Results of Experiment 1
00. 20. 40. 60. 8
11. 21. 41. 61. 8
0 10 20 30 40 50 60 70 80 90 100 110Stock Number i n Port f ol i o
Wee
kly
Ave
rage
Ret
urn
(%)
2
2. 5
3
3. 5
4
4. 5
5
0 10 20 30 40 50 60 70 80 90 100 110
Stock Number i n Por t f ol i o
Weekly
Std
.(% )
Average Return(1978-2004)
Risk (std)(1978-2004)
17
Experiment 2: Comparison to Experiment 2: Comparison to Cooper’s methodCooper’s method
Cooper’s method (CP): A traditional non-ML method for stock selection…
Compare PR and CP in 10-stock portfolios
18
Results of Experiment 2 Results of Experiment 2 Measures:
Average Return (Ret.)
Sharpe Ratio (SR): a risk-adjusted return: SR= Ret. / Std.
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Ret.(%) SR
PR 10-stock portfolio
CP 10-stock portfolio
20
OutlineOutline
IntroductionThe stock selection taskThe Prototype Ranking methodExperimental resultsConclusions