1
Cascading Investment Strategy Building via Multiview Learning
1.
2.
Abstract
Integrating the domain-specific knowledge in finance with methods of big data analytics
and artificial intelligence to contribute to financial research is an important issue, which has
become a focal point of concern for both academia and industry. This paper proposes a novel
two-stage machine learning strategy building method, which has several advantages such as
decreasing the modelling cost and improving the explanation power. We conduct the empirical
analysis to test our method based on five quantitative strategies discussed in the literature or
applied in practice using data from US and Chinese stock market. The empirical results show
that our method improves the performance dramatically for all testing strategies with an
increase of success rate about 6% to 17% and average return about 1.4 to 6.89 times of trading
signals. In addition, the results reveal that multi-view learning method is better than
single-view methods, and can provide some explanatory power for understanding the
information contribution to strategy.
Keywords: multi-view learning; intelligent decision; strategy optimization
JEL classification: C55; C61; G11
Paper #850341
2
1. Introduction
As statement in the report of J.P. Morgan, the great changes in the marketplace are
the increasing adoption of quantitative investing techniques, including developing new
quantitative trading strategies, utilizing increasing amounts and different types of data,
and adopting new methods such as machine learning and artificial intelligence
(Kolanovic and Krishnamachari, 2017). Taking use of the big data analysis and
machine learning methods puts high professional requirement for currently existing
researchers and analysts in the market. There is a big gap between new methods
mentioned above and the traditional theoretical and empirical methods broadly used in
financial market. Thus, we can observe that these two kinds of research methods are
used by different groups of researchers, i.e., the new methods are used by the
researchers who know the big data analysis methods but don’t understand financial
theory, while the traditional methods are used by the financial experts who may not
know big data analysis and machine learning methods very well. For example, machine
learning and artificial intelligence are usually used to forecast highly volatile financial
time series, such as the stock and futures prices and returns series and their volatility,
because artificial neural networks are good nonlinear function approximates. Recurrent
neural networks are usually used in the early works to predict stock prices (Kamijo and
Tanigawa,1990) and price volatility (Hamid and Iqbal, 2004), while self-organizing
fuzzy neural network (Bollen et al, 2011), deep recurrent neural networks, long
short-term memory (LSTM) neural networks and other new methods are studied how to
apply to financial market such as the work of Xiong et al. (2015) and Li, Bu and Wu
Paper #850341
3
(2017), since they have shown remarkable results in tasks such as artificial handwriting
generation, language forecasting and speech recognition (Li, Bu and Wu, 2017). We
find most of this kind of works only take use of machine learning and artificial
intelligence methods but rarely depends on the financial theories. On the other side,
most of the papers published in classic financial journals seldom adopt these new
methods. With the quick development of big data analytics and artificial intelligence
methods, how to integrate the domain-specific knowledge in finance and methods of
big data analytics and artificial intelligence is still an important issue for both academia
and industry.
To achieve this goal of utilizing increasing amounts and different types of data and
adopting new methods, researchers and analysts tend to develop more and more
complex quantitative models. For example, there are a big part of hybrid models and
integrated methods in forecasting area (Lin et al., 2012; Wang et al. 2012; among
others), especially when learning methods are included in. An extraordinarily complex
model not only puts high professional requirement for researchers and analysts, who
must have the domain-specific knowledge and understanding the complex models in
big data analysis and artificial intelligence, but also increases the modeling and
computational costs. Besides, a complex model with specific artificial intelligence
methods have weaker explanatory power than the easy models such as statistical and
econometric models. Therefore, how to better and more rational use of new methods to
avoid the problems caused by large amounts of data and complex models on the one
hand and improve the explanatory power of the models and provide more insights on
Paper #850341
4
theory on the other hand is a topic worth exploring.
This paper investigates the method that can solve the above issues and adopts
quantitative investment strategies development as specific case to make discussion.
Instead of bringing together all the information in a single very complex model, we
propose a novel method, i.e., two-stage multi-view learning investment strategy
building method. The general idea of this method is that by separating the domain
knowledge and different models or methods in two stages. In the first stage, researchers
and analysts can develop an investment strategy based on any theory or model from
their domain; in the second stage, the developed strategy is improved by multi-view
learning method proposed by this paper based on a large set of data, which reflects the
domain-specific knowledge, the theoretical and empirical results in finance.
There are several advantages of our proposed method. First, the separation of
modelling process makes the methods or models in each stage independent on each
other, and thus easy especially in the first stage. The researcher or analysts can set up
any single model in the first stage based on domain-specific knowledge and theoretical
models in financial economics and financial market or data-driven models like
statistical methods or even machine learning models that you at good at. Second, much
more valuable information is brought into models in the second stage according to the
domain knowledge and other feature extraction methods such as data mining, text
mining and so on. Thus, this method can break the limitation of data processing ability
of some specific models such as econometric models in the first stage. Third, the
proposed multi-view learning method in the second stage can contribute more to the
Paper #850341
5
improvement of investment strategy than the methods that aggregate all large scale of
information in single view. Fourth, our method can provide some explanatory power
for what kinds of information are important for specific strategies.
To examine the efficiency of our method, we choose five different types of
strategies in the empirical analysis, including candlestick charting strategy, pair
trading strategy, technical indicator strategy, statistical analysis strategy, and
forecasting strategy. The first two come from the prior literatures using the data from
US stock market, and the latter three are developed by ourselves according to
technical analysis, statistical analysis and time series forecasting model using the data
from Chinese stock market. In the empirical analysis, we compare the results of our
method with other benchmark methods according to the investment performance. We
compare the result of strategy developed using our model, i.e., the final results after
two stage, with that of strategy developed only based on domain knowledge, i.e., the
strategy in the first stage. The empirical results show that our method improves the
performance dramatically for all testing strategies with an increase of success rate
about 6% to 17% and average return about 1.4 to 6.89 times for trading signals.
Moreover, by comparing our multi-view learning method with single-view methods
using logistic regression model and random forest algorithm, our empirical results
confirm that multi-view learning method contributes more to investment strategy
building. Finally, our method can provide some explanatory power for understanding
which kinds of information contribute to the improvement of investment strategy.
The rest of this paper is organized as follows. Section 2 reviews the literature
Paper #850341
6
about active investment strategy that are the domain knowledge in finance for the first
stage of our method, and the literature on multi-view learning method in recent years,
which is related to our second stage method. Section 3 introduces the new proposed
framework of two-stage multi-view learning method for strategy building. Section 4
discusses the multi-view learning method. Section 5 provides the empirical results of
our method. Section 6 is the conclusion.
2. Literature review
2.1 Active investment strategy
Most investment strategies are designed to take use of price patterns that are
discovered by different financial theories and models, such as the capital asset pricing
model (CAPM) (Sharpe, 1964; Lintner, 1965), arbitrage pricing model (APT) (Ross,
1976), the Fama-French three-factor and five-factor asset pricing models (Fama and
French, 1993, 1996, 2015). To investigate the pricing mechanism of securities, a lot of
researches discuss the risk factors or anomalies in financial market. The
cross-sectional risk factors include size and value effects (Fama and French, 1993,
1996, 2015), profitability and investment patterns in average stock returns (Fama and
French, 2015), liquidity factor (Pástor and Stambaugh, 2003), etc. Return reversals
and momentum are two of the most studied capital market phenomena in the literature
that can be examined through the profits of zero net investment portfolios using well
diversified “winners” or “losers” portfolios. Return reversals are usually explained by
market overreaction to information, while momentum effects are explained by
Paper #850341
7
underreaction to information. Many literatures study the price patterns in US stock
markets, for example, DeBondt and Thaler (1985) reported return reversal over long
horizon, Lehmann (1990), Lo and MacKinlay (1990) and Jegadeesh (1990) reported
short-term reversals at daily, weekly, and monthly levels, while Jegadeesh and Titman
(1993) reported momentum effect for 3- to 12-month holding period. Some recent
literatures provide more evidence in different market and asset classes. Asness et al.
(2013) examined the value and momentum effect jointly across eight different
markets and asset classes and revealed value and momentum in government bonds
and value effects in currencies and commodities. Furthermore, recent literatures
propose time series momentum or trend effect. Moskowitz et al. (2012) proposed that
time series momentum can persist about a year in equity index, currency, commodity,
and bond futures. Han, Zhou and Zhu (2016) provided a trend factor that captures
simultaneously the short-, intermediate-, and long-term stock price trends, and
showed that this trend factor outperforms substantially the well-known short-term
reversal, momentum, and long-terms reversal factors separately.
There are several branches for active strategies, including fundamental analysis,
technical analysis, statistical arbitrage, and strategies based on time series forecasting
models, etc. These strategies are still widely used in financial market, due to the
existence of market inefficiency and the effect of investors’ not completely rationality.
Technical analysis, also known as "charting", has been a part of financial practice for
many decades. Because of the highly subjective nature of technical analysis, this
discipline has not received the same level of academic scrutiny and acceptance as more
Paper #850341
8
traditional approaches such as fundamental analysis. In rejecting the Random Walk
Hypothesis, technical analysis may well be an effective means for extracting useful
information from market prices. Lo and MacKinlay (1988) among other papers have
shown that past prices may be used to forecast future returns to some degree, which
provides support for technical analysts. Lu et al. (2015) investigated what determines
the profitability of candlestick trading strategies based on the results of the profitability
of candlestick trading strategies confirmed by Caginalp and Laurent (1998), Lu et al.
(2012), Lu (2014), etc. Lo, Mamaysky, and Wang (2000) proposed a systematic and
automatic approach for technical pattern recognition using nonparametric kernel
regression, and they found that several technical indicators do provide incremental
information and may have some practical value in US stock market. Some papers try to
adopt artificial intelligence methods to improve the performance of technical analysis.
For example, Refenes (1995) proposed the genetic-based global learning method in a
FX trading system in order to find the best combination of technical indicators for
prediction and trading. Dunis and Zhou (1998) adopted genetic algorithms to optimize
parameters and develop a FX trading system for a simple technical trading indicator,
RSI.
Statistical arbitrage strategies are developed to take use of the properties of stock
prices and returns, such as the momentum and value effects (Hogan et al. 2004), the
mean-reverting process (Elliot et al., 2005; Avellaneda and Lee, 2008; Focardi et al.,
2016) and so on. Focardi et al. (2016) introduced a new statistical arbitrage strategy
based on dynamic factor models, which exploits the mean-reverting properties of prices.
Paper #850341
9
Pair trading is one kind of popular statistical arbitrage strategies for short-term
speculation. There are two main issues in implementing a pairs trading strategy: One is
identifying those assets whose prices move close together, and the other one is
determining the position and timing to form a trading strategy. Several papers describe
trading strategies and methods for identifying suitable pairs. Gatev, Goetzmann,
Rouwenhorst (2006) presented a detailed analysis of the performance of pairs trading
strategies within a global framework of cointegration and find evidence of pairs trading
profit. They link the profitability to the presence of a common factor in the returns,
different from conventional risk measures.
Another type of investment strategies is formed according to forecasting results.
The forecasting methods have developed from simple linear model, e.g., the widely
used autoregressive integrated moving average (ARIMA) models, to hybrid models, for
example, the hybrid model of ARIMA and artificial neural networks, then to integrated
forecasting methods, and learning methods. Nyberg (2013) employed dynamic Probit
models to predict the U.S. bear and bull stock markets with a monthly dataset, and the
investment strategy based on it was proved to yield higher portfolio returns compared
with the buy-and-hold trading strategy in a small-scale market timing experiment. Luo
and Chen (2013) proposed a model combining a piecewise linearization algorithm and
weighted support vector machine (SVM) algorithm to predict the trading signal of
stock in Shanghai Exchange. Lin et al. (2012) employed empirical mode
decomposition (EMD) in least square support vector regression (LSSVR) algorithm to
forecast foreign exchange rate. Wang et al. (2012) presented an integrated approach
Paper #850341
10
which contains exponential smoothing model, ARIMA, back propagation neural
network and genetic algorithm to predict indexes both in China and US.
2.2 Multi-view learning methods
Learning methods have been used in financial market. Nevmyvaka et al. (2006)
presented an empirical application of reinforcement learning to the problem of
optimized trade execution in modern financial markets. Das and Banerjee (2011)
proposed the meta-learning algorithm (MLA), which combines the portfolio vectors
for the coming period generated by each base expert to form a final portfolio, as an
online portfolio selection method for a fund of funds.
In recent years, more and more scientific data analytics problems collect data
from multiple sources so that multi-viewing learning methods have been proposed.
One of the earliest schemes of multi-view learning is co-training which trains
alternately to maximize the consistency between two distinct views of the unlabeled
data (Blum and Mitchell, 1998). Many variants like co-expectation-maximization
(co-EM) (Nigam and Ghani, 2000; Brefeld and Scheffer, 2004), co-regularization
(Sindhwani et al., 2005; Sindhwani and Rosenberg, 2008), co-regression (Zhou and Li,
2005a; Brefeld et al., 2006), co-clustering (Bickel and Scheffer, 2004; Kumar et al.,
2011; Kumar and Daum e III, 2011) and graph-based co-training (Yu et al., 2011)
have been developed and they have been mainly applied to solve the problems in
natural language processing and computer vision. Besides co-training, multiple kernel
learning and subspace learning-based approaches are also important types of
multi-view learning method (Xu et al., 2013). Multiple kernel learning puts up a good
Paper #850341
11
performance in problems of object classification (Lin et al., 2007; Varma and Ray,
2007), object detection (Longworth and Gales, 2008) and object recognition
(Kembhavi et al., 2009). Subspace learning also does well in the facial expression
recognition problem (Dhillon et al., 2011) and sensing image classification problem
(Zhang et al., 2012).
Furthermore, researchers combine multi-view learning with multi-task learning to
form a new learning paradigm to explore both task relatedness and view relatedness
(Zhang et al., 2013; Liu et al., 2016b; Zheng et al., 2015). Overall, multi-view learning
outperforms single-view learning in many problems to some extent and it has better
generalization ability, which make multi-view learning more promising learning
paradigm.
3. Framework of two-stage machine learning method for strategy building
In this section, we discuss the proposed the two-stage machine learning method
for investment strategy building by this paper, which can totally separate the initial
investment strategies design and development part and the strategy improvement part.
In the first stage, develop an investment strategy and generate the initial trading signals.
You can form any investment strategy using any specific theory, model or price pattern
based on the knowledge from any area that you know and good at. For example, if you
are expert in finance, you can take use of the financial models such as CAPM, APT,
factor models in the literature; if you are good at statistics and econometric models, you
can develop a trading strategy according to statistical arbitrage or forecasting models; if
you are good at technical analysis, you can form a trading strategy according to some
Paper #850341
12
technical indicators. In the second stage, optimize the trading signals obtained from the
first stage and improve the strategy using artificial intelligence methods. We propose a
multi-view learning method for the second stage in this paper. The initial trading
signals formed in the first stage are the inputs of the multi-view learning predict
models in the second stage. The method in the second stage is actually a prediction and
optimization for trading signals, so that we can select and keep good signals that would
make profit and discard bad signals that would make loss. To improve the prediction
accuracy of trading signals whether it is profitable or not, we bring in a variety of
information. According the property of information, we classify the information into
different groups, and adopt the multi-view learning method to optimize the trading
signals and improve the performance of initial strategy. The multi-view learning
method in the second stage for our method framework can be divided into several
sub-frames, including view construction and the multi-view learning method for
strategy optimization. Figure 1 presents the framework of our approach.
Paper #850341
13
Figure 1: The framework of two-stage strategy optimization method
The advantages of our two-stage machine learning method for strategy
development are as follows. First, the separation of strategies development process and
performance optimization process has benefits from not only the computational cost
but also the human resource cost. This two-stage framework for investment strategy
development is much easier in each stage than very complex models that try to bring all
information together, so that we can break through the limitation of data processing
ability of some specific models. Also, each model development in each stage can be
accomplished by different person, which means that it reduces the professional
requirement for strategy development, and hence the human resource cost. Second, the
two-stage framework of investment strategy optimization can be applied to any active
strategy development and optimization, including the very simple and common
strategies in practice, some academic models in literatures, and even private investment
Paper #850341
14
strategies developed by industry companies that we even don’t know exactly the core
information of the strategy. The methods in the second stage is independent on that used
in the first stage, therefore, the application value of two-stage method increases a lot.
Third, we can add a lot of valuable information at the second stage no matter what
strategy is developed in the first stage, and bring the advantage of big data analysis
through the second stage. Fourth, bring in artificial intelligence models in the second
stage to obtain their advantage of high predictive accuracy to improve the performance
of strategy, while the method proposed in this paper still maintain the advantage of
explanatory power about why the second stage can improve the performance of initial
strategy. The explanatory power is very important, which can improve the
understanding of the initial strategy and shed light on the related risks of the strategy.
4. Multi-view learning method
4.1 Multi-view construction
To solve the investment strategy improvement issue, we adopt co-training style
algorithms for the second stage. The co-training style algorithms, which is one of the
earliest schemes for multi-view learning firstly introduced by Blum and Mitchell
(1998), have several good properties that are very suitable for our framework and
method for investment strategy optimization. For example, as Xu, Tao and Xu(2013)
stated, the co-training style algorithms usually train separate learners on distinct views,
which are then forced to be consistent across views, and this kind of approach can be
regarded as a late combination of multiple views. This property is very suitable for
Paper #850341
15
multi-view learning in the second stage, because it can separate the task of views
construction and views combination (learning algorithm) very easily and clearly,
which is very important for our method framework.
The main assumptions of co-training style algorithms summarized in Xu, Tao and
Xu (2013) include: (a) Sufficiency, each view is sufficient for classification on its own;
(b) Compatibility, the target functions in all views predict the same labels for
co-occurring features with high probability, and (c) Conditional independence: the
views are conditionally independent given the class label. The conditional
independence assumption plays a critical role, but it is usually too strong to be
satisfied in practice and several weaker alternatives have thus been considered. Abney
(2002) put forward that the weak dependence alone can lead to successful co-trading.
Wang and Zhou (2007) also showed that when the diversity among the all learners
based on each view is large, the performance of the learner can be improved by
co-training style algorithms. Moreover, the complementary principle of the
multi-view learning states that in a multi-view setting, each view of the data may
contain some knowledge that other views do not have; therefore, multiple views can
be employed to comprehensively and accurately describe the data.
In this paper, we propose a new way for view construction to solve the strategy
performance improvement problem, which is also based on the consideration of above
assumptions, especially the assumptions of sufficiency, weak dependence/diversity,
and complementary principle. The idea of view construction in this paper is as follows.
First, we try to collect all the related and fundamental information that can affect the
Paper #850341
16
trading strategies in financial market. Second, we divide all information into groups
according to their property, and each group is a view in multi-view learning method.
From the literatures and practical of fundamental analysis and technical analysis for
active strategies, we know there is a well-used way to do securities analysis, top down
or bottom up. No matter we adopt top down or bottom up approach for stock analysis,
for example, we should always include the information of individual stocks, the
sectors and the industries, the markets and the economy to make better understanding
about the individual stocks and stock market. Also, for the development of trading
strategies, stock selection, timing and asset allocation are very important process that
can influence the final performance. Top down and bottom up approach can do much
about stock selection and even timing problem. We still need some risk measures to
contribute to asset allocation. Therefore, we can form view construction for stock
market investment strategies according to securities analysis as the view of individual
stocks, the view of sectors and industries, the view of market and economy, and the
view of risks. The view construction can be adjusted a little to be more suitable for
different trading strategies. For example, for pair trading strategy, the relationship
between the pair of stocks is much more important than the sector or industry
information. Therefore, we can adjust the view construction as the view of target
individual stocks (TV), the view of associated individual stocks (AV), the view of
markets (MV), and the view of risks (RV). We can include the quotes information, the
historical trend information measure by technical indicators for all the view of
individual, sector, and market, and the valuation information according to financial
Paper #850341
17
statements can also be added to the view of individual stocks.
Because each view can add up new information for the original trading strategy
that may depend on some special patterns or anomalies of stock prices, the view
construction introduced here satisfies sufficiency assumption. Moreover, because we
construct the views according to different levels of information, therefore, the
assumptions of weak dependence or diversity and complementary principle are
satisfied.
Compared to the most widely used existing classes of view construction methods
used in multi-view learning to decomposes the original set of features into multiple
disjoint subsets to construct each view, such as random approaches ((Brefeld et al.,
2005; Bickel and Scheffer, 2004; Brefeld and Scheffer, 2004; Tao et al. 2006),
reshaping or decomposing algorithms (Wang et al. 2011), feature set partitioning
automatically approaches (Chen et al. 2011), our view construction approach
incorporates the area knowledge of financial economics. This is in line with our idea
of integrating theories and methods in finance, big data and artificial intelligence.
Moreover, there are significant differences between multi-view feature selection and
single-view feature selection, because view construction approach can bear some
connections with the mature feature selection algorithms. In multi-view feature
selection, the relationships among multiple views should be additionally considered,
besides the information within each view (Xu, Tao and Xu, 2013). When we construct
multiple views using our approach, we also consider this character of multi-view
learning method.
Paper #850341
18
4.2 Multi-view learning method for strategy optimization
We generally assume that the features of the above-mentioned views can help
improve the prediction results with different significance. Each view, if modeled
separately, can generate a unique prediction result. Since each view can predict the
results from a particular perspective, an ideal model needs to consider all the features
and aligns the prediction results together. One possible solution is to concatenate the
features from different views, which however, may cause overfitting issues because
each view has its distinct distributions and statistical property (Xu et al., 2013). In
addition, the prediction results from different views may not always be consistent
with each other due to their distinct influences on the target securities. Therefore, it is
deemed necessary to regularize the prediction results from different views by aligning
them together under certain conditions. For example, to a statistical arbitrage strategy,
the target security view may predict that the signal is good, while the associated asset
view may predict that the signal is bad. Our learning method measures the powers of
both views and forces them to give an agreement. In this paper, we extend the basic
idea of multi-view learning, which is a traditionally focal point of study in machine
learning when handling the fusion of heterogeneous features from separate
views(Zhou et al., 2004; Cheng and Wang, 2007; Longworth and Gales, 2008).
Specifically, we firstly construct the prediction model from each separate view, and
then align the view together under certain conditions as regularization terms in the
overall prediction.
The methods of investment strategy designing mentioned in the framework part
Paper #850341
19
are well-known and need not be repeated here. In this section, we propose a multi-view
learning method for prediction. We first define some notations. To distinguish vectors
and scalars, we use bold capital letters and bold lowercase letters to respectively
represent matrixes and vectors, while employ non-bold letters to denote scalars. And
parameters are Greek letters. All vectors are in column form unless stated.
Every strategy consists of a group of trading signals, which are described by different
views, 𝑿𝑘ϵ𝑅𝑁×𝐷𝑘 = [𝒙1𝑘, 𝒙2
𝑘, ⋯ , 𝒙𝑁𝑘 ]
𝑇, 𝑘 ∈ {𝑡𝑣, 𝑎𝑣, 𝑚𝑣, 𝑚𝑣} , where 𝑁 is the
number of trading signals in a strategy, 𝒙𝑖𝑘 ∈ 𝑅𝐷𝑘 denote the security feature from
different views extracted from signal 𝑖 , 𝐷𝑘 denotes the feature dimensions of
different views, {𝑡𝑣, 𝑎𝑣, 𝑚𝑣, 𝑚𝑣} represents the view of target individual stocks (TV),
the view of associated individual stocks (AV), the view of markets (MV), and the
view of risks (RV). The whole feature matrix is written as 𝑿 =
[𝑿𝑡𝑣, 𝑿𝑎𝑣, 𝑿𝑚𝑣 , 𝑿𝑟𝑣]𝜖𝑅𝑁×𝐷, where 𝐷 = 𝐷𝑡𝑣 + 𝐷𝑎𝑣 + 𝐷𝑚𝑣 + 𝐷𝑟𝑣. The target vector of
strategy is constituted by the label of every signal, indicating whether a trading signal
is profitable or not, which is denoted as 𝒚 = {𝑦1, 𝑦2, ⋯ , 𝑦𝑁}.
In this paper we apply logistic regression model for simplicity to construct four
specific view predictions and the multi-view prediction. The view-specific predict
functions is
𝑓(𝑿𝑘) =1
1+𝑒−𝑿𝑘𝒘𝑘 , 𝑘 ∈ {𝑡𝑣, 𝑎𝑣, 𝑚𝑣, 𝑚𝑣}, (1)
where 𝒘𝑘 ∈ 𝑅𝐷𝑘, 𝑘 ∈ {𝑡𝑣, 𝑎𝑣, 𝑚𝑣, 𝑚𝑣} denote the logistic mapping function for the
four different views. Then, the final prediction results are obtained by the following
function:
Paper #850341
20
𝑓(𝑿) =1
1+𝑒−𝑿𝒘 (2)
where 𝒘 ∈ 𝑅𝐷 is the weight vector for the strategy.
Information in target security view, associated asset view, market view and risk
view in fact describes the inherent characteristics of the same trading signal from
various aspects, we thus can reinforce the learning performance by enforcing the
agreement on their prediction results. To reduce the computational cost and reach the
almost same effect, instead of six regularization terms between every two views of
four we pick a half of them. According the maximum likelihood estimation (MLE), we
can define the following loss function:
𝑱(𝒘) =1
𝑀∑ (−𝑦𝑖𝒙𝑖𝒘 + ln(1 + 𝑒𝒙𝑖𝒘))𝑀
𝑖=1 + 𝜆1‖𝑿𝑡𝒘𝑡 − 𝑿𝑎𝒘𝑎‖22 + 𝜆2‖𝑿𝑎𝒘𝑎 −
𝑿𝑚𝒘𝑚‖22 + 𝜆3‖𝑿𝑚𝒘𝑚 − 𝑿𝑟𝒘𝑟‖2
2 (3)
4.3 Parameter estimation and super parameter adjustment
The optimization of min𝒘
𝑱(𝒘) is convex with respect to 𝒘 and we use the
gradient descent method to solve this objective function. We decompose the loss
function into two parts as follows,
ℎ(𝒘) =1
M∑ (−𝑦
𝑖��𝑖𝒘 + ln(1 + 𝑒��𝑖𝒘)) 𝑀
𝑖=1 (4)
𝑔(𝒘) = 𝜆1‖𝑿𝑡𝒘𝑡 − 𝑿𝑎𝒘𝑎‖22 + 𝜆
2‖𝑿𝑎𝒘𝑎 − 𝑿𝑚𝒘𝑚‖2
2 + 𝜆3‖𝑿𝑚𝒘𝑚 − 𝑿𝑟𝒘𝑟‖22. (5)
The optimization of min𝒘
𝑱(𝒘) can be rewritten as min𝒘
ℎ(𝒘) + 𝑔(𝒘). To deduce the
iterative formula, the first part of the gradient would be calculated as Eq.(6).
∂ℎ(𝒘)
∂𝑤𝑗= −
1
𝑀∑ (ℎ𝒘(𝒙𝑖) − 𝑦𝑖)
𝑀𝑖=1 𝑥𝑖,𝑗 (6)
where ℎ𝒘(𝒙𝑖) = 1/(1 + 𝑒−𝒙𝑖𝒘), and 𝑀 is the number of samples. For the second
part of the gradient, we express it in matrix form as Eq.(7) with the notations defined
Paper #850341
21
previously.
∂g(𝒘)
∂𝒘= [
𝜆1𝑿𝑡𝑡 −𝜆1𝑿𝑡𝑎
−𝜆1𝑿𝑎𝑡 (𝜆1 + 𝜆2)𝑿𝑎𝑎
0 0−𝜆2𝑿𝑎𝑚 0
0 −𝜆2𝑿𝑚𝑎
0 0
(𝜆2 + 𝜆3)𝑿𝑚𝑚 −𝜆3𝑿𝑚𝑟
−𝜆3𝑿𝑟𝑚 𝜆3𝑿𝑟𝑟
] 𝒘 (7)
Where 𝑿ij = (𝑿i)𝑇𝑿j𝜖𝑅𝐷i×𝐷j, 𝑖, 𝑗𝜖{𝑡𝑣, 𝑎𝑣, 𝑚𝑣, 𝑟 }. Thus, the global update
( ) ( ) ( )( )( )
( )
1 1
1 1 2 21
2 2 3 3
0 0
0
0
0 0
tt ta
at aa amn n T
w
ma mm mr
h y
+
−
− + −= − − − +
− + −
X X
X X Xw w X X
X X X
( )
3 3
n
rm rr
−
w
X X
(8)
where 𝛼 denotes the learning rate and our optimization problem would be well
solved. Both a threshold for the weight changes between two consecutive iterations
and the maximum iterative times are employed as the iteration stopping criterions in
the iteration.
We further develop an adjusting-parameter method to select the
hyper-parameters, in which the adjusting objectives are the coefficients of the penalty
terms together with the log likelihood part, called individual optimality criterion.
Every time we add one penalty to the likelihood part, the objective individual function
is
min𝑤
− ln(𝐿) + 𝜆𝑗𝑃𝑗 , 𝑗 = 1,2,3 (9)
where L and P denote the likelihood and the penalty, respectively. Then we locate the
order of magnitude of hyper-parameter 𝜆𝑗 and traverse around it by a small step
length, like 10e-4. Under the same optimization method and the same convergence
condition, the optimal hyper-parameter 𝜆𝑗∗ is obtained when the objective function
in Eq.(4) reaches the minimum. The same process would be repeated for each penalty
and we combine them with the likelihood, so the objective function in Eq.(3) becomes
Paper #850341
22
min𝑤
∑(−𝑦𝑖𝒙𝑖𝒘 + ln(1 + 𝑒𝒙𝑖𝒘))
𝑀
𝑖=1
+ 𝜆1∗ ‖𝑿𝑡𝒘𝑡 − 𝑿𝑎𝒘𝑎‖2
2 + 𝜆2∗ ‖𝑿𝑎𝒘𝑎 − 𝑿𝑚𝒘𝑚‖2
2
+ 𝜆3∗ ‖𝑿𝑚𝒘𝑚 − 𝑿𝑟𝒘𝑟‖2
2
(10)
5. Data and Empirical results
5.1 Empirical cases of investment strategy in the first stage
In this section, we will choose several trading strategies for the first stage. To
explain that our proposed method can be used for any type of trading strategy, we
choose 5 types of strategies in the empirical analysis, two of them are coming from
the published literatures using the sample in US stock market, three of them are
designed and developed by ourselves for Chinese stock market. The trading strategies
in our paper include the classes of technical strategy such as candlestick charting
strategy, technical indicator strategy, pair trading strategy, long-side trading strategy
according to statistical arbitrage, and strategy based on forecasting model. All the data
is from WIND dataset.
5.1.1 Candlestick charting strategy in US stock market
Lu et al. (2015) tested a series of candlestick trading strategies with different trend
definitions and holding strategies to find the key profitable factor and applied them on
DJIA constituents. In our empirical study, we adopt the candlestick charting strategy that
based on the three-day reversal patterns of the morning star (MS), the ten-day exponential
moving average (EMA10) trend (Marshall et al., 2006, 2008; Lu et al., 2015) and together
with the Marshall-Young-Rose-10 (MYR-10) holding strategy proposed by Marshall et al.
Paper #850341
23
(2006) and adopted by Lu et al. (2015), which has been proven to be profitable in US
market.
The principle of the three-day morning star pattern is that the downtrend continues
with a long black candle, the second day confirms the pessimistic market conditions with a
downward gap (the second candle can be black or white), and finally the third day closes
at the highest price of all three days (Lu et al., 2015). The pattern is shown below.
𝑃𝑑𝑎𝑦1𝑜 > 𝑃𝑑𝑎𝑦1
𝑐 ; |𝑃𝑑𝑎𝑦2𝑜 − 𝑃𝑑𝑎𝑦2
𝑐 | > 0; 𝑃𝑑𝑎𝑦1𝑐 > 𝑃𝑑𝑎𝑦2
𝑐 𝑎𝑛𝑑 𝑃𝑑𝑎𝑦1𝑐 > 𝑃𝑑𝑎𝑦2
𝑜 ;
𝑃𝑑𝑎𝑦3𝑐 > 𝑃𝑑𝑎𝑦3
𝑜 𝑎𝑛𝑑 𝑃𝑑𝑎𝑦𝑠𝑐 > 𝑃𝑑𝑎𝑦1
𝑐 + (𝑃𝑑𝑎𝑦1𝑜 − 𝑃𝑑𝑎𝑦1
𝑐 )/2, (11)
where 𝑃𝑡𝑐 and 𝑃𝑡
𝑜 denote the closing price and the opening price at day t, respectively.
The EMA10 trend is defined as follows:
𝐸𝑀𝐴10,𝑡 = 𝛼𝑃𝑡𝑐 + (1 − 𝛼)𝐸𝑀𝐴10,𝑡−1 (12)
where α = 2/(10 + 1). When the closing price is more (less) than EMA10, the trend is
upward (downward), and the key feature of EMA10 is that market conditions are always
alternating, moving either up or down. The MYR exit strategy sets a specific day to exit
the market and the MTR-10 holding strategy leads to returns as follows:
𝑅𝑀𝑌𝑅−10 = ln (𝑃𝑡+13
𝑐
𝑃𝑡+4𝑜 ) × 100% (13)
The trading strategies are developed based on the thought that candlestick patterns play a
crucial role in signaling whether a trend will continue or reverse. If a three-day bullish
pattern occurs after downtrends or a three-day bearish pattern occurs after uptrends, it
would be a profitable timing for buying or short selling on the 4th day. Then the holding
strategy determines the exiting rule and the holding period.
Paper #850341
24
Lu et al. (2015) employed a daily sample from 26 component stocks of Dow
Jones Industrial Average (DJIA) index, excluding stocks that failed to exist for the
whole sample period from January 2, 1992 to December 31, 2012. Because of the data
availability, this paper uses 29 component stocks of the DJIA index, also excluding
stocks that failed to exist for the whole sample period from January 3, 2006 to
November 3, 2017. Lu et al. (2015) adjusts the empirical results with a 0.5% total
transaction cost per round turn, while in our trials similar adjustment is not adopted for
keeping a profitable result in the first stage. To make sure the strategy used in this paper
is duplication of the strategy in Lu et al. (2015), we compare the performances of
original and reproduction of this strategy. From the comparison results shown in Table
1, we find they are very similar to each other. First, the mean return and winning ratio
is very close to each. Second, the average number of signals per year is 79 and 73 for
original and reproduction strategy, which is very close to each other. The difference
between the performances of original strategy in the literature and the duplication ones is
mainly due to the difference of underlying stocks and the non-overlapping sample period.
Table 1: Comparison of original and duplicated candlestick charting strategy
Strategy Sample period Number of signals Mean return Winning
Lu et al. (2015) 1992.1.2-2012.12.31 1657 0.07% 53.75%
Duplication 2006.1.3-2017.11.3 875 0.10% 52.34%
Note: Winning denotes the portion of the number of profitable trading in the total trades.
5.1.2 Pair trading strategy in US stock market
Gatev, Goetzmann, Rouwenhorst (GGR, 2006) tested a pair trading strategy
which matched stocks into a pair with minimum distance between normalized
Paper #850341
25
historical prices. Suppose that prices obey a statistical model of the form,
𝑃𝑖𝑡 = 𝛽𝑖𝑗𝑃𝑗𝑡 + 𝜀𝑖𝑡 (14)
where 𝑃𝑖𝑡 denotes the closing price of stock i. GGR (2006) chose a matching partner
for each stock by finding the security that minimizes the sum of squared deviations
between the two normalized price series during a 12-month formation period and they
were traded in the next 6-month trade period. A position would be opened when prices
diverge by more than two historical standard deviations and it would be unwind at the
next crossing of the prices.
GGR(2006) tested all stocks in CRSP using the sample period from January 1962
to December 2012. We also choose all the stocks listed in the New York Stock
Exchange (NYSE) and Nasdaq, but using the sample period from January 2006 to June
2017. We compare the average excess return and observations with excess return lower
than zero of the original and duplicated strategy, shown in Table 2, and the results
show that the average excess returns are quite similar to each other, while the ratio of
observations with excess return lower than zero is larger than that of original strategy
in the literature. We think that this can be explained that this strategy depends on the
period. Recently, the successful rate of this strategy is a little lower than before.
Paper #850341
26
Table 2: Comparison of original and duplicated pair trading strategy
Strategy Sample period Average excess return Observations with excess return < 0
Top 5 Top 20 101-120 Top 5 Top 20 101-120
GGR(2006) 1962.1-2002.12 0.00463 0.00520 0.00503 26% 15% 21%
Duplication 2006.1-2017.06 0.00334 0.00488 0.00418 38% 26% 39%
Notes: The “top n" portfolios include the n pairs with least distance measures, and the portfolio
"101-120" studies the 20 pairs after the top 100. The observations are monthly excess
returns.
5.1.3 Technical indicator strategy in Chinese stock market
We choose one of the most used technical indicator KDJ to make an easy
strategy in stock market. The K index of KDJ is calculated according to the following
equations:
𝐾(𝑛)𝑡 =𝑚−1
𝑚𝐾(𝑛)𝑡−1 +
1
𝑚𝑅𝑆𝑉(𝑛)𝑡 , (15)
RSV(n)𝑡 =𝑃𝑡
𝑐−𝑃𝑡𝑙(𝑛)
𝑃𝑡ℎ(𝑛)−𝑃𝑡
𝑙(𝑛)× 100, (16)
where 𝑃𝑡𝑐, 𝑃𝑡
𝑙(𝑛) and 𝑃𝑡ℎ(𝑛) represent the closing price at time t, the lowest price
and the highest price among n days. To form the technical trading strategy, we set the
parameters as n=9, m=3. The buying signal is generated when 𝐾𝑡−1 < 30 and 𝐾𝑡 > 30,
and the selling signal is generated when 𝐾𝑡−1 > 70 and 𝐾𝑡 < 70. For a specific stock,
two successive buying without a selling between them would not be allowed.
We adopt all the Chinese CSI 300 constituent stocks to do the empirical analysis,
using the study sample from January 2015 to December 2016. The performance of this
strategy presented by net value curves and compared with the corresponding CSI 300
index is shown in Figure 2.
Paper #850341
27
Figure 2: Performance of technical indicator strategy in Chinese stock market
Notes: The red solid line represents the net value curve of the strategy, and the blue dotted line is
the trend of CSI 300 index.
5.1.4 Unilateral statistical analysis strategy in Chinese A-share stock market
For the first stage, we can develop an investment strategy based on our understanding
of Chinese market, for example, forming a pair trading strategy in Chinese stock market.
We find some companies issue stock shares both in Chinese A-share and B-share market,
then there are some arbitrage opportunities between the pairs stocks. We choose this kind
of stocks that issues in the Shanghai Stock Exchange (SSE) and the Pearson correlation
coefficient of prices of each pair is higher than 0.7 as our empirical sample, which leads to
a total of 42 pairs of stocks. In Chinese stock market, short selling is not allowed before
the beginning of securities lending around 2013. Even after the beginning of securities
lending, the cost of short selling is very high. Therefore, we develop a unilateral trading
strategy according to the statistical arbitrage, that is we only trade in Chinese A-share, and
only take the buy-and-hold strategy without using short selling. When the price of an A
share stock is undervalued according to its relationship of prices of A-B shares, we buy
this stock, and sell it when the undervaluation disappears.
Paper #850341
28
When the price spread meets the condition expressed as Eq.(17), i.e., the spread
between prices of A share and B share for stock i is below the lower bound of regular
spread, then we buy the relative lower price A share stock.
𝑑𝑝𝑖𝑡 < 𝜇𝑑𝑝𝑖𝑡− 𝜎𝑑𝑝𝑖𝑡
, (17)
where 𝑑𝑝𝑖𝑡 is the spread between prices of A share and B share, defined as 𝑑𝑝𝑖𝑡 =
𝑃𝑖𝑡𝐴 − 𝑃𝑖𝑡
𝐵, 𝑃𝑖𝑡𝐴 and 𝑃𝑖𝑡
𝐵 are A and B share’s closing price respectively for stock i. The
right part of the inequity is the lower bound of regular spread, which adopts one
standard deviation bound, i.e., 𝜇𝑑𝑝 and 𝜎𝑑𝑝 are the mean and the standard deviation of
price spreads 𝑑𝑝𝑖𝑡 over the last 𝑁 days. Because we adopt closing price to produce
the buying signals for this strategy, the long-side trading activities actually happen in
next day. We assume that we can buy the stocks at their opening prices at t+1 day, also
assume every position must be closed in 𝑇 days. The selling signal for each holding
stock appears when Eq. (18) holds.
𝑑𝑝𝑖(𝑡+𝑗) > 𝜇𝑑𝑝𝑖(𝑡+𝑗), 𝑗 = 2,3, … , 𝑇 + 1. (18)
If the criteria cannot be met in 𝑇 days, the holding stock would be sold on the last
day. We fix 𝑁 = 60 and 𝑇 = 5 in this strategy. The study sample is from January
2010 to December 2016. The performance of this strategy presented by net value curves
and compared with are the corresponding market index trends is shown in Figure 3.
Paper #850341
29
Figure 3: Performance of Chinese A-share undervaluation trading strategy
Notes: The red solid line represents the net value curve of the strategy, and the blue dotted line is
the trend of Shanghai Composite Index.
5.1.5 Forecasting strategy in Chinese stock market
As we have summarized in literature review of this paper, the forecasting methods
are used to forecast the price movement, and some quantitative trading strategies are
developed based on this type of models. Here, we develop a trading strategy based on a
classic time series model, autoregressive (AR) model, i.e.,
𝑅𝑡 = 𝛽0 + 𝛽1𝑅𝑡−1 + ⋯ + 𝛽𝑝𝑅𝑡−𝑝 + 𝜀𝑡 (13)
where 𝑅𝑡 = ln (𝑃𝑡
𝑃𝑡−5) is log return in the five days, 𝜀𝑡 is the innovations. The lag
order p is determined by AIC with the maximum setting of 5. We take use of the
rolling auto-estimation method to provide the forecasting results. We choose
estimating window of 200 days, and rolling step of 1 day. For each time, the model is
re-estimated and provides 1-step-ahead and 2-step-ahead forecasting. If the stock is
predicted to continuously rise in next two trading days, it is a buying signal, and we
assume this stock will be bought at the opening price of the next day. At the end of the
third day we sell what we hold at the closing price.
The empirical experiment is based on 40 stocks in total, which are SSE 50 Index
Paper #850341
30
constituent stocks by excluding the securities that have one or more days with no trade
over the period from March 10, 2010 to Dec. 30, 2016. Because the estimation window
is 200 observations, the first forecasting result occurs on January 4, 2011. The
performance of this strategy presented by net value curves and compared with the
corresponding market index SSE 50 is shown in Figure 4.
Figure 4: Performance of strategy based on forcasting model
Notes: The red solid line represents the net value curve of the strategy, and the blue dotted line is
the trend of SSE 50 Index.
5.2 Data and multi-view construction for the second stage
According to the view construction proposed in section 4.2, we discuss the
variables used in different views in our empirical experiment. For the view of target
individual stocks, the quotes information, financial statements information, and the
technical indicators for individual stocks are considered. The quotes information
includes opening price, the highest price, the lowest price, closing price, trading
volume, etc. Financial statements information includes the financial variables such as
market value, gross revenue, net profit, net cash flow, earnings per share, return on
equity, debt-to-assets ratio, etc. The technical indicators can include all the indicators
that can be used to analyze the stock prices, such as the widely used relative strength
Paper #850341
31
indicator (RSI)1, KDJ2, moving average convergence-divergence (MACD)3, bolinger
bands (BOLL)4, stop and reverse indicator(SAR)5, rate of change indicator (RC)6, +DI
in direction movement indicator (PDI)7, bull and bear indicator (BBI)8, momentum
indicator (MTM)9, price and volume trend indicator (PVT)10, bias indicator (BIAS)11,
the active buying volume indicator of orders greater than 1 million RMB (BVI), the
active selling volume indicator of orders greater than 1 million RMB (SVI), and
self-define technical indicator, such as up or down trend indicator (TI)12, relative price
1 In RSI calculation, the number of period is 6.
2 In KDJ calculation, the number of period is 9, and all K, D and J are used.
3 In MACD calculation, the number of long-term period is 26, the number of short-term period is 12,
and the moving average parameter is 9.
4 In BOLL calculation, the moving average parameter is 26 and the bandwidth is double standard
deviation. All upper band, mid line and lower band are used.
5 In SAR calculation, the number of period is 4, the adjusting coefficient is 0.02 and the upper limit of
it is 0.2.
6 𝑅𝐶 =𝑃𝑡
𝑐
𝑃𝑡−50𝑐 × 100%.
7 𝑃𝐷𝐼 = 𝑃𝑡ℎ − 𝑃𝑡−1
ℎ , where 𝑃𝑡ℎ is the highest price for day t. If the value is negative, mark it as 0.
8 𝐵𝐵𝐼 = (𝑀𝐴3 + 𝑀𝐴6 + 𝑀𝐴12 + 𝑀𝐴24)/4, where 𝑀𝐴𝑛 is moving average of n days’ closing price.
9 𝑀𝑇𝑀 = 𝑃𝑡𝑐 − 𝑃𝑡−6
𝑐 , where 𝑃𝑡𝑐 is the closing price.
10 𝑃𝑉𝑇 = ∑ (𝑃𝑡
𝑐−𝑃𝑡𝑐
𝑃𝑡−1𝑐 × 𝑉𝑜𝑙𝑢𝑚𝑒𝑡)𝑇
𝑡=1 , where 𝑃𝑡𝑐 is the closing price and 𝑉𝑜𝑙𝑢𝑚𝑒𝑡 is the trading
volume.
11 𝐵𝐼𝐴𝑆 =𝑃𝑡
𝐶𝑙𝑜𝑠𝑒−𝑀𝐴12
𝑀𝐴12.
12 If 𝑃𝑡𝑐 < 𝑚𝑖𝑛{𝑀𝐴30, 𝑀𝐴90}, the TI is downward; and if 𝑃𝑡
𝐶𝑙𝑜𝑠𝑒 > 𝑚𝑎𝑥{𝑀𝐴30, 𝑀𝐴90}, the TI is
upward.
Paper #850341
32
indicator (RPI)13, falling point indicator (FPI)14, reversion point indicator (RP)15, etc.
For the view of associated individual stocks, besides the quotes information,
financial statements information, and technical indicators mentioned above, we also
bring in different correlation measures and the characteristics of the price spread
between target and associated individual stocks. This is because there are some kinds
of comovement, like contemporaneous correlation or lead-lag relationship between the
pair of stocks, which can contribute more information to price prediction. The
co-movement and lead-lag relationship are measured by different methods. In our
empirical study, we consider the Pearson coefficient, the Granger causality tests,
common factor models used to measure the price discovery of two assets, such as
mortified information share (MIS) model proposed by Lien and Shrestha (2009) based
on Hasbrouck (1995), and PT model proposed by Gonzalo and Granger (1995) to
measure the co-movement and lead-lag relationship. For the price spread, we consider
the mean of the price spread (MSP), standard deviation of the price spread (STDSP),
upper and lower bond of the price spread (UBSP, LBSP)16. The above measures are
estimated using the rolling window method with an estimation window of 120
observations. For the class of strategies like pairs trading and statistical arbitrage
13 𝑅𝑃𝐼 =max{𝑃𝑡−60
𝑐 ,⋯,𝑃𝑡𝑐
}−𝑃𝑡𝑐
max{𝑃𝑡−60𝑐 ,⋯,𝑃𝑡
𝑐}−min{𝑃𝑡−60
𝑐 ,⋯,𝑃𝑡𝑐
}.
14 If 𝑃𝑡𝑐 < 𝑀𝐴5 the 𝐹𝑃𝐼 = 1, otherwise 𝐹𝑃𝐼 = 0.
15 If 𝑀𝐴90 > 𝑚𝑎𝑥{𝑀𝐴5, 𝑀𝐴10, 𝑃𝑡𝑐} and 𝑀𝐴10 < 𝑀𝐴30 the 𝑅𝑃𝐼 = 1, otherwise 𝑅𝑃𝐼 = 0.
16 𝑈𝐵𝑆𝑃 = 𝑀𝑆𝑃 + c × 𝑆𝑇𝐷𝑆𝑃, 𝐿𝐵𝑆𝑃 = 𝑀𝑆𝑃 − c × 𝑆𝑇𝐷𝑆𝑃 , c = 0.5, 1, 1.5, 2, 2.5, 3.
Paper #850341
33
strategy that we can easily find a pair of stocks, then the view of associated individual
stocks includes all the information mentioned above, and the correlation measures are
estimated between the target and the associated individual stocks. For other classes of
strategies that cannot be found a pair of stocks, only correlation measures are included
in this view, and the correlation measures are estimated between the target asset and
market index.
For the view of markets, we bring in a collection of indicators that reflect the trend
and fluctuation of markets, usually index values and technical indicators of market
indices. For example, we incorporate Shanghai Composite Index, SSE 50 Index and
CSI 300 Index, and Shenwan industry sector indices in Chinese stock market, and DJIA
and 10 sector indices17 in US stock market. We adopt the index value and some
technical indicators of index, such as moving average (MA), TI, FPI, RPI, etc.
The view of risks includes various risk measures of securities and market indices,
which may affect the stability of an investment strategy. We bring in several measures
in our empirical study, such as the total risk measure and market risk measure. We
adopt price volatility of stock prices and market indices estimated by exponentially
weighted moving average model (EWMA), and standard deviation of daily returns in
26 trading days of individual stocks. The market risk is measured by beta coefficient
estimated using market index model using 60 months returns series data. Besides, we
17 sector indices include Dow Jones US Basic Materials Index, Dow Jones US Consumer Goods Index, Dow Jones
US Consumer Services, Dow Jones US Financials Index, Dow Jones US Health Care Index, Dow Jones US
Industrials Index, Dow Jones US Oil & Gas Index, Dow Jones US Technology Index, Dow Jones US
Telecommunications Index and Dow Jones US Utilities Index
Paper #850341
34
also bring the standard deviation of trading volume using 10 trading days into the view
of risk. All the data is from the WIND database.
5.3 Investment strategy optimization in second stage
Based on the trading signals produced by original strategies in the first stage, we
use multi-view learning algorithm to improve the performances of them by keeping the
signals with a high possibility of gaining profit, and excluding the ones that are
predicted to make loss. Therefore, we can use the success rates and average return for
signals to evaluate the performance of each trading strategy.
To validate the proposed method framework and multi-view learning logistic
regression model (MultiLR), we compare the performance of optimized strategy after
the second stage with the performance of original strategy formed in the first stage.
Moreover, to better understand the importance and contribution of different views for
the strategy improvement, we also set up several single-view logistic regression
models. We can apply the logistic regression model only with features in one view,
like target security view, associated asset view, market view or risk view, and these
models are recorded as TVLR, AVLR, MVLR and RVLR model, respectively. Then,
compare the results of MultiLR model with TVLR, AVLR, MVLR and RVLR models.
Third, to confirm that multi-view learning method in the second stage can contribute to
the performance improvement of trading strategies, we compare MultiLR model with
other benchmark models, such as the single-view model and random forest
classification method (RF), which shares the same information of multi-view learning
Paper #850341
35
model. We employ logistic regression model with all features in all views, i.e.,
combining all features together as a single view, and set up a single-view model,
recorded as LR model. To set up and estimate the models in the second stage, we
utilize the max normalization for all variables.
The empirical results of the above comparisons of different trading strategies are
reported in Table 3 to Table 7. Through the comparisons of before and after using the
proposed optimization method in this paper, multi-view learning methods with
single-view methods and the random forest classification, we get the following results.
First, our method can improve the performance dramatically of original investment
strategy through MultiLR model, by increasing the success rate from 6% to 17%, and
significant increase of average return for trading signals.
Second, through the comparisons of MultiLR model with models on each view,
we find different kinds of trading strategies depend on different information. For the
statistical arbitrage, characteristics of the target stock and its associated stock play an
important role, while the features in risk view and market view perform slightly
weaker, which is hold for two strategies in this paper no matter in US market or
Chinese market, shown in Table 4 and Table 6. This is consistent with the literatures,
because perfect statistical arbitrage strategy and pair trading strategy should have
excluded the market risk and not be affected by fundamental information. Our
empirical results show that even unperfect statistical arbitrage is barely affected by
market information and other risks, shown in Table 6. As for the technical trading
strategies and forecasting strategy, the results seem similar that the characteristics of
Paper #850341
36
the trading security itself and the market or the risks have more contribution to
performance improvement. The results can make us better understand each kind of
trading strategy and have much important implications for risk management for
investment, that is our method tells what kind of information we should follow and
bring into consideration.
Third, through comparisons of the multi-view learning method with single-view
methods, we prove that the multi-view learning method contributes more than other
methods to performance improvement for strategies. MultiLR model can outperform
LR model for all strategies in this paper, while MultiLR model can outperform RF
model in four of five strategies with only one exception for the forecasting strategy in
Chinese stock market. In this strategy, MultiLR model performs quite similar as good
as, but a little bit worse than RF model. Even though, the advantage of MultiLR model
proposed in this paper is larger than RF model, because our multi-view method has
better explanation ability than random forecast classification algorithm, which is
totally a black box. Also, the results of our multi-view methods are more stable than
that of random classification algorithm due to the randomness while training.
Table 3: Comparisons of our methods with other methods using candlestick charting
strategy in US market
Candlestick charting strategy in US stock market
Strategy RF TVLR AVLR MVLR RVLR LR MultiLR
No. of trading signals 173 119 102 143 159 16 121 68
No. of profitable signals 94 66 58 80 88 10 67 42
No. of failing signals 79 53 44 63 71 6 54 26
Success rate (%) 54.34 55.46 56.86 55.94 55.35 62.5 55.37 61.76
Signal average return (%) 0.27 0.34 0.31 0.35 0.30 1.11 0.18 0.56
Notes: From January of 2006 to November of 2017 the candlestick pattern strategy signaled 875
times. According to the time order, the first 702 trading signals are the training set and the
other 173 signals form the testing set. The table only shows the results from testing set.
Paper #850341
37
Table 4: Comparisons of our methods with other methods using pair trading strategy in
US market
Pair trading strategy in US stock market
Strategy RF TVLR AVLR MVLR RVLR LR MultiLR
No. of trading signals 135 69 94 126 98 116 43 90
No. of profitable signals 73 39 55 72 53 66 26 55
No. of failing signals 62 30 39 54 45 50 17 35
Success rate (%) 54.07 56.52 58.51 57.14 54.08 56.90 60.47 61.11
Signal average return (%) 0.07 -0.01 0.09 0.11 0.15 0.13 -0.20 0.17
Notes: From January of 2006 to June of 2017, the top 20 portfolio of the pair trading strategy had
23 6-month trade periods and signaled 1,171 times. The 135 signals from the last three 6-month
trade periods form the testing set. The table only shows the results from testing set.
Table 5: Comparisons of our methods with other methods using technical trading
strategy in Chinese stock market Technical indicator strategy in Chinese stock market
Strategy RF TVLR AVLR MVLR RVLR LR MultiLR
No. of trading signals 926 555 197 18 390 0 364 304
No. of profitable signals 513 342 117 9 233 0 217 188
No. of failing signals 413 213 80 9 157 0 147 116
Success rate (%) 55.40 61.62 59.39 50.00 59.74 N/A 59.62 61.84
Signal average return (%) 1.63 2.56 3.28 -0.59 2.93 N/A 2.97 3.77
Notes: From January of 2015 to December of 2016, the technical trading strategy in Chinese
market signaled 2,726 times (we choose a profitable period). According to the time order, the first
1,800 trading signals are the training set and the other 926 signals form the testing set. The table
only shows the results from testing set.
Table 6: Comparisons of our methods with other methods using Chinese A-share
undervaluation trading strategy Unilateral statistical analysis strategy in Chinese A-share stock market
Strategy RF TVLR AVLR MVLR RVLR LR MultiLR
No. of signals 3123 1475 62 290 0 1 399 297
No. of profitable signals 1654 910 40 188 0 1 262 210
No. of failing signals 1469 565 22 102 0 0 137 87
Success rate (%) 52.96 61.69 64.52 64.83 N/A 100.00 65.66 70.71
Signal average return (%) 0.69 1.57 3.74 3.45 N/A 18.27 3.88 4.76
Notes: From January of 2010 to December of 2016, Chinese A-share undervaluation strategy
signaled 18,123 times. According to the time order, the first 15,000 trading signals are the training
set and the other 3,123 signals form the testing set. The table only shows the results from testing
set.
Paper #850341
38
Table 7: Comparisons of our methods with other methods using forecasting strategy in
Chinese stock market Forecasting strategy in Chinese stock market
Strategy RF TVLR AVLR MVLR RVLR LR MultiLR
No. of trading signals 2463 608 133 0 433 0 195 189
No. of profitable signals 1293 382 77 0 234 0 118 116
No. of failing signals 1170 226 56 0 199 0 77 73
Success rate (%) 52.50 62.83 57.90 N/A 54.04 N/A 60.51 61.38
Signal average return (%) 0.15 0.52 0.24 N/A 0.05 N/A 0.19 0.21
Notes: From January of 2011 to December of 2016 the forecasting strategy in Chinese market
signaled 21,463 times. According to the time order, the first 19,000 trading signals are the training
set and the other 2,463 signals form the testing set. The table only shows the results from testing
set.
6. Conclusion
This paper works on an important issue related to how to integrate the
domain-specific knowledge in finance and methods of big data analytics and artificial
intelligence. This paper proposes a novel method for investment strategy building,
called two-stage multi-view learning method, which provides a new way to solve the
integration issue and avoid the problems caused by conventional methods that try to
aggregate all the information together in a very complex model, such as the high
requirement and high modelling cost. Moreover, the method can provide some
explanation power for strategy understanding, which artificial intelligence methods
lack.
We choose five different types of strategies, including candlestick charting
strategy, pair trading strategy, technical indicator strategy, long-side statistical analysis
strategy, and forecasting strategy, and adopt data from different markets to test our
method. The empirical results show that our method improves the performance
dramatically for all testing strategies with an increase of success rate about 6% to 17%
Paper #850341
39
and average return about 1.4 to 6.89 times for trading signals. Moreover, our empirical
results reveal that the multi-view learning method outperforms single-view methods in
building investment strategies. In addition, the empirical results confirm that our
method can provide some explanatory power for understanding what kinds of
information contribute the improvement of investment strategy.
The method proposed by this paper contributes to not only the academic research
but also the industry application. Our method can be extended easily to investigate
other financial market issues, because the main idea of our method is that by separating
the domain knowledge and different models or methods in different stage, it can solve
the integration issue better than other ways with a very complex model. Furthermore,
our method is suitable for any types of investment strategies, such as fundamental
analysis strategy, risk premia investing strategy, technical analysis strategy, merging of
fundamental and quantitative investment styles, quantitative investment strategy based
on models, etc. Because of the significant improvement of investment strategies
evaluated by success rate and average return for trading signals brought by our method,
our two-stage multi-view learning method is of great value for industry practice.
Paper #850341
40
Acknowledgments
This work is supported by grants from the National Natural Science Foundation
of China [grant numbers 71671012, 71373001, 71701007, 71531001], National High
Technology Research and Development Program of China (SS2014AA012303), and
Fundamental Research Funds for the Central Universities (Junjie Wu).
References
Asness C S, Moskowitz T J, Pedersen L H. Value and momentum everywhere[J]. The
Journal of Finance, 2013, 68(3): 929-985.
Bickel S, Scheffer T. Multi-view clustering[C]//ICDM. 2004, 4: 19-26.
Blum A, Mitchell T. Combining labeled and unlabeled data with co-training[C]//
Proceedings of the eleventh annual conference on Computational learning theory.
ACM, 1998: 92-100.
Bollen J, Mao H, Zeng X. Twitter mood predicts the stock market[J]. Journal of
computational science, 2011, 2(1): 1-8.
Brefeld U, Büscher C, Scheffer T. Multi-view discriminative sequential learning[C]//
ECML. 2005, 3720: 60-71.
Brefeld U, Gärtner T, Scheffer T, Wrobel S. Efficient co-regularised least squares
regression[C]//Proceedings of the 23rd international conference on Machine
learning. ACM, 2006: 137-144.
Paper #850341
41
Brefeld U, Scheffer T. Co-EM support vector learning[C]//Proceedings of the
twenty-first international conference on Machine learning. ACM, 2004: 16.
Caginalp G, Laurent H. The predictive power of price patterns[J]. Applied
Mathematical Finance, 1998, 5: 181-205.
Chen M, Chen Y, Weinberger K Q. Automatic feature decomposition for single view
co-training[C]//Proceedings of the 28th International Conference on Machine
Learning (ICML-11). 2011: 953-960.
Cheng J, Wang K. Active learning for image retrieval with Co-SVM[J]. Pattern
recognition, 2007, 40(1): 330-334.
Das P, Banerjee A. Meta optimization and its application to portfolio selection[C]//
Proceedings of the 17th ACM SIGKDD international conference on Knowledge
discovery and data mining. ACM, 2011: 1163-1171.
De Bondt W F M, Thaler R. Does the stock market overreact?[J]. The Journal of
Finance, 1985: 793-805.
Dhillon P, Foster D P, Ungar L H. Multi-view learning of word embeddings via
cca[C]//Advances in Neural Information Processing Systems. 2011: 199-207.
Dunis C, Zhou B. Nonlinear modelling of high frequency financial time series[M].
John Wiley & Sons Inc, 1998.
Elliot R, Van Der Hoek J, Malcolm W. Pairs trading[J]. Quantitative Finance. 2005,
(5): 271–276.
Paper #850341
42
Fama E F, French K R. A five-factor asset pricing model[J]. Journal of Financial
Economics, 2015, 116(1): 1-22.
Fama E F, French K R. Common risk factors in the returns on stocks and bonds[J].
Journal of Financial Economics, 1993, 33(1): 3-56.
Fama E F, French K R. Multifactor explanations of asset pricing anomalies[J]. The
Journal of Finance, 1996, 51(1): 55-84.
Focardi S M, Fabozzi F J, Mitov I K. A new approach to statistical arbitrage:
Strategies based on dynamic factor models of prices and their performance[J].
Journal of Banking & Finance, 2016, 65: 134-155.
Gatev E, Goetzmann W N, Rouwenhorst K G. Pairs trading: Performance of a
relative-value arbitrage rule[J]. The Review of Financial Studies, 2006, 19(3):
797-827.
Hamid S A, Iqbal Z. Using neural networks for forecasting volatility of S&P 500
Index futures prices[J]. Journal of Business Research, 2004, 57(10): 1116-1125.
Han Y, Zhou G, Zhu Y. A trend factor: Any economic gains from using information
over investment horizons?[J]. Journal of Financial Economics, 2016, 122(2):
352-375.
Hogan S, Jarrow R, Teo M, Warachka M. Testing market efficiency using statistical
arbitrage with applications to momentum and value strategies[J]. Journal of
Financial Economics, 2004, 73(3): 525-565.
Jegadeesh N, Titman S. Returns to buying winners and selling losers: Implications for
stock market efficiency[J]. The Journal of Finance, 1993, 48(1): 65-91.
Paper #850341
43
Jegadeesh N. Evidence of predictable behavior of security returns[J]. The Journal of
Finance, 1990, 45(3): 881-898.
Kamijo K, Tanigawa T. Stock price pattern recognition-a recurrent neural network
approach[C]//Neural Networks, 1990., 1990 IJCNN International Joint
Conference on. IEEE, 1990: 215-221.
Kembhavi A, Siddiquie B, Miezianko R, et al. Incremental multiple kernel learning
for object recognition[C]// IEEE 12th International Conference on Computer
Vision. IEEE, 2009: 638-645.
Kolanovic M, Krishnamachari, R T. Big data and AI strategies: Machine learning and
alternative data approach to investing[R]. J.P. Morgan, 2017.
Kumar A, Daumé H. A co-training approach for multi-view spectral clustering[C]//
Proceedings of the 28th International Conference on Machine Learning
(ICML-11). 2011: 393-400.
Kumar A, Rai P, Daume H. Co-regularized multi-view spectral
clustering[C]//Advances in neural information processing systems. 2011:
1413-1421.
Lehmann B N. Fads, martingales, and market efficiency[J]. The Quarterly Journal of
Economics, 1990, 105(1): 1-28.
Li J, Bu H, Wu J. Sentiment-aware stock market prediction: A deep learning
method[C]//Service Systems and Service Management (ICSSSM), 2017
International Conference on. IEEE, 2017: 1-6.
Paper #850341
44
Lin C S, Chiu S H, Lin T Y. Empirical mode decomposition–based least squares
support vector regression for foreign exchange rate forecasting[J]. Economic
Modelling, 2012, 29(6): 2583-2590.
Lin Y Y, Liu T L, Fuh C S. Local ensemble kernel learning for object category
recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition
(CVPR'07). IEEE, 2007: 1-8.
Lintner J. The valuation of risk assets and the selection of risky investments in stock
portfolios and capital budgets[J]. The Review of Economics and Statistics, 1965:
13-37.
Liu Y, Zheng Y, Liang Y, Liu S M, Rosenblum D S. Urban water quality prediction
based on multi-task multi-view learning[C]//Proceedings of the Twenty-Fifth
International Joint Conference on Artificial Intelligence. AAAI Press, 2016:
2576-2582.
Lo A W, MacKinlay A C. Stock market prices do not follow random walks: Evidence
from a simple specification test[J]. The Review of Financial Studies, 1988, 1(1):
41-66.
Lo A W, MacKinlay A C. When are contrarian profits due to stock market
overreaction?[J]. The Review of Financial Studies, 1990, 3(2): 175-205.
Lo A W, Mamaysky H, Wang J. Foundations of technical analysis: Computational
algorithms, statistical inference, and empirical implementation[J]. The Journal of
Finance, 2000, 55(4): 1705-1765.
Paper #850341
45
Longworth C, Gales M J F. Multiple kernel learning for speaker verification[C]//IEEE
International Conference on Acoustics Speech and Signal Processing. IEEE, 2008:
1581-1584.
Lu T H, Chen Y C, Hsu Y C. Trend definition or holding strategy: What determines
the profitability of candlestick charting?[J]. Journal of Banking & Finance, 2015,
61: 172-183.
Lu T H, Shiu Y M, Liu T C. Profitable candlestick trading strategies—The evidence
from a new perspective[J]. Review of Financial Economics, 2012, 21(2): 63-68.
Lu T H. The profitability of candlestick charting in the Taiwan stock market[J].
Pacific-Basin Finance Journal, 2014, 26: 65-78.
Luo L, Chen X. Integrating piecewise linear representation and weighted support
vector machine for stock trading signal prediction[J]. Applied Soft Computing,
2013, 13(2): 806-816.
Marshall B R, Young M R, Cahan R. Are candlestick technical trading strategies
profitable in the Japanese equity market?[J]. Review of Quantitative Finance and
Accounting, 2008, 31(2): 191-207.
Marshall B R, Young M R, Rose L C. Candlestick technical trading strategies: can
they create value for investors?[J]. Journal of Banking & Finance, 2006, 30(8):
2303-2323.
Moskowitz T J, Ooi Y H, Pedersen L H. Time series momentum[J]. Journal of
Financial Economics, 2012, 104(2): 228-250.
Paper #850341
46
Nevmyvaka Y, Feng Y, Kearns M. Reinforcement learning for optimized trade
execution[C]//Proceedings of the 23rd international conference on Machine
learning. ACM, 2006: 673-680.
Nigam K, Ghani R. Analyzing the effectiveness and applicability of
co-training[C]//Proceedings of the ninth international conference on Information
and knowledge management. ACM, 2000: 86-93.
Nyberg H. Predicting bear and bull stock markets with dynamic binary time series
models[J]. Journal of Banking & Finance, 2013, 37(9), 3351-3363.
Pástor Ľ, Stambaugh R F. Liquidity risk and expected stock returns[J]. Journal of
Political Economy, 2003, 111(3): 642-685.
Refenes A P. Neural networks in the capital markets[M]. John Wiley & Sons, Inc.,
1994.
Ross S A. The arbitrage theory of capital asset pricing[J]. Journal of Economic
Theory, 1976, 13(3): 341-360.
Sharpe W F. Capital asset prices: A theory of market equilibrium under conditions of
risk[J]. The Journal of Finance, 1964, 19(3): 425-442.
Sindhwani V, Niyogi P, Belkin M. A co-regularization approach to semi-supervised
learning with multiple views[C]//Proceedings of ICML workshop on learning with
multiple views. 2005: 74-79.
Sindhwani V, Rosenberg D S. An RKHS for multi-view learning and manifold
co-regularization[C]//Proceedings of the 25th international conference on
Machine learning. ACM, 2008: 976-983.
Paper #850341
47
Tao D, Tang X, Li X, Wu X. Asymmetric bagging and random subspace for support
vector machines-based relevance feedback in image retrieval[J]. IEEE
transactions on pattern analysis and machine intelligence, 2006, 28(7): 1088-1099.
Varma M, Ray D. Learning the discriminative power-invariance trade-off[C]//
International conference on computer vision, 2007: 1-8.
Wang J J, Wang J Z, Zhang Z G, Guo S P. Stock index forecasting based on a hybrid
model[J]. Omega, 2012, 40(6): 758-766.
Wang Z, Chen S, Gao D. A novel multi-view learning developed from single-view
patterns[J]. Pattern Recognition, 2011, 44(10): 2395-2413.
Xiong R, Nichols E P, Shen Y. Deep learning stock volatility with google domestic
trends[J]. arXiv preprint arXiv:1512.04916, 2015.
Xu C, Tao D, Xu C. A survey on multi-view learning[J]. workin paper at
arXiv:1304.5634, 2013.
Yu S, Krishnapuram B, Rosales R, Rao R. Bayesian co-training[J]. Journal of
Machine Learning Research, 2011, 12(Sep): 2649-2680.
Zhang L, Zhang L, Tao D, Huang X. On combining multiple features for
hyperspectral remote sensing image classification[J]. IEEE Transactions on
Geoscience and Remote Sensing, 2012, 50(3): 879-893.
Zhang W, Zhang K, Gu P, Xue X. Multi-View Embedding Learning for Incompletely
Labeled Data[C]//IJCAI. 2013: 1910-1916.
Paper #850341
48
Zheng Y, Yi X, Li M, Chang E. Forecasting fine-grained air quality based on big
data[C]//Proceedings of the 21th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining. ACM, 2015: 2267-2276.
Zhou Z, Chen K, Jiang Y. Exploiting unlabeled data in content-based image
retrieval[J]. Lecture Notes in Computer Science, 2004: 525-536.
Zhou Z H, Li M. Semi-Supervised Regression with Co-Training[C]//IJCAI. 2005, 5:
908-913.
Paper #850341
Top Related