Download - Paper #850341 · fuzzy neural network (Bollen et al, 2011), deep recurrent neural networks, long short-term memory (LSTM) neural networks and other new methods are studied how to

1

Cascading Investment Strategy Building via Multiview Learning

1.

2.

Abstract

Integrating the domain-specific knowledge in finance with methods of big data analytics

and artificial intelligence to contribute to financial research is an important issue, which has

become a focal point of concern for both academia and industry. This paper proposes a novel

two-stage machine learning strategy building method, which has several advantages such as

decreasing the modelling cost and improving the explanation power. We conduct the empirical

analysis to test our method based on five quantitative strategies discussed in the literature or

applied in practice using data from US and Chinese stock market. The empirical results show

that our method improves the performance dramatically for all testing strategies with an

increase of success rate about 6% to 17% and average return about 1.4 to 6.89 times of trading

signals. In addition, the results reveal that multi-view learning method is better than

single-view methods, and can provide some explanatory power for understanding the

information contribution to strategy.

Keywords: multi-view learning; intelligent decision; strategy optimization

JEL classification: C55; C61; G11

Paper #850341

2

1. Introduction

As statement in the report of J.P. Morgan, the great changes in the marketplace are

the increasing adoption of quantitative investing techniques, including developing new

quantitative trading strategies, utilizing increasing amounts and different types of data,

and adopting new methods such as machine learning and artificial intelligence

(Kolanovic and Krishnamachari, 2017). Taking use of the big data analysis and

machine learning methods puts high professional requirement for currently existing

researchers and analysts in the market. There is a big gap between new methods

mentioned above and the traditional theoretical and empirical methods broadly used in

financial market. Thus, we can observe that these two kinds of research methods are

used by different groups of researchers, i.e., the new methods are used by the

researchers who know the big data analysis methods but don’t understand financial

theory, while the traditional methods are used by the financial experts who may not

know big data analysis and machine learning methods very well. For example, machine

learning and artificial intelligence are usually used to forecast highly volatile financial

time series, such as the stock and futures prices and returns series and their volatility,

because artificial neural networks are good nonlinear function approximates. Recurrent

neural networks are usually used in the early works to predict stock prices (Kamijo and

Tanigawa,1990) and price volatility (Hamid and Iqbal, 2004), while self-organizing

fuzzy neural network (Bollen et al, 2011), deep recurrent neural networks, long

short-term memory (LSTM) neural networks and other new methods are studied how to

apply to financial market such as the work of Xiong et al. (2015) and Li, Bu and Wu

Paper #850341

3

(2017), since they have shown remarkable results in tasks such as artificial handwriting

generation, language forecasting and speech recognition (Li, Bu and Wu, 2017). We

find most of this kind of works only take use of machine learning and artificial

intelligence methods but rarely depends on the financial theories. On the other side,

most of the papers published in classic financial journals seldom adopt these new

methods. With the quick development of big data analytics and artificial intelligence

methods, how to integrate the domain-specific knowledge in finance and methods of

big data analytics and artificial intelligence is still an important issue for both academia

and industry.

To achieve this goal of utilizing increasing amounts and different types of data and

adopting new methods, researchers and analysts tend to develop more and more

complex quantitative models. For example, there are a big part of hybrid models and

integrated methods in forecasting area (Lin et al., 2012; Wang et al. 2012; among

others), especially when learning methods are included in. An extraordinarily complex

model not only puts high professional requirement for researchers and analysts, who

must have the domain-specific knowledge and understanding the complex models in

big data analysis and artificial intelligence, but also increases the modeling and

computational costs. Besides, a complex model with specific artificial intelligence

methods have weaker explanatory power than the easy models such as statistical and

econometric models. Therefore, how to better and more rational use of new methods to

avoid the problems caused by large amounts of data and complex models on the one

hand and improve the explanatory power of the models and provide more insights on

Paper #850341

4

theory on the other hand is a topic worth exploring.

This paper investigates the method that can solve the above issues and adopts

quantitative investment strategies development as specific case to make discussion.

Instead of bringing together all the information in a single very complex model, we

propose a novel method, i.e., two-stage multi-view learning investment strategy

building method. The general idea of this method is that by separating the domain

knowledge and different models or methods in two stages. In the first stage, researchers

and analysts can develop an investment strategy based on any theory or model from

their domain; in the second stage, the developed strategy is improved by multi-view

learning method proposed by this paper based on a large set of data, which reflects the

domain-specific knowledge, the theoretical and empirical results in finance.

There are several advantages of our proposed method. First, the separation of

modelling process makes the methods or models in each stage independent on each

other, and thus easy especially in the first stage. The researcher or analysts can set up

any single model in the first stage based on domain-specific knowledge and theoretical

models in financial economics and financial market or data-driven models like

statistical methods or even machine learning models that you at good at. Second, much

more valuable information is brought into models in the second stage according to the

domain knowledge and other feature extraction methods such as data mining, text

mining and so on. Thus, this method can break the limitation of data processing ability

of some specific models such as econometric models in the first stage. Third, the

proposed multi-view learning method in the second stage can contribute more to the

Paper #850341

5

improvement of investment strategy than the methods that aggregate all large scale of

information in single view. Fourth, our method can provide some explanatory power

for what kinds of information are important for specific strategies.

To examine the efficiency of our method, we choose five different types of

strategies in the empirical analysis, including candlestick charting strategy, pair

trading strategy, technical indicator strategy, statistical analysis strategy, and

forecasting strategy. The first two come from the prior literatures using the data from

US stock market, and the latter three are developed by ourselves according to

technical analysis, statistical analysis and time series forecasting model using the data

from Chinese stock market. In the empirical analysis, we compare the results of our

method with other benchmark methods according to the investment performance. We

compare the result of strategy developed using our model, i.e., the final results after

two stage, with that of strategy developed only based on domain knowledge, i.e., the

strategy in the first stage. The empirical results show that our method improves the

performance dramatically for all testing strategies with an increase of success rate

about 6% to 17% and average return about 1.4 to 6.89 times for trading signals.

Moreover, by comparing our multi-view learning method with single-view methods

using logistic regression model and random forest algorithm, our empirical results

confirm that multi-view learning method contributes more to investment strategy

building. Finally, our method can provide some explanatory power for understanding

which kinds of information contribute to the improvement of investment strategy.

The rest of this paper is organized as follows. Section 2 reviews the literature

Paper #850341

6

about active investment strategy that are the domain knowledge in finance for the first

stage of our method, and the literature on multi-view learning method in recent years,

which is related to our second stage method. Section 3 introduces the new proposed

framework of two-stage multi-view learning method for strategy building. Section 4

discusses the multi-view learning method. Section 5 provides the empirical results of

our method. Section 6 is the conclusion.

2. Literature review

2.1 Active investment strategy

Most investment strategies are designed to take use of price patterns that are

discovered by different financial theories and models, such as the capital asset pricing

model (CAPM) (Sharpe, 1964; Lintner, 1965), arbitrage pricing model (APT) (Ross,

1976), the Fama-French three-factor and five-factor asset pricing models (Fama and

French, 1993, 1996, 2015). To investigate the pricing mechanism of securities, a lot of

researches discuss the risk factors or anomalies in financial market. The

cross-sectional risk factors include size and value effects (Fama and French, 1993,

1996, 2015), profitability and investment patterns in average stock returns (Fama and

French, 2015), liquidity factor (Pástor and Stambaugh, 2003), etc. Return reversals

and momentum are two of the most studied capital market phenomena in the literature

that can be examined through the profits of zero net investment portfolios using well

diversified “winners” or “losers” portfolios. Return reversals are usually explained by

market overreaction to information, while momentum effects are explained by

Paper #850341

7

underreaction to information. Many literatures study the price patterns in US stock

markets, for example, DeBondt and Thaler (1985) reported return reversal over long

horizon, Lehmann (1990), Lo and MacKinlay (1990) and Jegadeesh (1990) reported

short-term reversals at daily, weekly, and monthly levels, while Jegadeesh and Titman

(1993) reported momentum effect for 3- to 12-month holding period. Some recent

literatures provide more evidence in different market and asset classes. Asness et al.

(2013) examined the value and momentum effect jointly across eight different

markets and asset classes and revealed value and momentum in government bonds

and value effects in currencies and commodities. Furthermore, recent literatures

propose time series momentum or trend effect. Moskowitz et al. (2012) proposed that

time series momentum can persist about a year in equity index, currency, commodity,

and bond futures. Han, Zhou and Zhu (2016) provided a trend factor that captures

simultaneously the short-, intermediate-, and long-term stock price trends, and

showed that this trend factor outperforms substantially the well-known short-term

reversal, momentum, and long-terms reversal factors separately.

There are several branches for active strategies, including fundamental analysis,

technical analysis, statistical arbitrage, and strategies based on time series forecasting

models, etc. These strategies are still widely used in financial market, due to the

existence of market inefficiency and the effect of investors’ not completely rationality.

Technical analysis, also known as "charting", has been a part of financial practice for

many decades. Because of the highly subjective nature of technical analysis, this

discipline has not received the same level of academic scrutiny and acceptance as more

Paper #850341

8

traditional approaches such as fundamental analysis. In rejecting the Random Walk

Hypothesis, technical analysis may well be an effective means for extracting useful

information from market prices. Lo and MacKinlay (1988) among other papers have

shown that past prices may be used to forecast future returns to some degree, which

provides support for technical analysts. Lu et al. (2015) investigated what determines

the profitability of candlestick trading strategies based on the results of the profitability

of candlestick trading strategies confirmed by Caginalp and Laurent (1998), Lu et al.

(2012), Lu (2014), etc. Lo, Mamaysky, and Wang (2000) proposed a systematic and

automatic approach for technical pattern recognition using nonparametric kernel

regression, and they found that several technical indicators do provide incremental

information and may have some practical value in US stock market. Some papers try to

adopt artificial intelligence methods to improve the performance of technical analysis.

For example, Refenes (1995) proposed the genetic-based global learning method in a

FX trading system in order to find the best combination of technical indicators for

prediction and trading. Dunis and Zhou (1998) adopted genetic algorithms to optimize

parameters and develop a FX trading system for a simple technical trading indicator,

RSI.

Statistical arbitrage strategies are developed to take use of the properties of stock

prices and returns, such as the momentum and value effects (Hogan et al. 2004), the

mean-reverting process (Elliot et al., 2005; Avellaneda and Lee, 2008; Focardi et al.,

2016) and so on. Focardi et al. (2016) introduced a new statistical arbitrage strategy

based on dynamic factor models, which exploits the mean-reverting properties of prices.

Paper #850341

9

Pair trading is one kind of popular statistical arbitrage strategies for short-term

speculation. There are two main issues in implementing a pairs trading strategy: One is

identifying those assets whose prices move close together, and the other one is

determining the position and timing to form a trading strategy. Several papers describe

trading strategies and methods for identifying suitable pairs. Gatev, Goetzmann,

Rouwenhorst (2006) presented a detailed analysis of the performance of pairs trading

strategies within a global framework of cointegration and find evidence of pairs trading

profit. They link the profitability to the presence of a common factor in the returns,

different from conventional risk measures.

Another type of investment strategies is formed according to forecasting results.

The forecasting methods have developed from simple linear model, e.g., the widely

used autoregressive integrated moving average (ARIMA) models, to hybrid models, for

example, the hybrid model of ARIMA and artificial neural networks, then to integrated

forecasting methods, and learning methods. Nyberg (2013) employed dynamic Probit

models to predict the U.S. bear and bull stock markets with a monthly dataset, and the

investment strategy based on it was proved to yield higher portfolio returns compared

with the buy-and-hold trading strategy in a small-scale market timing experiment. Luo

and Chen (2013) proposed a model combining a piecewise linearization algorithm and

weighted support vector machine (SVM) algorithm to predict the trading signal of

stock in Shanghai Exchange. Lin et al. (2012) employed empirical mode

decomposition (EMD) in least square support vector regression (LSSVR) algorithm to

forecast foreign exchange rate. Wang et al. (2012) presented an integrated approach

Paper #850341

10

which contains exponential smoothing model, ARIMA, back propagation neural

network and genetic algorithm to predict indexes both in China and US.

2.2 Multi-view learning methods

Learning methods have been used in financial market. Nevmyvaka et al. (2006)

presented an empirical application of reinforcement learning to the problem of

optimized trade execution in modern financial markets. Das and Banerjee (2011)

proposed the meta-learning algorithm (MLA), which combines the portfolio vectors

for the coming period generated by each base expert to form a final portfolio, as an

online portfolio selection method for a fund of funds.

In recent years, more and more scientific data analytics problems collect data

from multiple sources so that multi-viewing learning methods have been proposed.

One of the earliest schemes of multi-view learning is co-training which trains

alternately to maximize the consistency between two distinct views of the unlabeled

data (Blum and Mitchell, 1998). Many variants like co-expectation-maximization

(co-EM) (Nigam and Ghani, 2000; Brefeld and Scheffer, 2004), co-regularization

(Sindhwani et al., 2005; Sindhwani and Rosenberg, 2008), co-regression (Zhou and Li,

2005a; Brefeld et al., 2006), co-clustering (Bickel and Scheffer, 2004; Kumar et al.,

2011; Kumar and Daum e III, 2011) and graph-based co-training (Yu et al., 2011)

have been developed and they have been mainly applied to solve the problems in

natural language processing and computer vision. Besides co-training, multiple kernel

learning and subspace learning-based approaches are also important types of

multi-view learning method (Xu et al., 2013). Multiple kernel learning puts up a good

Paper #850341

11

performance in problems of object classification (Lin et al., 2007; Varma and Ray,

2007), object detection (Longworth and Gales, 2008) and object recognition

(Kembhavi et al., 2009). Subspace learning also does well in the facial expression

recognition problem (Dhillon et al., 2011) and sensing image classification problem

(Zhang et al., 2012).

Furthermore, researchers combine multi-view learning with multi-task learning to

form a new learning paradigm to explore both task relatedness and view relatedness

(Zhang et al., 2013; Liu et al., 2016b; Zheng et al., 2015). Overall, multi-view learning

outperforms single-view learning in many problems to some extent and it has better

generalization ability, which make multi-view learning more promising learning

paradigm.

3. Framework of two-stage machine learning method for strategy building

In this section, we discuss the proposed the two-stage machine learning method

for investment strategy building by this paper, which can totally separate the initial

investment strategies design and development part and the strategy improvement part.

In the first stage, develop an investment strategy and generate the initial trading signals.

You can form any investment strategy using any specific theory, model or price pattern

based on the knowledge from any area that you know and good at. For example, if you

are expert in finance, you can take use of the financial models such as CAPM, APT,

factor models in the literature; if you are good at statistics and econometric models, you

can develop a trading strategy according to statistical arbitrage or forecasting models; if

you are good at technical analysis, you can form a trading strategy according to some

Paper #850341

12

technical indicators. In the second stage, optimize the trading signals obtained from the

first stage and improve the strategy using artificial intelligence methods. We propose a

multi-view learning method for the second stage in this paper. The initial trading

signals formed in the first stage are the inputs of the multi-view learning predict

models in the second stage. The method in the second stage is actually a prediction and

optimization for trading signals, so that we can select and keep good signals that would

make profit and discard bad signals that would make loss. To improve the prediction

accuracy of trading signals whether it is profitable or not, we bring in a variety of

information. According the property of information, we classify the information into

different groups, and adopt the multi-view learning method to optimize the trading

signals and improve the performance of initial strategy. The multi-view learning

method in the second stage for our method framework can be divided into several

sub-frames, including view construction and the multi-view learning method for

strategy optimization. Figure 1 presents the framework of our approach.

Paper #850341

13

Figure 1: The framework of two-stage strategy optimization method

The advantages of our two-stage machine learning method for strategy

development are as follows. First, the separation of strategies development process and

performance optimization process has benefits from not only the computational cost

but also the human resource cost. This two-stage framework for investment strategy

development is much easier in each stage than very complex models that try to bring all

information together, so that we can break through the limitation of data processing

ability of some specific models. Also, each model development in each stage can be

accomplished by different person, which means that it reduces the professional

requirement for strategy development, and hence the human resource cost. Second, the

two-stage framework of investment strategy optimization can be applied to any active

strategy development and optimization, including the very simple and common

strategies in practice, some academic models in literatures, and even private investment

Paper #850341

14

strategies developed by industry companies that we even don’t know exactly the core

information of the strategy. The methods in the second stage is independent on that used

in the first stage, therefore, the application value of two-stage method increases a lot.

Third, we can add a lot of valuable information at the second stage no matter what

strategy is developed in the first stage, and bring the advantage of big data analysis

through the second stage. Fourth, bring in artificial intelligence models in the second

stage to obtain their advantage of high predictive accuracy to improve the performance

of strategy, while the method proposed in this paper still maintain the advantage of

explanatory power about why the second stage can improve the performance of initial

strategy. The explanatory power is very important, which can improve the

understanding of the initial strategy and shed light on the related risks of the strategy.

4. Multi-view learning method

4.1 Multi-view construction

To solve the investment strategy improvement issue, we adopt co-training style

algorithms for the second stage. The co-training style algorithms, which is one of the

earliest schemes for multi-view learning firstly introduced by Blum and Mitchell

(1998), have several good properties that are very suitable for our framework and

method for investment strategy optimization. For example, as Xu, Tao and Xu(2013)

stated, the co-training style algorithms usually train separate learners on distinct views,

which are then forced to be consistent across views, and this kind of approach can be

regarded as a late combination of multiple views. This property is very suitable for

Paper #850341

15

multi-view learning in the second stage, because it can separate the task of views

construction and views combination (learning algorithm) very easily and clearly,

which is very important for our method framework.

The main assumptions of co-training style algorithms summarized in Xu, Tao and

Xu (2013) include: (a) Sufficiency, each view is sufficient for classification on its own;

(b) Compatibility, the target functions in all views predict the same labels for

co-occurring features with high probability, and (c) Conditional independence: the

views are conditionally independent given the class label. The conditional

independence assumption plays a critical role, but it is usually too strong to be

satisfied in practice and several weaker alternatives have thus been considered. Abney

(2002) put forward that the weak dependence alone can lead to successful co-trading.

Wang and Zhou (2007) also showed that when the diversity among the all learners

based on each view is large, the performance of the learner can be improved by

co-training style algorithms. Moreover, the complementary principle of the

multi-view learning states that in a multi-view setting, each view of the data may

contain some knowledge that other views do not have; therefore, multiple views can

be employed to comprehensively and accurately describe the data.

In this paper, we propose a new way for view construction to solve the strategy

performance improvement problem, which is also based on the consideration of above

assumptions, especially the assumptions of sufficiency, weak dependence/diversity,

and complementary principle. The idea of view construction in this paper is as follows.

First, we try to collect all the related and fundamental information that can affect the

Paper #850341

16

trading strategies in financial market. Second, we divide all information into groups

according to their property, and each group is a view in multi-view learning method.

From the literatures and practical of fundamental analysis and technical analysis for

active strategies, we know there is a well-used way to do securities analysis, top down

or bottom up. No matter we adopt top down or bottom up approach for stock analysis,

for example, we should always include the information of individual stocks, the

sectors and the industries, the markets and the economy to make better understanding

about the individual stocks and stock market. Also, for the development of trading

strategies, stock selection, timing and asset allocation are very important process that

can influence the final performance. Top down and bottom up approach can do much

about stock selection and even timing problem. We still need some risk measures to

contribute to asset allocation. Therefore, we can form view construction for stock

market investment strategies according to securities analysis as the view of individual

stocks, the view of sectors and industries, the view of market and economy, and the

view of risks. The view construction can be adjusted a little to be more suitable for

different trading strategies. For example, for pair trading strategy, the relationship

between the pair of stocks is much more important than the sector or industry

information. Therefore, we can adjust the view construction as the view of target

individual stocks (TV), the view of associated individual stocks (AV), the view of

markets (MV), and the view of risks (RV). We can include the quotes information, the

historical trend information measure by technical indicators for all the view of

individual, sector, and market, and the valuation information according to financial

Paper #850341

17

statements can also be added to the view of individual stocks.

Because each view can add up new information for the original trading strategy

that may depend on some special patterns or anomalies of stock prices, the view

construction introduced here satisfies sufficiency assumption. Moreover, because we

construct the views according to different levels of information, therefore, the

assumptions of weak dependence or diversity and complementary principle are

satisfied.

Compared to the most widely used existing classes of view construction methods

used in multi-view learning to decomposes the original set of features into multiple

disjoint subsets to construct each view, such as random approaches ((Brefeld et al.,

2005; Bickel and Scheffer, 2004; Brefeld and Scheffer, 2004; Tao et al. 2006),

reshaping or decomposing algorithms (Wang et al. 2011), feature set partitioning

automatically approaches (Chen et al. 2011), our view construction approach

incorporates the area knowledge of financial economics. This is in line with our idea

of integrating theories and methods in finance, big data and artificial intelligence.

Moreover, there are significant differences between multi-view feature selection and

single-view feature selection, because view construction approach can bear some

connections with the mature feature selection algorithms. In multi-view feature

selection, the relationships among multiple views should be additionally considered,

besides the information within each view (Xu, Tao and Xu, 2013). When we construct

multiple views using our approach, we also consider this character of multi-view

learning method.

Paper #850341

18

4.2 Multi-view learning method for strategy optimization

We generally assume that the features of the above-mentioned views can help

improve the prediction results with different significance. Each view, if modeled

separately, can generate a unique prediction result. Since each view can predict the

results from a particular perspective, an ideal model needs to consider all the features

and aligns the prediction results together. One possible solution is to concatenate the

features from different views, which however, may cause overfitting issues because

each view has its distinct distributions and statistical property (Xu et al., 2013). In

addition, the prediction results from different views may not always be consistent

with each other due to their distinct influences on the target securities. Therefore, it is

deemed necessary to regularize the prediction results from different views by aligning

them together under certain conditions. For example, to a statistical arbitrage strategy,

the target security view may predict that the signal is good, while the associated asset

view may predict that the signal is bad. Our learning method measures the powers of

both views and forces them to give an agreement. In this paper, we extend the basic

idea of multi-view learning, which is a traditionally focal point of study in machine

learning when handling the fusion of heterogeneous features from separate

views(Zhou et al., 2004; Cheng and Wang, 2007; Longworth and Gales, 2008).

Specifically, we firstly construct the prediction model from each separate view, and

then align the view together under certain conditions as regularization terms in the

overall prediction.

The methods of investment strategy designing mentioned in the framework part

Paper #850341

19

are well-known and need not be repeated here. In this section, we propose a multi-view

learning method for prediction. We first define some notations. To distinguish vectors

and scalars, we use bold capital letters and bold lowercase letters to respectively

represent matrixes and vectors, while employ non-bold letters to denote scalars. And

parameters are Greek letters. All vectors are in column form unless stated.

Every strategy consists of a group of trading signals, which are described by different

views, 𝑿𝑘ϵ𝑅𝑁×𝐷𝑘 = [𝒙1𝑘, 𝒙2

𝑘, ⋯ , 𝒙𝑁𝑘 ]

𝑇, 𝑘 ∈ {𝑡𝑣, 𝑎𝑣, 𝑚𝑣, 𝑚𝑣} , where 𝑁 is the

number of trading signals in a strategy, 𝒙𝑖𝑘 ∈ 𝑅𝐷𝑘 denote the security feature from

different views extracted from signal 𝑖 , 𝐷𝑘 denotes the feature dimensions of

different views, {𝑡𝑣, 𝑎𝑣, 𝑚𝑣, 𝑚𝑣} represents the view of target individual stocks (TV),

the view of associated individual stocks (AV), the view of markets (MV), and the

view of risks (RV). The whole feature matrix is written as 𝑿 =

[𝑿𝑡𝑣, 𝑿𝑎𝑣, 𝑿𝑚𝑣 , 𝑿𝑟𝑣]𝜖𝑅𝑁×𝐷, where 𝐷 = 𝐷𝑡𝑣 + 𝐷𝑎𝑣 + 𝐷𝑚𝑣 + 𝐷𝑟𝑣. The target vector of

strategy is constituted by the label of every signal, indicating whether a trading signal

is profitable or not, which is denoted as 𝒚 = {𝑦1, 𝑦2, ⋯ , 𝑦𝑁}.

In this paper we apply logistic regression model for simplicity to construct four

specific view predictions and the multi-view prediction. The view-specific predict

functions is

𝑓(𝑿𝑘) =1

1+𝑒−𝑿𝑘𝒘𝑘 , 𝑘 ∈ {𝑡𝑣, 𝑎𝑣, 𝑚𝑣, 𝑚𝑣}, (1)

where 𝒘𝑘 ∈ 𝑅𝐷𝑘, 𝑘 ∈ {𝑡𝑣, 𝑎𝑣, 𝑚𝑣, 𝑚𝑣} denote the logistic mapping function for the

four different views. Then, the final prediction results are obtained by the following

function:

Paper #850341

20

𝑓(𝑿) =1

1+𝑒−𝑿𝒘 (2)

where 𝒘 ∈ 𝑅𝐷 is the weight vector for the strategy.

Information in target security view, associated asset view, market view and risk

view in fact describes the inherent characteristics of the same trading signal from

various aspects, we thus can reinforce the learning performance by enforcing the

agreement on their prediction results. To reduce the computational cost and reach the

almost same effect, instead of six regularization terms between every two views of

four we pick a half of them. According the maximum likelihood estimation (MLE), we

can define the following loss function:

𝑱(𝒘) =1

𝑀∑ (−𝑦𝑖𝒙𝑖𝒘 + ln(1 + 𝑒𝒙𝑖𝒘))𝑀

𝑖=1 + 𝜆1‖𝑿𝑡𝒘𝑡 − 𝑿𝑎𝒘𝑎‖22 + 𝜆2‖𝑿𝑎𝒘𝑎 −

𝑿𝑚𝒘𝑚‖22 + 𝜆3‖𝑿𝑚𝒘𝑚 − 𝑿𝑟𝒘𝑟‖2

2 (3)

4.3 Parameter estimation and super parameter adjustment

The optimization of min𝒘

𝑱(𝒘) is convex with respect to 𝒘 and we use the

gradient descent method to solve this objective function. We decompose the loss

function into two parts as follows,

ℎ(𝒘) =1

M∑ (−𝑦

𝑖��𝑖𝒘 + ln(1 + 𝑒��𝑖𝒘)) 𝑀

𝑖=1 (4)

𝑔(𝒘) = 𝜆1‖𝑿𝑡𝒘𝑡 − 𝑿𝑎𝒘𝑎‖22 + 𝜆

2‖𝑿𝑎𝒘𝑎 − 𝑿𝑚𝒘𝑚‖2

2 + 𝜆3‖𝑿𝑚𝒘𝑚 − 𝑿𝑟𝒘𝑟‖22. (5)

The optimization of min𝒘

𝑱(𝒘) can be rewritten as min𝒘

ℎ(𝒘) + 𝑔(𝒘). To deduce the

iterative formula, the first part of the gradient would be calculated as Eq.(6).

∂ℎ(𝒘)

∂𝑤𝑗= −

1

𝑀∑ (ℎ𝒘(𝒙𝑖) − 𝑦𝑖)

𝑀𝑖=1 𝑥𝑖,𝑗 (6)

where ℎ𝒘(𝒙𝑖) = 1/(1 + 𝑒−𝒙𝑖𝒘), and 𝑀 is the number of samples. For the second

part of the gradient, we express it in matrix form as Eq.(7) with the notations defined

Paper #850341

21

previously.

∂g(𝒘)

∂𝒘= [

𝜆1𝑿𝑡𝑡 −𝜆1𝑿𝑡𝑎

−𝜆1𝑿𝑎𝑡 (𝜆1 + 𝜆2)𝑿𝑎𝑎

0 0−𝜆2𝑿𝑎𝑚 0

0 −𝜆2𝑿𝑚𝑎

0 0

(𝜆2 + 𝜆3)𝑿𝑚𝑚 −𝜆3𝑿𝑚𝑟

−𝜆3𝑿𝑟𝑚 𝜆3𝑿𝑟𝑟

] 𝒘 (7)

Where 𝑿ij = (𝑿i)𝑇𝑿j𝜖𝑅𝐷i×𝐷j, 𝑖, 𝑗𝜖{𝑡𝑣, 𝑎𝑣, 𝑚𝑣, 𝑟 }. Thus, the global update

( ) ( ) ( )( )( )

( )

1 1

1 1 2 21

2 2 3 3

0 0

0

0

0 0

tt ta

at aa amn n T

w

ma mm mr

h y

+

−

− + −= − − − +

− + −

X X

X X Xw w X X

X X X

( )

3 3

n

rm rr

−

w

X X

(8)

where 𝛼 denotes the learning rate and our optimization problem would be well

solved. Both a threshold for the weight changes between two consecutive iterations

and the maximum iterative times are employed as the iteration stopping criterions in

the iteration.

We further develop an adjusting-parameter method to select the

hyper-parameters, in which the adjusting objectives are the coefficients of the penalty

terms together with the log likelihood part, called individual optimality criterion.

Every time we add one penalty to the likelihood part, the objective individual function

is

min𝑤

− ln(𝐿) + 𝜆𝑗𝑃𝑗 , 𝑗 = 1,2,3 (9)

where L and P denote the likelihood and the penalty, respectively. Then we locate the

order of magnitude of hyper-parameter 𝜆𝑗 and traverse around it by a small step

length, like 10e-4. Under the same optimization method and the same convergence

condition, the optimal hyper-parameter 𝜆𝑗∗ is obtained when the objective function

in Eq.(4) reaches the minimum. The same process would be repeated for each penalty

and we combine them with the likelihood, so the objective function in Eq.(3) becomes

Paper #850341

22

min𝑤

∑(−𝑦𝑖𝒙𝑖𝒘 + ln(1 + 𝑒𝒙𝑖𝒘))

𝑀

𝑖=1

+ 𝜆1∗ ‖𝑿𝑡𝒘𝑡 − 𝑿𝑎𝒘𝑎‖2

2 + 𝜆2∗ ‖𝑿𝑎𝒘𝑎 − 𝑿𝑚𝒘𝑚‖2

2

+ 𝜆3∗ ‖𝑿𝑚𝒘𝑚 − 𝑿𝑟𝒘𝑟‖2

2

(10)

5. Data and Empirical results

5.1 Empirical cases of investment strategy in the first stage

In this section, we will choose several trading strategies for the first stage. To

explain that our proposed method can be used for any type of trading strategy, we

choose 5 types of strategies in the empirical analysis, two of them are coming from

the published literatures using the sample in US stock market, three of them are

designed and developed by ourselves for Chinese stock market. The trading strategies

in our paper include the classes of technical strategy such as candlestick charting

strategy, technical indicator strategy, pair trading strategy, long-side trading strategy

according to statistical arbitrage, and strategy based on forecasting model. All the data

is from WIND dataset.

5.1.1 Candlestick charting strategy in US stock market

Lu et al. (2015) tested a series of candlestick trading strategies with different trend

definitions and holding strategies to find the key profitable factor and applied them on

DJIA constituents. In our empirical study, we adopt the candlestick charting strategy that

based on the three-day reversal patterns of the morning star (MS), the ten-day exponential

moving average (EMA10) trend (Marshall et al., 2006, 2008; Lu et al., 2015) and together

with the Marshall-Young-Rose-10 (MYR-10) holding strategy proposed by Marshall et al.

Paper #850341

23

(2006) and adopted by Lu et al. (2015), which has been proven to be profitable in US

market.

The principle of the three-day morning star pattern is that the downtrend continues

with a long black candle, the second day confirms the pessimistic market conditions with a

downward gap (the second candle can be black or white), and finally the third day closes

at the highest price of all three days (Lu et al., 2015). The pattern is shown below.

𝑃𝑑𝑎𝑦1𝑜 > 𝑃𝑑𝑎𝑦1

𝑐 ; |𝑃𝑑𝑎𝑦2𝑜 − 𝑃𝑑𝑎𝑦2

𝑐 | > 0; 𝑃𝑑𝑎𝑦1𝑐 > 𝑃𝑑𝑎𝑦2

𝑐 𝑎𝑛𝑑 𝑃𝑑𝑎𝑦1𝑐 > 𝑃𝑑𝑎𝑦2

𝑜 ;

𝑃𝑑𝑎𝑦3𝑐 > 𝑃𝑑𝑎𝑦3

𝑜 𝑎𝑛𝑑 𝑃𝑑𝑎𝑦𝑠𝑐 > 𝑃𝑑𝑎𝑦1

𝑐 + (𝑃𝑑𝑎𝑦1𝑜 − 𝑃𝑑𝑎𝑦1

𝑐 )/2, (11)

where 𝑃𝑡𝑐 and 𝑃𝑡

𝑜 denote the closing price and the opening price at day t, respectively.

The EMA10 trend is defined as follows:

𝐸𝑀𝐴10,𝑡 = 𝛼𝑃𝑡𝑐 + (1 − 𝛼)𝐸𝑀𝐴10,𝑡−1 (12)

where α = 2/(10 + 1). When the closing price is more (less) than EMA10, the trend is

upward (downward), and the key feature of EMA10 is that market conditions are always

alternating, moving either up or down. The MYR exit strategy sets a specific day to exit

the market and the MTR-10 holding strategy leads to returns as follows:

𝑅𝑀𝑌𝑅−10 = ln (𝑃𝑡+13

𝑐

𝑃𝑡+4𝑜 ) × 100% (13)

The trading strategies are developed based on the thought that candlestick patterns play a

crucial role in signaling whether a trend will continue or reverse. If a three-day bullish

pattern occurs after downtrends or a three-day bearish pattern occurs after uptrends, it

would be a profitable timing for buying or short selling on the 4th day. Then the holding

strategy determines the exiting rule and the holding period.

Paper #850341

24

Lu et al. (2015) employed a daily sample from 26 component stocks of Dow

Jones Industrial Average (DJIA) index, excluding stocks that failed to exist for the

whole sample period from January 2, 1992 to December 31, 2012. Because of the data

availability, this paper uses 29 component stocks of the DJIA index, also excluding

stocks that failed to exist for the whole sample period from January 3, 2006 to

November 3, 2017. Lu et al. (2015) adjusts the empirical results with a 0.5% total

transaction cost per round turn, while in our trials similar adjustment is not adopted for

keeping a profitable result in the first stage. To make sure the strategy used in this paper

is duplication of the strategy in Lu et al. (2015), we compare the performances of

original and reproduction of this strategy. From the comparison results shown in Table

1, we find they are very similar to each other. First, the mean return and winning ratio

is very close to each. Second, the average number of signals per year is 79 and 73 for

original and reproduction strategy, which is very close to each other. The difference

between the performances of original strategy in the literature and the duplication ones is

mainly due to the difference of underlying stocks and the non-overlapping sample period.

Table 1: Comparison of original and duplicated candlestick charting strategy

Strategy Sample period Number of signals Mean return Winning

Lu et al. (2015) 1992.1.2-2012.12.31 1657 0.07% 53.75%

Duplication 2006.1.3-2017.11.3 875 0.10% 52.34%

Note: Winning denotes the portion of the number of profitable trading in the total trades.

5.1.2 Pair trading strategy in US stock market

Gatev, Goetzmann, Rouwenhorst (GGR, 2006) tested a pair trading strategy

which matched stocks into a pair with minimum distance between normalized

Paper #850341

25

historical prices. Suppose that prices obey a statistical model of the form,

𝑃𝑖𝑡 = 𝛽𝑖𝑗𝑃𝑗𝑡 + 𝜀𝑖𝑡 (14)

where 𝑃𝑖𝑡 denotes the closing price of stock i. GGR (2006) chose a matching partner

for each stock by finding the security that minimizes the sum of squared deviations

between the two normalized price series during a 12-month formation period and they

were traded in the next 6-month trade period. A position would be opened when prices

diverge by more than two historical standard deviations and it would be unwind at the

next crossing of the prices.

GGR(2006) tested all stocks in CRSP using the sample period from January 1962

to December 2012. We also choose all the stocks listed in the New York Stock

Exchange (NYSE) and Nasdaq, but using the sample period from January 2006 to June

2017. We compare the average excess return and observations with excess return lower

than zero of the original and duplicated strategy, shown in Table 2, and the results

show that the average excess returns are quite similar to each other, while the ratio of

observations with excess return lower than zero is larger than that of original strategy

in the literature. We think that this can be explained that this strategy depends on the

period. Recently, the successful rate of this strategy is a little lower than before.

Paper #850341

26

Table 2: Comparison of original and duplicated pair trading strategy

Strategy Sample period Average excess return Observations with excess return < 0

Top 5 Top 20 101-120 Top 5 Top 20 101-120

GGR(2006) 1962.1-2002.12 0.00463 0.00520 0.00503 26% 15% 21%

Duplication 2006.1-2017.06 0.00334 0.00488 0.00418 38% 26% 39%

Notes: The “top n" portfolios include the n pairs with least distance measures, and the portfolio

"101-120" studies the 20 pairs after the top 100. The observations are monthly excess

returns.

5.1.3 Technical indicator strategy in Chinese stock market

We choose one of the most used technical indicator KDJ to make an easy

strategy in stock market. The K index of KDJ is calculated according to the following

equations:

𝐾(𝑛)𝑡 =𝑚−1

𝑚𝐾(𝑛)𝑡−1 +

1

𝑚𝑅𝑆𝑉(𝑛)𝑡 , (15)

RSV(n)𝑡 =𝑃𝑡

𝑐−𝑃𝑡𝑙(𝑛)

𝑃𝑡ℎ(𝑛)−𝑃𝑡

𝑙(𝑛)× 100, (16)

where 𝑃𝑡𝑐, 𝑃𝑡

𝑙(𝑛) and 𝑃𝑡ℎ(𝑛) represent the closing price at time t, the lowest price

and the highest price among n days. To form the technical trading strategy, we set the

parameters as n=9, m=3. The buying signal is generated when 𝐾𝑡−1 < 30 and 𝐾𝑡 > 30,

and the selling signal is generated when 𝐾𝑡−1 > 70 and 𝐾𝑡 < 70. For a specific stock,

two successive buying without a selling between them would not be allowed.

We adopt all the Chinese CSI 300 constituent stocks to do the empirical analysis,

using the study sample from January 2015 to December 2016. The performance of this

strategy presented by net value curves and compared with the corresponding CSI 300

index is shown in Figure 2.

Paper #850341

27

Figure 2: Performance of technical indicator strategy in Chinese stock market

Notes: The red solid line represents the net value curve of the strategy, and the blue dotted line is

the trend of CSI 300 index.

5.1.4 Unilateral statistical analysis strategy in Chinese A-share stock market

For the first stage, we can develop an investment strategy based on our understanding

of Chinese market, for example, forming a pair trading strategy in Chinese stock market.

We find some companies issue stock shares both in Chinese A-share and B-share market,

then there are some arbitrage opportunities between the pairs stocks. We choose this kind

of stocks that issues in the Shanghai Stock Exchange (SSE) and the Pearson correlation

coefficient of prices of each pair is higher than 0.7 as our empirical sample, which leads to

a total of 42 pairs of stocks. In Chinese stock market, short selling is not allowed before

the beginning of securities lending around 2013. Even after the beginning of securities

lending, the cost of short selling is very high. Therefore, we develop a unilateral trading

strategy according to the statistical arbitrage, that is we only trade in Chinese A-share, and

only take the buy-and-hold strategy without using short selling. When the price of an A

share stock is undervalued according to its relationship of prices of A-B shares, we buy

this stock, and sell it when the undervaluation disappears.

Paper #850341

28

When the price spread meets the condition expressed as Eq.(17), i.e., the spread

between prices of A share and B share for stock i is below the lower bound of regular

spread, then we buy the relative lower price A share stock.

𝑑𝑝𝑖𝑡 < 𝜇𝑑𝑝𝑖𝑡− 𝜎𝑑𝑝𝑖𝑡

, (17)

where 𝑑𝑝𝑖𝑡 is the spread between prices of A share and B share, defined as 𝑑𝑝𝑖𝑡 =

𝑃𝑖𝑡𝐴 − 𝑃𝑖𝑡

𝐵, 𝑃𝑖𝑡𝐴 and 𝑃𝑖𝑡

𝐵 are A and B share’s closing price respectively for stock i. The

right part of the inequity is the lower bound of regular spread, which adopts one

standard deviation bound, i.e., 𝜇𝑑𝑝 and 𝜎𝑑𝑝 are the mean and the standard deviation of

price spreads 𝑑𝑝𝑖𝑡 over the last 𝑁 days. Because we adopt closing price to produce

the buying signals for this strategy, the long-side trading activities actually happen in

next day. We assume that we can buy the stocks at their opening prices at t+1 day, also

assume every position must be closed in 𝑇 days. The selling signal for each holding

stock appears when Eq. (18) holds.

𝑑𝑝𝑖(𝑡+𝑗) > 𝜇𝑑𝑝𝑖(𝑡+𝑗), 𝑗 = 2,3, … , 𝑇 + 1. (18)

If the criteria cannot be met in 𝑇 days, the holding stock would be sold on the last

day. We fix 𝑁 = 60 and 𝑇 = 5 in this strategy. The study sample is from January

2010 to December 2016. The performance of this strategy presented by net value curves

and compared with are the corresponding market index trends is shown in Figure 3.

Paper #850341

29

Figure 3: Performance of Chinese A-share undervaluation trading strategy


the trend of Shanghai Composite Index.

5.1.5 Forecasting strategy in Chinese stock market

As we have summarized in literature review of this paper, the forecasting methods

are used to forecast the price movement, and some quantitative trading strategies are

developed based on this type of models. Here, we develop a trading strategy based on a

classic time series model, autoregressive (AR) model, i.e.,

𝑅𝑡 = 𝛽0 + 𝛽1𝑅𝑡−1 + ⋯ + 𝛽𝑝𝑅𝑡−𝑝 + 𝜀𝑡 (13)

where 𝑅𝑡 = ln (𝑃𝑡

𝑃𝑡−5) is log return in the five days, 𝜀𝑡 is the innovations. The lag

order p is determined by AIC with the maximum setting of 5. We take use of the

rolling auto-estimation method to provide the forecasting results. We choose

estimating window of 200 days, and rolling step of 1 day. For each time, the model is

re-estimated and provides 1-step-ahead and 2-step-ahead forecasting. If the stock is

predicted to continuously rise in next two trading days, it is a buying signal, and we

assume this stock will be bought at the opening price of the next day. At the end of the

third day we sell what we hold at the closing price.

The empirical experiment is based on 40 stocks in total, which are SSE 50 Index

Paper #850341

30

constituent stocks by excluding the securities that have one or more days with no trade

over the period from March 10, 2010 to Dec. 30, 2016. Because the estimation window

is 200 observations, the first forecasting result occurs on January 4, 2011. The

performance of this strategy presented by net value curves and compared with the

corresponding market index SSE 50 is shown in Figure 4.

Figure 4: Performance of strategy based on forcasting model


the trend of SSE 50 Index.

5.2 Data and multi-view construction for the second stage

According to the view construction proposed in section 4.2, we discuss the

variables used in different views in our empirical experiment. For the view of target

individual stocks, the quotes information, financial statements information, and the

technical indicators for individual stocks are considered. The quotes information

includes opening price, the highest price, the lowest price, closing price, trading

volume, etc. Financial statements information includes the financial variables such as

market value, gross revenue, net profit, net cash flow, earnings per share, return on

equity, debt-to-assets ratio, etc. The technical indicators can include all the indicators

that can be used to analyze the stock prices, such as the widely used relative strength

Paper #850341

31

indicator (RSI)1, KDJ2, moving average convergence-divergence (MACD)3, bolinger

bands (BOLL)4, stop and reverse indicator(SAR)5, rate of change indicator (RC)6, +DI

in direction movement indicator (PDI)7, bull and bear indicator (BBI)8, momentum

indicator (MTM)9, price and volume trend indicator (PVT)10, bias indicator (BIAS)11,

the active buying volume indicator of orders greater than 1 million RMB (BVI), the

active selling volume indicator of orders greater than 1 million RMB (SVI), and

self-define technical indicator, such as up or down trend indicator (TI)12, relative price

1 In RSI calculation, the number of period is 6.

2 In KDJ calculation, the number of period is 9, and all K, D and J are used.

3 In MACD calculation, the number of long-term period is 26, the number of short-term period is 12,

and the moving average parameter is 9.

4 In BOLL calculation, the moving average parameter is 26 and the bandwidth is double standard

deviation. All upper band, mid line and lower band are used.

5 In SAR calculation, the number of period is 4, the adjusting coefficient is 0.02 and the upper limit of

it is 0.2.

6 𝑅𝐶 =𝑃𝑡

𝑐

𝑃𝑡−50𝑐 × 100%.

7 𝑃𝐷𝐼 = 𝑃𝑡ℎ − 𝑃𝑡−1

ℎ , where 𝑃𝑡ℎ is the highest price for day t. If the value is negative, mark it as 0.

8 𝐵𝐵𝐼 = (𝑀𝐴3 + 𝑀𝐴6 + 𝑀𝐴12 + 𝑀𝐴24)/4, where 𝑀𝐴𝑛 is moving average of n days’ closing price.

9 𝑀𝑇𝑀 = 𝑃𝑡𝑐 − 𝑃𝑡−6

𝑐 , where 𝑃𝑡𝑐 is the closing price.

10 𝑃𝑉𝑇 = ∑ (𝑃𝑡

𝑐−𝑃𝑡𝑐

𝑃𝑡−1𝑐 × 𝑉𝑜𝑙𝑢𝑚𝑒𝑡)𝑇

𝑡=1 , where 𝑃𝑡𝑐 is the closing price and 𝑉𝑜𝑙𝑢𝑚𝑒𝑡 is the trading

volume.

11 𝐵𝐼𝐴𝑆 =𝑃𝑡

𝐶𝑙𝑜𝑠𝑒−𝑀𝐴12

𝑀𝐴12.

12 If 𝑃𝑡𝑐 < 𝑚𝑖𝑛{𝑀𝐴30, 𝑀𝐴90}, the TI is downward; and if 𝑃𝑡

𝐶𝑙𝑜𝑠𝑒 > 𝑚𝑎𝑥{𝑀𝐴30, 𝑀𝐴90}, the TI is

upward.

Paper #850341

32

indicator (RPI)13, falling point indicator (FPI)14, reversion point indicator (RP)15, etc.

For the view of associated individual stocks, besides the quotes information,

financial statements information, and technical indicators mentioned above, we also

bring in different correlation measures and the characteristics of the price spread

between target and associated individual stocks. This is because there are some kinds

of comovement, like contemporaneous correlation or lead-lag relationship between the

pair of stocks, which can contribute more information to price prediction. The

co-movement and lead-lag relationship are measured by different methods. In our

empirical study, we consider the Pearson coefficient, the Granger causality tests,

common factor models used to measure the price discovery of two assets, such as

mortified information share (MIS) model proposed by Lien and Shrestha (2009) based

on Hasbrouck (1995), and PT model proposed by Gonzalo and Granger (1995) to

measure the co-movement and lead-lag relationship. For the price spread, we consider

the mean of the price spread (MSP), standard deviation of the price spread (STDSP),

upper and lower bond of the price spread (UBSP, LBSP)16. The above measures are

estimated using the rolling window method with an estimation window of 120

observations. For the class of strategies like pairs trading and statistical arbitrage

13 𝑅𝑃𝐼 =max{𝑃𝑡−60

𝑐 ,⋯,𝑃𝑡𝑐

}−𝑃𝑡𝑐

max{𝑃𝑡−60𝑐 ,⋯,𝑃𝑡

𝑐}−min{𝑃𝑡−60

𝑐 ,⋯,𝑃𝑡𝑐

}.

14 If 𝑃𝑡𝑐 < 𝑀𝐴5 the 𝐹𝑃𝐼 = 1, otherwise 𝐹𝑃𝐼 = 0.

15 If 𝑀𝐴90 > 𝑚𝑎𝑥{𝑀𝐴5, 𝑀𝐴10, 𝑃𝑡𝑐} and 𝑀𝐴10 < 𝑀𝐴30 the 𝑅𝑃𝐼 = 1, otherwise 𝑅𝑃𝐼 = 0.

16 𝑈𝐵𝑆𝑃 = 𝑀𝑆𝑃 + c × 𝑆𝑇𝐷𝑆𝑃, 𝐿𝐵𝑆𝑃 = 𝑀𝑆𝑃 − c × 𝑆𝑇𝐷𝑆𝑃 , c = 0.5, 1, 1.5, 2, 2.5, 3.

Paper #850341

33

strategy that we can easily find a pair of stocks, then the view of associated individual

stocks includes all the information mentioned above, and the correlation measures are

estimated between the target and the associated individual stocks. For other classes of

strategies that cannot be found a pair of stocks, only correlation measures are included

in this view, and the correlation measures are estimated between the target asset and

market index.

For the view of markets, we bring in a collection of indicators that reflect the trend

and fluctuation of markets, usually index values and technical indicators of market

indices. For example, we incorporate Shanghai Composite Index, SSE 50 Index and

CSI 300 Index, and Shenwan industry sector indices in Chinese stock market, and DJIA

and 10 sector indices17 in US stock market. We adopt the index value and some

technical indicators of index, such as moving average (MA), TI, FPI, RPI, etc.

The view of risks includes various risk measures of securities and market indices,

which may affect the stability of an investment strategy. We bring in several measures

in our empirical study, such as the total risk measure and market risk measure. We

adopt price volatility of stock prices and market indices estimated by exponentially

weighted moving average model (EWMA), and standard deviation of daily returns in

26 trading days of individual stocks. The market risk is measured by beta coefficient

estimated using market index model using 60 months returns series data. Besides, we

17 sector indices include Dow Jones US Basic Materials Index, Dow Jones US Consumer Goods Index, Dow Jones

US Consumer Services, Dow Jones US Financials Index, Dow Jones US Health Care Index, Dow Jones US

Industrials Index, Dow Jones US Oil & Gas Index, Dow Jones US Technology Index, Dow Jones US

Telecommunications Index and Dow Jones US Utilities Index

Paper #850341

34

also bring the standard deviation of trading volume using 10 trading days into the view

of risk. All the data is from the WIND database.

5.3 Investment strategy optimization in second stage

Based on the trading signals produced by original strategies in the first stage, we

use multi-view learning algorithm to improve the performances of them by keeping the

signals with a high possibility of gaining profit, and excluding the ones that are

predicted to make loss. Therefore, we can use the success rates and average return for

signals to evaluate the performance of each trading strategy.

To validate the proposed method framework and multi-view learning logistic

regression model (MultiLR), we compare the performance of optimized strategy after

the second stage with the performance of original strategy formed in the first stage.

Moreover, to better understand the importance and contribution of different views for

the strategy improvement, we also set up several single-view logistic regression

models. We can apply the logistic regression model only with features in one view,

like target security view, associated asset view, market view or risk view, and these

models are recorded as TVLR, AVLR, MVLR and RVLR model, respectively. Then,

compare the results of MultiLR model with TVLR, AVLR, MVLR and RVLR models.

Third, to confirm that multi-view learning method in the second stage can contribute to

the performance improvement of trading strategies, we compare MultiLR model with

other benchmark models, such as the single-view model and random forest

classification method (RF), which shares the same information of multi-view learning

Paper #850341

35

model. We employ logistic regression model with all features in all views, i.e.,

combining all features together as a single view, and set up a single-view model,

recorded as LR model. To set up and estimate the models in the second stage, we

utilize the max normalization for all variables.

The empirical results of the above comparisons of different trading strategies are

reported in Table 3 to Table 7. Through the comparisons of before and after using the

proposed optimization method in this paper, multi-view learning methods with

single-view methods and the random forest classification, we get the following results.

First, our method can improve the performance dramatically of original investment

strategy through MultiLR model, by increasing the success rate from 6% to 17%, and

significant increase of average return for trading signals.

Second, through the comparisons of MultiLR model with models on each view,

we find different kinds of trading strategies depend on different information. For the

statistical arbitrage, characteristics of the target stock and its associated stock play an

important role, while the features in risk view and market view perform slightly

weaker, which is hold for two strategies in this paper no matter in US market or

Chinese market, shown in Table 4 and Table 6. This is consistent with the literatures,

because perfect statistical arbitrage strategy and pair trading strategy should have

excluded the market risk and not be affected by fundamental information. Our

empirical results show that even unperfect statistical arbitrage is barely affected by

market information and other risks, shown in Table 6. As for the technical trading

strategies and forecasting strategy, the results seem similar that the characteristics of

Paper #850341

36

the trading security itself and the market or the risks have more contribution to

performance improvement. The results can make us better understand each kind of

trading strategy and have much important implications for risk management for

investment, that is our method tells what kind of information we should follow and

bring into consideration.

Third, through comparisons of the multi-view learning method with single-view

methods, we prove that the multi-view learning method contributes more than other

methods to performance improvement for strategies. MultiLR model can outperform

LR model for all strategies in this paper, while MultiLR model can outperform RF

model in four of five strategies with only one exception for the forecasting strategy in

Chinese stock market. In this strategy, MultiLR model performs quite similar as good

as, but a little bit worse than RF model. Even though, the advantage of MultiLR model

proposed in this paper is larger than RF model, because our multi-view method has

better explanation ability than random forecast classification algorithm, which is

totally a black box. Also, the results of our multi-view methods are more stable than

that of random classification algorithm due to the randomness while training.

Table 3: Comparisons of our methods with other methods using candlestick charting

strategy in US market

Candlestick charting strategy in US stock market

Strategy RF TVLR AVLR MVLR RVLR LR MultiLR

No. of trading signals 173 119 102 143 159 16 121 68

No. of profitable signals 94 66 58 80 88 10 67 42

No. of failing signals 79 53 44 63 71 6 54 26

Success rate (%) 54.34 55.46 56.86 55.94 55.35 62.5 55.37 61.76

Signal average return (%) 0.27 0.34 0.31 0.35 0.30 1.11 0.18 0.56

Notes: From January of 2006 to November of 2017 the candlestick pattern strategy signaled 875

times. According to the time order, the first 702 trading signals are the training set and the

other 173 signals form the testing set. The table only shows the results from testing set.

Paper #850341

37

Table 4: Comparisons of our methods with other methods using pair trading strategy in

US market

Pair trading strategy in US stock market





Success rate (%) 54.07 56.52 58.51 57.14 54.08 56.90 60.47 61.11

Signal average return (%) 0.07 -0.01 0.09 0.11 0.15 0.13 -0.20 0.17

Notes: From January of 2006 to June of 2017, the top 20 portfolio of the pair trading strategy had

23 6-month trade periods and signaled 1,171 times. The 135 signals from the last three 6-month

trade periods form the testing set. The table only shows the results from testing set.

Table 5: Comparisons of our methods with other methods using technical trading

strategy in Chinese stock market Technical indicator strategy in Chinese stock market





Success rate (%) 55.40 61.62 59.39 50.00 59.74 N/A 59.62 61.84

Signal average return (%) 1.63 2.56 3.28 -0.59 2.93 N/A 2.97 3.77

Notes: From January of 2015 to December of 2016, the technical trading strategy in Chinese

market signaled 2,726 times (we choose a profitable period). According to the time order, the first

1,800 trading signals are the training set and the other 926 signals form the testing set. The table

only shows the results from testing set.

Table 6: Comparisons of our methods with other methods using Chinese A-share

undervaluation trading strategy Unilateral statistical analysis strategy in Chinese A-share stock market


No. of signals 3123 1475 62 290 0 1 399 297



Success rate (%) 52.96 61.69 64.52 64.83 N/A 100.00 65.66 70.71

Signal average return (%) 0.69 1.57 3.74 3.45 N/A 18.27 3.88 4.76

Notes: From January of 2010 to December of 2016, Chinese A-share undervaluation strategy

signaled 18,123 times. According to the time order, the first 15,000 trading signals are the training

set and the other 3,123 signals form the testing set. The table only shows the results from testing

set.

Paper #850341

38

Table 7: Comparisons of our methods with other methods using forecasting strategy in

Chinese stock market Forecasting strategy in Chinese stock market





Success rate (%) 52.50 62.83 57.90 N/A 54.04 N/A 60.51 61.38

Signal average return (%) 0.15 0.52 0.24 N/A 0.05 N/A 0.19 0.21

Notes: From January of 2011 to December of 2016 the forecasting strategy in Chinese market

signaled 21,463 times. According to the time order, the first 19,000 trading signals are the training

set and the other 2,463 signals form the testing set. The table only shows the results from testing

set.

6. Conclusion

This paper works on an important issue related to how to integrate the

domain-specific knowledge in finance and methods of big data analytics and artificial

intelligence. This paper proposes a novel method for investment strategy building,

called two-stage multi-view learning method, which provides a new way to solve the

integration issue and avoid the problems caused by conventional methods that try to

aggregate all the information together in a very complex model, such as the high

requirement and high modelling cost. Moreover, the method can provide some

explanation power for strategy understanding, which artificial intelligence methods

lack.

We choose five different types of strategies, including candlestick charting

strategy, pair trading strategy, technical indicator strategy, long-side statistical analysis

strategy, and forecasting strategy, and adopt data from different markets to test our

method. The empirical results show that our method improves the performance

dramatically for all testing strategies with an increase of success rate about 6% to 17%

Paper #850341

39

and average return about 1.4 to 6.89 times for trading signals. Moreover, our empirical

results reveal that the multi-view learning method outperforms single-view methods in

building investment strategies. In addition, the empirical results confirm that our

method can provide some explanatory power for understanding what kinds of

information contribute the improvement of investment strategy.

The method proposed by this paper contributes to not only the academic research

but also the industry application. Our method can be extended easily to investigate

other financial market issues, because the main idea of our method is that by separating

the domain knowledge and different models or methods in different stage, it can solve

the integration issue better than other ways with a very complex model. Furthermore,

our method is suitable for any types of investment strategies, such as fundamental

analysis strategy, risk premia investing strategy, technical analysis strategy, merging of

fundamental and quantitative investment styles, quantitative investment strategy based

on models, etc. Because of the significant improvement of investment strategies

evaluated by success rate and average return for trading signals brought by our method,

our two-stage multi-view learning method is of great value for industry practice.

Paper #850341

40

Acknowledgments

This work is supported by grants from the National Natural Science Foundation

of China [grant numbers 71671012, 71373001, 71701007, 71531001], National High

Technology Research and Development Program of China (SS2014AA012303), and

Fundamental Research Funds for the Central Universities (Junjie Wu).

References

Asness C S, Moskowitz T J, Pedersen L H. Value and momentum everywhere[J]. The

Journal of Finance, 2013, 68(3): 929-985.

Bickel S, Scheffer T. Multi-view clustering[C]//ICDM. 2004, 4: 19-26.

Blum A, Mitchell T. Combining labeled and unlabeled data with co-training[C]//

Proceedings of the eleventh annual conference on Computational learning theory.

ACM, 1998: 92-100.

Bollen J, Mao H, Zeng X. Twitter mood predicts the stock market[J]. Journal of

computational science, 2011, 2(1): 1-8.

Brefeld U, Büscher C, Scheffer T. Multi-view discriminative sequential learning[C]//

ECML. 2005, 3720: 60-71.

Brefeld U, Gärtner T, Scheffer T, Wrobel S. Efficient co-regularised least squares

regression[C]//Proceedings of the 23rd international conference on Machine

learning. ACM, 2006: 137-144.

Paper #850341

41

Brefeld U, Scheffer T. Co-EM support vector learning[C]//Proceedings of the

twenty-first international conference on Machine learning. ACM, 2004: 16.

Caginalp G, Laurent H. The predictive power of price patterns[J]. Applied

Mathematical Finance, 1998, 5: 181-205.

Chen M, Chen Y, Weinberger K Q. Automatic feature decomposition for single view

co-training[C]//Proceedings of the 28th International Conference on Machine

Learning (ICML-11). 2011: 953-960.

Cheng J, Wang K. Active learning for image retrieval with Co-SVM[J]. Pattern

recognition, 2007, 40(1): 330-334.

Das P, Banerjee A. Meta optimization and its application to portfolio selection[C]//

Proceedings of the 17th ACM SIGKDD international conference on Knowledge

discovery and data mining. ACM, 2011: 1163-1171.

De Bondt W F M, Thaler R. Does the stock market overreact?[J]. The Journal of

Finance, 1985: 793-805.

Dhillon P, Foster D P, Ungar L H. Multi-view learning of word embeddings via

cca[C]//Advances in Neural Information Processing Systems. 2011: 199-207.

Dunis C, Zhou B. Nonlinear modelling of high frequency financial time series[M].

John Wiley & Sons Inc, 1998.

Elliot R, Van Der Hoek J, Malcolm W. Pairs trading[J]. Quantitative Finance. 2005,

(5): 271–276.

Paper #850341

42

Fama E F, French K R. A five-factor asset pricing model[J]. Journal of Financial

Economics, 2015, 116(1): 1-22.

Fama E F, French K R. Common risk factors in the returns on stocks and bonds[J].

Journal of Financial Economics, 1993, 33(1): 3-56.

Fama E F, French K R. Multifactor explanations of asset pricing anomalies[J]. The

Journal of Finance, 1996, 51(1): 55-84.

Focardi S M, Fabozzi F J, Mitov I K. A new approach to statistical arbitrage:

Strategies based on dynamic factor models of prices and their performance[J].

Journal of Banking & Finance, 2016, 65: 134-155.

Gatev E, Goetzmann W N, Rouwenhorst K G. Pairs trading: Performance of a

relative-value arbitrage rule[J]. The Review of Financial Studies, 2006, 19(3):

797-827.

Hamid S A, Iqbal Z. Using neural networks for forecasting volatility of S&P 500

Index futures prices[J]. Journal of Business Research, 2004, 57(10): 1116-1125.

Han Y, Zhou G, Zhu Y. A trend factor: Any economic gains from using information

over investment horizons?[J]. Journal of Financial Economics, 2016, 122(2):

352-375.

Hogan S, Jarrow R, Teo M, Warachka M. Testing market efficiency using statistical

arbitrage with applications to momentum and value strategies[J]. Journal of

Financial Economics, 2004, 73(3): 525-565.

Jegadeesh N, Titman S. Returns to buying winners and selling losers: Implications for

stock market efficiency[J]. The Journal of Finance, 1993, 48(1): 65-91.

Paper #850341

43

Jegadeesh N. Evidence of predictable behavior of security returns[J]. The Journal of

Finance, 1990, 45(3): 881-898.

Kamijo K, Tanigawa T. Stock price pattern recognition-a recurrent neural network

approach[C]//Neural Networks, 1990., 1990 IJCNN International Joint

Conference on. IEEE, 1990: 215-221.

Kembhavi A, Siddiquie B, Miezianko R, et al. Incremental multiple kernel learning

for object recognition[C]// IEEE 12th International Conference on Computer

Vision. IEEE, 2009: 638-645.

Kolanovic M, Krishnamachari, R T. Big data and AI strategies: Machine learning and

alternative data approach to investing[R]. J.P. Morgan, 2017.

Kumar A, Daumé H. A co-training approach for multi-view spectral clustering[C]//

Proceedings of the 28th International Conference on Machine Learning

(ICML-11). 2011: 393-400.

Kumar A, Rai P, Daume H. Co-regularized multi-view spectral

clustering[C]//Advances in neural information processing systems. 2011:

1413-1421.

Lehmann B N. Fads, martingales, and market efficiency[J]. The Quarterly Journal of

Economics, 1990, 105(1): 1-28.

Li J, Bu H, Wu J. Sentiment-aware stock market prediction: A deep learning

method[C]//Service Systems and Service Management (ICSSSM), 2017

International Conference on. IEEE, 2017: 1-6.

Paper #850341

44

Lin C S, Chiu S H, Lin T Y. Empirical mode decomposition–based least squares

support vector regression for foreign exchange rate forecasting[J]. Economic

Modelling, 2012, 29(6): 2583-2590.

Lin Y Y, Liu T L, Fuh C S. Local ensemble kernel learning for object category

recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition

(CVPR'07). IEEE, 2007: 1-8.

Lintner J. The valuation of risk assets and the selection of risky investments in stock

portfolios and capital budgets[J]. The Review of Economics and Statistics, 1965:

13-37.

Liu Y, Zheng Y, Liang Y, Liu S M, Rosenblum D S. Urban water quality prediction

based on multi-task multi-view learning[C]//Proceedings of the Twenty-Fifth

International Joint Conference on Artificial Intelligence. AAAI Press, 2016:

2576-2582.

Lo A W, MacKinlay A C. Stock market prices do not follow random walks: Evidence

from a simple specification test[J]. The Review of Financial Studies, 1988, 1(1):

41-66.

Lo A W, MacKinlay A C. When are contrarian profits due to stock market

overreaction?[J]. The Review of Financial Studies, 1990, 3(2): 175-205.

Lo A W, Mamaysky H, Wang J. Foundations of technical analysis: Computational

algorithms, statistical inference, and empirical implementation[J]. The Journal of

Finance, 2000, 55(4): 1705-1765.

Paper #850341

45

Longworth C, Gales M J F. Multiple kernel learning for speaker verification[C]//IEEE

International Conference on Acoustics Speech and Signal Processing. IEEE, 2008:

1581-1584.

Lu T H, Chen Y C, Hsu Y C. Trend definition or holding strategy: What determines

the profitability of candlestick charting?[J]. Journal of Banking & Finance, 2015,

61: 172-183.

Lu T H, Shiu Y M, Liu T C. Profitable candlestick trading strategies—The evidence

from a new perspective[J]. Review of Financial Economics, 2012, 21(2): 63-68.

Lu T H. The profitability of candlestick charting in the Taiwan stock market[J].

Pacific-Basin Finance Journal, 2014, 26: 65-78.

Luo L, Chen X. Integrating piecewise linear representation and weighted support

vector machine for stock trading signal prediction[J]. Applied Soft Computing,

2013, 13(2): 806-816.

Marshall B R, Young M R, Cahan R. Are candlestick technical trading strategies

profitable in the Japanese equity market?[J]. Review of Quantitative Finance and

Accounting, 2008, 31(2): 191-207.

Marshall B R, Young M R, Rose L C. Candlestick technical trading strategies: can

they create value for investors?[J]. Journal of Banking & Finance, 2006, 30(8):

2303-2323.

Moskowitz T J, Ooi Y H, Pedersen L H. Time series momentum[J]. Journal of

Financial Economics, 2012, 104(2): 228-250.

Paper #850341

46

Nevmyvaka Y, Feng Y, Kearns M. Reinforcement learning for optimized trade

execution[C]//Proceedings of the 23rd international conference on Machine

learning. ACM, 2006: 673-680.

Nigam K, Ghani R. Analyzing the effectiveness and applicability of

co-training[C]//Proceedings of the ninth international conference on Information

and knowledge management. ACM, 2000: 86-93.

Nyberg H. Predicting bear and bull stock markets with dynamic binary time series

models[J]. Journal of Banking & Finance, 2013, 37(9), 3351-3363.

Pástor Ľ, Stambaugh R F. Liquidity risk and expected stock returns[J]. Journal of

Political Economy, 2003, 111(3): 642-685.

Refenes A P. Neural networks in the capital markets[M]. John Wiley & Sons, Inc.,

1994.

Ross S A. The arbitrage theory of capital asset pricing[J]. Journal of Economic

Theory, 1976, 13(3): 341-360.

Sharpe W F. Capital asset prices: A theory of market equilibrium under conditions of

risk[J]. The Journal of Finance, 1964, 19(3): 425-442.

Sindhwani V, Niyogi P, Belkin M. A co-regularization approach to semi-supervised

learning with multiple views[C]//Proceedings of ICML workshop on learning with

multiple views. 2005: 74-79.

Sindhwani V, Rosenberg D S. An RKHS for multi-view learning and manifold

co-regularization[C]//Proceedings of the 25th international conference on

Machine learning. ACM, 2008: 976-983.

Paper #850341

47

Tao D, Tang X, Li X, Wu X. Asymmetric bagging and random subspace for support

vector machines-based relevance feedback in image retrieval[J]. IEEE

transactions on pattern analysis and machine intelligence, 2006, 28(7): 1088-1099.

Varma M, Ray D. Learning the discriminative power-invariance trade-off[C]//

International conference on computer vision, 2007: 1-8.

Wang J J, Wang J Z, Zhang Z G, Guo S P. Stock index forecasting based on a hybrid

model[J]. Omega, 2012, 40(6): 758-766.

Wang Z, Chen S, Gao D. A novel multi-view learning developed from single-view

patterns[J]. Pattern Recognition, 2011, 44(10): 2395-2413.

Xiong R, Nichols E P, Shen Y. Deep learning stock volatility with google domestic

trends[J]. arXiv preprint arXiv:1512.04916, 2015.

Xu C, Tao D, Xu C. A survey on multi-view learning[J]. workin paper at

arXiv:1304.5634, 2013.

Yu S, Krishnapuram B, Rosales R, Rao R. Bayesian co-training[J]. Journal of

Machine Learning Research, 2011, 12(Sep): 2649-2680.

Zhang L, Zhang L, Tao D, Huang X. On combining multiple features for

hyperspectral remote sensing image classification[J]. IEEE Transactions on

Geoscience and Remote Sensing, 2012, 50(3): 879-893.

Zhang W, Zhang K, Gu P, Xue X. Multi-View Embedding Learning for Incompletely

Labeled Data[C]//IJCAI. 2013: 1910-1916.

Paper #850341

48

Zheng Y, Yi X, Li M, Chang E. Forecasting fine-grained air quality based on big

data[C]//Proceedings of the 21th ACM SIGKDD International Conference on

Knowledge Discovery and Data Mining. ACM, 2015: 2267-2276.

Zhou Z, Chen K, Jiang Y. Exploiting unlabeled data in content-based image

retrieval[J]. Lecture Notes in Computer Science, 2004: 525-536.

Zhou Z H, Li M. Semi-Supervised Regression with Co-Training[C]//IJCAI. 2005, 5:

908-913.

Paper #850341