Modeling Investor Behavior Using Machine Learning: Mean … · 2019. 12. 26. · Research Article...

15
Research Article Modeling Investor Behavior Using Machine Learning: Mean-Reversion and Momentum Trading Strategies Thiago Christiano Silva , 1,2 Benjamin Miranda Tabak , 3 and Idamar Magalhães Ferreira 3 1 Universidade Cat´ olica de Bras´ ılia, Distrito Federal, Brazil 2 Department of Computing and Mathematics, Faculty of Philosophy, Sciences, and Literatures in Ribeirão Preto, Universidade de São Paulo, São Paulo, Brazil 3 FGV/EPPG Escola de Pol´ ıticas P´ ublicas e Governo, Fundação Get´ ulio Vargas, School of Public Policy and Government, Getulio Vargas Foundation, Distrito Federal, Brazil Correspondence should be addressed to Benjamin Miranda Tabak; [email protected] Received 27 August 2019; Revised 9 November 2019; Accepted 21 November 2019; Published 26 December 2019 Academic Editor: Jos´ e Manuel Gal´ an Copyright © 2019 iago Christiano Silva et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We model investor behavior by training machine learning techniques with financial data comprising more than 13,000 investors of a large bank in Brazil over 2016 to 2018. We take high-frequency data on every sell or buy operation of these investors on a daily basis, allowing us to fully track these investment decisions over time. We then analyze whether these investment changes correlate with the IBOVESPA index. We find that investors decide their investment strategies using recent past price changes. ere is some degree of heterogeneity in investment decisions. Overall, we find evidence of mean-reverting investment strategies. We also find evidence that female investors and higher academic degree have a less pronounced mean-reverting strategy behavior com- paratively to male investors and those with lower academic degree. Finally, this paper provides a general methodological approach to mitigate potential biases arising from ad-hoc design decisions of discarding or introducing variables in empirical econometrics. For that, we use feature selection techniques from machine learning to identify relevant variables in an objective and concise way. 1. Introduction is paper studies the determinants of investors’ behavior in the stock market using transaction-level data on buy and sell operations of investors. Our data contains detailed in- formation of the investor’s identity and her socioeconomic characteristics, the investment value, and variation due to the buy or sell operation over time. e data is confidential and comes from a large and representative Brazilian bank. With this rich dataset, we are able to study how investors respond to changes in the Brazilian stock market due to variations of its market index, called IBOVESPA. We use historical variations of the IBOVESPA index with different horizons (window length) to test which one better predicts the investors’ behavior. To mitigate potential concerns due to subjective de- cisions by the analyst—and also to prevent discarding a potentially relevant predictor—we opt to use an objective approach to identify those horizons that best explain in- vestors’ buy or sell operations. For that, we use a robust feature selection technique borrowed from the machine learning literature called elastic net. e great advantage of the elastic net comes by the simplicity of its loss function (just like a regression) and also the robustness in preventing overfitting by optimally using a convex combination of the Lasso and Ridge regularization methods. Overfitting can occur as the algorithm may learn the dynamics of the variable of interest and fit very well the training dataset but with poor predictability in other datasets. Evaluating the potential for overfitting is essential for researchers as it may undermine the model. We understand that our method seeks to avoid, to some extent, the perils of overfitting. e Ridge and the Lasso algorithms impose penalties for large weights in the model [1]. In this way, they tend to reduce the Hindawi Complexity Volume 2019, Article ID 4325125, 14 pages https://doi.org/10.1155/2019/4325125

Transcript of Modeling Investor Behavior Using Machine Learning: Mean … · 2019. 12. 26. · Research Article...

Page 1: Modeling Investor Behavior Using Machine Learning: Mean … · 2019. 12. 26. · Research Article Modeling Investor Behavior Using Machine Learning: Mean-Reversion and Momentum Trading

Research ArticleModeling Investor Behavior Using Machine LearningMean-Reversion and Momentum Trading Strategies

Thiago Christiano Silva 12 Benjamin Miranda Tabak 3

and Idamar Magalhatildees Ferreira3

1Universidade Catolica de Brasılia Distrito Federal Brazil2Department of Computing and Mathematics Faculty of Philosophy Sciences and Literatures in Ribeiratildeo PretoUniversidade de Satildeo Paulo Satildeo Paulo Brazil3FGVEPPG Escola de Polıticas Publicas e Governo Fundaccedilatildeo Getulio Vargas School of Public Policy and GovernmentGetulio Vargas Foundation Distrito Federal Brazil

Correspondence should be addressed to Benjamin Miranda Tabak benjaminmtabakgmailcom

Received 27 August 2019 Revised 9 November 2019 Accepted 21 November 2019 Published 26 December 2019

Academic Editor Jose Manuel Galan

Copyright copy 2019shyiago Christiano Silva et alshyis is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

We model investor behavior by training machine learning techniques with nancial data comprising more than 13000 investorsof a large bank in Brazil over 2016 to 2018 We take high-frequency data on every sell or buy operation of these investors on a dailybasis allowing us to fully track these investment decisions over timeWe then analyze whether these investment changes correlatewith the IBOVESPA indexWe nd that investors decide their investment strategies using recent past price changesshyere is somedegree of heterogeneity in investment decisions Overall we nd evidence of mean-reverting investment strategies We also ndevidence that female investors and higher academic degree have a less pronounced mean-reverting strategy behavior com-paratively to male investors and those with lower academic degree Finally this paper provides a general methodological approachto mitigate potential biases arising from ad-hoc design decisions of discarding or introducing variables in empirical econometricsFor that we use feature selection techniques from machine learning to identify relevant variables in an objective and concise way

1 Introduction

shyis paper studies the determinants of investorsrsquo behavior inthe stock market using transaction-level data on buy and selloperations of investors Our data contains detailed in-formation of the investorrsquos identity and her socioeconomiccharacteristics the investment value and variation due tothe buy or sell operation over time shye data is condentialand comes from a large and representative Brazilian bankWith this rich dataset we are able to study how investorsrespond to changes in the Brazilian stock market due tovariations of its market index called IBOVESPA We usehistorical variations of the IBOVESPA index with dierenthorizons (window length) to test which one better predictsthe investorsrsquo behavior

To mitigate potential concerns due to subjective de-cisions by the analystmdashand also to prevent discarding a

potentially relevant predictormdashwe opt to use an objectiveapproach to identify those horizons that best explain in-vestorsrsquo buy or sell operations For that we use a robustfeature selection technique borrowed from the machinelearning literature called elastic net shye great advantage ofthe elastic net comes by the simplicity of its loss function(just like a regression) and also the robustness in preventingovertting by optimally using a convex combination of theLasso and Ridge regularization methods Overtting canoccur as the algorithm may learn the dynamics of thevariable of interest and t very well the training dataset butwith poor predictability in other datasets Evaluating thepotential for overtting is essential for researchers as it mayundermine the model We understand that our methodseeks to avoid to some extent the perils of overtting shyeRidge and the Lasso algorithms impose penalties for largeweights in the model [1] In this way they tend to reduce the

HindawiComplexityVolume 2019 Article ID 4325125 14 pageshttpsdoiorg10115520194325125

modelrsquos complexity and hence are able to minimize concernsabout overfitting

Investors tend to trade using different strategies such asbuy-and-hold (passive strategy) or an active strategy inwhich they seek to outperform a benchmark for example amarket index If investors trade using active strategies theymay use two different and well-known approaches a mean-reverting or a momentum strategy See [2ndash4] for seminalcontributions in these two strategies

In the first case they react to market swings by bettingthat the market will mean-revert +erefore they assume thetrend will change and therefore will sell after substantialupward changes and buy after downward changes In themomentum strategy they will bet that the trend will persist+us they will increase investments in the stock market afteran increase in the market index

While we understand that they may be other tradingstrategies we focus on the mean-reversion and momentumstrategies because they are well established in the literatureand serve as building blocks of many other strategies +ereis a large body of the literature that discusses their use indifferent contexts [5ndash11] In addition they are easily testablein empirical specifications +erefore we seek to understandif investors decide to hold their stocks or sell them afternegativepositive shocks

+e issue of how investors will behave on average isempirical +ere is a large body of the literature that discussespredictability for the stockmarket [12ndash16] In addition there isanother strand focusing on cognitive biases and excess tradingon equity and other financial assets [17ndash23] Our data con-taining transaction-level operations of buy and sell operationpermit us to follow each investorsrsquo decisions over time andtherefore test whether they use mean-reverting or momentumstrategy after changes in the stock market index

It is essential to notice that if traders use such strategiesthey may induce higher volatility in the market with theiractions In theory market changes should occur as newinformation arrives which is economically relevant to es-timate future profits and dividend distribution Howeverprice substantially changes over time and volatility is higherthan we would expect in a rational market +erefore weassume that the tradersrsquo decision to trade excessively willinduce higher volatility in the market Investorsrsquo decisionsthat follow different trading strategies may generate complexpatterns in prices and volatility+eymay induce long-rangecorrelation short-term predictability and chaotic dynamicsin prices over time+ere is a large body of the literature thatattempts to explain complex macrobehavior of systemsusing a composition of local rules For that agent-basedmodelling has been extensively used to explain price andvolatility using artificial markets [24] Using agent-basedmodelling LeBaron [25] explores structural (macro) fea-tures that emerge in a market where participants adapt andevolve over time while Bertella et al [26] study the effect ofinvestorrsquos behavioral bias in prices Understanding howinvestors behave and perform trading strategies is the firststep for better understanding the complexity that is intrinsicto financial markets Our paper also contributes to thismatter

To identify the most relevant predictors that explaininvestorsrsquo behavior we depart from using traditional panel-data econometric techniques and goodness-of-fit measuresand instead employ more robust methodologies borrowedfrom the machine learning literature Contrasting to usualeconometrics techniques that summarize relationships usinglinear regression analysis machine learning offers a set oftools that can potentially capture nonlinear relationshipsbetween the data According to Varian [27] bridging the gapbetween machine learning and econometrics is a naturaltendency mainly because of the presence of large amount ofdata and the rising complexitymdashpotentially highly non-linearmdashbetween data relationships Our work contributes tothis endeavour by providing a real case study of a financialdataset using machine learning techniques

Comparatively to econometrics machine learningtechniques have strong model selection techniques mainlythrough the use of cross-validation techniques which are atype of repeated resampling in random subsets of the datasetInitially the cross-validation procedure divides the data setinto two disjoint and complete subsets the training set andthe test set All the modelrsquos parameters are tuned using onlythe training set After the model is selected (tuned) using thetraining data we run it against the test set to check itsaccuracy or some other performance metric +e rationale isthat by training the model with some data and testingagainst another subset we are estimating the modelrsquos out-of-sample prediction power and not simply learning the data+e test set therefore would be a simulation of real (pro-duction) data and the modelrsquos performance on this datasetwould represent a rough estimate of actual performance ofthe model in real unseen data

Since our data set comprises more than 350000 ob-servations representing individual investorrsquos movementswith respect to their investments over 2016 to 2018 we applyregularization techniques to prevent modelrsquos overfittingduring the feature selection procedure with training dataFor that we apply an elastic net procedure [1] to control forthe modelrsquos complexity Elastic net is a generalization of theRidge (-norm) and Lasso (L1-norm) and hence is morerobust It uses an optimal convex combination of both typesof regularization Lasso tends to shrink the majority of thenonrelevant regressors to zero while keeping only the mostimportant regressors as nonzero In contrast Ridge tends tooutput nonzero coefficients for almost all regressors Byusing both regularization schemes we are able to enjoy thepositive characteristics of both schemes

Regularization is an important issue in large data setsbecause it prevents methods with high variance and low biasfrom overfitting [28 29] +is is the well-known bias-var-iance trade off in the machine learning literature [30] Whilelow bias prevents overfitting it can generate underfitting ofthe data set In contrast high-variance methods can learnnoise from the data and let go the true relationships of thedata set Low bias favors low model complexity at the cost ofa potential overfitting High variance tends to successfullycapture smoothly nonlinear relationships between the dataat the expense of a potential overfitting Examples of low-bias algorithms are the linear regression or neural networks

2 Complexity

with a single layer Examples of high-variance algorithms aredecision trees and multilayer neural networks It is im-portant to first set the rationale behind the regularizationprocess from the viewpoint of our financial data set of buyand sell operations On the one hand a strand in the eco-nomics literature advocates that the agentsrsquo decisions arecompletely rational in that decisions are taken by consid-ering all information from the market (complete in-formation) [31] On the other hand another body of theliterature argues that investors cannot potentially considerevery single information from the market when taking theirdecisions because (i) the agent does not have completeinformation and (ii) even if the agent did have completeinformation she would be unable to perform all requiredcalculations In this way they would naturally focus on themost relevant variables In this case we say that investorshave a bounded rationality term first coined by Simon [32]We can frame these two theories into the two types ofregularization frameworks used in this paper Investors withunbounded rationality ie that consider all potential var-iables would better be modeled by a Ridge regularizationprocedure because it does not tend to place zero importanceon any variables In contrast investors with bounded ra-tionality would be better modeled by a Lasso regularizationbecause it would choose a few (and more relevant) variablesand set the remainder as zero By using a weighted convexcombination of both Ridge and Lasso regularization pro-cedures we are effectively considering both cases in ourestimation process

While Brazil does not have well-developed stock marketsas advanced economies it is an important emerging countrythat due to its size and relative importance to its peersdeserves to be studied In addition capital markets have beenincreasing in the last years (according to the BMampFBovespawhich is the Brazilian stock exchange the number of in-vestors increased almost 20 from 2017 to 2018) whichstrengthen the relevance of our work Our main resultssuggest that investors use a mean-reverting trading strategy+erefore they reduce their investments after positivechanges in the IBOVESPA and increase it after negativechanges

We also test whether investorsrsquo biological and socio-economic characteristics explain their trading behavior Interms of schooling educated investors in theory shouldbehave in more rational ways and trade less frequently whenthere is no new information arriving continuously in themarket at least those that are not relevant regarding po-tential for future profits +erefore we would expect theseinvestors to have a smaller reaction to price fluctuations Wealso test dissimilarities in investment decision makingarising from the gender Neyse et al [33] and Lundeberget al [34] partly attribute investment differences amongmales and females due to systematic changes in over-confidence Excessive overconfidence is associated withhigher levels of testosterone which is more pronounced inmales Overconfidence may induce investors to take onhigher risks leading them to look for higher returns in theshort term In this way we would expect a less sensitivebehavior of females to changes of past IBOVESPA variations

as they would value more fundamentals and look for yieldsin the longer term Our empirical analyses corroborate theseviews

Several papers have studied investor behavior Onish-chenko and Ulku [35] show a change in foreign investorswhich have become more sophisticated +ey find thatforeign investors in Korea do not chase returns as theprevious literature normally reports +eir results suggest atransition from positive to negative feedback trading overtime Abreu [36] finds that investors that buy warrants havespecific characteristics such as young age and less educatedor investors with gambling attitudes (and overconfidence)(see also [37ndash41]) To the best of our knowledge our paper isone of the first that uses machine-learning techniques tounveil what are the characteristics that matter the most forexplaining investor behavior at the disaggregated level Westudy the reaction of investors to market changes and testwhether they employ momentum or mean-revertingstrategies

2 Data

We collect and match several unique proprietary and publicdatasets Our sample consists of public information from theIBOVESPA index investor-specific information and aproprietary customer database from a large Brazilian bankwith investor-specific matched daily transactions on buy andsell operations in the IBOVESPA stock exchange market+e last two datasets are confidential

+e first source is the IBOVESPA index of the Brazilianstock exchange (BMampFBovespa) +is index is consideredthe stock market benchmark for Brazil We have 747 days inour sample spanning over the years of 2016 to 2018

+e second source is the investor registration informationsuch as their profession degree of education and equityInformation is from the database of the home broker andcustomer relationship management (CRM) solution Our dataset is comprehensive and encompasses 13634 investors

+e last source provides each transaction made by eachinvestor on BMampFBOVESPA and on each of the daysbetween January 2 2016 and December 31 2018 as well astheir daily holdings We observe their daily trading activitieson investment decisions+is rich data set enables us to keeptrack of investorsrsquo buy and sell operations over time andtherefore permits us to test whether they use the mean-reverting trading strategy or the momentum trading strategyas a response to IBOVESPA index changes +ese are twocommon trading strategies that have been discussed in theliterature [42 43] Other strategies exist which may be morecomplex in nature and difficult to model and they are notthe object of our analysis One such example would berational traders that employ fundamental analysis andforecast future profits of traded companies to estimate theirpotential to distribute dividends and can value these stocks+e sample has 1099985 trading decisions (change in theinvestment volume) We also have 358176 customerholdings over time

Table 1 reports summary statistics of our data on in-vestorrsquos daily decisions on their investments We can see that

Complexity 3

there is a large range of daily investment variations goingfrom minus 100 to almost 500 On average we see a positiveinvestment variation (9267) We also show the IBO-VESPA index level and its variations in the last 1 2 3 5 and30 days We will use these IBOVESPA index changes tocheck how they correlate with the investment variationsvariable One underlying hypothesis is that investors look atthe IBOVESPA index to decide on their trading decisions

Figures 1(a) and 1(b) portray weekday heatmapsshowing the average daily investment changes in 2016broken down by investorrsquos gender and education level (see[44]) First we observe the richness of our data set in whichthere is a large heterogeneity of investorrsquos decisions on theirinvestment on a daily basis Second though there is asimilarity on how investors decide on their investments formales and females and for those with higher and lowereducation we observe some discrepancies in some occa-sions suggesting that these are two important features thatwe should study in our empirical analysis Besides thissubjective analysis our feature selection procedure willcorroborate such vision using an objective and quantitativeapproach For instance we observe that on average in-vestors mostly buy by the beginning or end of the weekwhile they sell on Wednesdays +ere is evidence of be-havioral changes of investors over weekdays in the stockmarket For instance Pena [45] studies the effect of reformon the Spanish stock exchange market +ey find thatbefore such reform there were positive abnormal excessreturns on Mondays effect of which disappeared followingthat reform

Figure 2 shows how investments are split across Brazilianstates over time As we can see there is also some hetero-geneity across investors residing in different states whichsuggests that we may have to control for state origin ofinvestors For instance there are some large investmentvariations in the Northern region of Brazil

Figure 3 depicts the distribution of investment variationacross different Brazilian states broken down by investorrsquosgender (female or male) Each distribution conditioned tothe state and gender integrates to one Interestingly most ofthe distributions have three persist modals that occur notonly across different states but also for different genders+emodals are centered at the zero (no investment variation)and at plusmn 30 investment variation marks While in mostcases the profiles of investment changes of both male andfemale largely coincide there are some notable exceptionsFor instance in less developed regionsmdashsuch as the Northand Northeastmdashthe distributions of investment decisions of

males and female significantly differ at some dates Overallmales tend to change more their investment positions rel-atively to females However such feature is even morepronounced in less developed regions

Figure 4 displays the same distribution of investmentvariation across different Brazilian states but now brokendown by the investorrsquos academic education We considerinvestors with higher education and those with high school orbelow Again the three-modal distribution found when webroke down the distribution by investorrsquos gender also show upwhen we look at their educational levels In more developedregionsmdashsuch as the Southeast and Southmdashinvestorsrsquo decisionare roughly the same regardless of their academic educationallevels Such similarity reinforces the divergence of academicdegree and the level of financial literacy especially in tradingIn contrast we observe a large heterogeneity in the Northregion less educated investors tend to vary their investmentpositions more than investors those with higher education

3 Feature Selection Using Machine Learning

In this section we analyze the predictive power of our at-tributes in explaining investor responses to changes in theBrazilian stock market index We use different time ag-gregations of changes in the IBOVESPA index which is thefinancial index that is carefully looked by investors whendeciding their investment strategies in Brazil We use 2- 3-and 5-day IBOVESPA index variations as well as 3- and 5-day IBOVESPA index averages +is analysis sheds light onhow investors look at IBOVESPA changes when decidingtheir trading strategies in the stock market It is an empiricalopen question to test whether investors take the very short-term changes such as 2- or 3-day or a more prolongedwindow such as 5-day changes

To test the predictive power we use data-driven machinelearning methods to identify the most relevant attributes[46ndash48] Since we have data from 13247 investors fromJanuary 1 2016 to December 31 2018 on a daily basis weneed first to purge out any macroeconomic factor that couldbe affecting all investors in the same manner over this timeframe +is becomes even more important due to the factthat Brazil was facing a recession from 2014Q4 to 2016Q4and therefore our sample contains part of that period Weperform this preprocessing to homogenize the data distri-bution since machine learning methods best perform oncross-sectional data [30 49]

To remove time factors homogeneously faced by in-vestors in a period we use a static panel-data specification

Table 1 Summary statistics of our panel data on investorsrsquo daily trading decisions over the period of 2016 to 2018

Statistic N Mean St Dev Min Pctl (25) Median Pctl (75) MaxInvestment variation () 356172 9267 68389 minus 100000 minus 6680 0510 8770 4998791-day IBOVESPA variation 356172 0145 1528 minus 4870 minus 0740 0110 1010 66002-day IBOVESPA variation 355796 0323 2160 minus 6550 minus 1140 0290 1600 91303-day IBOVESPA variation 355419 0472 2598 minus 7950 minus 1140 0540 2020 108805-day IBOVESPA variation 354588 0781 3274 minus 8250 minus 1230 0770 2750 1687030-day IBOVESPA variation 343592 4863 8282 minus 19060 minus 0740 5240 10500 28770IBOVESPA index 356176 0145 1528 minus 4870 minus 0740 0110 1010 6600

4 Complexity

Female Male

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

Monday

Tuesday

Wednesday

ursday

Friday

Month

Day

0 4 8Investment variation ()

(a)

High school Higher education

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

Monday

Tuesday

Wednesday

ursday

Friday

Month

Day

0 4 8 12Investment variation ()

(b)

Figure 1 Heatmap showing average daily investment changes () in 2016 broken down by investorrsquos gender (a) and education degree(b) We take average values of investment decisions that took place in every weekday within the month Due to the existence of somelarge absolute values and to improve readability we winsorize the investment variation distribution by 5 at each side before taking theaverage

Feb

16

May

16

Aug

16

Nov

16

Feb

17

May

17

Aug

17

Nov

17

Feb

18

May

18

Aug

18

Nov

18

Time

RegionMidwestNorthNortheast

SouthSoutheast

0

100

200

300

400

Inve

stmen

t var

iatio

n (

)

Figure 2 In this figure we aggregate investment made by all investors by states to which they belong As we can we have investors from allparts of Brazil in our database

Complexity 5

Midwest

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

minus25 0 25 50minus50Investment variation ()

Northeast

minus25 0 25 50minus50Investment variation ()

North

minus25 0 25 50minus50Investment variation ()

TypeFemaleMale

South

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

minus25 0 25 50minus50Investment variation ()

Southeast

minus25 0 25 50minus50Investment variation ()

Figure 3 Distribution of investment variation across different Brazilian states (Midwest North Northeast South and Southeast) brokendown by investorrsquos gender (female or male) Due to the existence of some large absolute values and to improve readability we winsorize theinvestment variation distribution by 5 at each side

6 Complexity

Midwest

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

0 25 50minus25Investment variation ()

North

0 25 50minus25Investment variation ()

Northeast

0 25 50minus25Investment variation ()

Southeast

0 25 50minus25Investment variation ()

South

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

TypeHigh schoolHigher education

0 25 50minus25Investment variation ()

Figure 4 Distribution of investment variation across different Brazilian states (Midwest North Northeast South and Southeast) brokendown by investorrsquos education (higher education or high school) Due to the existence of some large absolute values and to improvereadability we winsorize the investment variation distribution by 5 at each side

Complexity 7

with time fixed effects to purge out macroeconomic com-ponents as follows

Δyit αt + ϵit (1)

in which Δyit denotes the portfolio volume variation in thestock market of investor i at time t αt represents time fixedeffects and ϵit is the residual In this specification we in-terpret the residual ϵit as any variation of investor irsquosportfolio volume at time t that is not due to any timecommon factor such as the underlying macroeconomicscenario By using ϵit instead of Δyit we can effectively treatthe data as a large cross-sectional unit Hence we are able tofully use machine learning methods at their best setupwhich we discuss further

We choose an elastic net regression to estimate theimportance of each attribute in the model Such regressionoptimally combines L2-norm (Ridge) and L1-norm (Lasso)regularization +erefore we are able to prevent anyoverfitting in our empirical model Moreover we use aconvex combination of L1-norm which tends to shrinksthe majority of the nonrelevant regressors to zero and keepthe most important nonzero and L2-norm which tends tooutput nonzero and approximate coefficients for all thesimilar regressors By using both regularization schemeswe are able to enjoy the positive characteristics of bothschemes

To select the most important attributes we use the re-sidual ϵit the investment volume variation of investor i attime t not due to common time factors as dependentvariable and different IBOVESPA index time aggregationsand investorsrsquo biological and education characteristics asindependent variables as follows

ϵit βTmiddot Xit + errorit (2)

in which Xit is a vector composed of past IBOVESPAchanges with different windows (1- 2- 3- 5- 10- 20- and30-day IBOVESPA changes) and investorsrsquo characteristics(state of residency gender and level of schooling) +e termerrorit is the standard error According to the elastic netprocedure we select β that minimizes the following lossfunction L(β)

L(β) 1113944t1

T

1113944i1

N

ϵit minus 1113944j1

p

βjx(j)it

⎛⎝ ⎞⎠

2

+ λ (1 minus α)||β||222

+ α||β||211113890 1113891

(3)

in which t isin 1 N index times on a daily basis (fromJanuary 1 2016 to December 31 2018) and i index in-vestors +e term x

(j)it indexes the jth regressor of investor i

at time t +e operators ||||1 and ||||2 indicate L1-and L2-norms taken over the vector input

+e first expression in (3) denotes the traditional datafitting error (residuals) while the second represents theregularization term Parameter λ modulates the importanceof the traditional and regularization terms +e term αcontrols the convex mixture of L1 and L2 regularization +eregularization works by penalizing large β coefficients+erefore it shrinks the estimated coefficients and the

overall fit function becomes smoother over the datadistribution

In the elastic net regression a takes values in between 0and 1 We optimally tune a and λ using a nested cross-validation procedure with k 10 folds and 100 independentrepeats for statistical robustness [29 49] In this procedurewe use k minus 1 9 folds for training and the remaining fold fortesting +is procedure is cycled k times such that each foldappears exactly once for testing Such methodology enablesus to tune the regularization parameters while preventingoverfitting of the model We optimize a over the grid searchspace 0 005 010 1 and λ over 0 01 02 5 Asstandard practice we preprocess all regressors by applying aZ-score standardization over all the data points using pre-determined values extracted only from the training data (soas to prevent data leakage from the test set)

Figure 5 shows our results for the importance of differenttime aggregations of the IBOVESPA index in explaining in-vestorsrsquo behavior +e optimal regularization parameters wereλ 01 and α 035 We normalize the coefficients in termsof the most important attribute +e attribute ldquo1-day IBO-VESPA variationrdquo is the most powerful predictor forexplaining investorsrsquo behavior followed by ldquo2-day IBOVESPAvariationrdquo and ldquo5-day IBOVESPA variationrdquo +is suggeststhat investors prefer to base their investment decisions usingshort-term variations of the stock market index Even thoughmore prolonged periods of IBOVESPA index changes areimportantmdashsuch as 10- 20- and 30-day variationsmdashthey aremuch less important than the short-term variations In ad-dition we find that investorsrsquo gender and schooling level arealso important characteristics for explaining buy and selloperations over the Brazilian stock exchange market in theperiod from 2016 to 2018 We also observe that some regionalvariables are important such as Santa Catarina Rio de JaneiroDistrito FederalMinas Gerais and Parana and Satildeo Paulo+ismay suggest a different mass of investorsrsquo composition acrossdifferent states

Our feature selection procedure gives us an objective wayof identifying potentially important variables that should beaccounted for in our econometric exercise Such tool takentogether with the analystrsquos expertise to assess their validity interms of relationship with the analyzed measure is an im-portant step in producing econometric methods in a morereliable manner Our results point that we should control forinvestorsrsquo characteristics (gender and schooling level) andalso past IBOVESPA variations +e investorrsquos state is notimportant because we will employ a panel-data analysis withfixed effects at the investor level +erefore the investorrsquosstate is collinear with the investor fixed effect and would bedropped during the estimations

4 Econometric Analysis with SelectedVariables

In the previous section we have found that short-termvariations of the IBOVESPA index are better predictors forbuy or sell operations in the Brazilian stock exchange marketthan long-term variations+e feature selection procedure isa transparent way of choosing relevant variables in an ob-jective way However such method does not provide an

8 Complexity

answer as to whether each variable contributes positively ornegatively to the target variable ie the investment decisionof the investor (buy or sell) In this section we look at suchdirection and estimate the magnitude of the most relevantvariables found by our feature selection technique

In Section 41 we first test whether investment decisionsof Brazilian investors better fit to a mean-reversal or mo-mentum strategy For that we regress total investmentvariations of investors against past variations of the IBO-VESPA index For robustness we use 1- 2- 3- 5- and 30-day variations of the IBOVESPA index Our regressions areat the investor level which enables us to control for un-observed time-invariant characteristics of each Brazilianinvestor which would otherwise be impractical in case wehad aggregate data like most existing studies +erein wefind that the mean-reversal technique better explains buy orsell operations in the Brazilian stock market during theperiod from 2016 to 2018 Our results corroborate thefindings of our feature selection technique short-termvariations explain more buy or sell operations than long-term variations

In Sections 42 and 43 we study the determinants thateither soften or exacerbate the mean-reversal behavior ofBrazilian investors by looking at the role of gender and levelof schooling respectively of investors +ese exercisesconnect with the existing literature on the influence ofsocioeconomic and biological features in shaping the be-havior of economic agents

41 Do Investors Use a Mean-Reversion or a MomentumStrategy in 8eir Buy and Sell Operations To answer howinvestors respond to changes in the IBOVESPA index werun the following econometric specification

Δyit αi + αtlowast + βΔIBOVESPAt + ηit (4)

in which Δyit is the portfolio volume variation of investor iat time t +ere is a positive variation (Δyit gt 0) when in-vestor i buys more stocks at time t and a negative variation(Δyit lt 0) when she sells Alternatively when the investorholds her investment over time then (Δyit 0) +e factorηit is the standard error term

Our coefficient of interest is β which captures investorsrsquoresponses to variations of the IBOVESPA index denoted asΔIBOVESPAt We test whether investors use a mean-re-versal or momentum strategy as follows (we discard thehypothesis that investorsrsquo buy and sell decisions are un-related to variations of the IBOVESPA index because ourfeature selection technique identified past variations of theIBOVESPA index as the most relevant predictors of in-vestor-specific investment variations)

(i) If investors use amean-reversal strategy then increasesin the IBOVESPA indexmdashie ΔIBOVESPAt gt 0mdashare followed by sell operations in such a way that theinvestment volume of investors on average decreases(Δyit lt 0) +erefore a mean-reversal strategy istranslated by a negative β coefficient (βlt 0)

(ii) If investors use a momentum strategy then increasesin the IBOVESPA indexmdashie ΔIBOVESPAt gt 0mdashare followed by buy operations in such a way thatthe investment volume of investors on averageincreases (Δyit gt 0) +erefore a momentum strat-egy is translated by a positive β coefficient (βgt 0)

As there is persistence of the past IBOVESPA indexvariations by construction we test how investorsrsquo in-vestment volume respond to 1- 2- 3- 5- and 30-dayvariations of the IBOVESPA index in an independentmanner +is empirical design strategy prevents standarderrors to get overly inflated due to high pairwise correlationof these regressors

+e term αi represents investor fixed effects and ab-sorbs any nonobserved time-invariant characteristic ofeach investor in the sample +is mitigates potentialomitted variables that could bias our results such as in-vestorsrsquo skill which is hard to measure We should notethat any omitted variable that is time variant would not beabsorbed by the investor fixed effect +erefore while theintroduction of such fixed effect mitigates omitted variablebias it does not completely avoid it For instance if in-vestorsrsquo skill significantly increases over time then wewould have an omitted variable bias Since our panel spansa relatively small time periodmdash2016 to 2018mdashit is fair toassume that investorsrsquo skill remains roughly constant +e

Satildeo Paulo StateRio Grande do Sul State

20-day IBOVESPA variationParanaacute State

Minas Gerais StateDistrito Federal State

10-day IBOVESPA variationRio de Janeiro State

Santa Catarina State30-day IBOVESPA variation

Male investorHighly educated investor

3-day IBOVESPA variation5-day IBOVESPA variation2-day IBOVESPA variation1-day IBOVESPA variation

25 50 75 1000Importance ( normalized)

Figure 5 Feature selection results using an elastic net procedure with L2 and L1 regularization Coefficients are normalized in terms of themost important attribute (ldquo1-day IBOVESPA variationrdquo)

Complexity 9

term αtlowast connotes time-fixed effects at the year-monthlevel which absorbs any homogeneous time-variant effectsuch as the Brazilian recession or month-wise exchangerate fluctuations Since our panel frequency is on a dailybasis we cannot add a time fixed effect at the same fre-quency because our coefficient of interestmdashβmdashwould getabsorbed by the time fixed effects as it only varies acrosstime To prevent such problem we use a less granular timefixed effects namely month-year

Our data set contains 13247 investors in a large andrepresentative bank in Brazil and 610 time points Due to thisconfiguration we follow Petersen [50] and double-clusterstandard errors at the investor and time levels +is is arobust strategy that is important for panels with a largenumber of individuals and time points because it mitigatesheteroscedasticity and serial correlation Finally our data isin percentage terms

Table 2 reports our estimates of Regression (4) Weobserve that a 1 percent increase of the IBOVESPA indexassociates with an average decrease of 9693 of the investorportfolio volume when we look at the 1-day IBOVESPAvariation +e results remain with a statistically significantcoefficient across different lengths of past IBOVESPA var-iations (2- 3- and 5-day variations) except for 30-dayvariations in which the statistical significance vanishesMoreover the magnitude of the coefficient reduces as we useless recent past variations of the IBOVESPA index which isconsistent with the view that investors in our sample aremore concerned with short-term rather long-term variationsof the IBOVESPA index +e negative and statisticallysignificant sign corroborates the hypothesis that investorsuse mean-reverting trading strategies in which they tend to

sell after substantial upward changes of the IBOVESPAindex and tend to buy after downward changes

42 Does Gender Impact Investors Responsiveness to IBO-VESPA IndexChanges We have showed empirical evidencethat investorsrsquo strategy on average better fit to a mean-reverting behavior in the Brazilian stockmarket+at is theytend to sell after positive changes of the IBOVESPA indexand buy after negative changes In this section we askwhether the sensitiveness of investors to the IBOVESPAindex depends on their biological characteristics in specialtheir gender Biological factorsmdashespecially gendermdashhavebeen extensively explored in investment decision-makingNotable works relating biological factors including genderare Hira and Loibl [51] Lundeberg et al [34] Neyse et al[33] and Sunden and Surette [52] +is paper providesfurther evidence of the existence of such gender gap ininvestment decisions using a microdata on investor-matched buy and sell operations

In this line of research Neyse et al [33] and Lundeberget al [34] partly attribute behavioral differences among malesand females due to systematic changes in overconfidenceExcessive overconfidence is associated with higher levels oftestosterone which is more pronounced in males Over-confidence may induce investors to take on higher risksleading them to look for higher returns in the short term Inthis way we would expect that females be less sensitive tochanges of past IBOVESPA variations as they would valuemore fundamentals and look for yields in the longer term+erefore short-term variations of the IBOVESPA indiceswould explain less their buy or sell operations comparatively

Table 2 Output from Regression (4) We ask how investors respond to changes in the IBOVESPA index We only use changes rather thanpast averages because the former has greater prediction power as reported by our feature selection procedure +e dependent variable is thevariation of portfolio investment volume of investor i at time t in the Brazilian stock market from the beginning of 2016 to the end of 2018Regressors are 1- (1) 2- (2) 3- (3) 5- (4) and 30-day (5) IBOVESPA index variations +e panel is on a daily frequency basis FollowingPetersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010 lowastlowastplt 005 and lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔIBOVESPAt with1-day variation minus 9693lowastlowastlowast

(1580)2-day variation minus 4656lowastlowastlowast

(1160)3-day variation minus 2400lowastlowast

(0964)5-day variation minus 2265lowastlowastlowast

(0852)30-day variation 0058

(0680)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0037 0036 0036 0035 0033Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

10 Complexity

to males To empirically answer this question we constructthe following econometric specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Femalei + ηit(5)

in which Femalei is a dummy variable that takes the value of1 when investor i is female and 0 otherwise We do not addthe investorrsquos gender alone in (5) because it would beabsorbed by the investor fixed effects αi Our coefficient ofinterest is β2 which captures any behavioral deviation offemales to changes of the IBOVESPA index with respect tothe average of the entire sample (male and female) If β2 gt 0then the mean-reversal strategy is less pronounced to fe-males while β2 lt 0 indicates a more accentuated behaviortowards the mean-reversal strategy In the case β2 0 thenfemales and males respond on average equivalently tochanges of the IBOVESPA index Following the discussionon overconfidence and its influence on short-term decisionsover males and females our hypothesis is that β2 gt 0

Table 3 reports our estimates of Regression (5) Ourprevious results relating the mean-reversal strategy of in-vestors in the Brazilian stock market remain the same Weobserve that the interaction of changes in the IBOVESPAindex and the dummy female is positive and statisticallysignificant +is empirical finding corroborates the view thatfemale investors have a less pronounced mean-reversalstrategy than males as they look at longer-term returns andare less attentive to short-term variations of the IBOVESPAindex which could arise due to noisy information Forinstance looking at Specification (1) a 1 percent positivechange in the IBOVESPA index associates with a decrease ofminus 10345 + 6543 minus 3802 of the invested volume of femaleinvestors In contrast the entire sample (males and females)decreases their portfolio volume on average by minus 10345for a 1 percent positive change in the IBOVESPA indexInterestingly even though statistically insignificant 30-dayvariations of the IBOVESPA index are positively associatedwith investment volumes for females suggesting traits of amomentum strategy +is is also suggestive evidence that

Table 3 Output from Regression (5)We ask whether female investors have different sensitiveness with respect to their investment portfolioto IBOVESPA index changes We only use changes rather than past averages because the former has greater prediction power as reported byour feature selection procedure+e dependent variable is the variation of portfolio investment volume of investor i at time t in the Brazilianstock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3) 5- (4) and 30-day (5) IBOVESPA indexvariations as well as their interaction with the investorrsquos gender+e panel is on a daily frequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010 lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔIBOVESPAt with1-day variation minus 10345lowastlowastlowast

(1754)2-day variation minus 5019lowastlowastlowast

(1275)3-day variation minus 2650lowastlowast

(1055)5-day variation minus 2315lowastlowastlowast

(0886)30-day variation minus 0001

(0693)Interactions of ΔIBOVESPAt with gender1-day variation middot Female 6543lowastlowastlowast

(2392)2-day variation middot Female 3708lowastlowastlowast

(1264)3-day variation middot Female 2585lowastlowast

(1212)5-day variation middot Female 0517

(1538)30-day variation middot Female 0572

(0831)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0039 0037 0036 0035 0034Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

Complexity 11

females tend to look at longer horizons when taking in-vestment decisions

43 Does Formal Education Impact Investors Responsivenessto IBOVESPA Index Changes In this section we look at howformal education (academic degree or level of schooling) caninfluence investorsrsquo sensitiveness to IBOVESPA index varia-tions +ere are several works in the behavioral finance lit-erature that attempt to establish a connection between level ofschooling and investorsrsquo awareness of stock markets and theirdecision-making determinants We highlight the researchstudies of Grinblatt et al [53] and Guiso and Jappelli [54] Intheory educated investors should behave in more rationalways and trade less frequently when there is no new relevantinformation arriving in the market but noises +erefore wewould expect these investors to have a smaller reaction to pricefluctuations as they are able to better identify informationfrom noise To empirically test this behavior we run thefollowing specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Higher Educationi + ηit(6)

in which Higher Educationi is a dummy variable that takesthe value of 1 when investor i has a higher education (at leastcollege degree) and 0 otherwise (high school or a lowerdegree) Our coefficient of interest is β2 which captures anybehavioral deviation of investors with higher formal edu-cation to changes of the IBOVESPA index with respect to theaverage of the entire sample +e hypothesis is that β2 lt 0 inwhich more educated investors tend to better discern in-formation from noise out of variations of the IBOVESPAindex and therefore the mean-reversal strategy would be lesspronounced

Table 4 reports our estimates of Regression (6) Onaverage the mean-reversal strategy remains We note thatthe interaction of changes of the IBOVESPA index and thedummy higher education is positive and statistically sig-nificant +is suggests that investors with higher academic

Table 4 +is table reports output from Regression (6) We ask whether investors with higher academic degree have different sensitivenesswith respect to their investment portfolio to IBOVESPA index changes We only use changes rather than past averages because the formerhas greater prediction power as reported by our feature selection procedure +e dependent variable is the variation of portfolio investmentvolume of investor i at time t in the Brazilian stock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3)5- (4) and 30-day (5) IBOVESPA index variations as well as their interaction with the investorrsquos academic degree +e panel is on a dailyfrequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔΔIBOVESPAt with1-day variation minus 10347lowastlowastlowast

(1795)2-day variation minus 5136lowastlowastlowast

(1299)3-day variation minus 2750lowastlowast

(1076)5-day variation minus 2565lowastlowastlowast

(0931)30-day variation 0085

(0695)Interactions of ΔIBOVESPAt with academic degree1-day variation middot Higher education 5347lowastlowastlowast

(1520)2-day variation middot Higher education 3915lowastlowastlowast

(1040)3-day variation middot Higher education 2864lowast

(1573)5-day variation middot Higher education 2471lowast

(1398)30-day variation middot Higher education minus 0237

(0647)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0038 0036 0036 0035 0035Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

12 Complexity

degree have a less pronounced mean-reversal strategy thanless educated investors which corroborates our hypothesisLooking at Specification (3) we observe a positive thoughmarginally significant relationship between IBOVESPAchanges and investment volume (minus 2750 + 2864 0114) formore educated investors suggesting traits of a momentumstrategy

5 Conclusions

We employ machine learning techniques together witheconometrics techniques to model investor behavior using aunique dataset for investors that focus on stock marketinvestments We propose a methodological approach to linkmachine learning methods widely used in computer scienceto standard econometric techniques commonly employed insocial sciences

Using the unique data set with high-frequency dailyinvestment decision of a broad set of investors in Brazil weprovide evidence that investors look at past performance of abenchmark stock index in order to decide their investmentdecisions Investors seem to prefer mean-reverting strategiesin the short-run rather than momentum +is may be as-sociated with the disposition effect - investors prefer to sellthe winners and buy the losers [55 56] Furthermore re-search could exploit alternative explanations for thisbehavior

In addition we study the determinants that either softenor exacerbate the mean-reversal behavior of Brazilian in-vestors by looking at the role of gender and level ofschoolingWe find that females andmore educated investorsare less sensitive to changes of past IBOVESPA variationswhich is consistent with the literature on behavioral finance

+is paper highlights the importance of using non-traditional methods in econometric analysis +e use ofmachine learning methods permits us to automate the oftensubjective process of choosing which variables are importantin any econometric analysis By using a feature selectionschememdashsuch as the elastic net in this papermdashwe are able toidentify those attributes that best describe how investorsdecide to buy or sell their positions in an objective andstatistically correct manner In addition to that the businessspecialist can always assess these variables pointed out asmost important to analyze their economic meaning

Data Availability

+e data is confidential

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+iago C Silva (grant no 4085462018-2) and Benjamin MTabak (grant no 3105412018-2 4251232018-9) gratefullyacknowledge financial support from the CNPq foundation

References

[1] H Zou and T Hastie ldquoRegularization and variable selectionvia the elastic netrdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 67 no 2 pp 301ndash3202005

[2] E F Fama and K R French ldquoPermanent and temporarycomponents of stock pricesrdquo Journal of Political Economyvol 96 no 2 pp 246ndash273 1988

[3] A W Lo and A C MacKinlay ldquoStock market prices do notfollow random walks evidence from a simple specificationtestrdquo Review of Financial Studies vol 1 no 1 pp 41ndash66 1988

[4] J M Poterba and L H Summers ldquoMean reversion in stockpricesrdquo Journal of Financial Economics vol 22 no 1pp 27ndash59 1988

[5] J Conrad and G Kaul ldquoAn anatomy of trading strategiesrdquoReview of Financial Studies vol 11 no 3 pp 489ndash519 1998

[6] C Engel and J D Hamilton ldquoLong swings in the dollar arethey in the data and do markets know itrdquo 8e AmericanEconomic Review vol 80 pp 689ndash713 1990

[7] N Jegadeesh and S Titman ldquoMomentumrdquo Annual Review ofFinancial Economics vol 3 no 1 pp 493ndash509 2011

[8] R S J Koijen J C Rodrıguez and A Sbuelz ldquoMomentumand mean reversion in strategic asset allocationrdquo Manage-ment Science vol 55 no 7 pp 1199ndash1213 2009

[9] M Morrin J Jacoby G V Johar X He A Kuss andD Mazursky ldquoTaking stock of stockbrokers exploring mo-mentum versus contrarian investor strategies and profilesrdquoJournal of Consumer Research vol 29 no 2 pp 188ndash198 2002

[10] J Okunev andDWhite ldquoDomomentum-based strategies stillwork in foreign currency marketsrdquo 8e Journal of Financialand Quantitative Analysis vol 38 no 2 pp 425ndash447 2003

[11] D Schiereck W De Bondt and M Weber ldquoContrarian andmomentum strategies in Germanyrdquo Financial AnalystsJournal vol 55 no 6 pp 104ndash116 1999

[12] D O Cajueiro and B M Tabak ldquoTesting for predictability inequity returns for European transition marketsrdquo EconomicSystems vol 30 no 1 pp 56ndash78 2006

[13] D O Cajueiro and B M Tabak ldquoTesting for time-varyinglong-range dependence in real state equity returnsrdquo ChaosSolitons amp Fractals vol 38 no 1 pp 293ndash307 2008

[14] E J Chang E J A Lima and B M Tabak ldquoTesting forpredictability in emerging equity marketsrdquo Emerging MarketsReview vol 5 no 3 pp 295ndash316 2004

[15] A Sensoy K Ozturk E Hacihasanoglu and B M TabakldquoNot all emerging markets are the same a classification ap-proach with correlation based networksrdquo Journal of FinancialStability vol 33 pp 163ndash186 2017

[16] B M Tabak and E J A Lima ldquoMarket efficiency of Brazilianexchange rate evidence from variance ratio statistics andtechnical trading rulesrdquo European Journal of OperationalResearch vol 194 no 3 pp 814ndash820 2009

[17] C M Boya ldquoFrom efficient markets to adaptive marketsevidence from the French stock exchangerdquo Research in In-ternational Business and Finance vol 49 pp 156ndash165 2019

[18] R Ding and P Cheng ldquoSpeculative trading price pressureand overvaluationrdquo Journal of International Financial Mar-kets Institutions and Money vol 21 no 3 pp 419ndash442 2011

[19] E Lee and N Piqueira ldquoBehavioral biases of informed tradersevidence from insider trading on the 52-week highrdquo Journal ofEmpirical Finance vol 52 pp 56ndash75 2019

[20] T-Y Pak and P Babiarz ldquoDoes cognitive aging affect port-folio choicerdquo Journal of Economic Psychology vol 66pp 1ndash12 2018

Complexity 13

[21] T Suzuki and Y Ohkura ldquoFinancial technical indicator basedon chaotic bagging predictors for adaptive stock selection inJapanese and American marketsrdquo Physica A Statistical Me-chanics and its Applications vol 442 pp 50ndash66 2016

[22] A Urquhart and F McGroarty ldquoAre stock markets reallyefficient evidence of the adaptive market hypothesisrdquo In-ternational Review of Financial Analysis vol 47 pp 39ndash492016

[23] X Xiong Y Meng X Li and D Shen ldquoAn empirical analysisof the adaptive market hypothesis with calendar effects ev-idence from Chinardquo Finance Research Letters vol 31 2019

[24] H Takahashi and T Terano ldquoAnalyzing the influence ofoverconfident investors on financial markets through agent-based modelrdquo in Intelligent Data Engineering and AutomatedLearning-IDEAL 2007 H Yin P Tino E CorchadoW Byrne and X Yao Eds Springer Berlin HeidelbergGermany 2007

[25] B LeBaron ldquoEmpirical regularities from interacting long- andshort-memory investors in an agent-based stock marketrdquoIEEE Transactions on Evolutionary Computation vol 5 no 5pp 442ndash455 2001

[26] M A Bertella F R Pires L Feng and H E StanleyldquoConfidence and the stock market an agent-based approachrdquoPLoS One vol 9 no 1 Article ID e83488 2014

[27] H R Varian ldquoBig data new tricks for econometricsrdquo Journalof Economic Perspectives vol 28 no 2 pp 3ndash28 2014

[28] G James D Witten T Hastie and R Tibshirani An In-troduction to Statistical Learning With Applications in RSpringer Publishing Company Incorporated Switzerland2014

[29] T C Silva and L Zhao Machine Learning in ComplexNetworks Springer Publishing Company IncorporatedSwitzerland 1st edition 2016

[30] T Hastie R Tibshirani and J Friedman 8e Elements ofStatistical Learning Data Mining Inference and PredictionSpringer Berlin Heidelberg Germany 2nd edition 2009

[31] K J Arrow and G Debreu ldquoExistence of an equilibrium for acompetitive economyrdquo Econometrica vol 22 no 3pp 265ndash290 1954

[32] H A Simon Models of Man Social and Rational- Mathe-matical Essays on Rational Human Behavior in a Social Set-ting Wiley Hoboken NJ USA 1957

[33] L Neyse S Bosworth P Ring and U Schmidt ldquoOver-confidence incentives and digit ratiordquo Scientific Reportsvol 6 no 1 2016

[34] M A Lundeberg P W Fox and J Pun coha ldquoHighlyconfident but wrong gender differences and similarities inconfidence judgmentsrdquo Journal of Educational Psychologyvol 86 no 1 pp 114ndash121 1994

[35] O Onishchenko and N Ulku ldquoForeign investor tradingbehavior has evolvedrdquo Journal of Multinational FinancialManagement vol 51 pp 98ndash115 2019

[36] M Abreu ldquoHow biased is the behavior of the individualinvestor in warrantsrdquo Research in International Business andFinance vol 47 pp 139ndash149 2019

[37] J-C Li Y-X Li N-S Tang and D-C Mei ldquo+e roles ofmean residence time on herd behavior in a financial marketrdquoPhysica A Statistical Mechanics and Its Applications vol 462pp 350ndash357 2016

[38] C Liu and X Li ldquoMedia coverage and investor scare behaviordiffusionrdquo Physica A Statistical Mechanics and Its Applica-tions vol 527 p 121398 2019

[39] KW Park S H Jeong and J Y J Oh ldquoForeigners at the gateforeign investor trading and the disposition effect of domestic

individual investorsrdquo 8e North American Journal of Eco-nomics and Finance vol 49 pp 165ndash180 2019

[40] Y Shi Y-r Tang and W Long ldquoSentiment contagionanalysis of interacting investors evidence from Chinarsquos stockforumrdquo Physica A Statistical Mechanics and its Applicationsvol 523 pp 246ndash259 2019

[41] J R Wei J P Huang and P M Hui ldquoAn agent-based modelof stock markets incorporating momentum investorsrdquoPhysica A Statistical Mechanics and its Applications vol 392no 12 pp 2728ndash2735 2013

[42] R J Balvers and Y Wu ldquoMomentum and mean reversionacross national equity marketsrdquo Journal of Empirical Financevol 13 no 1 pp 24ndash48 2006

[43] D B Chaves and V Viswanathan ldquoMomentum and mean-reversion in commodity spot and futures marketsrdquo Journal ofCommodity Markets vol 3 no 1 pp 39ndash53 2016

[44] B M Barber and T Odean ldquoBoys will be boys genderoverconfidence and common stock investmentrdquo 8e Quar-terly Journal of Economics vol 116 no 1 pp 261ndash292 2001

[45] J I PeNtildea ldquoDaily seasonalities and stock market reforms inSpainrdquo Applied Financial Economics vol 5 no 6 pp 419ndash423 1995

[46] T C Silva and L Liang Zhao ldquoNetwork-based high level dataclassificationrdquo IEEE Transactions on Neural Networks andLearning Systems vol 23 no 6 pp 954ndash970 2012

[47] T C Silva and L Liang Zhao ldquoNetwork-based stochasticsemisupervised learningrdquo IEEE Transactions on Neural Net-works and Learning Systems vol 23 no 3 pp 451ndash466 2012

[48] T C Silva and L Liang Zhao ldquoStochastic competitive learningin complex networksrdquo IEEE Transactions on Neural Networksand Learning Systems vol 23 no 3 pp 385ndash398 2012

[49] C M Bishop Pattern Recognition and Machine Learning(Information Science and Statistics) Springer-Verlag BerlinHeidelberg Germany 2006

[50] M A Petersen ldquoEstimating standard errors in finance paneldata sets comparing approachesrdquo Review of Financial Studiesvol 22 no 1 pp 435ndash480 2009

[51] T K Hira and C Loibl ldquoGender differences in investmentbehaviorrdquo in Handbook of Consumer Finance ResearchJ J Xiao Ed Springer New York NY USA 2008

[52] A E Sunden and B J Surette ldquoGender differences in theallocation of assets in retirement savings plansrdquo8eAmericanEconomic Review vol 88 pp 207ndash211 1998

[53] M Grinblatt M Keloharju and J Linnainmaa ldquoIQ and stockmarket participationrdquo 8e Journal of Finance vol 66 no 6pp 2121ndash2164 2011

[54] L Guiso and T Jappelli ldquoAwareness and stock market par-ticipationrdquo Review of Finance vol 9 no 4 pp 537ndash567 2005

[55] N Barberis andW Xiong ldquoWhat drives the disposition effectan analysis of a long-standing preference-based explanationrdquo8e Journal of Finance vol 64 no 2 pp 751ndash784 2009

[56] H Shefrin and M Statman ldquo+e disposition to sell winnerstoo early and ride losers too long theory and evidencerdquo 8eJournal of Finance vol 40 no 3 pp 777ndash790 1985

14 Complexity

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 2: Modeling Investor Behavior Using Machine Learning: Mean … · 2019. 12. 26. · Research Article Modeling Investor Behavior Using Machine Learning: Mean-Reversion and Momentum Trading

modelrsquos complexity and hence are able to minimize concernsabout overfitting

Investors tend to trade using different strategies such asbuy-and-hold (passive strategy) or an active strategy inwhich they seek to outperform a benchmark for example amarket index If investors trade using active strategies theymay use two different and well-known approaches a mean-reverting or a momentum strategy See [2ndash4] for seminalcontributions in these two strategies

In the first case they react to market swings by bettingthat the market will mean-revert +erefore they assume thetrend will change and therefore will sell after substantialupward changes and buy after downward changes In themomentum strategy they will bet that the trend will persist+us they will increase investments in the stock market afteran increase in the market index

While we understand that they may be other tradingstrategies we focus on the mean-reversion and momentumstrategies because they are well established in the literatureand serve as building blocks of many other strategies +ereis a large body of the literature that discusses their use indifferent contexts [5ndash11] In addition they are easily testablein empirical specifications +erefore we seek to understandif investors decide to hold their stocks or sell them afternegativepositive shocks

+e issue of how investors will behave on average isempirical +ere is a large body of the literature that discussespredictability for the stockmarket [12ndash16] In addition there isanother strand focusing on cognitive biases and excess tradingon equity and other financial assets [17ndash23] Our data con-taining transaction-level operations of buy and sell operationpermit us to follow each investorsrsquo decisions over time andtherefore test whether they use mean-reverting or momentumstrategy after changes in the stock market index

It is essential to notice that if traders use such strategiesthey may induce higher volatility in the market with theiractions In theory market changes should occur as newinformation arrives which is economically relevant to es-timate future profits and dividend distribution Howeverprice substantially changes over time and volatility is higherthan we would expect in a rational market +erefore weassume that the tradersrsquo decision to trade excessively willinduce higher volatility in the market Investorsrsquo decisionsthat follow different trading strategies may generate complexpatterns in prices and volatility+eymay induce long-rangecorrelation short-term predictability and chaotic dynamicsin prices over time+ere is a large body of the literature thatattempts to explain complex macrobehavior of systemsusing a composition of local rules For that agent-basedmodelling has been extensively used to explain price andvolatility using artificial markets [24] Using agent-basedmodelling LeBaron [25] explores structural (macro) fea-tures that emerge in a market where participants adapt andevolve over time while Bertella et al [26] study the effect ofinvestorrsquos behavioral bias in prices Understanding howinvestors behave and perform trading strategies is the firststep for better understanding the complexity that is intrinsicto financial markets Our paper also contributes to thismatter

To identify the most relevant predictors that explaininvestorsrsquo behavior we depart from using traditional panel-data econometric techniques and goodness-of-fit measuresand instead employ more robust methodologies borrowedfrom the machine learning literature Contrasting to usualeconometrics techniques that summarize relationships usinglinear regression analysis machine learning offers a set oftools that can potentially capture nonlinear relationshipsbetween the data According to Varian [27] bridging the gapbetween machine learning and econometrics is a naturaltendency mainly because of the presence of large amount ofdata and the rising complexitymdashpotentially highly non-linearmdashbetween data relationships Our work contributes tothis endeavour by providing a real case study of a financialdataset using machine learning techniques

Comparatively to econometrics machine learningtechniques have strong model selection techniques mainlythrough the use of cross-validation techniques which are atype of repeated resampling in random subsets of the datasetInitially the cross-validation procedure divides the data setinto two disjoint and complete subsets the training set andthe test set All the modelrsquos parameters are tuned using onlythe training set After the model is selected (tuned) using thetraining data we run it against the test set to check itsaccuracy or some other performance metric +e rationale isthat by training the model with some data and testingagainst another subset we are estimating the modelrsquos out-of-sample prediction power and not simply learning the data+e test set therefore would be a simulation of real (pro-duction) data and the modelrsquos performance on this datasetwould represent a rough estimate of actual performance ofthe model in real unseen data

Since our data set comprises more than 350000 ob-servations representing individual investorrsquos movementswith respect to their investments over 2016 to 2018 we applyregularization techniques to prevent modelrsquos overfittingduring the feature selection procedure with training dataFor that we apply an elastic net procedure [1] to control forthe modelrsquos complexity Elastic net is a generalization of theRidge (-norm) and Lasso (L1-norm) and hence is morerobust It uses an optimal convex combination of both typesof regularization Lasso tends to shrink the majority of thenonrelevant regressors to zero while keeping only the mostimportant regressors as nonzero In contrast Ridge tends tooutput nonzero coefficients for almost all regressors Byusing both regularization schemes we are able to enjoy thepositive characteristics of both schemes

Regularization is an important issue in large data setsbecause it prevents methods with high variance and low biasfrom overfitting [28 29] +is is the well-known bias-var-iance trade off in the machine learning literature [30] Whilelow bias prevents overfitting it can generate underfitting ofthe data set In contrast high-variance methods can learnnoise from the data and let go the true relationships of thedata set Low bias favors low model complexity at the cost ofa potential overfitting High variance tends to successfullycapture smoothly nonlinear relationships between the dataat the expense of a potential overfitting Examples of low-bias algorithms are the linear regression or neural networks

2 Complexity

with a single layer Examples of high-variance algorithms aredecision trees and multilayer neural networks It is im-portant to first set the rationale behind the regularizationprocess from the viewpoint of our financial data set of buyand sell operations On the one hand a strand in the eco-nomics literature advocates that the agentsrsquo decisions arecompletely rational in that decisions are taken by consid-ering all information from the market (complete in-formation) [31] On the other hand another body of theliterature argues that investors cannot potentially considerevery single information from the market when taking theirdecisions because (i) the agent does not have completeinformation and (ii) even if the agent did have completeinformation she would be unable to perform all requiredcalculations In this way they would naturally focus on themost relevant variables In this case we say that investorshave a bounded rationality term first coined by Simon [32]We can frame these two theories into the two types ofregularization frameworks used in this paper Investors withunbounded rationality ie that consider all potential var-iables would better be modeled by a Ridge regularizationprocedure because it does not tend to place zero importanceon any variables In contrast investors with bounded ra-tionality would be better modeled by a Lasso regularizationbecause it would choose a few (and more relevant) variablesand set the remainder as zero By using a weighted convexcombination of both Ridge and Lasso regularization pro-cedures we are effectively considering both cases in ourestimation process

While Brazil does not have well-developed stock marketsas advanced economies it is an important emerging countrythat due to its size and relative importance to its peersdeserves to be studied In addition capital markets have beenincreasing in the last years (according to the BMampFBovespawhich is the Brazilian stock exchange the number of in-vestors increased almost 20 from 2017 to 2018) whichstrengthen the relevance of our work Our main resultssuggest that investors use a mean-reverting trading strategy+erefore they reduce their investments after positivechanges in the IBOVESPA and increase it after negativechanges

We also test whether investorsrsquo biological and socio-economic characteristics explain their trading behavior Interms of schooling educated investors in theory shouldbehave in more rational ways and trade less frequently whenthere is no new information arriving continuously in themarket at least those that are not relevant regarding po-tential for future profits +erefore we would expect theseinvestors to have a smaller reaction to price fluctuations Wealso test dissimilarities in investment decision makingarising from the gender Neyse et al [33] and Lundeberget al [34] partly attribute investment differences amongmales and females due to systematic changes in over-confidence Excessive overconfidence is associated withhigher levels of testosterone which is more pronounced inmales Overconfidence may induce investors to take onhigher risks leading them to look for higher returns in theshort term In this way we would expect a less sensitivebehavior of females to changes of past IBOVESPA variations

as they would value more fundamentals and look for yieldsin the longer term Our empirical analyses corroborate theseviews

Several papers have studied investor behavior Onish-chenko and Ulku [35] show a change in foreign investorswhich have become more sophisticated +ey find thatforeign investors in Korea do not chase returns as theprevious literature normally reports +eir results suggest atransition from positive to negative feedback trading overtime Abreu [36] finds that investors that buy warrants havespecific characteristics such as young age and less educatedor investors with gambling attitudes (and overconfidence)(see also [37ndash41]) To the best of our knowledge our paper isone of the first that uses machine-learning techniques tounveil what are the characteristics that matter the most forexplaining investor behavior at the disaggregated level Westudy the reaction of investors to market changes and testwhether they employ momentum or mean-revertingstrategies

2 Data

We collect and match several unique proprietary and publicdatasets Our sample consists of public information from theIBOVESPA index investor-specific information and aproprietary customer database from a large Brazilian bankwith investor-specific matched daily transactions on buy andsell operations in the IBOVESPA stock exchange market+e last two datasets are confidential

+e first source is the IBOVESPA index of the Brazilianstock exchange (BMampFBovespa) +is index is consideredthe stock market benchmark for Brazil We have 747 days inour sample spanning over the years of 2016 to 2018

+e second source is the investor registration informationsuch as their profession degree of education and equityInformation is from the database of the home broker andcustomer relationship management (CRM) solution Our dataset is comprehensive and encompasses 13634 investors

+e last source provides each transaction made by eachinvestor on BMampFBOVESPA and on each of the daysbetween January 2 2016 and December 31 2018 as well astheir daily holdings We observe their daily trading activitieson investment decisions+is rich data set enables us to keeptrack of investorsrsquo buy and sell operations over time andtherefore permits us to test whether they use the mean-reverting trading strategy or the momentum trading strategyas a response to IBOVESPA index changes +ese are twocommon trading strategies that have been discussed in theliterature [42 43] Other strategies exist which may be morecomplex in nature and difficult to model and they are notthe object of our analysis One such example would berational traders that employ fundamental analysis andforecast future profits of traded companies to estimate theirpotential to distribute dividends and can value these stocks+e sample has 1099985 trading decisions (change in theinvestment volume) We also have 358176 customerholdings over time

Table 1 reports summary statistics of our data on in-vestorrsquos daily decisions on their investments We can see that

Complexity 3

there is a large range of daily investment variations goingfrom minus 100 to almost 500 On average we see a positiveinvestment variation (9267) We also show the IBO-VESPA index level and its variations in the last 1 2 3 5 and30 days We will use these IBOVESPA index changes tocheck how they correlate with the investment variationsvariable One underlying hypothesis is that investors look atthe IBOVESPA index to decide on their trading decisions

Figures 1(a) and 1(b) portray weekday heatmapsshowing the average daily investment changes in 2016broken down by investorrsquos gender and education level (see[44]) First we observe the richness of our data set in whichthere is a large heterogeneity of investorrsquos decisions on theirinvestment on a daily basis Second though there is asimilarity on how investors decide on their investments formales and females and for those with higher and lowereducation we observe some discrepancies in some occa-sions suggesting that these are two important features thatwe should study in our empirical analysis Besides thissubjective analysis our feature selection procedure willcorroborate such vision using an objective and quantitativeapproach For instance we observe that on average in-vestors mostly buy by the beginning or end of the weekwhile they sell on Wednesdays +ere is evidence of be-havioral changes of investors over weekdays in the stockmarket For instance Pena [45] studies the effect of reformon the Spanish stock exchange market +ey find thatbefore such reform there were positive abnormal excessreturns on Mondays effect of which disappeared followingthat reform

Figure 2 shows how investments are split across Brazilianstates over time As we can see there is also some hetero-geneity across investors residing in different states whichsuggests that we may have to control for state origin ofinvestors For instance there are some large investmentvariations in the Northern region of Brazil

Figure 3 depicts the distribution of investment variationacross different Brazilian states broken down by investorrsquosgender (female or male) Each distribution conditioned tothe state and gender integrates to one Interestingly most ofthe distributions have three persist modals that occur notonly across different states but also for different genders+emodals are centered at the zero (no investment variation)and at plusmn 30 investment variation marks While in mostcases the profiles of investment changes of both male andfemale largely coincide there are some notable exceptionsFor instance in less developed regionsmdashsuch as the Northand Northeastmdashthe distributions of investment decisions of

males and female significantly differ at some dates Overallmales tend to change more their investment positions rel-atively to females However such feature is even morepronounced in less developed regions

Figure 4 displays the same distribution of investmentvariation across different Brazilian states but now brokendown by the investorrsquos academic education We considerinvestors with higher education and those with high school orbelow Again the three-modal distribution found when webroke down the distribution by investorrsquos gender also show upwhen we look at their educational levels In more developedregionsmdashsuch as the Southeast and Southmdashinvestorsrsquo decisionare roughly the same regardless of their academic educationallevels Such similarity reinforces the divergence of academicdegree and the level of financial literacy especially in tradingIn contrast we observe a large heterogeneity in the Northregion less educated investors tend to vary their investmentpositions more than investors those with higher education

3 Feature Selection Using Machine Learning

In this section we analyze the predictive power of our at-tributes in explaining investor responses to changes in theBrazilian stock market index We use different time ag-gregations of changes in the IBOVESPA index which is thefinancial index that is carefully looked by investors whendeciding their investment strategies in Brazil We use 2- 3-and 5-day IBOVESPA index variations as well as 3- and 5-day IBOVESPA index averages +is analysis sheds light onhow investors look at IBOVESPA changes when decidingtheir trading strategies in the stock market It is an empiricalopen question to test whether investors take the very short-term changes such as 2- or 3-day or a more prolongedwindow such as 5-day changes

To test the predictive power we use data-driven machinelearning methods to identify the most relevant attributes[46ndash48] Since we have data from 13247 investors fromJanuary 1 2016 to December 31 2018 on a daily basis weneed first to purge out any macroeconomic factor that couldbe affecting all investors in the same manner over this timeframe +is becomes even more important due to the factthat Brazil was facing a recession from 2014Q4 to 2016Q4and therefore our sample contains part of that period Weperform this preprocessing to homogenize the data distri-bution since machine learning methods best perform oncross-sectional data [30 49]

To remove time factors homogeneously faced by in-vestors in a period we use a static panel-data specification

Table 1 Summary statistics of our panel data on investorsrsquo daily trading decisions over the period of 2016 to 2018

Statistic N Mean St Dev Min Pctl (25) Median Pctl (75) MaxInvestment variation () 356172 9267 68389 minus 100000 minus 6680 0510 8770 4998791-day IBOVESPA variation 356172 0145 1528 minus 4870 minus 0740 0110 1010 66002-day IBOVESPA variation 355796 0323 2160 minus 6550 minus 1140 0290 1600 91303-day IBOVESPA variation 355419 0472 2598 minus 7950 minus 1140 0540 2020 108805-day IBOVESPA variation 354588 0781 3274 minus 8250 minus 1230 0770 2750 1687030-day IBOVESPA variation 343592 4863 8282 minus 19060 minus 0740 5240 10500 28770IBOVESPA index 356176 0145 1528 minus 4870 minus 0740 0110 1010 6600

4 Complexity

Female Male

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

Monday

Tuesday

Wednesday

ursday

Friday

Month

Day

0 4 8Investment variation ()

(a)

High school Higher education

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

Monday

Tuesday

Wednesday

ursday

Friday

Month

Day

0 4 8 12Investment variation ()

(b)

Figure 1 Heatmap showing average daily investment changes () in 2016 broken down by investorrsquos gender (a) and education degree(b) We take average values of investment decisions that took place in every weekday within the month Due to the existence of somelarge absolute values and to improve readability we winsorize the investment variation distribution by 5 at each side before taking theaverage

Feb

16

May

16

Aug

16

Nov

16

Feb

17

May

17

Aug

17

Nov

17

Feb

18

May

18

Aug

18

Nov

18

Time

RegionMidwestNorthNortheast

SouthSoutheast

0

100

200

300

400

Inve

stmen

t var

iatio

n (

)

Figure 2 In this figure we aggregate investment made by all investors by states to which they belong As we can we have investors from allparts of Brazil in our database

Complexity 5

Midwest

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

minus25 0 25 50minus50Investment variation ()

Northeast

minus25 0 25 50minus50Investment variation ()

North

minus25 0 25 50minus50Investment variation ()

TypeFemaleMale

South

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

minus25 0 25 50minus50Investment variation ()

Southeast

minus25 0 25 50minus50Investment variation ()

Figure 3 Distribution of investment variation across different Brazilian states (Midwest North Northeast South and Southeast) brokendown by investorrsquos gender (female or male) Due to the existence of some large absolute values and to improve readability we winsorize theinvestment variation distribution by 5 at each side

6 Complexity

Midwest

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

0 25 50minus25Investment variation ()

North

0 25 50minus25Investment variation ()

Northeast

0 25 50minus25Investment variation ()

Southeast

0 25 50minus25Investment variation ()

South

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

TypeHigh schoolHigher education

0 25 50minus25Investment variation ()

Figure 4 Distribution of investment variation across different Brazilian states (Midwest North Northeast South and Southeast) brokendown by investorrsquos education (higher education or high school) Due to the existence of some large absolute values and to improvereadability we winsorize the investment variation distribution by 5 at each side

Complexity 7

with time fixed effects to purge out macroeconomic com-ponents as follows

Δyit αt + ϵit (1)

in which Δyit denotes the portfolio volume variation in thestock market of investor i at time t αt represents time fixedeffects and ϵit is the residual In this specification we in-terpret the residual ϵit as any variation of investor irsquosportfolio volume at time t that is not due to any timecommon factor such as the underlying macroeconomicscenario By using ϵit instead of Δyit we can effectively treatthe data as a large cross-sectional unit Hence we are able tofully use machine learning methods at their best setupwhich we discuss further

We choose an elastic net regression to estimate theimportance of each attribute in the model Such regressionoptimally combines L2-norm (Ridge) and L1-norm (Lasso)regularization +erefore we are able to prevent anyoverfitting in our empirical model Moreover we use aconvex combination of L1-norm which tends to shrinksthe majority of the nonrelevant regressors to zero and keepthe most important nonzero and L2-norm which tends tooutput nonzero and approximate coefficients for all thesimilar regressors By using both regularization schemeswe are able to enjoy the positive characteristics of bothschemes

To select the most important attributes we use the re-sidual ϵit the investment volume variation of investor i attime t not due to common time factors as dependentvariable and different IBOVESPA index time aggregationsand investorsrsquo biological and education characteristics asindependent variables as follows

ϵit βTmiddot Xit + errorit (2)

in which Xit is a vector composed of past IBOVESPAchanges with different windows (1- 2- 3- 5- 10- 20- and30-day IBOVESPA changes) and investorsrsquo characteristics(state of residency gender and level of schooling) +e termerrorit is the standard error According to the elastic netprocedure we select β that minimizes the following lossfunction L(β)

L(β) 1113944t1

T

1113944i1

N

ϵit minus 1113944j1

p

βjx(j)it

⎛⎝ ⎞⎠

2

+ λ (1 minus α)||β||222

+ α||β||211113890 1113891

(3)

in which t isin 1 N index times on a daily basis (fromJanuary 1 2016 to December 31 2018) and i index in-vestors +e term x

(j)it indexes the jth regressor of investor i

at time t +e operators ||||1 and ||||2 indicate L1-and L2-norms taken over the vector input

+e first expression in (3) denotes the traditional datafitting error (residuals) while the second represents theregularization term Parameter λ modulates the importanceof the traditional and regularization terms +e term αcontrols the convex mixture of L1 and L2 regularization +eregularization works by penalizing large β coefficients+erefore it shrinks the estimated coefficients and the

overall fit function becomes smoother over the datadistribution

In the elastic net regression a takes values in between 0and 1 We optimally tune a and λ using a nested cross-validation procedure with k 10 folds and 100 independentrepeats for statistical robustness [29 49] In this procedurewe use k minus 1 9 folds for training and the remaining fold fortesting +is procedure is cycled k times such that each foldappears exactly once for testing Such methodology enablesus to tune the regularization parameters while preventingoverfitting of the model We optimize a over the grid searchspace 0 005 010 1 and λ over 0 01 02 5 Asstandard practice we preprocess all regressors by applying aZ-score standardization over all the data points using pre-determined values extracted only from the training data (soas to prevent data leakage from the test set)

Figure 5 shows our results for the importance of differenttime aggregations of the IBOVESPA index in explaining in-vestorsrsquo behavior +e optimal regularization parameters wereλ 01 and α 035 We normalize the coefficients in termsof the most important attribute +e attribute ldquo1-day IBO-VESPA variationrdquo is the most powerful predictor forexplaining investorsrsquo behavior followed by ldquo2-day IBOVESPAvariationrdquo and ldquo5-day IBOVESPA variationrdquo +is suggeststhat investors prefer to base their investment decisions usingshort-term variations of the stock market index Even thoughmore prolonged periods of IBOVESPA index changes areimportantmdashsuch as 10- 20- and 30-day variationsmdashthey aremuch less important than the short-term variations In ad-dition we find that investorsrsquo gender and schooling level arealso important characteristics for explaining buy and selloperations over the Brazilian stock exchange market in theperiod from 2016 to 2018 We also observe that some regionalvariables are important such as Santa Catarina Rio de JaneiroDistrito FederalMinas Gerais and Parana and Satildeo Paulo+ismay suggest a different mass of investorsrsquo composition acrossdifferent states

Our feature selection procedure gives us an objective wayof identifying potentially important variables that should beaccounted for in our econometric exercise Such tool takentogether with the analystrsquos expertise to assess their validity interms of relationship with the analyzed measure is an im-portant step in producing econometric methods in a morereliable manner Our results point that we should control forinvestorsrsquo characteristics (gender and schooling level) andalso past IBOVESPA variations +e investorrsquos state is notimportant because we will employ a panel-data analysis withfixed effects at the investor level +erefore the investorrsquosstate is collinear with the investor fixed effect and would bedropped during the estimations

4 Econometric Analysis with SelectedVariables

In the previous section we have found that short-termvariations of the IBOVESPA index are better predictors forbuy or sell operations in the Brazilian stock exchange marketthan long-term variations+e feature selection procedure isa transparent way of choosing relevant variables in an ob-jective way However such method does not provide an

8 Complexity

answer as to whether each variable contributes positively ornegatively to the target variable ie the investment decisionof the investor (buy or sell) In this section we look at suchdirection and estimate the magnitude of the most relevantvariables found by our feature selection technique

In Section 41 we first test whether investment decisionsof Brazilian investors better fit to a mean-reversal or mo-mentum strategy For that we regress total investmentvariations of investors against past variations of the IBO-VESPA index For robustness we use 1- 2- 3- 5- and 30-day variations of the IBOVESPA index Our regressions areat the investor level which enables us to control for un-observed time-invariant characteristics of each Brazilianinvestor which would otherwise be impractical in case wehad aggregate data like most existing studies +erein wefind that the mean-reversal technique better explains buy orsell operations in the Brazilian stock market during theperiod from 2016 to 2018 Our results corroborate thefindings of our feature selection technique short-termvariations explain more buy or sell operations than long-term variations

In Sections 42 and 43 we study the determinants thateither soften or exacerbate the mean-reversal behavior ofBrazilian investors by looking at the role of gender and levelof schooling respectively of investors +ese exercisesconnect with the existing literature on the influence ofsocioeconomic and biological features in shaping the be-havior of economic agents

41 Do Investors Use a Mean-Reversion or a MomentumStrategy in 8eir Buy and Sell Operations To answer howinvestors respond to changes in the IBOVESPA index werun the following econometric specification

Δyit αi + αtlowast + βΔIBOVESPAt + ηit (4)

in which Δyit is the portfolio volume variation of investor iat time t +ere is a positive variation (Δyit gt 0) when in-vestor i buys more stocks at time t and a negative variation(Δyit lt 0) when she sells Alternatively when the investorholds her investment over time then (Δyit 0) +e factorηit is the standard error term

Our coefficient of interest is β which captures investorsrsquoresponses to variations of the IBOVESPA index denoted asΔIBOVESPAt We test whether investors use a mean-re-versal or momentum strategy as follows (we discard thehypothesis that investorsrsquo buy and sell decisions are un-related to variations of the IBOVESPA index because ourfeature selection technique identified past variations of theIBOVESPA index as the most relevant predictors of in-vestor-specific investment variations)

(i) If investors use amean-reversal strategy then increasesin the IBOVESPA indexmdashie ΔIBOVESPAt gt 0mdashare followed by sell operations in such a way that theinvestment volume of investors on average decreases(Δyit lt 0) +erefore a mean-reversal strategy istranslated by a negative β coefficient (βlt 0)

(ii) If investors use a momentum strategy then increasesin the IBOVESPA indexmdashie ΔIBOVESPAt gt 0mdashare followed by buy operations in such a way thatthe investment volume of investors on averageincreases (Δyit gt 0) +erefore a momentum strat-egy is translated by a positive β coefficient (βgt 0)

As there is persistence of the past IBOVESPA indexvariations by construction we test how investorsrsquo in-vestment volume respond to 1- 2- 3- 5- and 30-dayvariations of the IBOVESPA index in an independentmanner +is empirical design strategy prevents standarderrors to get overly inflated due to high pairwise correlationof these regressors

+e term αi represents investor fixed effects and ab-sorbs any nonobserved time-invariant characteristic ofeach investor in the sample +is mitigates potentialomitted variables that could bias our results such as in-vestorsrsquo skill which is hard to measure We should notethat any omitted variable that is time variant would not beabsorbed by the investor fixed effect +erefore while theintroduction of such fixed effect mitigates omitted variablebias it does not completely avoid it For instance if in-vestorsrsquo skill significantly increases over time then wewould have an omitted variable bias Since our panel spansa relatively small time periodmdash2016 to 2018mdashit is fair toassume that investorsrsquo skill remains roughly constant +e

Satildeo Paulo StateRio Grande do Sul State

20-day IBOVESPA variationParanaacute State

Minas Gerais StateDistrito Federal State

10-day IBOVESPA variationRio de Janeiro State

Santa Catarina State30-day IBOVESPA variation

Male investorHighly educated investor

3-day IBOVESPA variation5-day IBOVESPA variation2-day IBOVESPA variation1-day IBOVESPA variation

25 50 75 1000Importance ( normalized)

Figure 5 Feature selection results using an elastic net procedure with L2 and L1 regularization Coefficients are normalized in terms of themost important attribute (ldquo1-day IBOVESPA variationrdquo)

Complexity 9

term αtlowast connotes time-fixed effects at the year-monthlevel which absorbs any homogeneous time-variant effectsuch as the Brazilian recession or month-wise exchangerate fluctuations Since our panel frequency is on a dailybasis we cannot add a time fixed effect at the same fre-quency because our coefficient of interestmdashβmdashwould getabsorbed by the time fixed effects as it only varies acrosstime To prevent such problem we use a less granular timefixed effects namely month-year

Our data set contains 13247 investors in a large andrepresentative bank in Brazil and 610 time points Due to thisconfiguration we follow Petersen [50] and double-clusterstandard errors at the investor and time levels +is is arobust strategy that is important for panels with a largenumber of individuals and time points because it mitigatesheteroscedasticity and serial correlation Finally our data isin percentage terms

Table 2 reports our estimates of Regression (4) Weobserve that a 1 percent increase of the IBOVESPA indexassociates with an average decrease of 9693 of the investorportfolio volume when we look at the 1-day IBOVESPAvariation +e results remain with a statistically significantcoefficient across different lengths of past IBOVESPA var-iations (2- 3- and 5-day variations) except for 30-dayvariations in which the statistical significance vanishesMoreover the magnitude of the coefficient reduces as we useless recent past variations of the IBOVESPA index which isconsistent with the view that investors in our sample aremore concerned with short-term rather long-term variationsof the IBOVESPA index +e negative and statisticallysignificant sign corroborates the hypothesis that investorsuse mean-reverting trading strategies in which they tend to

sell after substantial upward changes of the IBOVESPAindex and tend to buy after downward changes

42 Does Gender Impact Investors Responsiveness to IBO-VESPA IndexChanges We have showed empirical evidencethat investorsrsquo strategy on average better fit to a mean-reverting behavior in the Brazilian stockmarket+at is theytend to sell after positive changes of the IBOVESPA indexand buy after negative changes In this section we askwhether the sensitiveness of investors to the IBOVESPAindex depends on their biological characteristics in specialtheir gender Biological factorsmdashespecially gendermdashhavebeen extensively explored in investment decision-makingNotable works relating biological factors including genderare Hira and Loibl [51] Lundeberg et al [34] Neyse et al[33] and Sunden and Surette [52] +is paper providesfurther evidence of the existence of such gender gap ininvestment decisions using a microdata on investor-matched buy and sell operations

In this line of research Neyse et al [33] and Lundeberget al [34] partly attribute behavioral differences among malesand females due to systematic changes in overconfidenceExcessive overconfidence is associated with higher levels oftestosterone which is more pronounced in males Over-confidence may induce investors to take on higher risksleading them to look for higher returns in the short term Inthis way we would expect that females be less sensitive tochanges of past IBOVESPA variations as they would valuemore fundamentals and look for yields in the longer term+erefore short-term variations of the IBOVESPA indiceswould explain less their buy or sell operations comparatively

Table 2 Output from Regression (4) We ask how investors respond to changes in the IBOVESPA index We only use changes rather thanpast averages because the former has greater prediction power as reported by our feature selection procedure +e dependent variable is thevariation of portfolio investment volume of investor i at time t in the Brazilian stock market from the beginning of 2016 to the end of 2018Regressors are 1- (1) 2- (2) 3- (3) 5- (4) and 30-day (5) IBOVESPA index variations +e panel is on a daily frequency basis FollowingPetersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010 lowastlowastplt 005 and lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔIBOVESPAt with1-day variation minus 9693lowastlowastlowast

(1580)2-day variation minus 4656lowastlowastlowast

(1160)3-day variation minus 2400lowastlowast

(0964)5-day variation minus 2265lowastlowastlowast

(0852)30-day variation 0058

(0680)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0037 0036 0036 0035 0033Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

10 Complexity

to males To empirically answer this question we constructthe following econometric specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Femalei + ηit(5)

in which Femalei is a dummy variable that takes the value of1 when investor i is female and 0 otherwise We do not addthe investorrsquos gender alone in (5) because it would beabsorbed by the investor fixed effects αi Our coefficient ofinterest is β2 which captures any behavioral deviation offemales to changes of the IBOVESPA index with respect tothe average of the entire sample (male and female) If β2 gt 0then the mean-reversal strategy is less pronounced to fe-males while β2 lt 0 indicates a more accentuated behaviortowards the mean-reversal strategy In the case β2 0 thenfemales and males respond on average equivalently tochanges of the IBOVESPA index Following the discussionon overconfidence and its influence on short-term decisionsover males and females our hypothesis is that β2 gt 0

Table 3 reports our estimates of Regression (5) Ourprevious results relating the mean-reversal strategy of in-vestors in the Brazilian stock market remain the same Weobserve that the interaction of changes in the IBOVESPAindex and the dummy female is positive and statisticallysignificant +is empirical finding corroborates the view thatfemale investors have a less pronounced mean-reversalstrategy than males as they look at longer-term returns andare less attentive to short-term variations of the IBOVESPAindex which could arise due to noisy information Forinstance looking at Specification (1) a 1 percent positivechange in the IBOVESPA index associates with a decrease ofminus 10345 + 6543 minus 3802 of the invested volume of femaleinvestors In contrast the entire sample (males and females)decreases their portfolio volume on average by minus 10345for a 1 percent positive change in the IBOVESPA indexInterestingly even though statistically insignificant 30-dayvariations of the IBOVESPA index are positively associatedwith investment volumes for females suggesting traits of amomentum strategy +is is also suggestive evidence that

Table 3 Output from Regression (5)We ask whether female investors have different sensitiveness with respect to their investment portfolioto IBOVESPA index changes We only use changes rather than past averages because the former has greater prediction power as reported byour feature selection procedure+e dependent variable is the variation of portfolio investment volume of investor i at time t in the Brazilianstock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3) 5- (4) and 30-day (5) IBOVESPA indexvariations as well as their interaction with the investorrsquos gender+e panel is on a daily frequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010 lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔIBOVESPAt with1-day variation minus 10345lowastlowastlowast

(1754)2-day variation minus 5019lowastlowastlowast

(1275)3-day variation minus 2650lowastlowast

(1055)5-day variation minus 2315lowastlowastlowast

(0886)30-day variation minus 0001

(0693)Interactions of ΔIBOVESPAt with gender1-day variation middot Female 6543lowastlowastlowast

(2392)2-day variation middot Female 3708lowastlowastlowast

(1264)3-day variation middot Female 2585lowastlowast

(1212)5-day variation middot Female 0517

(1538)30-day variation middot Female 0572

(0831)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0039 0037 0036 0035 0034Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

Complexity 11

females tend to look at longer horizons when taking in-vestment decisions

43 Does Formal Education Impact Investors Responsivenessto IBOVESPA Index Changes In this section we look at howformal education (academic degree or level of schooling) caninfluence investorsrsquo sensitiveness to IBOVESPA index varia-tions +ere are several works in the behavioral finance lit-erature that attempt to establish a connection between level ofschooling and investorsrsquo awareness of stock markets and theirdecision-making determinants We highlight the researchstudies of Grinblatt et al [53] and Guiso and Jappelli [54] Intheory educated investors should behave in more rationalways and trade less frequently when there is no new relevantinformation arriving in the market but noises +erefore wewould expect these investors to have a smaller reaction to pricefluctuations as they are able to better identify informationfrom noise To empirically test this behavior we run thefollowing specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Higher Educationi + ηit(6)

in which Higher Educationi is a dummy variable that takesthe value of 1 when investor i has a higher education (at leastcollege degree) and 0 otherwise (high school or a lowerdegree) Our coefficient of interest is β2 which captures anybehavioral deviation of investors with higher formal edu-cation to changes of the IBOVESPA index with respect to theaverage of the entire sample +e hypothesis is that β2 lt 0 inwhich more educated investors tend to better discern in-formation from noise out of variations of the IBOVESPAindex and therefore the mean-reversal strategy would be lesspronounced

Table 4 reports our estimates of Regression (6) Onaverage the mean-reversal strategy remains We note thatthe interaction of changes of the IBOVESPA index and thedummy higher education is positive and statistically sig-nificant +is suggests that investors with higher academic

Table 4 +is table reports output from Regression (6) We ask whether investors with higher academic degree have different sensitivenesswith respect to their investment portfolio to IBOVESPA index changes We only use changes rather than past averages because the formerhas greater prediction power as reported by our feature selection procedure +e dependent variable is the variation of portfolio investmentvolume of investor i at time t in the Brazilian stock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3)5- (4) and 30-day (5) IBOVESPA index variations as well as their interaction with the investorrsquos academic degree +e panel is on a dailyfrequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔΔIBOVESPAt with1-day variation minus 10347lowastlowastlowast

(1795)2-day variation minus 5136lowastlowastlowast

(1299)3-day variation minus 2750lowastlowast

(1076)5-day variation minus 2565lowastlowastlowast

(0931)30-day variation 0085

(0695)Interactions of ΔIBOVESPAt with academic degree1-day variation middot Higher education 5347lowastlowastlowast

(1520)2-day variation middot Higher education 3915lowastlowastlowast

(1040)3-day variation middot Higher education 2864lowast

(1573)5-day variation middot Higher education 2471lowast

(1398)30-day variation middot Higher education minus 0237

(0647)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0038 0036 0036 0035 0035Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

12 Complexity

degree have a less pronounced mean-reversal strategy thanless educated investors which corroborates our hypothesisLooking at Specification (3) we observe a positive thoughmarginally significant relationship between IBOVESPAchanges and investment volume (minus 2750 + 2864 0114) formore educated investors suggesting traits of a momentumstrategy

5 Conclusions

We employ machine learning techniques together witheconometrics techniques to model investor behavior using aunique dataset for investors that focus on stock marketinvestments We propose a methodological approach to linkmachine learning methods widely used in computer scienceto standard econometric techniques commonly employed insocial sciences

Using the unique data set with high-frequency dailyinvestment decision of a broad set of investors in Brazil weprovide evidence that investors look at past performance of abenchmark stock index in order to decide their investmentdecisions Investors seem to prefer mean-reverting strategiesin the short-run rather than momentum +is may be as-sociated with the disposition effect - investors prefer to sellthe winners and buy the losers [55 56] Furthermore re-search could exploit alternative explanations for thisbehavior

In addition we study the determinants that either softenor exacerbate the mean-reversal behavior of Brazilian in-vestors by looking at the role of gender and level ofschoolingWe find that females andmore educated investorsare less sensitive to changes of past IBOVESPA variationswhich is consistent with the literature on behavioral finance

+is paper highlights the importance of using non-traditional methods in econometric analysis +e use ofmachine learning methods permits us to automate the oftensubjective process of choosing which variables are importantin any econometric analysis By using a feature selectionschememdashsuch as the elastic net in this papermdashwe are able toidentify those attributes that best describe how investorsdecide to buy or sell their positions in an objective andstatistically correct manner In addition to that the businessspecialist can always assess these variables pointed out asmost important to analyze their economic meaning

Data Availability

+e data is confidential

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+iago C Silva (grant no 4085462018-2) and Benjamin MTabak (grant no 3105412018-2 4251232018-9) gratefullyacknowledge financial support from the CNPq foundation

References

[1] H Zou and T Hastie ldquoRegularization and variable selectionvia the elastic netrdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 67 no 2 pp 301ndash3202005

[2] E F Fama and K R French ldquoPermanent and temporarycomponents of stock pricesrdquo Journal of Political Economyvol 96 no 2 pp 246ndash273 1988

[3] A W Lo and A C MacKinlay ldquoStock market prices do notfollow random walks evidence from a simple specificationtestrdquo Review of Financial Studies vol 1 no 1 pp 41ndash66 1988

[4] J M Poterba and L H Summers ldquoMean reversion in stockpricesrdquo Journal of Financial Economics vol 22 no 1pp 27ndash59 1988

[5] J Conrad and G Kaul ldquoAn anatomy of trading strategiesrdquoReview of Financial Studies vol 11 no 3 pp 489ndash519 1998

[6] C Engel and J D Hamilton ldquoLong swings in the dollar arethey in the data and do markets know itrdquo 8e AmericanEconomic Review vol 80 pp 689ndash713 1990

[7] N Jegadeesh and S Titman ldquoMomentumrdquo Annual Review ofFinancial Economics vol 3 no 1 pp 493ndash509 2011

[8] R S J Koijen J C Rodrıguez and A Sbuelz ldquoMomentumand mean reversion in strategic asset allocationrdquo Manage-ment Science vol 55 no 7 pp 1199ndash1213 2009

[9] M Morrin J Jacoby G V Johar X He A Kuss andD Mazursky ldquoTaking stock of stockbrokers exploring mo-mentum versus contrarian investor strategies and profilesrdquoJournal of Consumer Research vol 29 no 2 pp 188ndash198 2002

[10] J Okunev andDWhite ldquoDomomentum-based strategies stillwork in foreign currency marketsrdquo 8e Journal of Financialand Quantitative Analysis vol 38 no 2 pp 425ndash447 2003

[11] D Schiereck W De Bondt and M Weber ldquoContrarian andmomentum strategies in Germanyrdquo Financial AnalystsJournal vol 55 no 6 pp 104ndash116 1999

[12] D O Cajueiro and B M Tabak ldquoTesting for predictability inequity returns for European transition marketsrdquo EconomicSystems vol 30 no 1 pp 56ndash78 2006

[13] D O Cajueiro and B M Tabak ldquoTesting for time-varyinglong-range dependence in real state equity returnsrdquo ChaosSolitons amp Fractals vol 38 no 1 pp 293ndash307 2008

[14] E J Chang E J A Lima and B M Tabak ldquoTesting forpredictability in emerging equity marketsrdquo Emerging MarketsReview vol 5 no 3 pp 295ndash316 2004

[15] A Sensoy K Ozturk E Hacihasanoglu and B M TabakldquoNot all emerging markets are the same a classification ap-proach with correlation based networksrdquo Journal of FinancialStability vol 33 pp 163ndash186 2017

[16] B M Tabak and E J A Lima ldquoMarket efficiency of Brazilianexchange rate evidence from variance ratio statistics andtechnical trading rulesrdquo European Journal of OperationalResearch vol 194 no 3 pp 814ndash820 2009

[17] C M Boya ldquoFrom efficient markets to adaptive marketsevidence from the French stock exchangerdquo Research in In-ternational Business and Finance vol 49 pp 156ndash165 2019

[18] R Ding and P Cheng ldquoSpeculative trading price pressureand overvaluationrdquo Journal of International Financial Mar-kets Institutions and Money vol 21 no 3 pp 419ndash442 2011

[19] E Lee and N Piqueira ldquoBehavioral biases of informed tradersevidence from insider trading on the 52-week highrdquo Journal ofEmpirical Finance vol 52 pp 56ndash75 2019

[20] T-Y Pak and P Babiarz ldquoDoes cognitive aging affect port-folio choicerdquo Journal of Economic Psychology vol 66pp 1ndash12 2018

Complexity 13

[21] T Suzuki and Y Ohkura ldquoFinancial technical indicator basedon chaotic bagging predictors for adaptive stock selection inJapanese and American marketsrdquo Physica A Statistical Me-chanics and its Applications vol 442 pp 50ndash66 2016

[22] A Urquhart and F McGroarty ldquoAre stock markets reallyefficient evidence of the adaptive market hypothesisrdquo In-ternational Review of Financial Analysis vol 47 pp 39ndash492016

[23] X Xiong Y Meng X Li and D Shen ldquoAn empirical analysisof the adaptive market hypothesis with calendar effects ev-idence from Chinardquo Finance Research Letters vol 31 2019

[24] H Takahashi and T Terano ldquoAnalyzing the influence ofoverconfident investors on financial markets through agent-based modelrdquo in Intelligent Data Engineering and AutomatedLearning-IDEAL 2007 H Yin P Tino E CorchadoW Byrne and X Yao Eds Springer Berlin HeidelbergGermany 2007

[25] B LeBaron ldquoEmpirical regularities from interacting long- andshort-memory investors in an agent-based stock marketrdquoIEEE Transactions on Evolutionary Computation vol 5 no 5pp 442ndash455 2001

[26] M A Bertella F R Pires L Feng and H E StanleyldquoConfidence and the stock market an agent-based approachrdquoPLoS One vol 9 no 1 Article ID e83488 2014

[27] H R Varian ldquoBig data new tricks for econometricsrdquo Journalof Economic Perspectives vol 28 no 2 pp 3ndash28 2014

[28] G James D Witten T Hastie and R Tibshirani An In-troduction to Statistical Learning With Applications in RSpringer Publishing Company Incorporated Switzerland2014

[29] T C Silva and L Zhao Machine Learning in ComplexNetworks Springer Publishing Company IncorporatedSwitzerland 1st edition 2016

[30] T Hastie R Tibshirani and J Friedman 8e Elements ofStatistical Learning Data Mining Inference and PredictionSpringer Berlin Heidelberg Germany 2nd edition 2009

[31] K J Arrow and G Debreu ldquoExistence of an equilibrium for acompetitive economyrdquo Econometrica vol 22 no 3pp 265ndash290 1954

[32] H A Simon Models of Man Social and Rational- Mathe-matical Essays on Rational Human Behavior in a Social Set-ting Wiley Hoboken NJ USA 1957

[33] L Neyse S Bosworth P Ring and U Schmidt ldquoOver-confidence incentives and digit ratiordquo Scientific Reportsvol 6 no 1 2016

[34] M A Lundeberg P W Fox and J Pun coha ldquoHighlyconfident but wrong gender differences and similarities inconfidence judgmentsrdquo Journal of Educational Psychologyvol 86 no 1 pp 114ndash121 1994

[35] O Onishchenko and N Ulku ldquoForeign investor tradingbehavior has evolvedrdquo Journal of Multinational FinancialManagement vol 51 pp 98ndash115 2019

[36] M Abreu ldquoHow biased is the behavior of the individualinvestor in warrantsrdquo Research in International Business andFinance vol 47 pp 139ndash149 2019

[37] J-C Li Y-X Li N-S Tang and D-C Mei ldquo+e roles ofmean residence time on herd behavior in a financial marketrdquoPhysica A Statistical Mechanics and Its Applications vol 462pp 350ndash357 2016

[38] C Liu and X Li ldquoMedia coverage and investor scare behaviordiffusionrdquo Physica A Statistical Mechanics and Its Applica-tions vol 527 p 121398 2019

[39] KW Park S H Jeong and J Y J Oh ldquoForeigners at the gateforeign investor trading and the disposition effect of domestic

individual investorsrdquo 8e North American Journal of Eco-nomics and Finance vol 49 pp 165ndash180 2019

[40] Y Shi Y-r Tang and W Long ldquoSentiment contagionanalysis of interacting investors evidence from Chinarsquos stockforumrdquo Physica A Statistical Mechanics and its Applicationsvol 523 pp 246ndash259 2019

[41] J R Wei J P Huang and P M Hui ldquoAn agent-based modelof stock markets incorporating momentum investorsrdquoPhysica A Statistical Mechanics and its Applications vol 392no 12 pp 2728ndash2735 2013

[42] R J Balvers and Y Wu ldquoMomentum and mean reversionacross national equity marketsrdquo Journal of Empirical Financevol 13 no 1 pp 24ndash48 2006

[43] D B Chaves and V Viswanathan ldquoMomentum and mean-reversion in commodity spot and futures marketsrdquo Journal ofCommodity Markets vol 3 no 1 pp 39ndash53 2016

[44] B M Barber and T Odean ldquoBoys will be boys genderoverconfidence and common stock investmentrdquo 8e Quar-terly Journal of Economics vol 116 no 1 pp 261ndash292 2001

[45] J I PeNtildea ldquoDaily seasonalities and stock market reforms inSpainrdquo Applied Financial Economics vol 5 no 6 pp 419ndash423 1995

[46] T C Silva and L Liang Zhao ldquoNetwork-based high level dataclassificationrdquo IEEE Transactions on Neural Networks andLearning Systems vol 23 no 6 pp 954ndash970 2012

[47] T C Silva and L Liang Zhao ldquoNetwork-based stochasticsemisupervised learningrdquo IEEE Transactions on Neural Net-works and Learning Systems vol 23 no 3 pp 451ndash466 2012

[48] T C Silva and L Liang Zhao ldquoStochastic competitive learningin complex networksrdquo IEEE Transactions on Neural Networksand Learning Systems vol 23 no 3 pp 385ndash398 2012

[49] C M Bishop Pattern Recognition and Machine Learning(Information Science and Statistics) Springer-Verlag BerlinHeidelberg Germany 2006

[50] M A Petersen ldquoEstimating standard errors in finance paneldata sets comparing approachesrdquo Review of Financial Studiesvol 22 no 1 pp 435ndash480 2009

[51] T K Hira and C Loibl ldquoGender differences in investmentbehaviorrdquo in Handbook of Consumer Finance ResearchJ J Xiao Ed Springer New York NY USA 2008

[52] A E Sunden and B J Surette ldquoGender differences in theallocation of assets in retirement savings plansrdquo8eAmericanEconomic Review vol 88 pp 207ndash211 1998

[53] M Grinblatt M Keloharju and J Linnainmaa ldquoIQ and stockmarket participationrdquo 8e Journal of Finance vol 66 no 6pp 2121ndash2164 2011

[54] L Guiso and T Jappelli ldquoAwareness and stock market par-ticipationrdquo Review of Finance vol 9 no 4 pp 537ndash567 2005

[55] N Barberis andW Xiong ldquoWhat drives the disposition effectan analysis of a long-standing preference-based explanationrdquo8e Journal of Finance vol 64 no 2 pp 751ndash784 2009

[56] H Shefrin and M Statman ldquo+e disposition to sell winnerstoo early and ride losers too long theory and evidencerdquo 8eJournal of Finance vol 40 no 3 pp 777ndash790 1985

14 Complexity

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 3: Modeling Investor Behavior Using Machine Learning: Mean … · 2019. 12. 26. · Research Article Modeling Investor Behavior Using Machine Learning: Mean-Reversion and Momentum Trading

with a single layer Examples of high-variance algorithms aredecision trees and multilayer neural networks It is im-portant to first set the rationale behind the regularizationprocess from the viewpoint of our financial data set of buyand sell operations On the one hand a strand in the eco-nomics literature advocates that the agentsrsquo decisions arecompletely rational in that decisions are taken by consid-ering all information from the market (complete in-formation) [31] On the other hand another body of theliterature argues that investors cannot potentially considerevery single information from the market when taking theirdecisions because (i) the agent does not have completeinformation and (ii) even if the agent did have completeinformation she would be unable to perform all requiredcalculations In this way they would naturally focus on themost relevant variables In this case we say that investorshave a bounded rationality term first coined by Simon [32]We can frame these two theories into the two types ofregularization frameworks used in this paper Investors withunbounded rationality ie that consider all potential var-iables would better be modeled by a Ridge regularizationprocedure because it does not tend to place zero importanceon any variables In contrast investors with bounded ra-tionality would be better modeled by a Lasso regularizationbecause it would choose a few (and more relevant) variablesand set the remainder as zero By using a weighted convexcombination of both Ridge and Lasso regularization pro-cedures we are effectively considering both cases in ourestimation process

While Brazil does not have well-developed stock marketsas advanced economies it is an important emerging countrythat due to its size and relative importance to its peersdeserves to be studied In addition capital markets have beenincreasing in the last years (according to the BMampFBovespawhich is the Brazilian stock exchange the number of in-vestors increased almost 20 from 2017 to 2018) whichstrengthen the relevance of our work Our main resultssuggest that investors use a mean-reverting trading strategy+erefore they reduce their investments after positivechanges in the IBOVESPA and increase it after negativechanges

We also test whether investorsrsquo biological and socio-economic characteristics explain their trading behavior Interms of schooling educated investors in theory shouldbehave in more rational ways and trade less frequently whenthere is no new information arriving continuously in themarket at least those that are not relevant regarding po-tential for future profits +erefore we would expect theseinvestors to have a smaller reaction to price fluctuations Wealso test dissimilarities in investment decision makingarising from the gender Neyse et al [33] and Lundeberget al [34] partly attribute investment differences amongmales and females due to systematic changes in over-confidence Excessive overconfidence is associated withhigher levels of testosterone which is more pronounced inmales Overconfidence may induce investors to take onhigher risks leading them to look for higher returns in theshort term In this way we would expect a less sensitivebehavior of females to changes of past IBOVESPA variations

as they would value more fundamentals and look for yieldsin the longer term Our empirical analyses corroborate theseviews

Several papers have studied investor behavior Onish-chenko and Ulku [35] show a change in foreign investorswhich have become more sophisticated +ey find thatforeign investors in Korea do not chase returns as theprevious literature normally reports +eir results suggest atransition from positive to negative feedback trading overtime Abreu [36] finds that investors that buy warrants havespecific characteristics such as young age and less educatedor investors with gambling attitudes (and overconfidence)(see also [37ndash41]) To the best of our knowledge our paper isone of the first that uses machine-learning techniques tounveil what are the characteristics that matter the most forexplaining investor behavior at the disaggregated level Westudy the reaction of investors to market changes and testwhether they employ momentum or mean-revertingstrategies

2 Data

We collect and match several unique proprietary and publicdatasets Our sample consists of public information from theIBOVESPA index investor-specific information and aproprietary customer database from a large Brazilian bankwith investor-specific matched daily transactions on buy andsell operations in the IBOVESPA stock exchange market+e last two datasets are confidential

+e first source is the IBOVESPA index of the Brazilianstock exchange (BMampFBovespa) +is index is consideredthe stock market benchmark for Brazil We have 747 days inour sample spanning over the years of 2016 to 2018

+e second source is the investor registration informationsuch as their profession degree of education and equityInformation is from the database of the home broker andcustomer relationship management (CRM) solution Our dataset is comprehensive and encompasses 13634 investors

+e last source provides each transaction made by eachinvestor on BMampFBOVESPA and on each of the daysbetween January 2 2016 and December 31 2018 as well astheir daily holdings We observe their daily trading activitieson investment decisions+is rich data set enables us to keeptrack of investorsrsquo buy and sell operations over time andtherefore permits us to test whether they use the mean-reverting trading strategy or the momentum trading strategyas a response to IBOVESPA index changes +ese are twocommon trading strategies that have been discussed in theliterature [42 43] Other strategies exist which may be morecomplex in nature and difficult to model and they are notthe object of our analysis One such example would berational traders that employ fundamental analysis andforecast future profits of traded companies to estimate theirpotential to distribute dividends and can value these stocks+e sample has 1099985 trading decisions (change in theinvestment volume) We also have 358176 customerholdings over time

Table 1 reports summary statistics of our data on in-vestorrsquos daily decisions on their investments We can see that

Complexity 3

there is a large range of daily investment variations goingfrom minus 100 to almost 500 On average we see a positiveinvestment variation (9267) We also show the IBO-VESPA index level and its variations in the last 1 2 3 5 and30 days We will use these IBOVESPA index changes tocheck how they correlate with the investment variationsvariable One underlying hypothesis is that investors look atthe IBOVESPA index to decide on their trading decisions

Figures 1(a) and 1(b) portray weekday heatmapsshowing the average daily investment changes in 2016broken down by investorrsquos gender and education level (see[44]) First we observe the richness of our data set in whichthere is a large heterogeneity of investorrsquos decisions on theirinvestment on a daily basis Second though there is asimilarity on how investors decide on their investments formales and females and for those with higher and lowereducation we observe some discrepancies in some occa-sions suggesting that these are two important features thatwe should study in our empirical analysis Besides thissubjective analysis our feature selection procedure willcorroborate such vision using an objective and quantitativeapproach For instance we observe that on average in-vestors mostly buy by the beginning or end of the weekwhile they sell on Wednesdays +ere is evidence of be-havioral changes of investors over weekdays in the stockmarket For instance Pena [45] studies the effect of reformon the Spanish stock exchange market +ey find thatbefore such reform there were positive abnormal excessreturns on Mondays effect of which disappeared followingthat reform

Figure 2 shows how investments are split across Brazilianstates over time As we can see there is also some hetero-geneity across investors residing in different states whichsuggests that we may have to control for state origin ofinvestors For instance there are some large investmentvariations in the Northern region of Brazil

Figure 3 depicts the distribution of investment variationacross different Brazilian states broken down by investorrsquosgender (female or male) Each distribution conditioned tothe state and gender integrates to one Interestingly most ofthe distributions have three persist modals that occur notonly across different states but also for different genders+emodals are centered at the zero (no investment variation)and at plusmn 30 investment variation marks While in mostcases the profiles of investment changes of both male andfemale largely coincide there are some notable exceptionsFor instance in less developed regionsmdashsuch as the Northand Northeastmdashthe distributions of investment decisions of

males and female significantly differ at some dates Overallmales tend to change more their investment positions rel-atively to females However such feature is even morepronounced in less developed regions

Figure 4 displays the same distribution of investmentvariation across different Brazilian states but now brokendown by the investorrsquos academic education We considerinvestors with higher education and those with high school orbelow Again the three-modal distribution found when webroke down the distribution by investorrsquos gender also show upwhen we look at their educational levels In more developedregionsmdashsuch as the Southeast and Southmdashinvestorsrsquo decisionare roughly the same regardless of their academic educationallevels Such similarity reinforces the divergence of academicdegree and the level of financial literacy especially in tradingIn contrast we observe a large heterogeneity in the Northregion less educated investors tend to vary their investmentpositions more than investors those with higher education

3 Feature Selection Using Machine Learning

In this section we analyze the predictive power of our at-tributes in explaining investor responses to changes in theBrazilian stock market index We use different time ag-gregations of changes in the IBOVESPA index which is thefinancial index that is carefully looked by investors whendeciding their investment strategies in Brazil We use 2- 3-and 5-day IBOVESPA index variations as well as 3- and 5-day IBOVESPA index averages +is analysis sheds light onhow investors look at IBOVESPA changes when decidingtheir trading strategies in the stock market It is an empiricalopen question to test whether investors take the very short-term changes such as 2- or 3-day or a more prolongedwindow such as 5-day changes

To test the predictive power we use data-driven machinelearning methods to identify the most relevant attributes[46ndash48] Since we have data from 13247 investors fromJanuary 1 2016 to December 31 2018 on a daily basis weneed first to purge out any macroeconomic factor that couldbe affecting all investors in the same manner over this timeframe +is becomes even more important due to the factthat Brazil was facing a recession from 2014Q4 to 2016Q4and therefore our sample contains part of that period Weperform this preprocessing to homogenize the data distri-bution since machine learning methods best perform oncross-sectional data [30 49]

To remove time factors homogeneously faced by in-vestors in a period we use a static panel-data specification

Table 1 Summary statistics of our panel data on investorsrsquo daily trading decisions over the period of 2016 to 2018

Statistic N Mean St Dev Min Pctl (25) Median Pctl (75) MaxInvestment variation () 356172 9267 68389 minus 100000 minus 6680 0510 8770 4998791-day IBOVESPA variation 356172 0145 1528 minus 4870 minus 0740 0110 1010 66002-day IBOVESPA variation 355796 0323 2160 minus 6550 minus 1140 0290 1600 91303-day IBOVESPA variation 355419 0472 2598 minus 7950 minus 1140 0540 2020 108805-day IBOVESPA variation 354588 0781 3274 minus 8250 minus 1230 0770 2750 1687030-day IBOVESPA variation 343592 4863 8282 minus 19060 minus 0740 5240 10500 28770IBOVESPA index 356176 0145 1528 minus 4870 minus 0740 0110 1010 6600

4 Complexity

Female Male

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

Monday

Tuesday

Wednesday

ursday

Friday

Month

Day

0 4 8Investment variation ()

(a)

High school Higher education

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

Monday

Tuesday

Wednesday

ursday

Friday

Month

Day

0 4 8 12Investment variation ()

(b)

Figure 1 Heatmap showing average daily investment changes () in 2016 broken down by investorrsquos gender (a) and education degree(b) We take average values of investment decisions that took place in every weekday within the month Due to the existence of somelarge absolute values and to improve readability we winsorize the investment variation distribution by 5 at each side before taking theaverage

Feb

16

May

16

Aug

16

Nov

16

Feb

17

May

17

Aug

17

Nov

17

Feb

18

May

18

Aug

18

Nov

18

Time

RegionMidwestNorthNortheast

SouthSoutheast

0

100

200

300

400

Inve

stmen

t var

iatio

n (

)

Figure 2 In this figure we aggregate investment made by all investors by states to which they belong As we can we have investors from allparts of Brazil in our database

Complexity 5

Midwest

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

minus25 0 25 50minus50Investment variation ()

Northeast

minus25 0 25 50minus50Investment variation ()

North

minus25 0 25 50minus50Investment variation ()

TypeFemaleMale

South

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

minus25 0 25 50minus50Investment variation ()

Southeast

minus25 0 25 50minus50Investment variation ()

Figure 3 Distribution of investment variation across different Brazilian states (Midwest North Northeast South and Southeast) brokendown by investorrsquos gender (female or male) Due to the existence of some large absolute values and to improve readability we winsorize theinvestment variation distribution by 5 at each side

6 Complexity

Midwest

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

0 25 50minus25Investment variation ()

North

0 25 50minus25Investment variation ()

Northeast

0 25 50minus25Investment variation ()

Southeast

0 25 50minus25Investment variation ()

South

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

TypeHigh schoolHigher education

0 25 50minus25Investment variation ()

Figure 4 Distribution of investment variation across different Brazilian states (Midwest North Northeast South and Southeast) brokendown by investorrsquos education (higher education or high school) Due to the existence of some large absolute values and to improvereadability we winsorize the investment variation distribution by 5 at each side

Complexity 7

with time fixed effects to purge out macroeconomic com-ponents as follows

Δyit αt + ϵit (1)

in which Δyit denotes the portfolio volume variation in thestock market of investor i at time t αt represents time fixedeffects and ϵit is the residual In this specification we in-terpret the residual ϵit as any variation of investor irsquosportfolio volume at time t that is not due to any timecommon factor such as the underlying macroeconomicscenario By using ϵit instead of Δyit we can effectively treatthe data as a large cross-sectional unit Hence we are able tofully use machine learning methods at their best setupwhich we discuss further

We choose an elastic net regression to estimate theimportance of each attribute in the model Such regressionoptimally combines L2-norm (Ridge) and L1-norm (Lasso)regularization +erefore we are able to prevent anyoverfitting in our empirical model Moreover we use aconvex combination of L1-norm which tends to shrinksthe majority of the nonrelevant regressors to zero and keepthe most important nonzero and L2-norm which tends tooutput nonzero and approximate coefficients for all thesimilar regressors By using both regularization schemeswe are able to enjoy the positive characteristics of bothschemes

To select the most important attributes we use the re-sidual ϵit the investment volume variation of investor i attime t not due to common time factors as dependentvariable and different IBOVESPA index time aggregationsand investorsrsquo biological and education characteristics asindependent variables as follows

ϵit βTmiddot Xit + errorit (2)

in which Xit is a vector composed of past IBOVESPAchanges with different windows (1- 2- 3- 5- 10- 20- and30-day IBOVESPA changes) and investorsrsquo characteristics(state of residency gender and level of schooling) +e termerrorit is the standard error According to the elastic netprocedure we select β that minimizes the following lossfunction L(β)

L(β) 1113944t1

T

1113944i1

N

ϵit minus 1113944j1

p

βjx(j)it

⎛⎝ ⎞⎠

2

+ λ (1 minus α)||β||222

+ α||β||211113890 1113891

(3)

in which t isin 1 N index times on a daily basis (fromJanuary 1 2016 to December 31 2018) and i index in-vestors +e term x

(j)it indexes the jth regressor of investor i

at time t +e operators ||||1 and ||||2 indicate L1-and L2-norms taken over the vector input

+e first expression in (3) denotes the traditional datafitting error (residuals) while the second represents theregularization term Parameter λ modulates the importanceof the traditional and regularization terms +e term αcontrols the convex mixture of L1 and L2 regularization +eregularization works by penalizing large β coefficients+erefore it shrinks the estimated coefficients and the

overall fit function becomes smoother over the datadistribution

In the elastic net regression a takes values in between 0and 1 We optimally tune a and λ using a nested cross-validation procedure with k 10 folds and 100 independentrepeats for statistical robustness [29 49] In this procedurewe use k minus 1 9 folds for training and the remaining fold fortesting +is procedure is cycled k times such that each foldappears exactly once for testing Such methodology enablesus to tune the regularization parameters while preventingoverfitting of the model We optimize a over the grid searchspace 0 005 010 1 and λ over 0 01 02 5 Asstandard practice we preprocess all regressors by applying aZ-score standardization over all the data points using pre-determined values extracted only from the training data (soas to prevent data leakage from the test set)

Figure 5 shows our results for the importance of differenttime aggregations of the IBOVESPA index in explaining in-vestorsrsquo behavior +e optimal regularization parameters wereλ 01 and α 035 We normalize the coefficients in termsof the most important attribute +e attribute ldquo1-day IBO-VESPA variationrdquo is the most powerful predictor forexplaining investorsrsquo behavior followed by ldquo2-day IBOVESPAvariationrdquo and ldquo5-day IBOVESPA variationrdquo +is suggeststhat investors prefer to base their investment decisions usingshort-term variations of the stock market index Even thoughmore prolonged periods of IBOVESPA index changes areimportantmdashsuch as 10- 20- and 30-day variationsmdashthey aremuch less important than the short-term variations In ad-dition we find that investorsrsquo gender and schooling level arealso important characteristics for explaining buy and selloperations over the Brazilian stock exchange market in theperiod from 2016 to 2018 We also observe that some regionalvariables are important such as Santa Catarina Rio de JaneiroDistrito FederalMinas Gerais and Parana and Satildeo Paulo+ismay suggest a different mass of investorsrsquo composition acrossdifferent states

Our feature selection procedure gives us an objective wayof identifying potentially important variables that should beaccounted for in our econometric exercise Such tool takentogether with the analystrsquos expertise to assess their validity interms of relationship with the analyzed measure is an im-portant step in producing econometric methods in a morereliable manner Our results point that we should control forinvestorsrsquo characteristics (gender and schooling level) andalso past IBOVESPA variations +e investorrsquos state is notimportant because we will employ a panel-data analysis withfixed effects at the investor level +erefore the investorrsquosstate is collinear with the investor fixed effect and would bedropped during the estimations

4 Econometric Analysis with SelectedVariables

In the previous section we have found that short-termvariations of the IBOVESPA index are better predictors forbuy or sell operations in the Brazilian stock exchange marketthan long-term variations+e feature selection procedure isa transparent way of choosing relevant variables in an ob-jective way However such method does not provide an

8 Complexity

answer as to whether each variable contributes positively ornegatively to the target variable ie the investment decisionof the investor (buy or sell) In this section we look at suchdirection and estimate the magnitude of the most relevantvariables found by our feature selection technique

In Section 41 we first test whether investment decisionsof Brazilian investors better fit to a mean-reversal or mo-mentum strategy For that we regress total investmentvariations of investors against past variations of the IBO-VESPA index For robustness we use 1- 2- 3- 5- and 30-day variations of the IBOVESPA index Our regressions areat the investor level which enables us to control for un-observed time-invariant characteristics of each Brazilianinvestor which would otherwise be impractical in case wehad aggregate data like most existing studies +erein wefind that the mean-reversal technique better explains buy orsell operations in the Brazilian stock market during theperiod from 2016 to 2018 Our results corroborate thefindings of our feature selection technique short-termvariations explain more buy or sell operations than long-term variations

In Sections 42 and 43 we study the determinants thateither soften or exacerbate the mean-reversal behavior ofBrazilian investors by looking at the role of gender and levelof schooling respectively of investors +ese exercisesconnect with the existing literature on the influence ofsocioeconomic and biological features in shaping the be-havior of economic agents

41 Do Investors Use a Mean-Reversion or a MomentumStrategy in 8eir Buy and Sell Operations To answer howinvestors respond to changes in the IBOVESPA index werun the following econometric specification

Δyit αi + αtlowast + βΔIBOVESPAt + ηit (4)

in which Δyit is the portfolio volume variation of investor iat time t +ere is a positive variation (Δyit gt 0) when in-vestor i buys more stocks at time t and a negative variation(Δyit lt 0) when she sells Alternatively when the investorholds her investment over time then (Δyit 0) +e factorηit is the standard error term

Our coefficient of interest is β which captures investorsrsquoresponses to variations of the IBOVESPA index denoted asΔIBOVESPAt We test whether investors use a mean-re-versal or momentum strategy as follows (we discard thehypothesis that investorsrsquo buy and sell decisions are un-related to variations of the IBOVESPA index because ourfeature selection technique identified past variations of theIBOVESPA index as the most relevant predictors of in-vestor-specific investment variations)

(i) If investors use amean-reversal strategy then increasesin the IBOVESPA indexmdashie ΔIBOVESPAt gt 0mdashare followed by sell operations in such a way that theinvestment volume of investors on average decreases(Δyit lt 0) +erefore a mean-reversal strategy istranslated by a negative β coefficient (βlt 0)

(ii) If investors use a momentum strategy then increasesin the IBOVESPA indexmdashie ΔIBOVESPAt gt 0mdashare followed by buy operations in such a way thatthe investment volume of investors on averageincreases (Δyit gt 0) +erefore a momentum strat-egy is translated by a positive β coefficient (βgt 0)

As there is persistence of the past IBOVESPA indexvariations by construction we test how investorsrsquo in-vestment volume respond to 1- 2- 3- 5- and 30-dayvariations of the IBOVESPA index in an independentmanner +is empirical design strategy prevents standarderrors to get overly inflated due to high pairwise correlationof these regressors

+e term αi represents investor fixed effects and ab-sorbs any nonobserved time-invariant characteristic ofeach investor in the sample +is mitigates potentialomitted variables that could bias our results such as in-vestorsrsquo skill which is hard to measure We should notethat any omitted variable that is time variant would not beabsorbed by the investor fixed effect +erefore while theintroduction of such fixed effect mitigates omitted variablebias it does not completely avoid it For instance if in-vestorsrsquo skill significantly increases over time then wewould have an omitted variable bias Since our panel spansa relatively small time periodmdash2016 to 2018mdashit is fair toassume that investorsrsquo skill remains roughly constant +e

Satildeo Paulo StateRio Grande do Sul State

20-day IBOVESPA variationParanaacute State

Minas Gerais StateDistrito Federal State

10-day IBOVESPA variationRio de Janeiro State

Santa Catarina State30-day IBOVESPA variation

Male investorHighly educated investor

3-day IBOVESPA variation5-day IBOVESPA variation2-day IBOVESPA variation1-day IBOVESPA variation

25 50 75 1000Importance ( normalized)

Figure 5 Feature selection results using an elastic net procedure with L2 and L1 regularization Coefficients are normalized in terms of themost important attribute (ldquo1-day IBOVESPA variationrdquo)

Complexity 9

term αtlowast connotes time-fixed effects at the year-monthlevel which absorbs any homogeneous time-variant effectsuch as the Brazilian recession or month-wise exchangerate fluctuations Since our panel frequency is on a dailybasis we cannot add a time fixed effect at the same fre-quency because our coefficient of interestmdashβmdashwould getabsorbed by the time fixed effects as it only varies acrosstime To prevent such problem we use a less granular timefixed effects namely month-year

Our data set contains 13247 investors in a large andrepresentative bank in Brazil and 610 time points Due to thisconfiguration we follow Petersen [50] and double-clusterstandard errors at the investor and time levels +is is arobust strategy that is important for panels with a largenumber of individuals and time points because it mitigatesheteroscedasticity and serial correlation Finally our data isin percentage terms

Table 2 reports our estimates of Regression (4) Weobserve that a 1 percent increase of the IBOVESPA indexassociates with an average decrease of 9693 of the investorportfolio volume when we look at the 1-day IBOVESPAvariation +e results remain with a statistically significantcoefficient across different lengths of past IBOVESPA var-iations (2- 3- and 5-day variations) except for 30-dayvariations in which the statistical significance vanishesMoreover the magnitude of the coefficient reduces as we useless recent past variations of the IBOVESPA index which isconsistent with the view that investors in our sample aremore concerned with short-term rather long-term variationsof the IBOVESPA index +e negative and statisticallysignificant sign corroborates the hypothesis that investorsuse mean-reverting trading strategies in which they tend to

sell after substantial upward changes of the IBOVESPAindex and tend to buy after downward changes

42 Does Gender Impact Investors Responsiveness to IBO-VESPA IndexChanges We have showed empirical evidencethat investorsrsquo strategy on average better fit to a mean-reverting behavior in the Brazilian stockmarket+at is theytend to sell after positive changes of the IBOVESPA indexand buy after negative changes In this section we askwhether the sensitiveness of investors to the IBOVESPAindex depends on their biological characteristics in specialtheir gender Biological factorsmdashespecially gendermdashhavebeen extensively explored in investment decision-makingNotable works relating biological factors including genderare Hira and Loibl [51] Lundeberg et al [34] Neyse et al[33] and Sunden and Surette [52] +is paper providesfurther evidence of the existence of such gender gap ininvestment decisions using a microdata on investor-matched buy and sell operations

In this line of research Neyse et al [33] and Lundeberget al [34] partly attribute behavioral differences among malesand females due to systematic changes in overconfidenceExcessive overconfidence is associated with higher levels oftestosterone which is more pronounced in males Over-confidence may induce investors to take on higher risksleading them to look for higher returns in the short term Inthis way we would expect that females be less sensitive tochanges of past IBOVESPA variations as they would valuemore fundamentals and look for yields in the longer term+erefore short-term variations of the IBOVESPA indiceswould explain less their buy or sell operations comparatively

Table 2 Output from Regression (4) We ask how investors respond to changes in the IBOVESPA index We only use changes rather thanpast averages because the former has greater prediction power as reported by our feature selection procedure +e dependent variable is thevariation of portfolio investment volume of investor i at time t in the Brazilian stock market from the beginning of 2016 to the end of 2018Regressors are 1- (1) 2- (2) 3- (3) 5- (4) and 30-day (5) IBOVESPA index variations +e panel is on a daily frequency basis FollowingPetersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010 lowastlowastplt 005 and lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔIBOVESPAt with1-day variation minus 9693lowastlowastlowast

(1580)2-day variation minus 4656lowastlowastlowast

(1160)3-day variation minus 2400lowastlowast

(0964)5-day variation minus 2265lowastlowastlowast

(0852)30-day variation 0058

(0680)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0037 0036 0036 0035 0033Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

10 Complexity

to males To empirically answer this question we constructthe following econometric specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Femalei + ηit(5)

in which Femalei is a dummy variable that takes the value of1 when investor i is female and 0 otherwise We do not addthe investorrsquos gender alone in (5) because it would beabsorbed by the investor fixed effects αi Our coefficient ofinterest is β2 which captures any behavioral deviation offemales to changes of the IBOVESPA index with respect tothe average of the entire sample (male and female) If β2 gt 0then the mean-reversal strategy is less pronounced to fe-males while β2 lt 0 indicates a more accentuated behaviortowards the mean-reversal strategy In the case β2 0 thenfemales and males respond on average equivalently tochanges of the IBOVESPA index Following the discussionon overconfidence and its influence on short-term decisionsover males and females our hypothesis is that β2 gt 0

Table 3 reports our estimates of Regression (5) Ourprevious results relating the mean-reversal strategy of in-vestors in the Brazilian stock market remain the same Weobserve that the interaction of changes in the IBOVESPAindex and the dummy female is positive and statisticallysignificant +is empirical finding corroborates the view thatfemale investors have a less pronounced mean-reversalstrategy than males as they look at longer-term returns andare less attentive to short-term variations of the IBOVESPAindex which could arise due to noisy information Forinstance looking at Specification (1) a 1 percent positivechange in the IBOVESPA index associates with a decrease ofminus 10345 + 6543 minus 3802 of the invested volume of femaleinvestors In contrast the entire sample (males and females)decreases their portfolio volume on average by minus 10345for a 1 percent positive change in the IBOVESPA indexInterestingly even though statistically insignificant 30-dayvariations of the IBOVESPA index are positively associatedwith investment volumes for females suggesting traits of amomentum strategy +is is also suggestive evidence that

Table 3 Output from Regression (5)We ask whether female investors have different sensitiveness with respect to their investment portfolioto IBOVESPA index changes We only use changes rather than past averages because the former has greater prediction power as reported byour feature selection procedure+e dependent variable is the variation of portfolio investment volume of investor i at time t in the Brazilianstock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3) 5- (4) and 30-day (5) IBOVESPA indexvariations as well as their interaction with the investorrsquos gender+e panel is on a daily frequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010 lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔIBOVESPAt with1-day variation minus 10345lowastlowastlowast

(1754)2-day variation minus 5019lowastlowastlowast

(1275)3-day variation minus 2650lowastlowast

(1055)5-day variation minus 2315lowastlowastlowast

(0886)30-day variation minus 0001

(0693)Interactions of ΔIBOVESPAt with gender1-day variation middot Female 6543lowastlowastlowast

(2392)2-day variation middot Female 3708lowastlowastlowast

(1264)3-day variation middot Female 2585lowastlowast

(1212)5-day variation middot Female 0517

(1538)30-day variation middot Female 0572

(0831)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0039 0037 0036 0035 0034Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

Complexity 11

females tend to look at longer horizons when taking in-vestment decisions

43 Does Formal Education Impact Investors Responsivenessto IBOVESPA Index Changes In this section we look at howformal education (academic degree or level of schooling) caninfluence investorsrsquo sensitiveness to IBOVESPA index varia-tions +ere are several works in the behavioral finance lit-erature that attempt to establish a connection between level ofschooling and investorsrsquo awareness of stock markets and theirdecision-making determinants We highlight the researchstudies of Grinblatt et al [53] and Guiso and Jappelli [54] Intheory educated investors should behave in more rationalways and trade less frequently when there is no new relevantinformation arriving in the market but noises +erefore wewould expect these investors to have a smaller reaction to pricefluctuations as they are able to better identify informationfrom noise To empirically test this behavior we run thefollowing specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Higher Educationi + ηit(6)

in which Higher Educationi is a dummy variable that takesthe value of 1 when investor i has a higher education (at leastcollege degree) and 0 otherwise (high school or a lowerdegree) Our coefficient of interest is β2 which captures anybehavioral deviation of investors with higher formal edu-cation to changes of the IBOVESPA index with respect to theaverage of the entire sample +e hypothesis is that β2 lt 0 inwhich more educated investors tend to better discern in-formation from noise out of variations of the IBOVESPAindex and therefore the mean-reversal strategy would be lesspronounced

Table 4 reports our estimates of Regression (6) Onaverage the mean-reversal strategy remains We note thatthe interaction of changes of the IBOVESPA index and thedummy higher education is positive and statistically sig-nificant +is suggests that investors with higher academic

Table 4 +is table reports output from Regression (6) We ask whether investors with higher academic degree have different sensitivenesswith respect to their investment portfolio to IBOVESPA index changes We only use changes rather than past averages because the formerhas greater prediction power as reported by our feature selection procedure +e dependent variable is the variation of portfolio investmentvolume of investor i at time t in the Brazilian stock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3)5- (4) and 30-day (5) IBOVESPA index variations as well as their interaction with the investorrsquos academic degree +e panel is on a dailyfrequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔΔIBOVESPAt with1-day variation minus 10347lowastlowastlowast

(1795)2-day variation minus 5136lowastlowastlowast

(1299)3-day variation minus 2750lowastlowast

(1076)5-day variation minus 2565lowastlowastlowast

(0931)30-day variation 0085

(0695)Interactions of ΔIBOVESPAt with academic degree1-day variation middot Higher education 5347lowastlowastlowast

(1520)2-day variation middot Higher education 3915lowastlowastlowast

(1040)3-day variation middot Higher education 2864lowast

(1573)5-day variation middot Higher education 2471lowast

(1398)30-day variation middot Higher education minus 0237

(0647)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0038 0036 0036 0035 0035Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

12 Complexity

degree have a less pronounced mean-reversal strategy thanless educated investors which corroborates our hypothesisLooking at Specification (3) we observe a positive thoughmarginally significant relationship between IBOVESPAchanges and investment volume (minus 2750 + 2864 0114) formore educated investors suggesting traits of a momentumstrategy

5 Conclusions

We employ machine learning techniques together witheconometrics techniques to model investor behavior using aunique dataset for investors that focus on stock marketinvestments We propose a methodological approach to linkmachine learning methods widely used in computer scienceto standard econometric techniques commonly employed insocial sciences

Using the unique data set with high-frequency dailyinvestment decision of a broad set of investors in Brazil weprovide evidence that investors look at past performance of abenchmark stock index in order to decide their investmentdecisions Investors seem to prefer mean-reverting strategiesin the short-run rather than momentum +is may be as-sociated with the disposition effect - investors prefer to sellthe winners and buy the losers [55 56] Furthermore re-search could exploit alternative explanations for thisbehavior

In addition we study the determinants that either softenor exacerbate the mean-reversal behavior of Brazilian in-vestors by looking at the role of gender and level ofschoolingWe find that females andmore educated investorsare less sensitive to changes of past IBOVESPA variationswhich is consistent with the literature on behavioral finance

+is paper highlights the importance of using non-traditional methods in econometric analysis +e use ofmachine learning methods permits us to automate the oftensubjective process of choosing which variables are importantin any econometric analysis By using a feature selectionschememdashsuch as the elastic net in this papermdashwe are able toidentify those attributes that best describe how investorsdecide to buy or sell their positions in an objective andstatistically correct manner In addition to that the businessspecialist can always assess these variables pointed out asmost important to analyze their economic meaning

Data Availability

+e data is confidential

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+iago C Silva (grant no 4085462018-2) and Benjamin MTabak (grant no 3105412018-2 4251232018-9) gratefullyacknowledge financial support from the CNPq foundation

References

[1] H Zou and T Hastie ldquoRegularization and variable selectionvia the elastic netrdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 67 no 2 pp 301ndash3202005

[2] E F Fama and K R French ldquoPermanent and temporarycomponents of stock pricesrdquo Journal of Political Economyvol 96 no 2 pp 246ndash273 1988

[3] A W Lo and A C MacKinlay ldquoStock market prices do notfollow random walks evidence from a simple specificationtestrdquo Review of Financial Studies vol 1 no 1 pp 41ndash66 1988

[4] J M Poterba and L H Summers ldquoMean reversion in stockpricesrdquo Journal of Financial Economics vol 22 no 1pp 27ndash59 1988

[5] J Conrad and G Kaul ldquoAn anatomy of trading strategiesrdquoReview of Financial Studies vol 11 no 3 pp 489ndash519 1998

[6] C Engel and J D Hamilton ldquoLong swings in the dollar arethey in the data and do markets know itrdquo 8e AmericanEconomic Review vol 80 pp 689ndash713 1990

[7] N Jegadeesh and S Titman ldquoMomentumrdquo Annual Review ofFinancial Economics vol 3 no 1 pp 493ndash509 2011

[8] R S J Koijen J C Rodrıguez and A Sbuelz ldquoMomentumand mean reversion in strategic asset allocationrdquo Manage-ment Science vol 55 no 7 pp 1199ndash1213 2009

[9] M Morrin J Jacoby G V Johar X He A Kuss andD Mazursky ldquoTaking stock of stockbrokers exploring mo-mentum versus contrarian investor strategies and profilesrdquoJournal of Consumer Research vol 29 no 2 pp 188ndash198 2002

[10] J Okunev andDWhite ldquoDomomentum-based strategies stillwork in foreign currency marketsrdquo 8e Journal of Financialand Quantitative Analysis vol 38 no 2 pp 425ndash447 2003

[11] D Schiereck W De Bondt and M Weber ldquoContrarian andmomentum strategies in Germanyrdquo Financial AnalystsJournal vol 55 no 6 pp 104ndash116 1999

[12] D O Cajueiro and B M Tabak ldquoTesting for predictability inequity returns for European transition marketsrdquo EconomicSystems vol 30 no 1 pp 56ndash78 2006

[13] D O Cajueiro and B M Tabak ldquoTesting for time-varyinglong-range dependence in real state equity returnsrdquo ChaosSolitons amp Fractals vol 38 no 1 pp 293ndash307 2008

[14] E J Chang E J A Lima and B M Tabak ldquoTesting forpredictability in emerging equity marketsrdquo Emerging MarketsReview vol 5 no 3 pp 295ndash316 2004

[15] A Sensoy K Ozturk E Hacihasanoglu and B M TabakldquoNot all emerging markets are the same a classification ap-proach with correlation based networksrdquo Journal of FinancialStability vol 33 pp 163ndash186 2017

[16] B M Tabak and E J A Lima ldquoMarket efficiency of Brazilianexchange rate evidence from variance ratio statistics andtechnical trading rulesrdquo European Journal of OperationalResearch vol 194 no 3 pp 814ndash820 2009

[17] C M Boya ldquoFrom efficient markets to adaptive marketsevidence from the French stock exchangerdquo Research in In-ternational Business and Finance vol 49 pp 156ndash165 2019

[18] R Ding and P Cheng ldquoSpeculative trading price pressureand overvaluationrdquo Journal of International Financial Mar-kets Institutions and Money vol 21 no 3 pp 419ndash442 2011

[19] E Lee and N Piqueira ldquoBehavioral biases of informed tradersevidence from insider trading on the 52-week highrdquo Journal ofEmpirical Finance vol 52 pp 56ndash75 2019

[20] T-Y Pak and P Babiarz ldquoDoes cognitive aging affect port-folio choicerdquo Journal of Economic Psychology vol 66pp 1ndash12 2018

Complexity 13

[21] T Suzuki and Y Ohkura ldquoFinancial technical indicator basedon chaotic bagging predictors for adaptive stock selection inJapanese and American marketsrdquo Physica A Statistical Me-chanics and its Applications vol 442 pp 50ndash66 2016

[22] A Urquhart and F McGroarty ldquoAre stock markets reallyefficient evidence of the adaptive market hypothesisrdquo In-ternational Review of Financial Analysis vol 47 pp 39ndash492016

[23] X Xiong Y Meng X Li and D Shen ldquoAn empirical analysisof the adaptive market hypothesis with calendar effects ev-idence from Chinardquo Finance Research Letters vol 31 2019

[24] H Takahashi and T Terano ldquoAnalyzing the influence ofoverconfident investors on financial markets through agent-based modelrdquo in Intelligent Data Engineering and AutomatedLearning-IDEAL 2007 H Yin P Tino E CorchadoW Byrne and X Yao Eds Springer Berlin HeidelbergGermany 2007

[25] B LeBaron ldquoEmpirical regularities from interacting long- andshort-memory investors in an agent-based stock marketrdquoIEEE Transactions on Evolutionary Computation vol 5 no 5pp 442ndash455 2001

[26] M A Bertella F R Pires L Feng and H E StanleyldquoConfidence and the stock market an agent-based approachrdquoPLoS One vol 9 no 1 Article ID e83488 2014

[27] H R Varian ldquoBig data new tricks for econometricsrdquo Journalof Economic Perspectives vol 28 no 2 pp 3ndash28 2014

[28] G James D Witten T Hastie and R Tibshirani An In-troduction to Statistical Learning With Applications in RSpringer Publishing Company Incorporated Switzerland2014

[29] T C Silva and L Zhao Machine Learning in ComplexNetworks Springer Publishing Company IncorporatedSwitzerland 1st edition 2016

[30] T Hastie R Tibshirani and J Friedman 8e Elements ofStatistical Learning Data Mining Inference and PredictionSpringer Berlin Heidelberg Germany 2nd edition 2009

[31] K J Arrow and G Debreu ldquoExistence of an equilibrium for acompetitive economyrdquo Econometrica vol 22 no 3pp 265ndash290 1954

[32] H A Simon Models of Man Social and Rational- Mathe-matical Essays on Rational Human Behavior in a Social Set-ting Wiley Hoboken NJ USA 1957

[33] L Neyse S Bosworth P Ring and U Schmidt ldquoOver-confidence incentives and digit ratiordquo Scientific Reportsvol 6 no 1 2016

[34] M A Lundeberg P W Fox and J Pun coha ldquoHighlyconfident but wrong gender differences and similarities inconfidence judgmentsrdquo Journal of Educational Psychologyvol 86 no 1 pp 114ndash121 1994

[35] O Onishchenko and N Ulku ldquoForeign investor tradingbehavior has evolvedrdquo Journal of Multinational FinancialManagement vol 51 pp 98ndash115 2019

[36] M Abreu ldquoHow biased is the behavior of the individualinvestor in warrantsrdquo Research in International Business andFinance vol 47 pp 139ndash149 2019

[37] J-C Li Y-X Li N-S Tang and D-C Mei ldquo+e roles ofmean residence time on herd behavior in a financial marketrdquoPhysica A Statistical Mechanics and Its Applications vol 462pp 350ndash357 2016

[38] C Liu and X Li ldquoMedia coverage and investor scare behaviordiffusionrdquo Physica A Statistical Mechanics and Its Applica-tions vol 527 p 121398 2019

[39] KW Park S H Jeong and J Y J Oh ldquoForeigners at the gateforeign investor trading and the disposition effect of domestic

individual investorsrdquo 8e North American Journal of Eco-nomics and Finance vol 49 pp 165ndash180 2019

[40] Y Shi Y-r Tang and W Long ldquoSentiment contagionanalysis of interacting investors evidence from Chinarsquos stockforumrdquo Physica A Statistical Mechanics and its Applicationsvol 523 pp 246ndash259 2019

[41] J R Wei J P Huang and P M Hui ldquoAn agent-based modelof stock markets incorporating momentum investorsrdquoPhysica A Statistical Mechanics and its Applications vol 392no 12 pp 2728ndash2735 2013

[42] R J Balvers and Y Wu ldquoMomentum and mean reversionacross national equity marketsrdquo Journal of Empirical Financevol 13 no 1 pp 24ndash48 2006

[43] D B Chaves and V Viswanathan ldquoMomentum and mean-reversion in commodity spot and futures marketsrdquo Journal ofCommodity Markets vol 3 no 1 pp 39ndash53 2016

[44] B M Barber and T Odean ldquoBoys will be boys genderoverconfidence and common stock investmentrdquo 8e Quar-terly Journal of Economics vol 116 no 1 pp 261ndash292 2001

[45] J I PeNtildea ldquoDaily seasonalities and stock market reforms inSpainrdquo Applied Financial Economics vol 5 no 6 pp 419ndash423 1995

[46] T C Silva and L Liang Zhao ldquoNetwork-based high level dataclassificationrdquo IEEE Transactions on Neural Networks andLearning Systems vol 23 no 6 pp 954ndash970 2012

[47] T C Silva and L Liang Zhao ldquoNetwork-based stochasticsemisupervised learningrdquo IEEE Transactions on Neural Net-works and Learning Systems vol 23 no 3 pp 451ndash466 2012

[48] T C Silva and L Liang Zhao ldquoStochastic competitive learningin complex networksrdquo IEEE Transactions on Neural Networksand Learning Systems vol 23 no 3 pp 385ndash398 2012

[49] C M Bishop Pattern Recognition and Machine Learning(Information Science and Statistics) Springer-Verlag BerlinHeidelberg Germany 2006

[50] M A Petersen ldquoEstimating standard errors in finance paneldata sets comparing approachesrdquo Review of Financial Studiesvol 22 no 1 pp 435ndash480 2009

[51] T K Hira and C Loibl ldquoGender differences in investmentbehaviorrdquo in Handbook of Consumer Finance ResearchJ J Xiao Ed Springer New York NY USA 2008

[52] A E Sunden and B J Surette ldquoGender differences in theallocation of assets in retirement savings plansrdquo8eAmericanEconomic Review vol 88 pp 207ndash211 1998

[53] M Grinblatt M Keloharju and J Linnainmaa ldquoIQ and stockmarket participationrdquo 8e Journal of Finance vol 66 no 6pp 2121ndash2164 2011

[54] L Guiso and T Jappelli ldquoAwareness and stock market par-ticipationrdquo Review of Finance vol 9 no 4 pp 537ndash567 2005

[55] N Barberis andW Xiong ldquoWhat drives the disposition effectan analysis of a long-standing preference-based explanationrdquo8e Journal of Finance vol 64 no 2 pp 751ndash784 2009

[56] H Shefrin and M Statman ldquo+e disposition to sell winnerstoo early and ride losers too long theory and evidencerdquo 8eJournal of Finance vol 40 no 3 pp 777ndash790 1985

14 Complexity

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 4: Modeling Investor Behavior Using Machine Learning: Mean … · 2019. 12. 26. · Research Article Modeling Investor Behavior Using Machine Learning: Mean-Reversion and Momentum Trading

there is a large range of daily investment variations goingfrom minus 100 to almost 500 On average we see a positiveinvestment variation (9267) We also show the IBO-VESPA index level and its variations in the last 1 2 3 5 and30 days We will use these IBOVESPA index changes tocheck how they correlate with the investment variationsvariable One underlying hypothesis is that investors look atthe IBOVESPA index to decide on their trading decisions

Figures 1(a) and 1(b) portray weekday heatmapsshowing the average daily investment changes in 2016broken down by investorrsquos gender and education level (see[44]) First we observe the richness of our data set in whichthere is a large heterogeneity of investorrsquos decisions on theirinvestment on a daily basis Second though there is asimilarity on how investors decide on their investments formales and females and for those with higher and lowereducation we observe some discrepancies in some occa-sions suggesting that these are two important features thatwe should study in our empirical analysis Besides thissubjective analysis our feature selection procedure willcorroborate such vision using an objective and quantitativeapproach For instance we observe that on average in-vestors mostly buy by the beginning or end of the weekwhile they sell on Wednesdays +ere is evidence of be-havioral changes of investors over weekdays in the stockmarket For instance Pena [45] studies the effect of reformon the Spanish stock exchange market +ey find thatbefore such reform there were positive abnormal excessreturns on Mondays effect of which disappeared followingthat reform

Figure 2 shows how investments are split across Brazilianstates over time As we can see there is also some hetero-geneity across investors residing in different states whichsuggests that we may have to control for state origin ofinvestors For instance there are some large investmentvariations in the Northern region of Brazil

Figure 3 depicts the distribution of investment variationacross different Brazilian states broken down by investorrsquosgender (female or male) Each distribution conditioned tothe state and gender integrates to one Interestingly most ofthe distributions have three persist modals that occur notonly across different states but also for different genders+emodals are centered at the zero (no investment variation)and at plusmn 30 investment variation marks While in mostcases the profiles of investment changes of both male andfemale largely coincide there are some notable exceptionsFor instance in less developed regionsmdashsuch as the Northand Northeastmdashthe distributions of investment decisions of

males and female significantly differ at some dates Overallmales tend to change more their investment positions rel-atively to females However such feature is even morepronounced in less developed regions

Figure 4 displays the same distribution of investmentvariation across different Brazilian states but now brokendown by the investorrsquos academic education We considerinvestors with higher education and those with high school orbelow Again the three-modal distribution found when webroke down the distribution by investorrsquos gender also show upwhen we look at their educational levels In more developedregionsmdashsuch as the Southeast and Southmdashinvestorsrsquo decisionare roughly the same regardless of their academic educationallevels Such similarity reinforces the divergence of academicdegree and the level of financial literacy especially in tradingIn contrast we observe a large heterogeneity in the Northregion less educated investors tend to vary their investmentpositions more than investors those with higher education

3 Feature Selection Using Machine Learning

In this section we analyze the predictive power of our at-tributes in explaining investor responses to changes in theBrazilian stock market index We use different time ag-gregations of changes in the IBOVESPA index which is thefinancial index that is carefully looked by investors whendeciding their investment strategies in Brazil We use 2- 3-and 5-day IBOVESPA index variations as well as 3- and 5-day IBOVESPA index averages +is analysis sheds light onhow investors look at IBOVESPA changes when decidingtheir trading strategies in the stock market It is an empiricalopen question to test whether investors take the very short-term changes such as 2- or 3-day or a more prolongedwindow such as 5-day changes

To test the predictive power we use data-driven machinelearning methods to identify the most relevant attributes[46ndash48] Since we have data from 13247 investors fromJanuary 1 2016 to December 31 2018 on a daily basis weneed first to purge out any macroeconomic factor that couldbe affecting all investors in the same manner over this timeframe +is becomes even more important due to the factthat Brazil was facing a recession from 2014Q4 to 2016Q4and therefore our sample contains part of that period Weperform this preprocessing to homogenize the data distri-bution since machine learning methods best perform oncross-sectional data [30 49]

To remove time factors homogeneously faced by in-vestors in a period we use a static panel-data specification

Table 1 Summary statistics of our panel data on investorsrsquo daily trading decisions over the period of 2016 to 2018

Statistic N Mean St Dev Min Pctl (25) Median Pctl (75) MaxInvestment variation () 356172 9267 68389 minus 100000 minus 6680 0510 8770 4998791-day IBOVESPA variation 356172 0145 1528 minus 4870 minus 0740 0110 1010 66002-day IBOVESPA variation 355796 0323 2160 minus 6550 minus 1140 0290 1600 91303-day IBOVESPA variation 355419 0472 2598 minus 7950 minus 1140 0540 2020 108805-day IBOVESPA variation 354588 0781 3274 minus 8250 minus 1230 0770 2750 1687030-day IBOVESPA variation 343592 4863 8282 minus 19060 minus 0740 5240 10500 28770IBOVESPA index 356176 0145 1528 minus 4870 minus 0740 0110 1010 6600

4 Complexity

Female Male

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

Monday

Tuesday

Wednesday

ursday

Friday

Month

Day

0 4 8Investment variation ()

(a)

High school Higher education

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

Monday

Tuesday

Wednesday

ursday

Friday

Month

Day

0 4 8 12Investment variation ()

(b)

Figure 1 Heatmap showing average daily investment changes () in 2016 broken down by investorrsquos gender (a) and education degree(b) We take average values of investment decisions that took place in every weekday within the month Due to the existence of somelarge absolute values and to improve readability we winsorize the investment variation distribution by 5 at each side before taking theaverage

Feb

16

May

16

Aug

16

Nov

16

Feb

17

May

17

Aug

17

Nov

17

Feb

18

May

18

Aug

18

Nov

18

Time

RegionMidwestNorthNortheast

SouthSoutheast

0

100

200

300

400

Inve

stmen

t var

iatio

n (

)

Figure 2 In this figure we aggregate investment made by all investors by states to which they belong As we can we have investors from allparts of Brazil in our database

Complexity 5

Midwest

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

minus25 0 25 50minus50Investment variation ()

Northeast

minus25 0 25 50minus50Investment variation ()

North

minus25 0 25 50minus50Investment variation ()

TypeFemaleMale

South

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

minus25 0 25 50minus50Investment variation ()

Southeast

minus25 0 25 50minus50Investment variation ()

Figure 3 Distribution of investment variation across different Brazilian states (Midwest North Northeast South and Southeast) brokendown by investorrsquos gender (female or male) Due to the existence of some large absolute values and to improve readability we winsorize theinvestment variation distribution by 5 at each side

6 Complexity

Midwest

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

0 25 50minus25Investment variation ()

North

0 25 50minus25Investment variation ()

Northeast

0 25 50minus25Investment variation ()

Southeast

0 25 50minus25Investment variation ()

South

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

TypeHigh schoolHigher education

0 25 50minus25Investment variation ()

Figure 4 Distribution of investment variation across different Brazilian states (Midwest North Northeast South and Southeast) brokendown by investorrsquos education (higher education or high school) Due to the existence of some large absolute values and to improvereadability we winsorize the investment variation distribution by 5 at each side

Complexity 7

with time fixed effects to purge out macroeconomic com-ponents as follows

Δyit αt + ϵit (1)

in which Δyit denotes the portfolio volume variation in thestock market of investor i at time t αt represents time fixedeffects and ϵit is the residual In this specification we in-terpret the residual ϵit as any variation of investor irsquosportfolio volume at time t that is not due to any timecommon factor such as the underlying macroeconomicscenario By using ϵit instead of Δyit we can effectively treatthe data as a large cross-sectional unit Hence we are able tofully use machine learning methods at their best setupwhich we discuss further

We choose an elastic net regression to estimate theimportance of each attribute in the model Such regressionoptimally combines L2-norm (Ridge) and L1-norm (Lasso)regularization +erefore we are able to prevent anyoverfitting in our empirical model Moreover we use aconvex combination of L1-norm which tends to shrinksthe majority of the nonrelevant regressors to zero and keepthe most important nonzero and L2-norm which tends tooutput nonzero and approximate coefficients for all thesimilar regressors By using both regularization schemeswe are able to enjoy the positive characteristics of bothschemes

To select the most important attributes we use the re-sidual ϵit the investment volume variation of investor i attime t not due to common time factors as dependentvariable and different IBOVESPA index time aggregationsand investorsrsquo biological and education characteristics asindependent variables as follows

ϵit βTmiddot Xit + errorit (2)

in which Xit is a vector composed of past IBOVESPAchanges with different windows (1- 2- 3- 5- 10- 20- and30-day IBOVESPA changes) and investorsrsquo characteristics(state of residency gender and level of schooling) +e termerrorit is the standard error According to the elastic netprocedure we select β that minimizes the following lossfunction L(β)

L(β) 1113944t1

T

1113944i1

N

ϵit minus 1113944j1

p

βjx(j)it

⎛⎝ ⎞⎠

2

+ λ (1 minus α)||β||222

+ α||β||211113890 1113891

(3)

in which t isin 1 N index times on a daily basis (fromJanuary 1 2016 to December 31 2018) and i index in-vestors +e term x

(j)it indexes the jth regressor of investor i

at time t +e operators ||||1 and ||||2 indicate L1-and L2-norms taken over the vector input

+e first expression in (3) denotes the traditional datafitting error (residuals) while the second represents theregularization term Parameter λ modulates the importanceof the traditional and regularization terms +e term αcontrols the convex mixture of L1 and L2 regularization +eregularization works by penalizing large β coefficients+erefore it shrinks the estimated coefficients and the

overall fit function becomes smoother over the datadistribution

In the elastic net regression a takes values in between 0and 1 We optimally tune a and λ using a nested cross-validation procedure with k 10 folds and 100 independentrepeats for statistical robustness [29 49] In this procedurewe use k minus 1 9 folds for training and the remaining fold fortesting +is procedure is cycled k times such that each foldappears exactly once for testing Such methodology enablesus to tune the regularization parameters while preventingoverfitting of the model We optimize a over the grid searchspace 0 005 010 1 and λ over 0 01 02 5 Asstandard practice we preprocess all regressors by applying aZ-score standardization over all the data points using pre-determined values extracted only from the training data (soas to prevent data leakage from the test set)

Figure 5 shows our results for the importance of differenttime aggregations of the IBOVESPA index in explaining in-vestorsrsquo behavior +e optimal regularization parameters wereλ 01 and α 035 We normalize the coefficients in termsof the most important attribute +e attribute ldquo1-day IBO-VESPA variationrdquo is the most powerful predictor forexplaining investorsrsquo behavior followed by ldquo2-day IBOVESPAvariationrdquo and ldquo5-day IBOVESPA variationrdquo +is suggeststhat investors prefer to base their investment decisions usingshort-term variations of the stock market index Even thoughmore prolonged periods of IBOVESPA index changes areimportantmdashsuch as 10- 20- and 30-day variationsmdashthey aremuch less important than the short-term variations In ad-dition we find that investorsrsquo gender and schooling level arealso important characteristics for explaining buy and selloperations over the Brazilian stock exchange market in theperiod from 2016 to 2018 We also observe that some regionalvariables are important such as Santa Catarina Rio de JaneiroDistrito FederalMinas Gerais and Parana and Satildeo Paulo+ismay suggest a different mass of investorsrsquo composition acrossdifferent states

Our feature selection procedure gives us an objective wayof identifying potentially important variables that should beaccounted for in our econometric exercise Such tool takentogether with the analystrsquos expertise to assess their validity interms of relationship with the analyzed measure is an im-portant step in producing econometric methods in a morereliable manner Our results point that we should control forinvestorsrsquo characteristics (gender and schooling level) andalso past IBOVESPA variations +e investorrsquos state is notimportant because we will employ a panel-data analysis withfixed effects at the investor level +erefore the investorrsquosstate is collinear with the investor fixed effect and would bedropped during the estimations

4 Econometric Analysis with SelectedVariables

In the previous section we have found that short-termvariations of the IBOVESPA index are better predictors forbuy or sell operations in the Brazilian stock exchange marketthan long-term variations+e feature selection procedure isa transparent way of choosing relevant variables in an ob-jective way However such method does not provide an

8 Complexity

answer as to whether each variable contributes positively ornegatively to the target variable ie the investment decisionof the investor (buy or sell) In this section we look at suchdirection and estimate the magnitude of the most relevantvariables found by our feature selection technique

In Section 41 we first test whether investment decisionsof Brazilian investors better fit to a mean-reversal or mo-mentum strategy For that we regress total investmentvariations of investors against past variations of the IBO-VESPA index For robustness we use 1- 2- 3- 5- and 30-day variations of the IBOVESPA index Our regressions areat the investor level which enables us to control for un-observed time-invariant characteristics of each Brazilianinvestor which would otherwise be impractical in case wehad aggregate data like most existing studies +erein wefind that the mean-reversal technique better explains buy orsell operations in the Brazilian stock market during theperiod from 2016 to 2018 Our results corroborate thefindings of our feature selection technique short-termvariations explain more buy or sell operations than long-term variations

In Sections 42 and 43 we study the determinants thateither soften or exacerbate the mean-reversal behavior ofBrazilian investors by looking at the role of gender and levelof schooling respectively of investors +ese exercisesconnect with the existing literature on the influence ofsocioeconomic and biological features in shaping the be-havior of economic agents

41 Do Investors Use a Mean-Reversion or a MomentumStrategy in 8eir Buy and Sell Operations To answer howinvestors respond to changes in the IBOVESPA index werun the following econometric specification

Δyit αi + αtlowast + βΔIBOVESPAt + ηit (4)

in which Δyit is the portfolio volume variation of investor iat time t +ere is a positive variation (Δyit gt 0) when in-vestor i buys more stocks at time t and a negative variation(Δyit lt 0) when she sells Alternatively when the investorholds her investment over time then (Δyit 0) +e factorηit is the standard error term

Our coefficient of interest is β which captures investorsrsquoresponses to variations of the IBOVESPA index denoted asΔIBOVESPAt We test whether investors use a mean-re-versal or momentum strategy as follows (we discard thehypothesis that investorsrsquo buy and sell decisions are un-related to variations of the IBOVESPA index because ourfeature selection technique identified past variations of theIBOVESPA index as the most relevant predictors of in-vestor-specific investment variations)

(i) If investors use amean-reversal strategy then increasesin the IBOVESPA indexmdashie ΔIBOVESPAt gt 0mdashare followed by sell operations in such a way that theinvestment volume of investors on average decreases(Δyit lt 0) +erefore a mean-reversal strategy istranslated by a negative β coefficient (βlt 0)

(ii) If investors use a momentum strategy then increasesin the IBOVESPA indexmdashie ΔIBOVESPAt gt 0mdashare followed by buy operations in such a way thatthe investment volume of investors on averageincreases (Δyit gt 0) +erefore a momentum strat-egy is translated by a positive β coefficient (βgt 0)

As there is persistence of the past IBOVESPA indexvariations by construction we test how investorsrsquo in-vestment volume respond to 1- 2- 3- 5- and 30-dayvariations of the IBOVESPA index in an independentmanner +is empirical design strategy prevents standarderrors to get overly inflated due to high pairwise correlationof these regressors

+e term αi represents investor fixed effects and ab-sorbs any nonobserved time-invariant characteristic ofeach investor in the sample +is mitigates potentialomitted variables that could bias our results such as in-vestorsrsquo skill which is hard to measure We should notethat any omitted variable that is time variant would not beabsorbed by the investor fixed effect +erefore while theintroduction of such fixed effect mitigates omitted variablebias it does not completely avoid it For instance if in-vestorsrsquo skill significantly increases over time then wewould have an omitted variable bias Since our panel spansa relatively small time periodmdash2016 to 2018mdashit is fair toassume that investorsrsquo skill remains roughly constant +e

Satildeo Paulo StateRio Grande do Sul State

20-day IBOVESPA variationParanaacute State

Minas Gerais StateDistrito Federal State

10-day IBOVESPA variationRio de Janeiro State

Santa Catarina State30-day IBOVESPA variation

Male investorHighly educated investor

3-day IBOVESPA variation5-day IBOVESPA variation2-day IBOVESPA variation1-day IBOVESPA variation

25 50 75 1000Importance ( normalized)

Figure 5 Feature selection results using an elastic net procedure with L2 and L1 regularization Coefficients are normalized in terms of themost important attribute (ldquo1-day IBOVESPA variationrdquo)

Complexity 9

term αtlowast connotes time-fixed effects at the year-monthlevel which absorbs any homogeneous time-variant effectsuch as the Brazilian recession or month-wise exchangerate fluctuations Since our panel frequency is on a dailybasis we cannot add a time fixed effect at the same fre-quency because our coefficient of interestmdashβmdashwould getabsorbed by the time fixed effects as it only varies acrosstime To prevent such problem we use a less granular timefixed effects namely month-year

Our data set contains 13247 investors in a large andrepresentative bank in Brazil and 610 time points Due to thisconfiguration we follow Petersen [50] and double-clusterstandard errors at the investor and time levels +is is arobust strategy that is important for panels with a largenumber of individuals and time points because it mitigatesheteroscedasticity and serial correlation Finally our data isin percentage terms

Table 2 reports our estimates of Regression (4) Weobserve that a 1 percent increase of the IBOVESPA indexassociates with an average decrease of 9693 of the investorportfolio volume when we look at the 1-day IBOVESPAvariation +e results remain with a statistically significantcoefficient across different lengths of past IBOVESPA var-iations (2- 3- and 5-day variations) except for 30-dayvariations in which the statistical significance vanishesMoreover the magnitude of the coefficient reduces as we useless recent past variations of the IBOVESPA index which isconsistent with the view that investors in our sample aremore concerned with short-term rather long-term variationsof the IBOVESPA index +e negative and statisticallysignificant sign corroborates the hypothesis that investorsuse mean-reverting trading strategies in which they tend to

sell after substantial upward changes of the IBOVESPAindex and tend to buy after downward changes

42 Does Gender Impact Investors Responsiveness to IBO-VESPA IndexChanges We have showed empirical evidencethat investorsrsquo strategy on average better fit to a mean-reverting behavior in the Brazilian stockmarket+at is theytend to sell after positive changes of the IBOVESPA indexand buy after negative changes In this section we askwhether the sensitiveness of investors to the IBOVESPAindex depends on their biological characteristics in specialtheir gender Biological factorsmdashespecially gendermdashhavebeen extensively explored in investment decision-makingNotable works relating biological factors including genderare Hira and Loibl [51] Lundeberg et al [34] Neyse et al[33] and Sunden and Surette [52] +is paper providesfurther evidence of the existence of such gender gap ininvestment decisions using a microdata on investor-matched buy and sell operations

In this line of research Neyse et al [33] and Lundeberget al [34] partly attribute behavioral differences among malesand females due to systematic changes in overconfidenceExcessive overconfidence is associated with higher levels oftestosterone which is more pronounced in males Over-confidence may induce investors to take on higher risksleading them to look for higher returns in the short term Inthis way we would expect that females be less sensitive tochanges of past IBOVESPA variations as they would valuemore fundamentals and look for yields in the longer term+erefore short-term variations of the IBOVESPA indiceswould explain less their buy or sell operations comparatively

Table 2 Output from Regression (4) We ask how investors respond to changes in the IBOVESPA index We only use changes rather thanpast averages because the former has greater prediction power as reported by our feature selection procedure +e dependent variable is thevariation of portfolio investment volume of investor i at time t in the Brazilian stock market from the beginning of 2016 to the end of 2018Regressors are 1- (1) 2- (2) 3- (3) 5- (4) and 30-day (5) IBOVESPA index variations +e panel is on a daily frequency basis FollowingPetersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010 lowastlowastplt 005 and lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔIBOVESPAt with1-day variation minus 9693lowastlowastlowast

(1580)2-day variation minus 4656lowastlowastlowast

(1160)3-day variation minus 2400lowastlowast

(0964)5-day variation minus 2265lowastlowastlowast

(0852)30-day variation 0058

(0680)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0037 0036 0036 0035 0033Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

10 Complexity

to males To empirically answer this question we constructthe following econometric specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Femalei + ηit(5)

in which Femalei is a dummy variable that takes the value of1 when investor i is female and 0 otherwise We do not addthe investorrsquos gender alone in (5) because it would beabsorbed by the investor fixed effects αi Our coefficient ofinterest is β2 which captures any behavioral deviation offemales to changes of the IBOVESPA index with respect tothe average of the entire sample (male and female) If β2 gt 0then the mean-reversal strategy is less pronounced to fe-males while β2 lt 0 indicates a more accentuated behaviortowards the mean-reversal strategy In the case β2 0 thenfemales and males respond on average equivalently tochanges of the IBOVESPA index Following the discussionon overconfidence and its influence on short-term decisionsover males and females our hypothesis is that β2 gt 0

Table 3 reports our estimates of Regression (5) Ourprevious results relating the mean-reversal strategy of in-vestors in the Brazilian stock market remain the same Weobserve that the interaction of changes in the IBOVESPAindex and the dummy female is positive and statisticallysignificant +is empirical finding corroborates the view thatfemale investors have a less pronounced mean-reversalstrategy than males as they look at longer-term returns andare less attentive to short-term variations of the IBOVESPAindex which could arise due to noisy information Forinstance looking at Specification (1) a 1 percent positivechange in the IBOVESPA index associates with a decrease ofminus 10345 + 6543 minus 3802 of the invested volume of femaleinvestors In contrast the entire sample (males and females)decreases their portfolio volume on average by minus 10345for a 1 percent positive change in the IBOVESPA indexInterestingly even though statistically insignificant 30-dayvariations of the IBOVESPA index are positively associatedwith investment volumes for females suggesting traits of amomentum strategy +is is also suggestive evidence that

Table 3 Output from Regression (5)We ask whether female investors have different sensitiveness with respect to their investment portfolioto IBOVESPA index changes We only use changes rather than past averages because the former has greater prediction power as reported byour feature selection procedure+e dependent variable is the variation of portfolio investment volume of investor i at time t in the Brazilianstock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3) 5- (4) and 30-day (5) IBOVESPA indexvariations as well as their interaction with the investorrsquos gender+e panel is on a daily frequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010 lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔIBOVESPAt with1-day variation minus 10345lowastlowastlowast

(1754)2-day variation minus 5019lowastlowastlowast

(1275)3-day variation minus 2650lowastlowast

(1055)5-day variation minus 2315lowastlowastlowast

(0886)30-day variation minus 0001

(0693)Interactions of ΔIBOVESPAt with gender1-day variation middot Female 6543lowastlowastlowast

(2392)2-day variation middot Female 3708lowastlowastlowast

(1264)3-day variation middot Female 2585lowastlowast

(1212)5-day variation middot Female 0517

(1538)30-day variation middot Female 0572

(0831)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0039 0037 0036 0035 0034Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

Complexity 11

females tend to look at longer horizons when taking in-vestment decisions

43 Does Formal Education Impact Investors Responsivenessto IBOVESPA Index Changes In this section we look at howformal education (academic degree or level of schooling) caninfluence investorsrsquo sensitiveness to IBOVESPA index varia-tions +ere are several works in the behavioral finance lit-erature that attempt to establish a connection between level ofschooling and investorsrsquo awareness of stock markets and theirdecision-making determinants We highlight the researchstudies of Grinblatt et al [53] and Guiso and Jappelli [54] Intheory educated investors should behave in more rationalways and trade less frequently when there is no new relevantinformation arriving in the market but noises +erefore wewould expect these investors to have a smaller reaction to pricefluctuations as they are able to better identify informationfrom noise To empirically test this behavior we run thefollowing specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Higher Educationi + ηit(6)

in which Higher Educationi is a dummy variable that takesthe value of 1 when investor i has a higher education (at leastcollege degree) and 0 otherwise (high school or a lowerdegree) Our coefficient of interest is β2 which captures anybehavioral deviation of investors with higher formal edu-cation to changes of the IBOVESPA index with respect to theaverage of the entire sample +e hypothesis is that β2 lt 0 inwhich more educated investors tend to better discern in-formation from noise out of variations of the IBOVESPAindex and therefore the mean-reversal strategy would be lesspronounced

Table 4 reports our estimates of Regression (6) Onaverage the mean-reversal strategy remains We note thatthe interaction of changes of the IBOVESPA index and thedummy higher education is positive and statistically sig-nificant +is suggests that investors with higher academic

Table 4 +is table reports output from Regression (6) We ask whether investors with higher academic degree have different sensitivenesswith respect to their investment portfolio to IBOVESPA index changes We only use changes rather than past averages because the formerhas greater prediction power as reported by our feature selection procedure +e dependent variable is the variation of portfolio investmentvolume of investor i at time t in the Brazilian stock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3)5- (4) and 30-day (5) IBOVESPA index variations as well as their interaction with the investorrsquos academic degree +e panel is on a dailyfrequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔΔIBOVESPAt with1-day variation minus 10347lowastlowastlowast

(1795)2-day variation minus 5136lowastlowastlowast

(1299)3-day variation minus 2750lowastlowast

(1076)5-day variation minus 2565lowastlowastlowast

(0931)30-day variation 0085

(0695)Interactions of ΔIBOVESPAt with academic degree1-day variation middot Higher education 5347lowastlowastlowast

(1520)2-day variation middot Higher education 3915lowastlowastlowast

(1040)3-day variation middot Higher education 2864lowast

(1573)5-day variation middot Higher education 2471lowast

(1398)30-day variation middot Higher education minus 0237

(0647)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0038 0036 0036 0035 0035Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

12 Complexity

degree have a less pronounced mean-reversal strategy thanless educated investors which corroborates our hypothesisLooking at Specification (3) we observe a positive thoughmarginally significant relationship between IBOVESPAchanges and investment volume (minus 2750 + 2864 0114) formore educated investors suggesting traits of a momentumstrategy

5 Conclusions

We employ machine learning techniques together witheconometrics techniques to model investor behavior using aunique dataset for investors that focus on stock marketinvestments We propose a methodological approach to linkmachine learning methods widely used in computer scienceto standard econometric techniques commonly employed insocial sciences

Using the unique data set with high-frequency dailyinvestment decision of a broad set of investors in Brazil weprovide evidence that investors look at past performance of abenchmark stock index in order to decide their investmentdecisions Investors seem to prefer mean-reverting strategiesin the short-run rather than momentum +is may be as-sociated with the disposition effect - investors prefer to sellthe winners and buy the losers [55 56] Furthermore re-search could exploit alternative explanations for thisbehavior

In addition we study the determinants that either softenor exacerbate the mean-reversal behavior of Brazilian in-vestors by looking at the role of gender and level ofschoolingWe find that females andmore educated investorsare less sensitive to changes of past IBOVESPA variationswhich is consistent with the literature on behavioral finance

+is paper highlights the importance of using non-traditional methods in econometric analysis +e use ofmachine learning methods permits us to automate the oftensubjective process of choosing which variables are importantin any econometric analysis By using a feature selectionschememdashsuch as the elastic net in this papermdashwe are able toidentify those attributes that best describe how investorsdecide to buy or sell their positions in an objective andstatistically correct manner In addition to that the businessspecialist can always assess these variables pointed out asmost important to analyze their economic meaning

Data Availability

+e data is confidential

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+iago C Silva (grant no 4085462018-2) and Benjamin MTabak (grant no 3105412018-2 4251232018-9) gratefullyacknowledge financial support from the CNPq foundation

References

[1] H Zou and T Hastie ldquoRegularization and variable selectionvia the elastic netrdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 67 no 2 pp 301ndash3202005

[2] E F Fama and K R French ldquoPermanent and temporarycomponents of stock pricesrdquo Journal of Political Economyvol 96 no 2 pp 246ndash273 1988

[3] A W Lo and A C MacKinlay ldquoStock market prices do notfollow random walks evidence from a simple specificationtestrdquo Review of Financial Studies vol 1 no 1 pp 41ndash66 1988

[4] J M Poterba and L H Summers ldquoMean reversion in stockpricesrdquo Journal of Financial Economics vol 22 no 1pp 27ndash59 1988

[5] J Conrad and G Kaul ldquoAn anatomy of trading strategiesrdquoReview of Financial Studies vol 11 no 3 pp 489ndash519 1998

[6] C Engel and J D Hamilton ldquoLong swings in the dollar arethey in the data and do markets know itrdquo 8e AmericanEconomic Review vol 80 pp 689ndash713 1990

[7] N Jegadeesh and S Titman ldquoMomentumrdquo Annual Review ofFinancial Economics vol 3 no 1 pp 493ndash509 2011

[8] R S J Koijen J C Rodrıguez and A Sbuelz ldquoMomentumand mean reversion in strategic asset allocationrdquo Manage-ment Science vol 55 no 7 pp 1199ndash1213 2009

[9] M Morrin J Jacoby G V Johar X He A Kuss andD Mazursky ldquoTaking stock of stockbrokers exploring mo-mentum versus contrarian investor strategies and profilesrdquoJournal of Consumer Research vol 29 no 2 pp 188ndash198 2002

[10] J Okunev andDWhite ldquoDomomentum-based strategies stillwork in foreign currency marketsrdquo 8e Journal of Financialand Quantitative Analysis vol 38 no 2 pp 425ndash447 2003

[11] D Schiereck W De Bondt and M Weber ldquoContrarian andmomentum strategies in Germanyrdquo Financial AnalystsJournal vol 55 no 6 pp 104ndash116 1999

[12] D O Cajueiro and B M Tabak ldquoTesting for predictability inequity returns for European transition marketsrdquo EconomicSystems vol 30 no 1 pp 56ndash78 2006

[13] D O Cajueiro and B M Tabak ldquoTesting for time-varyinglong-range dependence in real state equity returnsrdquo ChaosSolitons amp Fractals vol 38 no 1 pp 293ndash307 2008

[14] E J Chang E J A Lima and B M Tabak ldquoTesting forpredictability in emerging equity marketsrdquo Emerging MarketsReview vol 5 no 3 pp 295ndash316 2004

[15] A Sensoy K Ozturk E Hacihasanoglu and B M TabakldquoNot all emerging markets are the same a classification ap-proach with correlation based networksrdquo Journal of FinancialStability vol 33 pp 163ndash186 2017

[16] B M Tabak and E J A Lima ldquoMarket efficiency of Brazilianexchange rate evidence from variance ratio statistics andtechnical trading rulesrdquo European Journal of OperationalResearch vol 194 no 3 pp 814ndash820 2009

[17] C M Boya ldquoFrom efficient markets to adaptive marketsevidence from the French stock exchangerdquo Research in In-ternational Business and Finance vol 49 pp 156ndash165 2019

[18] R Ding and P Cheng ldquoSpeculative trading price pressureand overvaluationrdquo Journal of International Financial Mar-kets Institutions and Money vol 21 no 3 pp 419ndash442 2011

[19] E Lee and N Piqueira ldquoBehavioral biases of informed tradersevidence from insider trading on the 52-week highrdquo Journal ofEmpirical Finance vol 52 pp 56ndash75 2019

[20] T-Y Pak and P Babiarz ldquoDoes cognitive aging affect port-folio choicerdquo Journal of Economic Psychology vol 66pp 1ndash12 2018

Complexity 13

[21] T Suzuki and Y Ohkura ldquoFinancial technical indicator basedon chaotic bagging predictors for adaptive stock selection inJapanese and American marketsrdquo Physica A Statistical Me-chanics and its Applications vol 442 pp 50ndash66 2016

[22] A Urquhart and F McGroarty ldquoAre stock markets reallyefficient evidence of the adaptive market hypothesisrdquo In-ternational Review of Financial Analysis vol 47 pp 39ndash492016

[23] X Xiong Y Meng X Li and D Shen ldquoAn empirical analysisof the adaptive market hypothesis with calendar effects ev-idence from Chinardquo Finance Research Letters vol 31 2019

[24] H Takahashi and T Terano ldquoAnalyzing the influence ofoverconfident investors on financial markets through agent-based modelrdquo in Intelligent Data Engineering and AutomatedLearning-IDEAL 2007 H Yin P Tino E CorchadoW Byrne and X Yao Eds Springer Berlin HeidelbergGermany 2007

[25] B LeBaron ldquoEmpirical regularities from interacting long- andshort-memory investors in an agent-based stock marketrdquoIEEE Transactions on Evolutionary Computation vol 5 no 5pp 442ndash455 2001

[26] M A Bertella F R Pires L Feng and H E StanleyldquoConfidence and the stock market an agent-based approachrdquoPLoS One vol 9 no 1 Article ID e83488 2014

[27] H R Varian ldquoBig data new tricks for econometricsrdquo Journalof Economic Perspectives vol 28 no 2 pp 3ndash28 2014

[28] G James D Witten T Hastie and R Tibshirani An In-troduction to Statistical Learning With Applications in RSpringer Publishing Company Incorporated Switzerland2014

[29] T C Silva and L Zhao Machine Learning in ComplexNetworks Springer Publishing Company IncorporatedSwitzerland 1st edition 2016

[30] T Hastie R Tibshirani and J Friedman 8e Elements ofStatistical Learning Data Mining Inference and PredictionSpringer Berlin Heidelberg Germany 2nd edition 2009

[31] K J Arrow and G Debreu ldquoExistence of an equilibrium for acompetitive economyrdquo Econometrica vol 22 no 3pp 265ndash290 1954

[32] H A Simon Models of Man Social and Rational- Mathe-matical Essays on Rational Human Behavior in a Social Set-ting Wiley Hoboken NJ USA 1957

[33] L Neyse S Bosworth P Ring and U Schmidt ldquoOver-confidence incentives and digit ratiordquo Scientific Reportsvol 6 no 1 2016

[34] M A Lundeberg P W Fox and J Pun coha ldquoHighlyconfident but wrong gender differences and similarities inconfidence judgmentsrdquo Journal of Educational Psychologyvol 86 no 1 pp 114ndash121 1994

[35] O Onishchenko and N Ulku ldquoForeign investor tradingbehavior has evolvedrdquo Journal of Multinational FinancialManagement vol 51 pp 98ndash115 2019

[36] M Abreu ldquoHow biased is the behavior of the individualinvestor in warrantsrdquo Research in International Business andFinance vol 47 pp 139ndash149 2019

[37] J-C Li Y-X Li N-S Tang and D-C Mei ldquo+e roles ofmean residence time on herd behavior in a financial marketrdquoPhysica A Statistical Mechanics and Its Applications vol 462pp 350ndash357 2016

[38] C Liu and X Li ldquoMedia coverage and investor scare behaviordiffusionrdquo Physica A Statistical Mechanics and Its Applica-tions vol 527 p 121398 2019

[39] KW Park S H Jeong and J Y J Oh ldquoForeigners at the gateforeign investor trading and the disposition effect of domestic

individual investorsrdquo 8e North American Journal of Eco-nomics and Finance vol 49 pp 165ndash180 2019

[40] Y Shi Y-r Tang and W Long ldquoSentiment contagionanalysis of interacting investors evidence from Chinarsquos stockforumrdquo Physica A Statistical Mechanics and its Applicationsvol 523 pp 246ndash259 2019

[41] J R Wei J P Huang and P M Hui ldquoAn agent-based modelof stock markets incorporating momentum investorsrdquoPhysica A Statistical Mechanics and its Applications vol 392no 12 pp 2728ndash2735 2013

[42] R J Balvers and Y Wu ldquoMomentum and mean reversionacross national equity marketsrdquo Journal of Empirical Financevol 13 no 1 pp 24ndash48 2006

[43] D B Chaves and V Viswanathan ldquoMomentum and mean-reversion in commodity spot and futures marketsrdquo Journal ofCommodity Markets vol 3 no 1 pp 39ndash53 2016

[44] B M Barber and T Odean ldquoBoys will be boys genderoverconfidence and common stock investmentrdquo 8e Quar-terly Journal of Economics vol 116 no 1 pp 261ndash292 2001

[45] J I PeNtildea ldquoDaily seasonalities and stock market reforms inSpainrdquo Applied Financial Economics vol 5 no 6 pp 419ndash423 1995

[46] T C Silva and L Liang Zhao ldquoNetwork-based high level dataclassificationrdquo IEEE Transactions on Neural Networks andLearning Systems vol 23 no 6 pp 954ndash970 2012

[47] T C Silva and L Liang Zhao ldquoNetwork-based stochasticsemisupervised learningrdquo IEEE Transactions on Neural Net-works and Learning Systems vol 23 no 3 pp 451ndash466 2012

[48] T C Silva and L Liang Zhao ldquoStochastic competitive learningin complex networksrdquo IEEE Transactions on Neural Networksand Learning Systems vol 23 no 3 pp 385ndash398 2012

[49] C M Bishop Pattern Recognition and Machine Learning(Information Science and Statistics) Springer-Verlag BerlinHeidelberg Germany 2006

[50] M A Petersen ldquoEstimating standard errors in finance paneldata sets comparing approachesrdquo Review of Financial Studiesvol 22 no 1 pp 435ndash480 2009

[51] T K Hira and C Loibl ldquoGender differences in investmentbehaviorrdquo in Handbook of Consumer Finance ResearchJ J Xiao Ed Springer New York NY USA 2008

[52] A E Sunden and B J Surette ldquoGender differences in theallocation of assets in retirement savings plansrdquo8eAmericanEconomic Review vol 88 pp 207ndash211 1998

[53] M Grinblatt M Keloharju and J Linnainmaa ldquoIQ and stockmarket participationrdquo 8e Journal of Finance vol 66 no 6pp 2121ndash2164 2011

[54] L Guiso and T Jappelli ldquoAwareness and stock market par-ticipationrdquo Review of Finance vol 9 no 4 pp 537ndash567 2005

[55] N Barberis andW Xiong ldquoWhat drives the disposition effectan analysis of a long-standing preference-based explanationrdquo8e Journal of Finance vol 64 no 2 pp 751ndash784 2009

[56] H Shefrin and M Statman ldquo+e disposition to sell winnerstoo early and ride losers too long theory and evidencerdquo 8eJournal of Finance vol 40 no 3 pp 777ndash790 1985

14 Complexity

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 5: Modeling Investor Behavior Using Machine Learning: Mean … · 2019. 12. 26. · Research Article Modeling Investor Behavior Using Machine Learning: Mean-Reversion and Momentum Trading

Female Male

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

Monday

Tuesday

Wednesday

ursday

Friday

Month

Day

0 4 8Investment variation ()

(a)

High school Higher education

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

Monday

Tuesday

Wednesday

ursday

Friday

Month

Day

0 4 8 12Investment variation ()

(b)

Figure 1 Heatmap showing average daily investment changes () in 2016 broken down by investorrsquos gender (a) and education degree(b) We take average values of investment decisions that took place in every weekday within the month Due to the existence of somelarge absolute values and to improve readability we winsorize the investment variation distribution by 5 at each side before taking theaverage

Feb

16

May

16

Aug

16

Nov

16

Feb

17

May

17

Aug

17

Nov

17

Feb

18

May

18

Aug

18

Nov

18

Time

RegionMidwestNorthNortheast

SouthSoutheast

0

100

200

300

400

Inve

stmen

t var

iatio

n (

)

Figure 2 In this figure we aggregate investment made by all investors by states to which they belong As we can we have investors from allparts of Brazil in our database

Complexity 5

Midwest

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

minus25 0 25 50minus50Investment variation ()

Northeast

minus25 0 25 50minus50Investment variation ()

North

minus25 0 25 50minus50Investment variation ()

TypeFemaleMale

South

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

minus25 0 25 50minus50Investment variation ()

Southeast

minus25 0 25 50minus50Investment variation ()

Figure 3 Distribution of investment variation across different Brazilian states (Midwest North Northeast South and Southeast) brokendown by investorrsquos gender (female or male) Due to the existence of some large absolute values and to improve readability we winsorize theinvestment variation distribution by 5 at each side

6 Complexity

Midwest

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

0 25 50minus25Investment variation ()

North

0 25 50minus25Investment variation ()

Northeast

0 25 50minus25Investment variation ()

Southeast

0 25 50minus25Investment variation ()

South

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

TypeHigh schoolHigher education

0 25 50minus25Investment variation ()

Figure 4 Distribution of investment variation across different Brazilian states (Midwest North Northeast South and Southeast) brokendown by investorrsquos education (higher education or high school) Due to the existence of some large absolute values and to improvereadability we winsorize the investment variation distribution by 5 at each side

Complexity 7

with time fixed effects to purge out macroeconomic com-ponents as follows

Δyit αt + ϵit (1)

in which Δyit denotes the portfolio volume variation in thestock market of investor i at time t αt represents time fixedeffects and ϵit is the residual In this specification we in-terpret the residual ϵit as any variation of investor irsquosportfolio volume at time t that is not due to any timecommon factor such as the underlying macroeconomicscenario By using ϵit instead of Δyit we can effectively treatthe data as a large cross-sectional unit Hence we are able tofully use machine learning methods at their best setupwhich we discuss further

We choose an elastic net regression to estimate theimportance of each attribute in the model Such regressionoptimally combines L2-norm (Ridge) and L1-norm (Lasso)regularization +erefore we are able to prevent anyoverfitting in our empirical model Moreover we use aconvex combination of L1-norm which tends to shrinksthe majority of the nonrelevant regressors to zero and keepthe most important nonzero and L2-norm which tends tooutput nonzero and approximate coefficients for all thesimilar regressors By using both regularization schemeswe are able to enjoy the positive characteristics of bothschemes

To select the most important attributes we use the re-sidual ϵit the investment volume variation of investor i attime t not due to common time factors as dependentvariable and different IBOVESPA index time aggregationsand investorsrsquo biological and education characteristics asindependent variables as follows

ϵit βTmiddot Xit + errorit (2)

in which Xit is a vector composed of past IBOVESPAchanges with different windows (1- 2- 3- 5- 10- 20- and30-day IBOVESPA changes) and investorsrsquo characteristics(state of residency gender and level of schooling) +e termerrorit is the standard error According to the elastic netprocedure we select β that minimizes the following lossfunction L(β)

L(β) 1113944t1

T

1113944i1

N

ϵit minus 1113944j1

p

βjx(j)it

⎛⎝ ⎞⎠

2

+ λ (1 minus α)||β||222

+ α||β||211113890 1113891

(3)

in which t isin 1 N index times on a daily basis (fromJanuary 1 2016 to December 31 2018) and i index in-vestors +e term x

(j)it indexes the jth regressor of investor i

at time t +e operators ||||1 and ||||2 indicate L1-and L2-norms taken over the vector input

+e first expression in (3) denotes the traditional datafitting error (residuals) while the second represents theregularization term Parameter λ modulates the importanceof the traditional and regularization terms +e term αcontrols the convex mixture of L1 and L2 regularization +eregularization works by penalizing large β coefficients+erefore it shrinks the estimated coefficients and the

overall fit function becomes smoother over the datadistribution

In the elastic net regression a takes values in between 0and 1 We optimally tune a and λ using a nested cross-validation procedure with k 10 folds and 100 independentrepeats for statistical robustness [29 49] In this procedurewe use k minus 1 9 folds for training and the remaining fold fortesting +is procedure is cycled k times such that each foldappears exactly once for testing Such methodology enablesus to tune the regularization parameters while preventingoverfitting of the model We optimize a over the grid searchspace 0 005 010 1 and λ over 0 01 02 5 Asstandard practice we preprocess all regressors by applying aZ-score standardization over all the data points using pre-determined values extracted only from the training data (soas to prevent data leakage from the test set)

Figure 5 shows our results for the importance of differenttime aggregations of the IBOVESPA index in explaining in-vestorsrsquo behavior +e optimal regularization parameters wereλ 01 and α 035 We normalize the coefficients in termsof the most important attribute +e attribute ldquo1-day IBO-VESPA variationrdquo is the most powerful predictor forexplaining investorsrsquo behavior followed by ldquo2-day IBOVESPAvariationrdquo and ldquo5-day IBOVESPA variationrdquo +is suggeststhat investors prefer to base their investment decisions usingshort-term variations of the stock market index Even thoughmore prolonged periods of IBOVESPA index changes areimportantmdashsuch as 10- 20- and 30-day variationsmdashthey aremuch less important than the short-term variations In ad-dition we find that investorsrsquo gender and schooling level arealso important characteristics for explaining buy and selloperations over the Brazilian stock exchange market in theperiod from 2016 to 2018 We also observe that some regionalvariables are important such as Santa Catarina Rio de JaneiroDistrito FederalMinas Gerais and Parana and Satildeo Paulo+ismay suggest a different mass of investorsrsquo composition acrossdifferent states

Our feature selection procedure gives us an objective wayof identifying potentially important variables that should beaccounted for in our econometric exercise Such tool takentogether with the analystrsquos expertise to assess their validity interms of relationship with the analyzed measure is an im-portant step in producing econometric methods in a morereliable manner Our results point that we should control forinvestorsrsquo characteristics (gender and schooling level) andalso past IBOVESPA variations +e investorrsquos state is notimportant because we will employ a panel-data analysis withfixed effects at the investor level +erefore the investorrsquosstate is collinear with the investor fixed effect and would bedropped during the estimations

4 Econometric Analysis with SelectedVariables

In the previous section we have found that short-termvariations of the IBOVESPA index are better predictors forbuy or sell operations in the Brazilian stock exchange marketthan long-term variations+e feature selection procedure isa transparent way of choosing relevant variables in an ob-jective way However such method does not provide an

8 Complexity

answer as to whether each variable contributes positively ornegatively to the target variable ie the investment decisionof the investor (buy or sell) In this section we look at suchdirection and estimate the magnitude of the most relevantvariables found by our feature selection technique

In Section 41 we first test whether investment decisionsof Brazilian investors better fit to a mean-reversal or mo-mentum strategy For that we regress total investmentvariations of investors against past variations of the IBO-VESPA index For robustness we use 1- 2- 3- 5- and 30-day variations of the IBOVESPA index Our regressions areat the investor level which enables us to control for un-observed time-invariant characteristics of each Brazilianinvestor which would otherwise be impractical in case wehad aggregate data like most existing studies +erein wefind that the mean-reversal technique better explains buy orsell operations in the Brazilian stock market during theperiod from 2016 to 2018 Our results corroborate thefindings of our feature selection technique short-termvariations explain more buy or sell operations than long-term variations

In Sections 42 and 43 we study the determinants thateither soften or exacerbate the mean-reversal behavior ofBrazilian investors by looking at the role of gender and levelof schooling respectively of investors +ese exercisesconnect with the existing literature on the influence ofsocioeconomic and biological features in shaping the be-havior of economic agents

41 Do Investors Use a Mean-Reversion or a MomentumStrategy in 8eir Buy and Sell Operations To answer howinvestors respond to changes in the IBOVESPA index werun the following econometric specification

Δyit αi + αtlowast + βΔIBOVESPAt + ηit (4)

in which Δyit is the portfolio volume variation of investor iat time t +ere is a positive variation (Δyit gt 0) when in-vestor i buys more stocks at time t and a negative variation(Δyit lt 0) when she sells Alternatively when the investorholds her investment over time then (Δyit 0) +e factorηit is the standard error term

Our coefficient of interest is β which captures investorsrsquoresponses to variations of the IBOVESPA index denoted asΔIBOVESPAt We test whether investors use a mean-re-versal or momentum strategy as follows (we discard thehypothesis that investorsrsquo buy and sell decisions are un-related to variations of the IBOVESPA index because ourfeature selection technique identified past variations of theIBOVESPA index as the most relevant predictors of in-vestor-specific investment variations)

(i) If investors use amean-reversal strategy then increasesin the IBOVESPA indexmdashie ΔIBOVESPAt gt 0mdashare followed by sell operations in such a way that theinvestment volume of investors on average decreases(Δyit lt 0) +erefore a mean-reversal strategy istranslated by a negative β coefficient (βlt 0)

(ii) If investors use a momentum strategy then increasesin the IBOVESPA indexmdashie ΔIBOVESPAt gt 0mdashare followed by buy operations in such a way thatthe investment volume of investors on averageincreases (Δyit gt 0) +erefore a momentum strat-egy is translated by a positive β coefficient (βgt 0)

As there is persistence of the past IBOVESPA indexvariations by construction we test how investorsrsquo in-vestment volume respond to 1- 2- 3- 5- and 30-dayvariations of the IBOVESPA index in an independentmanner +is empirical design strategy prevents standarderrors to get overly inflated due to high pairwise correlationof these regressors

+e term αi represents investor fixed effects and ab-sorbs any nonobserved time-invariant characteristic ofeach investor in the sample +is mitigates potentialomitted variables that could bias our results such as in-vestorsrsquo skill which is hard to measure We should notethat any omitted variable that is time variant would not beabsorbed by the investor fixed effect +erefore while theintroduction of such fixed effect mitigates omitted variablebias it does not completely avoid it For instance if in-vestorsrsquo skill significantly increases over time then wewould have an omitted variable bias Since our panel spansa relatively small time periodmdash2016 to 2018mdashit is fair toassume that investorsrsquo skill remains roughly constant +e

Satildeo Paulo StateRio Grande do Sul State

20-day IBOVESPA variationParanaacute State

Minas Gerais StateDistrito Federal State

10-day IBOVESPA variationRio de Janeiro State

Santa Catarina State30-day IBOVESPA variation

Male investorHighly educated investor

3-day IBOVESPA variation5-day IBOVESPA variation2-day IBOVESPA variation1-day IBOVESPA variation

25 50 75 1000Importance ( normalized)

Figure 5 Feature selection results using an elastic net procedure with L2 and L1 regularization Coefficients are normalized in terms of themost important attribute (ldquo1-day IBOVESPA variationrdquo)

Complexity 9

term αtlowast connotes time-fixed effects at the year-monthlevel which absorbs any homogeneous time-variant effectsuch as the Brazilian recession or month-wise exchangerate fluctuations Since our panel frequency is on a dailybasis we cannot add a time fixed effect at the same fre-quency because our coefficient of interestmdashβmdashwould getabsorbed by the time fixed effects as it only varies acrosstime To prevent such problem we use a less granular timefixed effects namely month-year

Our data set contains 13247 investors in a large andrepresentative bank in Brazil and 610 time points Due to thisconfiguration we follow Petersen [50] and double-clusterstandard errors at the investor and time levels +is is arobust strategy that is important for panels with a largenumber of individuals and time points because it mitigatesheteroscedasticity and serial correlation Finally our data isin percentage terms

Table 2 reports our estimates of Regression (4) Weobserve that a 1 percent increase of the IBOVESPA indexassociates with an average decrease of 9693 of the investorportfolio volume when we look at the 1-day IBOVESPAvariation +e results remain with a statistically significantcoefficient across different lengths of past IBOVESPA var-iations (2- 3- and 5-day variations) except for 30-dayvariations in which the statistical significance vanishesMoreover the magnitude of the coefficient reduces as we useless recent past variations of the IBOVESPA index which isconsistent with the view that investors in our sample aremore concerned with short-term rather long-term variationsof the IBOVESPA index +e negative and statisticallysignificant sign corroborates the hypothesis that investorsuse mean-reverting trading strategies in which they tend to

sell after substantial upward changes of the IBOVESPAindex and tend to buy after downward changes

42 Does Gender Impact Investors Responsiveness to IBO-VESPA IndexChanges We have showed empirical evidencethat investorsrsquo strategy on average better fit to a mean-reverting behavior in the Brazilian stockmarket+at is theytend to sell after positive changes of the IBOVESPA indexand buy after negative changes In this section we askwhether the sensitiveness of investors to the IBOVESPAindex depends on their biological characteristics in specialtheir gender Biological factorsmdashespecially gendermdashhavebeen extensively explored in investment decision-makingNotable works relating biological factors including genderare Hira and Loibl [51] Lundeberg et al [34] Neyse et al[33] and Sunden and Surette [52] +is paper providesfurther evidence of the existence of such gender gap ininvestment decisions using a microdata on investor-matched buy and sell operations

In this line of research Neyse et al [33] and Lundeberget al [34] partly attribute behavioral differences among malesand females due to systematic changes in overconfidenceExcessive overconfidence is associated with higher levels oftestosterone which is more pronounced in males Over-confidence may induce investors to take on higher risksleading them to look for higher returns in the short term Inthis way we would expect that females be less sensitive tochanges of past IBOVESPA variations as they would valuemore fundamentals and look for yields in the longer term+erefore short-term variations of the IBOVESPA indiceswould explain less their buy or sell operations comparatively

Table 2 Output from Regression (4) We ask how investors respond to changes in the IBOVESPA index We only use changes rather thanpast averages because the former has greater prediction power as reported by our feature selection procedure +e dependent variable is thevariation of portfolio investment volume of investor i at time t in the Brazilian stock market from the beginning of 2016 to the end of 2018Regressors are 1- (1) 2- (2) 3- (3) 5- (4) and 30-day (5) IBOVESPA index variations +e panel is on a daily frequency basis FollowingPetersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010 lowastlowastplt 005 and lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔIBOVESPAt with1-day variation minus 9693lowastlowastlowast

(1580)2-day variation minus 4656lowastlowastlowast

(1160)3-day variation minus 2400lowastlowast

(0964)5-day variation minus 2265lowastlowastlowast

(0852)30-day variation 0058

(0680)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0037 0036 0036 0035 0033Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

10 Complexity

to males To empirically answer this question we constructthe following econometric specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Femalei + ηit(5)

in which Femalei is a dummy variable that takes the value of1 when investor i is female and 0 otherwise We do not addthe investorrsquos gender alone in (5) because it would beabsorbed by the investor fixed effects αi Our coefficient ofinterest is β2 which captures any behavioral deviation offemales to changes of the IBOVESPA index with respect tothe average of the entire sample (male and female) If β2 gt 0then the mean-reversal strategy is less pronounced to fe-males while β2 lt 0 indicates a more accentuated behaviortowards the mean-reversal strategy In the case β2 0 thenfemales and males respond on average equivalently tochanges of the IBOVESPA index Following the discussionon overconfidence and its influence on short-term decisionsover males and females our hypothesis is that β2 gt 0

Table 3 reports our estimates of Regression (5) Ourprevious results relating the mean-reversal strategy of in-vestors in the Brazilian stock market remain the same Weobserve that the interaction of changes in the IBOVESPAindex and the dummy female is positive and statisticallysignificant +is empirical finding corroborates the view thatfemale investors have a less pronounced mean-reversalstrategy than males as they look at longer-term returns andare less attentive to short-term variations of the IBOVESPAindex which could arise due to noisy information Forinstance looking at Specification (1) a 1 percent positivechange in the IBOVESPA index associates with a decrease ofminus 10345 + 6543 minus 3802 of the invested volume of femaleinvestors In contrast the entire sample (males and females)decreases their portfolio volume on average by minus 10345for a 1 percent positive change in the IBOVESPA indexInterestingly even though statistically insignificant 30-dayvariations of the IBOVESPA index are positively associatedwith investment volumes for females suggesting traits of amomentum strategy +is is also suggestive evidence that

Table 3 Output from Regression (5)We ask whether female investors have different sensitiveness with respect to their investment portfolioto IBOVESPA index changes We only use changes rather than past averages because the former has greater prediction power as reported byour feature selection procedure+e dependent variable is the variation of portfolio investment volume of investor i at time t in the Brazilianstock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3) 5- (4) and 30-day (5) IBOVESPA indexvariations as well as their interaction with the investorrsquos gender+e panel is on a daily frequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010 lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔIBOVESPAt with1-day variation minus 10345lowastlowastlowast

(1754)2-day variation minus 5019lowastlowastlowast

(1275)3-day variation minus 2650lowastlowast

(1055)5-day variation minus 2315lowastlowastlowast

(0886)30-day variation minus 0001

(0693)Interactions of ΔIBOVESPAt with gender1-day variation middot Female 6543lowastlowastlowast

(2392)2-day variation middot Female 3708lowastlowastlowast

(1264)3-day variation middot Female 2585lowastlowast

(1212)5-day variation middot Female 0517

(1538)30-day variation middot Female 0572

(0831)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0039 0037 0036 0035 0034Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

Complexity 11

females tend to look at longer horizons when taking in-vestment decisions

43 Does Formal Education Impact Investors Responsivenessto IBOVESPA Index Changes In this section we look at howformal education (academic degree or level of schooling) caninfluence investorsrsquo sensitiveness to IBOVESPA index varia-tions +ere are several works in the behavioral finance lit-erature that attempt to establish a connection between level ofschooling and investorsrsquo awareness of stock markets and theirdecision-making determinants We highlight the researchstudies of Grinblatt et al [53] and Guiso and Jappelli [54] Intheory educated investors should behave in more rationalways and trade less frequently when there is no new relevantinformation arriving in the market but noises +erefore wewould expect these investors to have a smaller reaction to pricefluctuations as they are able to better identify informationfrom noise To empirically test this behavior we run thefollowing specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Higher Educationi + ηit(6)

in which Higher Educationi is a dummy variable that takesthe value of 1 when investor i has a higher education (at leastcollege degree) and 0 otherwise (high school or a lowerdegree) Our coefficient of interest is β2 which captures anybehavioral deviation of investors with higher formal edu-cation to changes of the IBOVESPA index with respect to theaverage of the entire sample +e hypothesis is that β2 lt 0 inwhich more educated investors tend to better discern in-formation from noise out of variations of the IBOVESPAindex and therefore the mean-reversal strategy would be lesspronounced

Table 4 reports our estimates of Regression (6) Onaverage the mean-reversal strategy remains We note thatthe interaction of changes of the IBOVESPA index and thedummy higher education is positive and statistically sig-nificant +is suggests that investors with higher academic

Table 4 +is table reports output from Regression (6) We ask whether investors with higher academic degree have different sensitivenesswith respect to their investment portfolio to IBOVESPA index changes We only use changes rather than past averages because the formerhas greater prediction power as reported by our feature selection procedure +e dependent variable is the variation of portfolio investmentvolume of investor i at time t in the Brazilian stock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3)5- (4) and 30-day (5) IBOVESPA index variations as well as their interaction with the investorrsquos academic degree +e panel is on a dailyfrequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔΔIBOVESPAt with1-day variation minus 10347lowastlowastlowast

(1795)2-day variation minus 5136lowastlowastlowast

(1299)3-day variation minus 2750lowastlowast

(1076)5-day variation minus 2565lowastlowastlowast

(0931)30-day variation 0085

(0695)Interactions of ΔIBOVESPAt with academic degree1-day variation middot Higher education 5347lowastlowastlowast

(1520)2-day variation middot Higher education 3915lowastlowastlowast

(1040)3-day variation middot Higher education 2864lowast

(1573)5-day variation middot Higher education 2471lowast

(1398)30-day variation middot Higher education minus 0237

(0647)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0038 0036 0036 0035 0035Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

12 Complexity

degree have a less pronounced mean-reversal strategy thanless educated investors which corroborates our hypothesisLooking at Specification (3) we observe a positive thoughmarginally significant relationship between IBOVESPAchanges and investment volume (minus 2750 + 2864 0114) formore educated investors suggesting traits of a momentumstrategy

5 Conclusions

We employ machine learning techniques together witheconometrics techniques to model investor behavior using aunique dataset for investors that focus on stock marketinvestments We propose a methodological approach to linkmachine learning methods widely used in computer scienceto standard econometric techniques commonly employed insocial sciences

Using the unique data set with high-frequency dailyinvestment decision of a broad set of investors in Brazil weprovide evidence that investors look at past performance of abenchmark stock index in order to decide their investmentdecisions Investors seem to prefer mean-reverting strategiesin the short-run rather than momentum +is may be as-sociated with the disposition effect - investors prefer to sellthe winners and buy the losers [55 56] Furthermore re-search could exploit alternative explanations for thisbehavior

In addition we study the determinants that either softenor exacerbate the mean-reversal behavior of Brazilian in-vestors by looking at the role of gender and level ofschoolingWe find that females andmore educated investorsare less sensitive to changes of past IBOVESPA variationswhich is consistent with the literature on behavioral finance

+is paper highlights the importance of using non-traditional methods in econometric analysis +e use ofmachine learning methods permits us to automate the oftensubjective process of choosing which variables are importantin any econometric analysis By using a feature selectionschememdashsuch as the elastic net in this papermdashwe are able toidentify those attributes that best describe how investorsdecide to buy or sell their positions in an objective andstatistically correct manner In addition to that the businessspecialist can always assess these variables pointed out asmost important to analyze their economic meaning

Data Availability

+e data is confidential

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+iago C Silva (grant no 4085462018-2) and Benjamin MTabak (grant no 3105412018-2 4251232018-9) gratefullyacknowledge financial support from the CNPq foundation

References

[1] H Zou and T Hastie ldquoRegularization and variable selectionvia the elastic netrdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 67 no 2 pp 301ndash3202005

[2] E F Fama and K R French ldquoPermanent and temporarycomponents of stock pricesrdquo Journal of Political Economyvol 96 no 2 pp 246ndash273 1988

[3] A W Lo and A C MacKinlay ldquoStock market prices do notfollow random walks evidence from a simple specificationtestrdquo Review of Financial Studies vol 1 no 1 pp 41ndash66 1988

[4] J M Poterba and L H Summers ldquoMean reversion in stockpricesrdquo Journal of Financial Economics vol 22 no 1pp 27ndash59 1988

[5] J Conrad and G Kaul ldquoAn anatomy of trading strategiesrdquoReview of Financial Studies vol 11 no 3 pp 489ndash519 1998

[6] C Engel and J D Hamilton ldquoLong swings in the dollar arethey in the data and do markets know itrdquo 8e AmericanEconomic Review vol 80 pp 689ndash713 1990

[7] N Jegadeesh and S Titman ldquoMomentumrdquo Annual Review ofFinancial Economics vol 3 no 1 pp 493ndash509 2011

[8] R S J Koijen J C Rodrıguez and A Sbuelz ldquoMomentumand mean reversion in strategic asset allocationrdquo Manage-ment Science vol 55 no 7 pp 1199ndash1213 2009

[9] M Morrin J Jacoby G V Johar X He A Kuss andD Mazursky ldquoTaking stock of stockbrokers exploring mo-mentum versus contrarian investor strategies and profilesrdquoJournal of Consumer Research vol 29 no 2 pp 188ndash198 2002

[10] J Okunev andDWhite ldquoDomomentum-based strategies stillwork in foreign currency marketsrdquo 8e Journal of Financialand Quantitative Analysis vol 38 no 2 pp 425ndash447 2003

[11] D Schiereck W De Bondt and M Weber ldquoContrarian andmomentum strategies in Germanyrdquo Financial AnalystsJournal vol 55 no 6 pp 104ndash116 1999

[12] D O Cajueiro and B M Tabak ldquoTesting for predictability inequity returns for European transition marketsrdquo EconomicSystems vol 30 no 1 pp 56ndash78 2006

[13] D O Cajueiro and B M Tabak ldquoTesting for time-varyinglong-range dependence in real state equity returnsrdquo ChaosSolitons amp Fractals vol 38 no 1 pp 293ndash307 2008

[14] E J Chang E J A Lima and B M Tabak ldquoTesting forpredictability in emerging equity marketsrdquo Emerging MarketsReview vol 5 no 3 pp 295ndash316 2004

[15] A Sensoy K Ozturk E Hacihasanoglu and B M TabakldquoNot all emerging markets are the same a classification ap-proach with correlation based networksrdquo Journal of FinancialStability vol 33 pp 163ndash186 2017

[16] B M Tabak and E J A Lima ldquoMarket efficiency of Brazilianexchange rate evidence from variance ratio statistics andtechnical trading rulesrdquo European Journal of OperationalResearch vol 194 no 3 pp 814ndash820 2009

[17] C M Boya ldquoFrom efficient markets to adaptive marketsevidence from the French stock exchangerdquo Research in In-ternational Business and Finance vol 49 pp 156ndash165 2019

[18] R Ding and P Cheng ldquoSpeculative trading price pressureand overvaluationrdquo Journal of International Financial Mar-kets Institutions and Money vol 21 no 3 pp 419ndash442 2011

[19] E Lee and N Piqueira ldquoBehavioral biases of informed tradersevidence from insider trading on the 52-week highrdquo Journal ofEmpirical Finance vol 52 pp 56ndash75 2019

[20] T-Y Pak and P Babiarz ldquoDoes cognitive aging affect port-folio choicerdquo Journal of Economic Psychology vol 66pp 1ndash12 2018

Complexity 13

[21] T Suzuki and Y Ohkura ldquoFinancial technical indicator basedon chaotic bagging predictors for adaptive stock selection inJapanese and American marketsrdquo Physica A Statistical Me-chanics and its Applications vol 442 pp 50ndash66 2016

[22] A Urquhart and F McGroarty ldquoAre stock markets reallyefficient evidence of the adaptive market hypothesisrdquo In-ternational Review of Financial Analysis vol 47 pp 39ndash492016

[23] X Xiong Y Meng X Li and D Shen ldquoAn empirical analysisof the adaptive market hypothesis with calendar effects ev-idence from Chinardquo Finance Research Letters vol 31 2019

[24] H Takahashi and T Terano ldquoAnalyzing the influence ofoverconfident investors on financial markets through agent-based modelrdquo in Intelligent Data Engineering and AutomatedLearning-IDEAL 2007 H Yin P Tino E CorchadoW Byrne and X Yao Eds Springer Berlin HeidelbergGermany 2007

[25] B LeBaron ldquoEmpirical regularities from interacting long- andshort-memory investors in an agent-based stock marketrdquoIEEE Transactions on Evolutionary Computation vol 5 no 5pp 442ndash455 2001

[26] M A Bertella F R Pires L Feng and H E StanleyldquoConfidence and the stock market an agent-based approachrdquoPLoS One vol 9 no 1 Article ID e83488 2014

[27] H R Varian ldquoBig data new tricks for econometricsrdquo Journalof Economic Perspectives vol 28 no 2 pp 3ndash28 2014

[28] G James D Witten T Hastie and R Tibshirani An In-troduction to Statistical Learning With Applications in RSpringer Publishing Company Incorporated Switzerland2014

[29] T C Silva and L Zhao Machine Learning in ComplexNetworks Springer Publishing Company IncorporatedSwitzerland 1st edition 2016

[30] T Hastie R Tibshirani and J Friedman 8e Elements ofStatistical Learning Data Mining Inference and PredictionSpringer Berlin Heidelberg Germany 2nd edition 2009

[31] K J Arrow and G Debreu ldquoExistence of an equilibrium for acompetitive economyrdquo Econometrica vol 22 no 3pp 265ndash290 1954

[32] H A Simon Models of Man Social and Rational- Mathe-matical Essays on Rational Human Behavior in a Social Set-ting Wiley Hoboken NJ USA 1957

[33] L Neyse S Bosworth P Ring and U Schmidt ldquoOver-confidence incentives and digit ratiordquo Scientific Reportsvol 6 no 1 2016

[34] M A Lundeberg P W Fox and J Pun coha ldquoHighlyconfident but wrong gender differences and similarities inconfidence judgmentsrdquo Journal of Educational Psychologyvol 86 no 1 pp 114ndash121 1994

[35] O Onishchenko and N Ulku ldquoForeign investor tradingbehavior has evolvedrdquo Journal of Multinational FinancialManagement vol 51 pp 98ndash115 2019

[36] M Abreu ldquoHow biased is the behavior of the individualinvestor in warrantsrdquo Research in International Business andFinance vol 47 pp 139ndash149 2019

[37] J-C Li Y-X Li N-S Tang and D-C Mei ldquo+e roles ofmean residence time on herd behavior in a financial marketrdquoPhysica A Statistical Mechanics and Its Applications vol 462pp 350ndash357 2016

[38] C Liu and X Li ldquoMedia coverage and investor scare behaviordiffusionrdquo Physica A Statistical Mechanics and Its Applica-tions vol 527 p 121398 2019

[39] KW Park S H Jeong and J Y J Oh ldquoForeigners at the gateforeign investor trading and the disposition effect of domestic

individual investorsrdquo 8e North American Journal of Eco-nomics and Finance vol 49 pp 165ndash180 2019

[40] Y Shi Y-r Tang and W Long ldquoSentiment contagionanalysis of interacting investors evidence from Chinarsquos stockforumrdquo Physica A Statistical Mechanics and its Applicationsvol 523 pp 246ndash259 2019

[41] J R Wei J P Huang and P M Hui ldquoAn agent-based modelof stock markets incorporating momentum investorsrdquoPhysica A Statistical Mechanics and its Applications vol 392no 12 pp 2728ndash2735 2013

[42] R J Balvers and Y Wu ldquoMomentum and mean reversionacross national equity marketsrdquo Journal of Empirical Financevol 13 no 1 pp 24ndash48 2006

[43] D B Chaves and V Viswanathan ldquoMomentum and mean-reversion in commodity spot and futures marketsrdquo Journal ofCommodity Markets vol 3 no 1 pp 39ndash53 2016

[44] B M Barber and T Odean ldquoBoys will be boys genderoverconfidence and common stock investmentrdquo 8e Quar-terly Journal of Economics vol 116 no 1 pp 261ndash292 2001

[45] J I PeNtildea ldquoDaily seasonalities and stock market reforms inSpainrdquo Applied Financial Economics vol 5 no 6 pp 419ndash423 1995

[46] T C Silva and L Liang Zhao ldquoNetwork-based high level dataclassificationrdquo IEEE Transactions on Neural Networks andLearning Systems vol 23 no 6 pp 954ndash970 2012

[47] T C Silva and L Liang Zhao ldquoNetwork-based stochasticsemisupervised learningrdquo IEEE Transactions on Neural Net-works and Learning Systems vol 23 no 3 pp 451ndash466 2012

[48] T C Silva and L Liang Zhao ldquoStochastic competitive learningin complex networksrdquo IEEE Transactions on Neural Networksand Learning Systems vol 23 no 3 pp 385ndash398 2012

[49] C M Bishop Pattern Recognition and Machine Learning(Information Science and Statistics) Springer-Verlag BerlinHeidelberg Germany 2006

[50] M A Petersen ldquoEstimating standard errors in finance paneldata sets comparing approachesrdquo Review of Financial Studiesvol 22 no 1 pp 435ndash480 2009

[51] T K Hira and C Loibl ldquoGender differences in investmentbehaviorrdquo in Handbook of Consumer Finance ResearchJ J Xiao Ed Springer New York NY USA 2008

[52] A E Sunden and B J Surette ldquoGender differences in theallocation of assets in retirement savings plansrdquo8eAmericanEconomic Review vol 88 pp 207ndash211 1998

[53] M Grinblatt M Keloharju and J Linnainmaa ldquoIQ and stockmarket participationrdquo 8e Journal of Finance vol 66 no 6pp 2121ndash2164 2011

[54] L Guiso and T Jappelli ldquoAwareness and stock market par-ticipationrdquo Review of Finance vol 9 no 4 pp 537ndash567 2005

[55] N Barberis andW Xiong ldquoWhat drives the disposition effectan analysis of a long-standing preference-based explanationrdquo8e Journal of Finance vol 64 no 2 pp 751ndash784 2009

[56] H Shefrin and M Statman ldquo+e disposition to sell winnerstoo early and ride losers too long theory and evidencerdquo 8eJournal of Finance vol 40 no 3 pp 777ndash790 1985

14 Complexity

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 6: Modeling Investor Behavior Using Machine Learning: Mean … · 2019. 12. 26. · Research Article Modeling Investor Behavior Using Machine Learning: Mean-Reversion and Momentum Trading

Midwest

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

minus25 0 25 50minus50Investment variation ()

Northeast

minus25 0 25 50minus50Investment variation ()

North

minus25 0 25 50minus50Investment variation ()

TypeFemaleMale

South

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

minus25 0 25 50minus50Investment variation ()

Southeast

minus25 0 25 50minus50Investment variation ()

Figure 3 Distribution of investment variation across different Brazilian states (Midwest North Northeast South and Southeast) brokendown by investorrsquos gender (female or male) Due to the existence of some large absolute values and to improve readability we winsorize theinvestment variation distribution by 5 at each side

6 Complexity

Midwest

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

0 25 50minus25Investment variation ()

North

0 25 50minus25Investment variation ()

Northeast

0 25 50minus25Investment variation ()

Southeast

0 25 50minus25Investment variation ()

South

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

TypeHigh schoolHigher education

0 25 50minus25Investment variation ()

Figure 4 Distribution of investment variation across different Brazilian states (Midwest North Northeast South and Southeast) brokendown by investorrsquos education (higher education or high school) Due to the existence of some large absolute values and to improvereadability we winsorize the investment variation distribution by 5 at each side

Complexity 7

with time fixed effects to purge out macroeconomic com-ponents as follows

Δyit αt + ϵit (1)

in which Δyit denotes the portfolio volume variation in thestock market of investor i at time t αt represents time fixedeffects and ϵit is the residual In this specification we in-terpret the residual ϵit as any variation of investor irsquosportfolio volume at time t that is not due to any timecommon factor such as the underlying macroeconomicscenario By using ϵit instead of Δyit we can effectively treatthe data as a large cross-sectional unit Hence we are able tofully use machine learning methods at their best setupwhich we discuss further

We choose an elastic net regression to estimate theimportance of each attribute in the model Such regressionoptimally combines L2-norm (Ridge) and L1-norm (Lasso)regularization +erefore we are able to prevent anyoverfitting in our empirical model Moreover we use aconvex combination of L1-norm which tends to shrinksthe majority of the nonrelevant regressors to zero and keepthe most important nonzero and L2-norm which tends tooutput nonzero and approximate coefficients for all thesimilar regressors By using both regularization schemeswe are able to enjoy the positive characteristics of bothschemes

To select the most important attributes we use the re-sidual ϵit the investment volume variation of investor i attime t not due to common time factors as dependentvariable and different IBOVESPA index time aggregationsand investorsrsquo biological and education characteristics asindependent variables as follows

ϵit βTmiddot Xit + errorit (2)

in which Xit is a vector composed of past IBOVESPAchanges with different windows (1- 2- 3- 5- 10- 20- and30-day IBOVESPA changes) and investorsrsquo characteristics(state of residency gender and level of schooling) +e termerrorit is the standard error According to the elastic netprocedure we select β that minimizes the following lossfunction L(β)

L(β) 1113944t1

T

1113944i1

N

ϵit minus 1113944j1

p

βjx(j)it

⎛⎝ ⎞⎠

2

+ λ (1 minus α)||β||222

+ α||β||211113890 1113891

(3)

in which t isin 1 N index times on a daily basis (fromJanuary 1 2016 to December 31 2018) and i index in-vestors +e term x

(j)it indexes the jth regressor of investor i

at time t +e operators ||||1 and ||||2 indicate L1-and L2-norms taken over the vector input

+e first expression in (3) denotes the traditional datafitting error (residuals) while the second represents theregularization term Parameter λ modulates the importanceof the traditional and regularization terms +e term αcontrols the convex mixture of L1 and L2 regularization +eregularization works by penalizing large β coefficients+erefore it shrinks the estimated coefficients and the

overall fit function becomes smoother over the datadistribution

In the elastic net regression a takes values in between 0and 1 We optimally tune a and λ using a nested cross-validation procedure with k 10 folds and 100 independentrepeats for statistical robustness [29 49] In this procedurewe use k minus 1 9 folds for training and the remaining fold fortesting +is procedure is cycled k times such that each foldappears exactly once for testing Such methodology enablesus to tune the regularization parameters while preventingoverfitting of the model We optimize a over the grid searchspace 0 005 010 1 and λ over 0 01 02 5 Asstandard practice we preprocess all regressors by applying aZ-score standardization over all the data points using pre-determined values extracted only from the training data (soas to prevent data leakage from the test set)

Figure 5 shows our results for the importance of differenttime aggregations of the IBOVESPA index in explaining in-vestorsrsquo behavior +e optimal regularization parameters wereλ 01 and α 035 We normalize the coefficients in termsof the most important attribute +e attribute ldquo1-day IBO-VESPA variationrdquo is the most powerful predictor forexplaining investorsrsquo behavior followed by ldquo2-day IBOVESPAvariationrdquo and ldquo5-day IBOVESPA variationrdquo +is suggeststhat investors prefer to base their investment decisions usingshort-term variations of the stock market index Even thoughmore prolonged periods of IBOVESPA index changes areimportantmdashsuch as 10- 20- and 30-day variationsmdashthey aremuch less important than the short-term variations In ad-dition we find that investorsrsquo gender and schooling level arealso important characteristics for explaining buy and selloperations over the Brazilian stock exchange market in theperiod from 2016 to 2018 We also observe that some regionalvariables are important such as Santa Catarina Rio de JaneiroDistrito FederalMinas Gerais and Parana and Satildeo Paulo+ismay suggest a different mass of investorsrsquo composition acrossdifferent states

Our feature selection procedure gives us an objective wayof identifying potentially important variables that should beaccounted for in our econometric exercise Such tool takentogether with the analystrsquos expertise to assess their validity interms of relationship with the analyzed measure is an im-portant step in producing econometric methods in a morereliable manner Our results point that we should control forinvestorsrsquo characteristics (gender and schooling level) andalso past IBOVESPA variations +e investorrsquos state is notimportant because we will employ a panel-data analysis withfixed effects at the investor level +erefore the investorrsquosstate is collinear with the investor fixed effect and would bedropped during the estimations

4 Econometric Analysis with SelectedVariables

In the previous section we have found that short-termvariations of the IBOVESPA index are better predictors forbuy or sell operations in the Brazilian stock exchange marketthan long-term variations+e feature selection procedure isa transparent way of choosing relevant variables in an ob-jective way However such method does not provide an

8 Complexity

answer as to whether each variable contributes positively ornegatively to the target variable ie the investment decisionof the investor (buy or sell) In this section we look at suchdirection and estimate the magnitude of the most relevantvariables found by our feature selection technique

In Section 41 we first test whether investment decisionsof Brazilian investors better fit to a mean-reversal or mo-mentum strategy For that we regress total investmentvariations of investors against past variations of the IBO-VESPA index For robustness we use 1- 2- 3- 5- and 30-day variations of the IBOVESPA index Our regressions areat the investor level which enables us to control for un-observed time-invariant characteristics of each Brazilianinvestor which would otherwise be impractical in case wehad aggregate data like most existing studies +erein wefind that the mean-reversal technique better explains buy orsell operations in the Brazilian stock market during theperiod from 2016 to 2018 Our results corroborate thefindings of our feature selection technique short-termvariations explain more buy or sell operations than long-term variations

In Sections 42 and 43 we study the determinants thateither soften or exacerbate the mean-reversal behavior ofBrazilian investors by looking at the role of gender and levelof schooling respectively of investors +ese exercisesconnect with the existing literature on the influence ofsocioeconomic and biological features in shaping the be-havior of economic agents

41 Do Investors Use a Mean-Reversion or a MomentumStrategy in 8eir Buy and Sell Operations To answer howinvestors respond to changes in the IBOVESPA index werun the following econometric specification

Δyit αi + αtlowast + βΔIBOVESPAt + ηit (4)

in which Δyit is the portfolio volume variation of investor iat time t +ere is a positive variation (Δyit gt 0) when in-vestor i buys more stocks at time t and a negative variation(Δyit lt 0) when she sells Alternatively when the investorholds her investment over time then (Δyit 0) +e factorηit is the standard error term

Our coefficient of interest is β which captures investorsrsquoresponses to variations of the IBOVESPA index denoted asΔIBOVESPAt We test whether investors use a mean-re-versal or momentum strategy as follows (we discard thehypothesis that investorsrsquo buy and sell decisions are un-related to variations of the IBOVESPA index because ourfeature selection technique identified past variations of theIBOVESPA index as the most relevant predictors of in-vestor-specific investment variations)

(i) If investors use amean-reversal strategy then increasesin the IBOVESPA indexmdashie ΔIBOVESPAt gt 0mdashare followed by sell operations in such a way that theinvestment volume of investors on average decreases(Δyit lt 0) +erefore a mean-reversal strategy istranslated by a negative β coefficient (βlt 0)

(ii) If investors use a momentum strategy then increasesin the IBOVESPA indexmdashie ΔIBOVESPAt gt 0mdashare followed by buy operations in such a way thatthe investment volume of investors on averageincreases (Δyit gt 0) +erefore a momentum strat-egy is translated by a positive β coefficient (βgt 0)

As there is persistence of the past IBOVESPA indexvariations by construction we test how investorsrsquo in-vestment volume respond to 1- 2- 3- 5- and 30-dayvariations of the IBOVESPA index in an independentmanner +is empirical design strategy prevents standarderrors to get overly inflated due to high pairwise correlationof these regressors

+e term αi represents investor fixed effects and ab-sorbs any nonobserved time-invariant characteristic ofeach investor in the sample +is mitigates potentialomitted variables that could bias our results such as in-vestorsrsquo skill which is hard to measure We should notethat any omitted variable that is time variant would not beabsorbed by the investor fixed effect +erefore while theintroduction of such fixed effect mitigates omitted variablebias it does not completely avoid it For instance if in-vestorsrsquo skill significantly increases over time then wewould have an omitted variable bias Since our panel spansa relatively small time periodmdash2016 to 2018mdashit is fair toassume that investorsrsquo skill remains roughly constant +e

Satildeo Paulo StateRio Grande do Sul State

20-day IBOVESPA variationParanaacute State

Minas Gerais StateDistrito Federal State

10-day IBOVESPA variationRio de Janeiro State

Santa Catarina State30-day IBOVESPA variation

Male investorHighly educated investor

3-day IBOVESPA variation5-day IBOVESPA variation2-day IBOVESPA variation1-day IBOVESPA variation

25 50 75 1000Importance ( normalized)

Figure 5 Feature selection results using an elastic net procedure with L2 and L1 regularization Coefficients are normalized in terms of themost important attribute (ldquo1-day IBOVESPA variationrdquo)

Complexity 9

term αtlowast connotes time-fixed effects at the year-monthlevel which absorbs any homogeneous time-variant effectsuch as the Brazilian recession or month-wise exchangerate fluctuations Since our panel frequency is on a dailybasis we cannot add a time fixed effect at the same fre-quency because our coefficient of interestmdashβmdashwould getabsorbed by the time fixed effects as it only varies acrosstime To prevent such problem we use a less granular timefixed effects namely month-year

Our data set contains 13247 investors in a large andrepresentative bank in Brazil and 610 time points Due to thisconfiguration we follow Petersen [50] and double-clusterstandard errors at the investor and time levels +is is arobust strategy that is important for panels with a largenumber of individuals and time points because it mitigatesheteroscedasticity and serial correlation Finally our data isin percentage terms

Table 2 reports our estimates of Regression (4) Weobserve that a 1 percent increase of the IBOVESPA indexassociates with an average decrease of 9693 of the investorportfolio volume when we look at the 1-day IBOVESPAvariation +e results remain with a statistically significantcoefficient across different lengths of past IBOVESPA var-iations (2- 3- and 5-day variations) except for 30-dayvariations in which the statistical significance vanishesMoreover the magnitude of the coefficient reduces as we useless recent past variations of the IBOVESPA index which isconsistent with the view that investors in our sample aremore concerned with short-term rather long-term variationsof the IBOVESPA index +e negative and statisticallysignificant sign corroborates the hypothesis that investorsuse mean-reverting trading strategies in which they tend to

sell after substantial upward changes of the IBOVESPAindex and tend to buy after downward changes

42 Does Gender Impact Investors Responsiveness to IBO-VESPA IndexChanges We have showed empirical evidencethat investorsrsquo strategy on average better fit to a mean-reverting behavior in the Brazilian stockmarket+at is theytend to sell after positive changes of the IBOVESPA indexand buy after negative changes In this section we askwhether the sensitiveness of investors to the IBOVESPAindex depends on their biological characteristics in specialtheir gender Biological factorsmdashespecially gendermdashhavebeen extensively explored in investment decision-makingNotable works relating biological factors including genderare Hira and Loibl [51] Lundeberg et al [34] Neyse et al[33] and Sunden and Surette [52] +is paper providesfurther evidence of the existence of such gender gap ininvestment decisions using a microdata on investor-matched buy and sell operations

In this line of research Neyse et al [33] and Lundeberget al [34] partly attribute behavioral differences among malesand females due to systematic changes in overconfidenceExcessive overconfidence is associated with higher levels oftestosterone which is more pronounced in males Over-confidence may induce investors to take on higher risksleading them to look for higher returns in the short term Inthis way we would expect that females be less sensitive tochanges of past IBOVESPA variations as they would valuemore fundamentals and look for yields in the longer term+erefore short-term variations of the IBOVESPA indiceswould explain less their buy or sell operations comparatively

Table 2 Output from Regression (4) We ask how investors respond to changes in the IBOVESPA index We only use changes rather thanpast averages because the former has greater prediction power as reported by our feature selection procedure +e dependent variable is thevariation of portfolio investment volume of investor i at time t in the Brazilian stock market from the beginning of 2016 to the end of 2018Regressors are 1- (1) 2- (2) 3- (3) 5- (4) and 30-day (5) IBOVESPA index variations +e panel is on a daily frequency basis FollowingPetersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010 lowastlowastplt 005 and lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔIBOVESPAt with1-day variation minus 9693lowastlowastlowast

(1580)2-day variation minus 4656lowastlowastlowast

(1160)3-day variation minus 2400lowastlowast

(0964)5-day variation minus 2265lowastlowastlowast

(0852)30-day variation 0058

(0680)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0037 0036 0036 0035 0033Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

10 Complexity

to males To empirically answer this question we constructthe following econometric specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Femalei + ηit(5)

in which Femalei is a dummy variable that takes the value of1 when investor i is female and 0 otherwise We do not addthe investorrsquos gender alone in (5) because it would beabsorbed by the investor fixed effects αi Our coefficient ofinterest is β2 which captures any behavioral deviation offemales to changes of the IBOVESPA index with respect tothe average of the entire sample (male and female) If β2 gt 0then the mean-reversal strategy is less pronounced to fe-males while β2 lt 0 indicates a more accentuated behaviortowards the mean-reversal strategy In the case β2 0 thenfemales and males respond on average equivalently tochanges of the IBOVESPA index Following the discussionon overconfidence and its influence on short-term decisionsover males and females our hypothesis is that β2 gt 0

Table 3 reports our estimates of Regression (5) Ourprevious results relating the mean-reversal strategy of in-vestors in the Brazilian stock market remain the same Weobserve that the interaction of changes in the IBOVESPAindex and the dummy female is positive and statisticallysignificant +is empirical finding corroborates the view thatfemale investors have a less pronounced mean-reversalstrategy than males as they look at longer-term returns andare less attentive to short-term variations of the IBOVESPAindex which could arise due to noisy information Forinstance looking at Specification (1) a 1 percent positivechange in the IBOVESPA index associates with a decrease ofminus 10345 + 6543 minus 3802 of the invested volume of femaleinvestors In contrast the entire sample (males and females)decreases their portfolio volume on average by minus 10345for a 1 percent positive change in the IBOVESPA indexInterestingly even though statistically insignificant 30-dayvariations of the IBOVESPA index are positively associatedwith investment volumes for females suggesting traits of amomentum strategy +is is also suggestive evidence that

Table 3 Output from Regression (5)We ask whether female investors have different sensitiveness with respect to their investment portfolioto IBOVESPA index changes We only use changes rather than past averages because the former has greater prediction power as reported byour feature selection procedure+e dependent variable is the variation of portfolio investment volume of investor i at time t in the Brazilianstock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3) 5- (4) and 30-day (5) IBOVESPA indexvariations as well as their interaction with the investorrsquos gender+e panel is on a daily frequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010 lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔIBOVESPAt with1-day variation minus 10345lowastlowastlowast

(1754)2-day variation minus 5019lowastlowastlowast

(1275)3-day variation minus 2650lowastlowast

(1055)5-day variation minus 2315lowastlowastlowast

(0886)30-day variation minus 0001

(0693)Interactions of ΔIBOVESPAt with gender1-day variation middot Female 6543lowastlowastlowast

(2392)2-day variation middot Female 3708lowastlowastlowast

(1264)3-day variation middot Female 2585lowastlowast

(1212)5-day variation middot Female 0517

(1538)30-day variation middot Female 0572

(0831)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0039 0037 0036 0035 0034Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

Complexity 11

females tend to look at longer horizons when taking in-vestment decisions

43 Does Formal Education Impact Investors Responsivenessto IBOVESPA Index Changes In this section we look at howformal education (academic degree or level of schooling) caninfluence investorsrsquo sensitiveness to IBOVESPA index varia-tions +ere are several works in the behavioral finance lit-erature that attempt to establish a connection between level ofschooling and investorsrsquo awareness of stock markets and theirdecision-making determinants We highlight the researchstudies of Grinblatt et al [53] and Guiso and Jappelli [54] Intheory educated investors should behave in more rationalways and trade less frequently when there is no new relevantinformation arriving in the market but noises +erefore wewould expect these investors to have a smaller reaction to pricefluctuations as they are able to better identify informationfrom noise To empirically test this behavior we run thefollowing specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Higher Educationi + ηit(6)

in which Higher Educationi is a dummy variable that takesthe value of 1 when investor i has a higher education (at leastcollege degree) and 0 otherwise (high school or a lowerdegree) Our coefficient of interest is β2 which captures anybehavioral deviation of investors with higher formal edu-cation to changes of the IBOVESPA index with respect to theaverage of the entire sample +e hypothesis is that β2 lt 0 inwhich more educated investors tend to better discern in-formation from noise out of variations of the IBOVESPAindex and therefore the mean-reversal strategy would be lesspronounced

Table 4 reports our estimates of Regression (6) Onaverage the mean-reversal strategy remains We note thatthe interaction of changes of the IBOVESPA index and thedummy higher education is positive and statistically sig-nificant +is suggests that investors with higher academic

Table 4 +is table reports output from Regression (6) We ask whether investors with higher academic degree have different sensitivenesswith respect to their investment portfolio to IBOVESPA index changes We only use changes rather than past averages because the formerhas greater prediction power as reported by our feature selection procedure +e dependent variable is the variation of portfolio investmentvolume of investor i at time t in the Brazilian stock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3)5- (4) and 30-day (5) IBOVESPA index variations as well as their interaction with the investorrsquos academic degree +e panel is on a dailyfrequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔΔIBOVESPAt with1-day variation minus 10347lowastlowastlowast

(1795)2-day variation minus 5136lowastlowastlowast

(1299)3-day variation minus 2750lowastlowast

(1076)5-day variation minus 2565lowastlowastlowast

(0931)30-day variation 0085

(0695)Interactions of ΔIBOVESPAt with academic degree1-day variation middot Higher education 5347lowastlowastlowast

(1520)2-day variation middot Higher education 3915lowastlowastlowast

(1040)3-day variation middot Higher education 2864lowast

(1573)5-day variation middot Higher education 2471lowast

(1398)30-day variation middot Higher education minus 0237

(0647)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0038 0036 0036 0035 0035Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

12 Complexity

degree have a less pronounced mean-reversal strategy thanless educated investors which corroborates our hypothesisLooking at Specification (3) we observe a positive thoughmarginally significant relationship between IBOVESPAchanges and investment volume (minus 2750 + 2864 0114) formore educated investors suggesting traits of a momentumstrategy

5 Conclusions

We employ machine learning techniques together witheconometrics techniques to model investor behavior using aunique dataset for investors that focus on stock marketinvestments We propose a methodological approach to linkmachine learning methods widely used in computer scienceto standard econometric techniques commonly employed insocial sciences

Using the unique data set with high-frequency dailyinvestment decision of a broad set of investors in Brazil weprovide evidence that investors look at past performance of abenchmark stock index in order to decide their investmentdecisions Investors seem to prefer mean-reverting strategiesin the short-run rather than momentum +is may be as-sociated with the disposition effect - investors prefer to sellthe winners and buy the losers [55 56] Furthermore re-search could exploit alternative explanations for thisbehavior

In addition we study the determinants that either softenor exacerbate the mean-reversal behavior of Brazilian in-vestors by looking at the role of gender and level ofschoolingWe find that females andmore educated investorsare less sensitive to changes of past IBOVESPA variationswhich is consistent with the literature on behavioral finance

+is paper highlights the importance of using non-traditional methods in econometric analysis +e use ofmachine learning methods permits us to automate the oftensubjective process of choosing which variables are importantin any econometric analysis By using a feature selectionschememdashsuch as the elastic net in this papermdashwe are able toidentify those attributes that best describe how investorsdecide to buy or sell their positions in an objective andstatistically correct manner In addition to that the businessspecialist can always assess these variables pointed out asmost important to analyze their economic meaning

Data Availability

+e data is confidential

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+iago C Silva (grant no 4085462018-2) and Benjamin MTabak (grant no 3105412018-2 4251232018-9) gratefullyacknowledge financial support from the CNPq foundation

References

[1] H Zou and T Hastie ldquoRegularization and variable selectionvia the elastic netrdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 67 no 2 pp 301ndash3202005

[2] E F Fama and K R French ldquoPermanent and temporarycomponents of stock pricesrdquo Journal of Political Economyvol 96 no 2 pp 246ndash273 1988

[3] A W Lo and A C MacKinlay ldquoStock market prices do notfollow random walks evidence from a simple specificationtestrdquo Review of Financial Studies vol 1 no 1 pp 41ndash66 1988

[4] J M Poterba and L H Summers ldquoMean reversion in stockpricesrdquo Journal of Financial Economics vol 22 no 1pp 27ndash59 1988

[5] J Conrad and G Kaul ldquoAn anatomy of trading strategiesrdquoReview of Financial Studies vol 11 no 3 pp 489ndash519 1998

[6] C Engel and J D Hamilton ldquoLong swings in the dollar arethey in the data and do markets know itrdquo 8e AmericanEconomic Review vol 80 pp 689ndash713 1990

[7] N Jegadeesh and S Titman ldquoMomentumrdquo Annual Review ofFinancial Economics vol 3 no 1 pp 493ndash509 2011

[8] R S J Koijen J C Rodrıguez and A Sbuelz ldquoMomentumand mean reversion in strategic asset allocationrdquo Manage-ment Science vol 55 no 7 pp 1199ndash1213 2009

[9] M Morrin J Jacoby G V Johar X He A Kuss andD Mazursky ldquoTaking stock of stockbrokers exploring mo-mentum versus contrarian investor strategies and profilesrdquoJournal of Consumer Research vol 29 no 2 pp 188ndash198 2002

[10] J Okunev andDWhite ldquoDomomentum-based strategies stillwork in foreign currency marketsrdquo 8e Journal of Financialand Quantitative Analysis vol 38 no 2 pp 425ndash447 2003

[11] D Schiereck W De Bondt and M Weber ldquoContrarian andmomentum strategies in Germanyrdquo Financial AnalystsJournal vol 55 no 6 pp 104ndash116 1999

[12] D O Cajueiro and B M Tabak ldquoTesting for predictability inequity returns for European transition marketsrdquo EconomicSystems vol 30 no 1 pp 56ndash78 2006

[13] D O Cajueiro and B M Tabak ldquoTesting for time-varyinglong-range dependence in real state equity returnsrdquo ChaosSolitons amp Fractals vol 38 no 1 pp 293ndash307 2008

[14] E J Chang E J A Lima and B M Tabak ldquoTesting forpredictability in emerging equity marketsrdquo Emerging MarketsReview vol 5 no 3 pp 295ndash316 2004

[15] A Sensoy K Ozturk E Hacihasanoglu and B M TabakldquoNot all emerging markets are the same a classification ap-proach with correlation based networksrdquo Journal of FinancialStability vol 33 pp 163ndash186 2017

[16] B M Tabak and E J A Lima ldquoMarket efficiency of Brazilianexchange rate evidence from variance ratio statistics andtechnical trading rulesrdquo European Journal of OperationalResearch vol 194 no 3 pp 814ndash820 2009

[17] C M Boya ldquoFrom efficient markets to adaptive marketsevidence from the French stock exchangerdquo Research in In-ternational Business and Finance vol 49 pp 156ndash165 2019

[18] R Ding and P Cheng ldquoSpeculative trading price pressureand overvaluationrdquo Journal of International Financial Mar-kets Institutions and Money vol 21 no 3 pp 419ndash442 2011

[19] E Lee and N Piqueira ldquoBehavioral biases of informed tradersevidence from insider trading on the 52-week highrdquo Journal ofEmpirical Finance vol 52 pp 56ndash75 2019

[20] T-Y Pak and P Babiarz ldquoDoes cognitive aging affect port-folio choicerdquo Journal of Economic Psychology vol 66pp 1ndash12 2018

Complexity 13

[21] T Suzuki and Y Ohkura ldquoFinancial technical indicator basedon chaotic bagging predictors for adaptive stock selection inJapanese and American marketsrdquo Physica A Statistical Me-chanics and its Applications vol 442 pp 50ndash66 2016

[22] A Urquhart and F McGroarty ldquoAre stock markets reallyefficient evidence of the adaptive market hypothesisrdquo In-ternational Review of Financial Analysis vol 47 pp 39ndash492016

[23] X Xiong Y Meng X Li and D Shen ldquoAn empirical analysisof the adaptive market hypothesis with calendar effects ev-idence from Chinardquo Finance Research Letters vol 31 2019

[24] H Takahashi and T Terano ldquoAnalyzing the influence ofoverconfident investors on financial markets through agent-based modelrdquo in Intelligent Data Engineering and AutomatedLearning-IDEAL 2007 H Yin P Tino E CorchadoW Byrne and X Yao Eds Springer Berlin HeidelbergGermany 2007

[25] B LeBaron ldquoEmpirical regularities from interacting long- andshort-memory investors in an agent-based stock marketrdquoIEEE Transactions on Evolutionary Computation vol 5 no 5pp 442ndash455 2001

[26] M A Bertella F R Pires L Feng and H E StanleyldquoConfidence and the stock market an agent-based approachrdquoPLoS One vol 9 no 1 Article ID e83488 2014

[27] H R Varian ldquoBig data new tricks for econometricsrdquo Journalof Economic Perspectives vol 28 no 2 pp 3ndash28 2014

[28] G James D Witten T Hastie and R Tibshirani An In-troduction to Statistical Learning With Applications in RSpringer Publishing Company Incorporated Switzerland2014

[29] T C Silva and L Zhao Machine Learning in ComplexNetworks Springer Publishing Company IncorporatedSwitzerland 1st edition 2016

[30] T Hastie R Tibshirani and J Friedman 8e Elements ofStatistical Learning Data Mining Inference and PredictionSpringer Berlin Heidelberg Germany 2nd edition 2009

[31] K J Arrow and G Debreu ldquoExistence of an equilibrium for acompetitive economyrdquo Econometrica vol 22 no 3pp 265ndash290 1954

[32] H A Simon Models of Man Social and Rational- Mathe-matical Essays on Rational Human Behavior in a Social Set-ting Wiley Hoboken NJ USA 1957

[33] L Neyse S Bosworth P Ring and U Schmidt ldquoOver-confidence incentives and digit ratiordquo Scientific Reportsvol 6 no 1 2016

[34] M A Lundeberg P W Fox and J Pun coha ldquoHighlyconfident but wrong gender differences and similarities inconfidence judgmentsrdquo Journal of Educational Psychologyvol 86 no 1 pp 114ndash121 1994

[35] O Onishchenko and N Ulku ldquoForeign investor tradingbehavior has evolvedrdquo Journal of Multinational FinancialManagement vol 51 pp 98ndash115 2019

[36] M Abreu ldquoHow biased is the behavior of the individualinvestor in warrantsrdquo Research in International Business andFinance vol 47 pp 139ndash149 2019

[37] J-C Li Y-X Li N-S Tang and D-C Mei ldquo+e roles ofmean residence time on herd behavior in a financial marketrdquoPhysica A Statistical Mechanics and Its Applications vol 462pp 350ndash357 2016

[38] C Liu and X Li ldquoMedia coverage and investor scare behaviordiffusionrdquo Physica A Statistical Mechanics and Its Applica-tions vol 527 p 121398 2019

[39] KW Park S H Jeong and J Y J Oh ldquoForeigners at the gateforeign investor trading and the disposition effect of domestic

individual investorsrdquo 8e North American Journal of Eco-nomics and Finance vol 49 pp 165ndash180 2019

[40] Y Shi Y-r Tang and W Long ldquoSentiment contagionanalysis of interacting investors evidence from Chinarsquos stockforumrdquo Physica A Statistical Mechanics and its Applicationsvol 523 pp 246ndash259 2019

[41] J R Wei J P Huang and P M Hui ldquoAn agent-based modelof stock markets incorporating momentum investorsrdquoPhysica A Statistical Mechanics and its Applications vol 392no 12 pp 2728ndash2735 2013

[42] R J Balvers and Y Wu ldquoMomentum and mean reversionacross national equity marketsrdquo Journal of Empirical Financevol 13 no 1 pp 24ndash48 2006

[43] D B Chaves and V Viswanathan ldquoMomentum and mean-reversion in commodity spot and futures marketsrdquo Journal ofCommodity Markets vol 3 no 1 pp 39ndash53 2016

[44] B M Barber and T Odean ldquoBoys will be boys genderoverconfidence and common stock investmentrdquo 8e Quar-terly Journal of Economics vol 116 no 1 pp 261ndash292 2001

[45] J I PeNtildea ldquoDaily seasonalities and stock market reforms inSpainrdquo Applied Financial Economics vol 5 no 6 pp 419ndash423 1995

[46] T C Silva and L Liang Zhao ldquoNetwork-based high level dataclassificationrdquo IEEE Transactions on Neural Networks andLearning Systems vol 23 no 6 pp 954ndash970 2012

[47] T C Silva and L Liang Zhao ldquoNetwork-based stochasticsemisupervised learningrdquo IEEE Transactions on Neural Net-works and Learning Systems vol 23 no 3 pp 451ndash466 2012

[48] T C Silva and L Liang Zhao ldquoStochastic competitive learningin complex networksrdquo IEEE Transactions on Neural Networksand Learning Systems vol 23 no 3 pp 385ndash398 2012

[49] C M Bishop Pattern Recognition and Machine Learning(Information Science and Statistics) Springer-Verlag BerlinHeidelberg Germany 2006

[50] M A Petersen ldquoEstimating standard errors in finance paneldata sets comparing approachesrdquo Review of Financial Studiesvol 22 no 1 pp 435ndash480 2009

[51] T K Hira and C Loibl ldquoGender differences in investmentbehaviorrdquo in Handbook of Consumer Finance ResearchJ J Xiao Ed Springer New York NY USA 2008

[52] A E Sunden and B J Surette ldquoGender differences in theallocation of assets in retirement savings plansrdquo8eAmericanEconomic Review vol 88 pp 207ndash211 1998

[53] M Grinblatt M Keloharju and J Linnainmaa ldquoIQ and stockmarket participationrdquo 8e Journal of Finance vol 66 no 6pp 2121ndash2164 2011

[54] L Guiso and T Jappelli ldquoAwareness and stock market par-ticipationrdquo Review of Finance vol 9 no 4 pp 537ndash567 2005

[55] N Barberis andW Xiong ldquoWhat drives the disposition effectan analysis of a long-standing preference-based explanationrdquo8e Journal of Finance vol 64 no 2 pp 751ndash784 2009

[56] H Shefrin and M Statman ldquo+e disposition to sell winnerstoo early and ride losers too long theory and evidencerdquo 8eJournal of Finance vol 40 no 3 pp 777ndash790 1985

14 Complexity

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 7: Modeling Investor Behavior Using Machine Learning: Mean … · 2019. 12. 26. · Research Article Modeling Investor Behavior Using Machine Learning: Mean-Reversion and Momentum Trading

Midwest

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

0 25 50minus25Investment variation ()

North

0 25 50minus25Investment variation ()

Northeast

0 25 50minus25Investment variation ()

Southeast

0 25 50minus25Investment variation ()

South

2016minus03

2016minus06

2016minus09

2016minus12

2017minus09

2017minus12

2018minus03

2018minus06

2018minus09

2018minus12

Tim

e

TypeHigh schoolHigher education

0 25 50minus25Investment variation ()

Figure 4 Distribution of investment variation across different Brazilian states (Midwest North Northeast South and Southeast) brokendown by investorrsquos education (higher education or high school) Due to the existence of some large absolute values and to improvereadability we winsorize the investment variation distribution by 5 at each side

Complexity 7

with time fixed effects to purge out macroeconomic com-ponents as follows

Δyit αt + ϵit (1)

in which Δyit denotes the portfolio volume variation in thestock market of investor i at time t αt represents time fixedeffects and ϵit is the residual In this specification we in-terpret the residual ϵit as any variation of investor irsquosportfolio volume at time t that is not due to any timecommon factor such as the underlying macroeconomicscenario By using ϵit instead of Δyit we can effectively treatthe data as a large cross-sectional unit Hence we are able tofully use machine learning methods at their best setupwhich we discuss further

We choose an elastic net regression to estimate theimportance of each attribute in the model Such regressionoptimally combines L2-norm (Ridge) and L1-norm (Lasso)regularization +erefore we are able to prevent anyoverfitting in our empirical model Moreover we use aconvex combination of L1-norm which tends to shrinksthe majority of the nonrelevant regressors to zero and keepthe most important nonzero and L2-norm which tends tooutput nonzero and approximate coefficients for all thesimilar regressors By using both regularization schemeswe are able to enjoy the positive characteristics of bothschemes

To select the most important attributes we use the re-sidual ϵit the investment volume variation of investor i attime t not due to common time factors as dependentvariable and different IBOVESPA index time aggregationsand investorsrsquo biological and education characteristics asindependent variables as follows

ϵit βTmiddot Xit + errorit (2)

in which Xit is a vector composed of past IBOVESPAchanges with different windows (1- 2- 3- 5- 10- 20- and30-day IBOVESPA changes) and investorsrsquo characteristics(state of residency gender and level of schooling) +e termerrorit is the standard error According to the elastic netprocedure we select β that minimizes the following lossfunction L(β)

L(β) 1113944t1

T

1113944i1

N

ϵit minus 1113944j1

p

βjx(j)it

⎛⎝ ⎞⎠

2

+ λ (1 minus α)||β||222

+ α||β||211113890 1113891

(3)

in which t isin 1 N index times on a daily basis (fromJanuary 1 2016 to December 31 2018) and i index in-vestors +e term x

(j)it indexes the jth regressor of investor i

at time t +e operators ||||1 and ||||2 indicate L1-and L2-norms taken over the vector input

+e first expression in (3) denotes the traditional datafitting error (residuals) while the second represents theregularization term Parameter λ modulates the importanceof the traditional and regularization terms +e term αcontrols the convex mixture of L1 and L2 regularization +eregularization works by penalizing large β coefficients+erefore it shrinks the estimated coefficients and the

overall fit function becomes smoother over the datadistribution

In the elastic net regression a takes values in between 0and 1 We optimally tune a and λ using a nested cross-validation procedure with k 10 folds and 100 independentrepeats for statistical robustness [29 49] In this procedurewe use k minus 1 9 folds for training and the remaining fold fortesting +is procedure is cycled k times such that each foldappears exactly once for testing Such methodology enablesus to tune the regularization parameters while preventingoverfitting of the model We optimize a over the grid searchspace 0 005 010 1 and λ over 0 01 02 5 Asstandard practice we preprocess all regressors by applying aZ-score standardization over all the data points using pre-determined values extracted only from the training data (soas to prevent data leakage from the test set)

Figure 5 shows our results for the importance of differenttime aggregations of the IBOVESPA index in explaining in-vestorsrsquo behavior +e optimal regularization parameters wereλ 01 and α 035 We normalize the coefficients in termsof the most important attribute +e attribute ldquo1-day IBO-VESPA variationrdquo is the most powerful predictor forexplaining investorsrsquo behavior followed by ldquo2-day IBOVESPAvariationrdquo and ldquo5-day IBOVESPA variationrdquo +is suggeststhat investors prefer to base their investment decisions usingshort-term variations of the stock market index Even thoughmore prolonged periods of IBOVESPA index changes areimportantmdashsuch as 10- 20- and 30-day variationsmdashthey aremuch less important than the short-term variations In ad-dition we find that investorsrsquo gender and schooling level arealso important characteristics for explaining buy and selloperations over the Brazilian stock exchange market in theperiod from 2016 to 2018 We also observe that some regionalvariables are important such as Santa Catarina Rio de JaneiroDistrito FederalMinas Gerais and Parana and Satildeo Paulo+ismay suggest a different mass of investorsrsquo composition acrossdifferent states

Our feature selection procedure gives us an objective wayof identifying potentially important variables that should beaccounted for in our econometric exercise Such tool takentogether with the analystrsquos expertise to assess their validity interms of relationship with the analyzed measure is an im-portant step in producing econometric methods in a morereliable manner Our results point that we should control forinvestorsrsquo characteristics (gender and schooling level) andalso past IBOVESPA variations +e investorrsquos state is notimportant because we will employ a panel-data analysis withfixed effects at the investor level +erefore the investorrsquosstate is collinear with the investor fixed effect and would bedropped during the estimations

4 Econometric Analysis with SelectedVariables

In the previous section we have found that short-termvariations of the IBOVESPA index are better predictors forbuy or sell operations in the Brazilian stock exchange marketthan long-term variations+e feature selection procedure isa transparent way of choosing relevant variables in an ob-jective way However such method does not provide an

8 Complexity

answer as to whether each variable contributes positively ornegatively to the target variable ie the investment decisionof the investor (buy or sell) In this section we look at suchdirection and estimate the magnitude of the most relevantvariables found by our feature selection technique

In Section 41 we first test whether investment decisionsof Brazilian investors better fit to a mean-reversal or mo-mentum strategy For that we regress total investmentvariations of investors against past variations of the IBO-VESPA index For robustness we use 1- 2- 3- 5- and 30-day variations of the IBOVESPA index Our regressions areat the investor level which enables us to control for un-observed time-invariant characteristics of each Brazilianinvestor which would otherwise be impractical in case wehad aggregate data like most existing studies +erein wefind that the mean-reversal technique better explains buy orsell operations in the Brazilian stock market during theperiod from 2016 to 2018 Our results corroborate thefindings of our feature selection technique short-termvariations explain more buy or sell operations than long-term variations

In Sections 42 and 43 we study the determinants thateither soften or exacerbate the mean-reversal behavior ofBrazilian investors by looking at the role of gender and levelof schooling respectively of investors +ese exercisesconnect with the existing literature on the influence ofsocioeconomic and biological features in shaping the be-havior of economic agents

41 Do Investors Use a Mean-Reversion or a MomentumStrategy in 8eir Buy and Sell Operations To answer howinvestors respond to changes in the IBOVESPA index werun the following econometric specification

Δyit αi + αtlowast + βΔIBOVESPAt + ηit (4)

in which Δyit is the portfolio volume variation of investor iat time t +ere is a positive variation (Δyit gt 0) when in-vestor i buys more stocks at time t and a negative variation(Δyit lt 0) when she sells Alternatively when the investorholds her investment over time then (Δyit 0) +e factorηit is the standard error term

Our coefficient of interest is β which captures investorsrsquoresponses to variations of the IBOVESPA index denoted asΔIBOVESPAt We test whether investors use a mean-re-versal or momentum strategy as follows (we discard thehypothesis that investorsrsquo buy and sell decisions are un-related to variations of the IBOVESPA index because ourfeature selection technique identified past variations of theIBOVESPA index as the most relevant predictors of in-vestor-specific investment variations)

(i) If investors use amean-reversal strategy then increasesin the IBOVESPA indexmdashie ΔIBOVESPAt gt 0mdashare followed by sell operations in such a way that theinvestment volume of investors on average decreases(Δyit lt 0) +erefore a mean-reversal strategy istranslated by a negative β coefficient (βlt 0)

(ii) If investors use a momentum strategy then increasesin the IBOVESPA indexmdashie ΔIBOVESPAt gt 0mdashare followed by buy operations in such a way thatthe investment volume of investors on averageincreases (Δyit gt 0) +erefore a momentum strat-egy is translated by a positive β coefficient (βgt 0)

As there is persistence of the past IBOVESPA indexvariations by construction we test how investorsrsquo in-vestment volume respond to 1- 2- 3- 5- and 30-dayvariations of the IBOVESPA index in an independentmanner +is empirical design strategy prevents standarderrors to get overly inflated due to high pairwise correlationof these regressors

+e term αi represents investor fixed effects and ab-sorbs any nonobserved time-invariant characteristic ofeach investor in the sample +is mitigates potentialomitted variables that could bias our results such as in-vestorsrsquo skill which is hard to measure We should notethat any omitted variable that is time variant would not beabsorbed by the investor fixed effect +erefore while theintroduction of such fixed effect mitigates omitted variablebias it does not completely avoid it For instance if in-vestorsrsquo skill significantly increases over time then wewould have an omitted variable bias Since our panel spansa relatively small time periodmdash2016 to 2018mdashit is fair toassume that investorsrsquo skill remains roughly constant +e

Satildeo Paulo StateRio Grande do Sul State

20-day IBOVESPA variationParanaacute State

Minas Gerais StateDistrito Federal State

10-day IBOVESPA variationRio de Janeiro State

Santa Catarina State30-day IBOVESPA variation

Male investorHighly educated investor

3-day IBOVESPA variation5-day IBOVESPA variation2-day IBOVESPA variation1-day IBOVESPA variation

25 50 75 1000Importance ( normalized)

Figure 5 Feature selection results using an elastic net procedure with L2 and L1 regularization Coefficients are normalized in terms of themost important attribute (ldquo1-day IBOVESPA variationrdquo)

Complexity 9

term αtlowast connotes time-fixed effects at the year-monthlevel which absorbs any homogeneous time-variant effectsuch as the Brazilian recession or month-wise exchangerate fluctuations Since our panel frequency is on a dailybasis we cannot add a time fixed effect at the same fre-quency because our coefficient of interestmdashβmdashwould getabsorbed by the time fixed effects as it only varies acrosstime To prevent such problem we use a less granular timefixed effects namely month-year

Our data set contains 13247 investors in a large andrepresentative bank in Brazil and 610 time points Due to thisconfiguration we follow Petersen [50] and double-clusterstandard errors at the investor and time levels +is is arobust strategy that is important for panels with a largenumber of individuals and time points because it mitigatesheteroscedasticity and serial correlation Finally our data isin percentage terms

Table 2 reports our estimates of Regression (4) Weobserve that a 1 percent increase of the IBOVESPA indexassociates with an average decrease of 9693 of the investorportfolio volume when we look at the 1-day IBOVESPAvariation +e results remain with a statistically significantcoefficient across different lengths of past IBOVESPA var-iations (2- 3- and 5-day variations) except for 30-dayvariations in which the statistical significance vanishesMoreover the magnitude of the coefficient reduces as we useless recent past variations of the IBOVESPA index which isconsistent with the view that investors in our sample aremore concerned with short-term rather long-term variationsof the IBOVESPA index +e negative and statisticallysignificant sign corroborates the hypothesis that investorsuse mean-reverting trading strategies in which they tend to

sell after substantial upward changes of the IBOVESPAindex and tend to buy after downward changes

42 Does Gender Impact Investors Responsiveness to IBO-VESPA IndexChanges We have showed empirical evidencethat investorsrsquo strategy on average better fit to a mean-reverting behavior in the Brazilian stockmarket+at is theytend to sell after positive changes of the IBOVESPA indexand buy after negative changes In this section we askwhether the sensitiveness of investors to the IBOVESPAindex depends on their biological characteristics in specialtheir gender Biological factorsmdashespecially gendermdashhavebeen extensively explored in investment decision-makingNotable works relating biological factors including genderare Hira and Loibl [51] Lundeberg et al [34] Neyse et al[33] and Sunden and Surette [52] +is paper providesfurther evidence of the existence of such gender gap ininvestment decisions using a microdata on investor-matched buy and sell operations

In this line of research Neyse et al [33] and Lundeberget al [34] partly attribute behavioral differences among malesand females due to systematic changes in overconfidenceExcessive overconfidence is associated with higher levels oftestosterone which is more pronounced in males Over-confidence may induce investors to take on higher risksleading them to look for higher returns in the short term Inthis way we would expect that females be less sensitive tochanges of past IBOVESPA variations as they would valuemore fundamentals and look for yields in the longer term+erefore short-term variations of the IBOVESPA indiceswould explain less their buy or sell operations comparatively

Table 2 Output from Regression (4) We ask how investors respond to changes in the IBOVESPA index We only use changes rather thanpast averages because the former has greater prediction power as reported by our feature selection procedure +e dependent variable is thevariation of portfolio investment volume of investor i at time t in the Brazilian stock market from the beginning of 2016 to the end of 2018Regressors are 1- (1) 2- (2) 3- (3) 5- (4) and 30-day (5) IBOVESPA index variations +e panel is on a daily frequency basis FollowingPetersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010 lowastlowastplt 005 and lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔIBOVESPAt with1-day variation minus 9693lowastlowastlowast

(1580)2-day variation minus 4656lowastlowastlowast

(1160)3-day variation minus 2400lowastlowast

(0964)5-day variation minus 2265lowastlowastlowast

(0852)30-day variation 0058

(0680)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0037 0036 0036 0035 0033Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

10 Complexity

to males To empirically answer this question we constructthe following econometric specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Femalei + ηit(5)

in which Femalei is a dummy variable that takes the value of1 when investor i is female and 0 otherwise We do not addthe investorrsquos gender alone in (5) because it would beabsorbed by the investor fixed effects αi Our coefficient ofinterest is β2 which captures any behavioral deviation offemales to changes of the IBOVESPA index with respect tothe average of the entire sample (male and female) If β2 gt 0then the mean-reversal strategy is less pronounced to fe-males while β2 lt 0 indicates a more accentuated behaviortowards the mean-reversal strategy In the case β2 0 thenfemales and males respond on average equivalently tochanges of the IBOVESPA index Following the discussionon overconfidence and its influence on short-term decisionsover males and females our hypothesis is that β2 gt 0

Table 3 reports our estimates of Regression (5) Ourprevious results relating the mean-reversal strategy of in-vestors in the Brazilian stock market remain the same Weobserve that the interaction of changes in the IBOVESPAindex and the dummy female is positive and statisticallysignificant +is empirical finding corroborates the view thatfemale investors have a less pronounced mean-reversalstrategy than males as they look at longer-term returns andare less attentive to short-term variations of the IBOVESPAindex which could arise due to noisy information Forinstance looking at Specification (1) a 1 percent positivechange in the IBOVESPA index associates with a decrease ofminus 10345 + 6543 minus 3802 of the invested volume of femaleinvestors In contrast the entire sample (males and females)decreases their portfolio volume on average by minus 10345for a 1 percent positive change in the IBOVESPA indexInterestingly even though statistically insignificant 30-dayvariations of the IBOVESPA index are positively associatedwith investment volumes for females suggesting traits of amomentum strategy +is is also suggestive evidence that

Table 3 Output from Regression (5)We ask whether female investors have different sensitiveness with respect to their investment portfolioto IBOVESPA index changes We only use changes rather than past averages because the former has greater prediction power as reported byour feature selection procedure+e dependent variable is the variation of portfolio investment volume of investor i at time t in the Brazilianstock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3) 5- (4) and 30-day (5) IBOVESPA indexvariations as well as their interaction with the investorrsquos gender+e panel is on a daily frequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010 lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔIBOVESPAt with1-day variation minus 10345lowastlowastlowast

(1754)2-day variation minus 5019lowastlowastlowast

(1275)3-day variation minus 2650lowastlowast

(1055)5-day variation minus 2315lowastlowastlowast

(0886)30-day variation minus 0001

(0693)Interactions of ΔIBOVESPAt with gender1-day variation middot Female 6543lowastlowastlowast

(2392)2-day variation middot Female 3708lowastlowastlowast

(1264)3-day variation middot Female 2585lowastlowast

(1212)5-day variation middot Female 0517

(1538)30-day variation middot Female 0572

(0831)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0039 0037 0036 0035 0034Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

Complexity 11

females tend to look at longer horizons when taking in-vestment decisions

43 Does Formal Education Impact Investors Responsivenessto IBOVESPA Index Changes In this section we look at howformal education (academic degree or level of schooling) caninfluence investorsrsquo sensitiveness to IBOVESPA index varia-tions +ere are several works in the behavioral finance lit-erature that attempt to establish a connection between level ofschooling and investorsrsquo awareness of stock markets and theirdecision-making determinants We highlight the researchstudies of Grinblatt et al [53] and Guiso and Jappelli [54] Intheory educated investors should behave in more rationalways and trade less frequently when there is no new relevantinformation arriving in the market but noises +erefore wewould expect these investors to have a smaller reaction to pricefluctuations as they are able to better identify informationfrom noise To empirically test this behavior we run thefollowing specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Higher Educationi + ηit(6)

in which Higher Educationi is a dummy variable that takesthe value of 1 when investor i has a higher education (at leastcollege degree) and 0 otherwise (high school or a lowerdegree) Our coefficient of interest is β2 which captures anybehavioral deviation of investors with higher formal edu-cation to changes of the IBOVESPA index with respect to theaverage of the entire sample +e hypothesis is that β2 lt 0 inwhich more educated investors tend to better discern in-formation from noise out of variations of the IBOVESPAindex and therefore the mean-reversal strategy would be lesspronounced

Table 4 reports our estimates of Regression (6) Onaverage the mean-reversal strategy remains We note thatthe interaction of changes of the IBOVESPA index and thedummy higher education is positive and statistically sig-nificant +is suggests that investors with higher academic

Table 4 +is table reports output from Regression (6) We ask whether investors with higher academic degree have different sensitivenesswith respect to their investment portfolio to IBOVESPA index changes We only use changes rather than past averages because the formerhas greater prediction power as reported by our feature selection procedure +e dependent variable is the variation of portfolio investmentvolume of investor i at time t in the Brazilian stock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3)5- (4) and 30-day (5) IBOVESPA index variations as well as their interaction with the investorrsquos academic degree +e panel is on a dailyfrequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔΔIBOVESPAt with1-day variation minus 10347lowastlowastlowast

(1795)2-day variation minus 5136lowastlowastlowast

(1299)3-day variation minus 2750lowastlowast

(1076)5-day variation minus 2565lowastlowastlowast

(0931)30-day variation 0085

(0695)Interactions of ΔIBOVESPAt with academic degree1-day variation middot Higher education 5347lowastlowastlowast

(1520)2-day variation middot Higher education 3915lowastlowastlowast

(1040)3-day variation middot Higher education 2864lowast

(1573)5-day variation middot Higher education 2471lowast

(1398)30-day variation middot Higher education minus 0237

(0647)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0038 0036 0036 0035 0035Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

12 Complexity

degree have a less pronounced mean-reversal strategy thanless educated investors which corroborates our hypothesisLooking at Specification (3) we observe a positive thoughmarginally significant relationship between IBOVESPAchanges and investment volume (minus 2750 + 2864 0114) formore educated investors suggesting traits of a momentumstrategy

5 Conclusions

We employ machine learning techniques together witheconometrics techniques to model investor behavior using aunique dataset for investors that focus on stock marketinvestments We propose a methodological approach to linkmachine learning methods widely used in computer scienceto standard econometric techniques commonly employed insocial sciences

Using the unique data set with high-frequency dailyinvestment decision of a broad set of investors in Brazil weprovide evidence that investors look at past performance of abenchmark stock index in order to decide their investmentdecisions Investors seem to prefer mean-reverting strategiesin the short-run rather than momentum +is may be as-sociated with the disposition effect - investors prefer to sellthe winners and buy the losers [55 56] Furthermore re-search could exploit alternative explanations for thisbehavior

In addition we study the determinants that either softenor exacerbate the mean-reversal behavior of Brazilian in-vestors by looking at the role of gender and level ofschoolingWe find that females andmore educated investorsare less sensitive to changes of past IBOVESPA variationswhich is consistent with the literature on behavioral finance

+is paper highlights the importance of using non-traditional methods in econometric analysis +e use ofmachine learning methods permits us to automate the oftensubjective process of choosing which variables are importantin any econometric analysis By using a feature selectionschememdashsuch as the elastic net in this papermdashwe are able toidentify those attributes that best describe how investorsdecide to buy or sell their positions in an objective andstatistically correct manner In addition to that the businessspecialist can always assess these variables pointed out asmost important to analyze their economic meaning

Data Availability

+e data is confidential

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+iago C Silva (grant no 4085462018-2) and Benjamin MTabak (grant no 3105412018-2 4251232018-9) gratefullyacknowledge financial support from the CNPq foundation

References

[1] H Zou and T Hastie ldquoRegularization and variable selectionvia the elastic netrdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 67 no 2 pp 301ndash3202005

[2] E F Fama and K R French ldquoPermanent and temporarycomponents of stock pricesrdquo Journal of Political Economyvol 96 no 2 pp 246ndash273 1988

[3] A W Lo and A C MacKinlay ldquoStock market prices do notfollow random walks evidence from a simple specificationtestrdquo Review of Financial Studies vol 1 no 1 pp 41ndash66 1988

[4] J M Poterba and L H Summers ldquoMean reversion in stockpricesrdquo Journal of Financial Economics vol 22 no 1pp 27ndash59 1988

[5] J Conrad and G Kaul ldquoAn anatomy of trading strategiesrdquoReview of Financial Studies vol 11 no 3 pp 489ndash519 1998

[6] C Engel and J D Hamilton ldquoLong swings in the dollar arethey in the data and do markets know itrdquo 8e AmericanEconomic Review vol 80 pp 689ndash713 1990

[7] N Jegadeesh and S Titman ldquoMomentumrdquo Annual Review ofFinancial Economics vol 3 no 1 pp 493ndash509 2011

[8] R S J Koijen J C Rodrıguez and A Sbuelz ldquoMomentumand mean reversion in strategic asset allocationrdquo Manage-ment Science vol 55 no 7 pp 1199ndash1213 2009

[9] M Morrin J Jacoby G V Johar X He A Kuss andD Mazursky ldquoTaking stock of stockbrokers exploring mo-mentum versus contrarian investor strategies and profilesrdquoJournal of Consumer Research vol 29 no 2 pp 188ndash198 2002

[10] J Okunev andDWhite ldquoDomomentum-based strategies stillwork in foreign currency marketsrdquo 8e Journal of Financialand Quantitative Analysis vol 38 no 2 pp 425ndash447 2003

[11] D Schiereck W De Bondt and M Weber ldquoContrarian andmomentum strategies in Germanyrdquo Financial AnalystsJournal vol 55 no 6 pp 104ndash116 1999

[12] D O Cajueiro and B M Tabak ldquoTesting for predictability inequity returns for European transition marketsrdquo EconomicSystems vol 30 no 1 pp 56ndash78 2006

[13] D O Cajueiro and B M Tabak ldquoTesting for time-varyinglong-range dependence in real state equity returnsrdquo ChaosSolitons amp Fractals vol 38 no 1 pp 293ndash307 2008

[14] E J Chang E J A Lima and B M Tabak ldquoTesting forpredictability in emerging equity marketsrdquo Emerging MarketsReview vol 5 no 3 pp 295ndash316 2004

[15] A Sensoy K Ozturk E Hacihasanoglu and B M TabakldquoNot all emerging markets are the same a classification ap-proach with correlation based networksrdquo Journal of FinancialStability vol 33 pp 163ndash186 2017

[16] B M Tabak and E J A Lima ldquoMarket efficiency of Brazilianexchange rate evidence from variance ratio statistics andtechnical trading rulesrdquo European Journal of OperationalResearch vol 194 no 3 pp 814ndash820 2009

[17] C M Boya ldquoFrom efficient markets to adaptive marketsevidence from the French stock exchangerdquo Research in In-ternational Business and Finance vol 49 pp 156ndash165 2019

[18] R Ding and P Cheng ldquoSpeculative trading price pressureand overvaluationrdquo Journal of International Financial Mar-kets Institutions and Money vol 21 no 3 pp 419ndash442 2011

[19] E Lee and N Piqueira ldquoBehavioral biases of informed tradersevidence from insider trading on the 52-week highrdquo Journal ofEmpirical Finance vol 52 pp 56ndash75 2019

[20] T-Y Pak and P Babiarz ldquoDoes cognitive aging affect port-folio choicerdquo Journal of Economic Psychology vol 66pp 1ndash12 2018

Complexity 13

[21] T Suzuki and Y Ohkura ldquoFinancial technical indicator basedon chaotic bagging predictors for adaptive stock selection inJapanese and American marketsrdquo Physica A Statistical Me-chanics and its Applications vol 442 pp 50ndash66 2016

[22] A Urquhart and F McGroarty ldquoAre stock markets reallyefficient evidence of the adaptive market hypothesisrdquo In-ternational Review of Financial Analysis vol 47 pp 39ndash492016

[23] X Xiong Y Meng X Li and D Shen ldquoAn empirical analysisof the adaptive market hypothesis with calendar effects ev-idence from Chinardquo Finance Research Letters vol 31 2019

[24] H Takahashi and T Terano ldquoAnalyzing the influence ofoverconfident investors on financial markets through agent-based modelrdquo in Intelligent Data Engineering and AutomatedLearning-IDEAL 2007 H Yin P Tino E CorchadoW Byrne and X Yao Eds Springer Berlin HeidelbergGermany 2007

[25] B LeBaron ldquoEmpirical regularities from interacting long- andshort-memory investors in an agent-based stock marketrdquoIEEE Transactions on Evolutionary Computation vol 5 no 5pp 442ndash455 2001

[26] M A Bertella F R Pires L Feng and H E StanleyldquoConfidence and the stock market an agent-based approachrdquoPLoS One vol 9 no 1 Article ID e83488 2014

[27] H R Varian ldquoBig data new tricks for econometricsrdquo Journalof Economic Perspectives vol 28 no 2 pp 3ndash28 2014

[28] G James D Witten T Hastie and R Tibshirani An In-troduction to Statistical Learning With Applications in RSpringer Publishing Company Incorporated Switzerland2014

[29] T C Silva and L Zhao Machine Learning in ComplexNetworks Springer Publishing Company IncorporatedSwitzerland 1st edition 2016

[30] T Hastie R Tibshirani and J Friedman 8e Elements ofStatistical Learning Data Mining Inference and PredictionSpringer Berlin Heidelberg Germany 2nd edition 2009

[31] K J Arrow and G Debreu ldquoExistence of an equilibrium for acompetitive economyrdquo Econometrica vol 22 no 3pp 265ndash290 1954

[32] H A Simon Models of Man Social and Rational- Mathe-matical Essays on Rational Human Behavior in a Social Set-ting Wiley Hoboken NJ USA 1957

[33] L Neyse S Bosworth P Ring and U Schmidt ldquoOver-confidence incentives and digit ratiordquo Scientific Reportsvol 6 no 1 2016

[34] M A Lundeberg P W Fox and J Pun coha ldquoHighlyconfident but wrong gender differences and similarities inconfidence judgmentsrdquo Journal of Educational Psychologyvol 86 no 1 pp 114ndash121 1994

[35] O Onishchenko and N Ulku ldquoForeign investor tradingbehavior has evolvedrdquo Journal of Multinational FinancialManagement vol 51 pp 98ndash115 2019

[36] M Abreu ldquoHow biased is the behavior of the individualinvestor in warrantsrdquo Research in International Business andFinance vol 47 pp 139ndash149 2019

[37] J-C Li Y-X Li N-S Tang and D-C Mei ldquo+e roles ofmean residence time on herd behavior in a financial marketrdquoPhysica A Statistical Mechanics and Its Applications vol 462pp 350ndash357 2016

[38] C Liu and X Li ldquoMedia coverage and investor scare behaviordiffusionrdquo Physica A Statistical Mechanics and Its Applica-tions vol 527 p 121398 2019

[39] KW Park S H Jeong and J Y J Oh ldquoForeigners at the gateforeign investor trading and the disposition effect of domestic

individual investorsrdquo 8e North American Journal of Eco-nomics and Finance vol 49 pp 165ndash180 2019

[40] Y Shi Y-r Tang and W Long ldquoSentiment contagionanalysis of interacting investors evidence from Chinarsquos stockforumrdquo Physica A Statistical Mechanics and its Applicationsvol 523 pp 246ndash259 2019

[41] J R Wei J P Huang and P M Hui ldquoAn agent-based modelof stock markets incorporating momentum investorsrdquoPhysica A Statistical Mechanics and its Applications vol 392no 12 pp 2728ndash2735 2013

[42] R J Balvers and Y Wu ldquoMomentum and mean reversionacross national equity marketsrdquo Journal of Empirical Financevol 13 no 1 pp 24ndash48 2006

[43] D B Chaves and V Viswanathan ldquoMomentum and mean-reversion in commodity spot and futures marketsrdquo Journal ofCommodity Markets vol 3 no 1 pp 39ndash53 2016

[44] B M Barber and T Odean ldquoBoys will be boys genderoverconfidence and common stock investmentrdquo 8e Quar-terly Journal of Economics vol 116 no 1 pp 261ndash292 2001

[45] J I PeNtildea ldquoDaily seasonalities and stock market reforms inSpainrdquo Applied Financial Economics vol 5 no 6 pp 419ndash423 1995

[46] T C Silva and L Liang Zhao ldquoNetwork-based high level dataclassificationrdquo IEEE Transactions on Neural Networks andLearning Systems vol 23 no 6 pp 954ndash970 2012

[47] T C Silva and L Liang Zhao ldquoNetwork-based stochasticsemisupervised learningrdquo IEEE Transactions on Neural Net-works and Learning Systems vol 23 no 3 pp 451ndash466 2012

[48] T C Silva and L Liang Zhao ldquoStochastic competitive learningin complex networksrdquo IEEE Transactions on Neural Networksand Learning Systems vol 23 no 3 pp 385ndash398 2012

[49] C M Bishop Pattern Recognition and Machine Learning(Information Science and Statistics) Springer-Verlag BerlinHeidelberg Germany 2006

[50] M A Petersen ldquoEstimating standard errors in finance paneldata sets comparing approachesrdquo Review of Financial Studiesvol 22 no 1 pp 435ndash480 2009

[51] T K Hira and C Loibl ldquoGender differences in investmentbehaviorrdquo in Handbook of Consumer Finance ResearchJ J Xiao Ed Springer New York NY USA 2008

[52] A E Sunden and B J Surette ldquoGender differences in theallocation of assets in retirement savings plansrdquo8eAmericanEconomic Review vol 88 pp 207ndash211 1998

[53] M Grinblatt M Keloharju and J Linnainmaa ldquoIQ and stockmarket participationrdquo 8e Journal of Finance vol 66 no 6pp 2121ndash2164 2011

[54] L Guiso and T Jappelli ldquoAwareness and stock market par-ticipationrdquo Review of Finance vol 9 no 4 pp 537ndash567 2005

[55] N Barberis andW Xiong ldquoWhat drives the disposition effectan analysis of a long-standing preference-based explanationrdquo8e Journal of Finance vol 64 no 2 pp 751ndash784 2009

[56] H Shefrin and M Statman ldquo+e disposition to sell winnerstoo early and ride losers too long theory and evidencerdquo 8eJournal of Finance vol 40 no 3 pp 777ndash790 1985

14 Complexity

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 8: Modeling Investor Behavior Using Machine Learning: Mean … · 2019. 12. 26. · Research Article Modeling Investor Behavior Using Machine Learning: Mean-Reversion and Momentum Trading

with time fixed effects to purge out macroeconomic com-ponents as follows

Δyit αt + ϵit (1)

in which Δyit denotes the portfolio volume variation in thestock market of investor i at time t αt represents time fixedeffects and ϵit is the residual In this specification we in-terpret the residual ϵit as any variation of investor irsquosportfolio volume at time t that is not due to any timecommon factor such as the underlying macroeconomicscenario By using ϵit instead of Δyit we can effectively treatthe data as a large cross-sectional unit Hence we are able tofully use machine learning methods at their best setupwhich we discuss further

We choose an elastic net regression to estimate theimportance of each attribute in the model Such regressionoptimally combines L2-norm (Ridge) and L1-norm (Lasso)regularization +erefore we are able to prevent anyoverfitting in our empirical model Moreover we use aconvex combination of L1-norm which tends to shrinksthe majority of the nonrelevant regressors to zero and keepthe most important nonzero and L2-norm which tends tooutput nonzero and approximate coefficients for all thesimilar regressors By using both regularization schemeswe are able to enjoy the positive characteristics of bothschemes

To select the most important attributes we use the re-sidual ϵit the investment volume variation of investor i attime t not due to common time factors as dependentvariable and different IBOVESPA index time aggregationsand investorsrsquo biological and education characteristics asindependent variables as follows

ϵit βTmiddot Xit + errorit (2)

in which Xit is a vector composed of past IBOVESPAchanges with different windows (1- 2- 3- 5- 10- 20- and30-day IBOVESPA changes) and investorsrsquo characteristics(state of residency gender and level of schooling) +e termerrorit is the standard error According to the elastic netprocedure we select β that minimizes the following lossfunction L(β)

L(β) 1113944t1

T

1113944i1

N

ϵit minus 1113944j1

p

βjx(j)it

⎛⎝ ⎞⎠

2

+ λ (1 minus α)||β||222

+ α||β||211113890 1113891

(3)

in which t isin 1 N index times on a daily basis (fromJanuary 1 2016 to December 31 2018) and i index in-vestors +e term x

(j)it indexes the jth regressor of investor i

at time t +e operators ||||1 and ||||2 indicate L1-and L2-norms taken over the vector input

+e first expression in (3) denotes the traditional datafitting error (residuals) while the second represents theregularization term Parameter λ modulates the importanceof the traditional and regularization terms +e term αcontrols the convex mixture of L1 and L2 regularization +eregularization works by penalizing large β coefficients+erefore it shrinks the estimated coefficients and the

overall fit function becomes smoother over the datadistribution

In the elastic net regression a takes values in between 0and 1 We optimally tune a and λ using a nested cross-validation procedure with k 10 folds and 100 independentrepeats for statistical robustness [29 49] In this procedurewe use k minus 1 9 folds for training and the remaining fold fortesting +is procedure is cycled k times such that each foldappears exactly once for testing Such methodology enablesus to tune the regularization parameters while preventingoverfitting of the model We optimize a over the grid searchspace 0 005 010 1 and λ over 0 01 02 5 Asstandard practice we preprocess all regressors by applying aZ-score standardization over all the data points using pre-determined values extracted only from the training data (soas to prevent data leakage from the test set)

Figure 5 shows our results for the importance of differenttime aggregations of the IBOVESPA index in explaining in-vestorsrsquo behavior +e optimal regularization parameters wereλ 01 and α 035 We normalize the coefficients in termsof the most important attribute +e attribute ldquo1-day IBO-VESPA variationrdquo is the most powerful predictor forexplaining investorsrsquo behavior followed by ldquo2-day IBOVESPAvariationrdquo and ldquo5-day IBOVESPA variationrdquo +is suggeststhat investors prefer to base their investment decisions usingshort-term variations of the stock market index Even thoughmore prolonged periods of IBOVESPA index changes areimportantmdashsuch as 10- 20- and 30-day variationsmdashthey aremuch less important than the short-term variations In ad-dition we find that investorsrsquo gender and schooling level arealso important characteristics for explaining buy and selloperations over the Brazilian stock exchange market in theperiod from 2016 to 2018 We also observe that some regionalvariables are important such as Santa Catarina Rio de JaneiroDistrito FederalMinas Gerais and Parana and Satildeo Paulo+ismay suggest a different mass of investorsrsquo composition acrossdifferent states

Our feature selection procedure gives us an objective wayof identifying potentially important variables that should beaccounted for in our econometric exercise Such tool takentogether with the analystrsquos expertise to assess their validity interms of relationship with the analyzed measure is an im-portant step in producing econometric methods in a morereliable manner Our results point that we should control forinvestorsrsquo characteristics (gender and schooling level) andalso past IBOVESPA variations +e investorrsquos state is notimportant because we will employ a panel-data analysis withfixed effects at the investor level +erefore the investorrsquosstate is collinear with the investor fixed effect and would bedropped during the estimations

4 Econometric Analysis with SelectedVariables

In the previous section we have found that short-termvariations of the IBOVESPA index are better predictors forbuy or sell operations in the Brazilian stock exchange marketthan long-term variations+e feature selection procedure isa transparent way of choosing relevant variables in an ob-jective way However such method does not provide an

8 Complexity

answer as to whether each variable contributes positively ornegatively to the target variable ie the investment decisionof the investor (buy or sell) In this section we look at suchdirection and estimate the magnitude of the most relevantvariables found by our feature selection technique

In Section 41 we first test whether investment decisionsof Brazilian investors better fit to a mean-reversal or mo-mentum strategy For that we regress total investmentvariations of investors against past variations of the IBO-VESPA index For robustness we use 1- 2- 3- 5- and 30-day variations of the IBOVESPA index Our regressions areat the investor level which enables us to control for un-observed time-invariant characteristics of each Brazilianinvestor which would otherwise be impractical in case wehad aggregate data like most existing studies +erein wefind that the mean-reversal technique better explains buy orsell operations in the Brazilian stock market during theperiod from 2016 to 2018 Our results corroborate thefindings of our feature selection technique short-termvariations explain more buy or sell operations than long-term variations

In Sections 42 and 43 we study the determinants thateither soften or exacerbate the mean-reversal behavior ofBrazilian investors by looking at the role of gender and levelof schooling respectively of investors +ese exercisesconnect with the existing literature on the influence ofsocioeconomic and biological features in shaping the be-havior of economic agents

41 Do Investors Use a Mean-Reversion or a MomentumStrategy in 8eir Buy and Sell Operations To answer howinvestors respond to changes in the IBOVESPA index werun the following econometric specification

Δyit αi + αtlowast + βΔIBOVESPAt + ηit (4)

in which Δyit is the portfolio volume variation of investor iat time t +ere is a positive variation (Δyit gt 0) when in-vestor i buys more stocks at time t and a negative variation(Δyit lt 0) when she sells Alternatively when the investorholds her investment over time then (Δyit 0) +e factorηit is the standard error term

Our coefficient of interest is β which captures investorsrsquoresponses to variations of the IBOVESPA index denoted asΔIBOVESPAt We test whether investors use a mean-re-versal or momentum strategy as follows (we discard thehypothesis that investorsrsquo buy and sell decisions are un-related to variations of the IBOVESPA index because ourfeature selection technique identified past variations of theIBOVESPA index as the most relevant predictors of in-vestor-specific investment variations)

(i) If investors use amean-reversal strategy then increasesin the IBOVESPA indexmdashie ΔIBOVESPAt gt 0mdashare followed by sell operations in such a way that theinvestment volume of investors on average decreases(Δyit lt 0) +erefore a mean-reversal strategy istranslated by a negative β coefficient (βlt 0)

(ii) If investors use a momentum strategy then increasesin the IBOVESPA indexmdashie ΔIBOVESPAt gt 0mdashare followed by buy operations in such a way thatthe investment volume of investors on averageincreases (Δyit gt 0) +erefore a momentum strat-egy is translated by a positive β coefficient (βgt 0)

As there is persistence of the past IBOVESPA indexvariations by construction we test how investorsrsquo in-vestment volume respond to 1- 2- 3- 5- and 30-dayvariations of the IBOVESPA index in an independentmanner +is empirical design strategy prevents standarderrors to get overly inflated due to high pairwise correlationof these regressors

+e term αi represents investor fixed effects and ab-sorbs any nonobserved time-invariant characteristic ofeach investor in the sample +is mitigates potentialomitted variables that could bias our results such as in-vestorsrsquo skill which is hard to measure We should notethat any omitted variable that is time variant would not beabsorbed by the investor fixed effect +erefore while theintroduction of such fixed effect mitigates omitted variablebias it does not completely avoid it For instance if in-vestorsrsquo skill significantly increases over time then wewould have an omitted variable bias Since our panel spansa relatively small time periodmdash2016 to 2018mdashit is fair toassume that investorsrsquo skill remains roughly constant +e

Satildeo Paulo StateRio Grande do Sul State

20-day IBOVESPA variationParanaacute State

Minas Gerais StateDistrito Federal State

10-day IBOVESPA variationRio de Janeiro State

Santa Catarina State30-day IBOVESPA variation

Male investorHighly educated investor

3-day IBOVESPA variation5-day IBOVESPA variation2-day IBOVESPA variation1-day IBOVESPA variation

25 50 75 1000Importance ( normalized)

Figure 5 Feature selection results using an elastic net procedure with L2 and L1 regularization Coefficients are normalized in terms of themost important attribute (ldquo1-day IBOVESPA variationrdquo)

Complexity 9

term αtlowast connotes time-fixed effects at the year-monthlevel which absorbs any homogeneous time-variant effectsuch as the Brazilian recession or month-wise exchangerate fluctuations Since our panel frequency is on a dailybasis we cannot add a time fixed effect at the same fre-quency because our coefficient of interestmdashβmdashwould getabsorbed by the time fixed effects as it only varies acrosstime To prevent such problem we use a less granular timefixed effects namely month-year

Our data set contains 13247 investors in a large andrepresentative bank in Brazil and 610 time points Due to thisconfiguration we follow Petersen [50] and double-clusterstandard errors at the investor and time levels +is is arobust strategy that is important for panels with a largenumber of individuals and time points because it mitigatesheteroscedasticity and serial correlation Finally our data isin percentage terms

Table 2 reports our estimates of Regression (4) Weobserve that a 1 percent increase of the IBOVESPA indexassociates with an average decrease of 9693 of the investorportfolio volume when we look at the 1-day IBOVESPAvariation +e results remain with a statistically significantcoefficient across different lengths of past IBOVESPA var-iations (2- 3- and 5-day variations) except for 30-dayvariations in which the statistical significance vanishesMoreover the magnitude of the coefficient reduces as we useless recent past variations of the IBOVESPA index which isconsistent with the view that investors in our sample aremore concerned with short-term rather long-term variationsof the IBOVESPA index +e negative and statisticallysignificant sign corroborates the hypothesis that investorsuse mean-reverting trading strategies in which they tend to

sell after substantial upward changes of the IBOVESPAindex and tend to buy after downward changes

42 Does Gender Impact Investors Responsiveness to IBO-VESPA IndexChanges We have showed empirical evidencethat investorsrsquo strategy on average better fit to a mean-reverting behavior in the Brazilian stockmarket+at is theytend to sell after positive changes of the IBOVESPA indexand buy after negative changes In this section we askwhether the sensitiveness of investors to the IBOVESPAindex depends on their biological characteristics in specialtheir gender Biological factorsmdashespecially gendermdashhavebeen extensively explored in investment decision-makingNotable works relating biological factors including genderare Hira and Loibl [51] Lundeberg et al [34] Neyse et al[33] and Sunden and Surette [52] +is paper providesfurther evidence of the existence of such gender gap ininvestment decisions using a microdata on investor-matched buy and sell operations

In this line of research Neyse et al [33] and Lundeberget al [34] partly attribute behavioral differences among malesand females due to systematic changes in overconfidenceExcessive overconfidence is associated with higher levels oftestosterone which is more pronounced in males Over-confidence may induce investors to take on higher risksleading them to look for higher returns in the short term Inthis way we would expect that females be less sensitive tochanges of past IBOVESPA variations as they would valuemore fundamentals and look for yields in the longer term+erefore short-term variations of the IBOVESPA indiceswould explain less their buy or sell operations comparatively

Table 2 Output from Regression (4) We ask how investors respond to changes in the IBOVESPA index We only use changes rather thanpast averages because the former has greater prediction power as reported by our feature selection procedure +e dependent variable is thevariation of portfolio investment volume of investor i at time t in the Brazilian stock market from the beginning of 2016 to the end of 2018Regressors are 1- (1) 2- (2) 3- (3) 5- (4) and 30-day (5) IBOVESPA index variations +e panel is on a daily frequency basis FollowingPetersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010 lowastlowastplt 005 and lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔIBOVESPAt with1-day variation minus 9693lowastlowastlowast

(1580)2-day variation minus 4656lowastlowastlowast

(1160)3-day variation minus 2400lowastlowast

(0964)5-day variation minus 2265lowastlowastlowast

(0852)30-day variation 0058

(0680)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0037 0036 0036 0035 0033Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

10 Complexity

to males To empirically answer this question we constructthe following econometric specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Femalei + ηit(5)

in which Femalei is a dummy variable that takes the value of1 when investor i is female and 0 otherwise We do not addthe investorrsquos gender alone in (5) because it would beabsorbed by the investor fixed effects αi Our coefficient ofinterest is β2 which captures any behavioral deviation offemales to changes of the IBOVESPA index with respect tothe average of the entire sample (male and female) If β2 gt 0then the mean-reversal strategy is less pronounced to fe-males while β2 lt 0 indicates a more accentuated behaviortowards the mean-reversal strategy In the case β2 0 thenfemales and males respond on average equivalently tochanges of the IBOVESPA index Following the discussionon overconfidence and its influence on short-term decisionsover males and females our hypothesis is that β2 gt 0

Table 3 reports our estimates of Regression (5) Ourprevious results relating the mean-reversal strategy of in-vestors in the Brazilian stock market remain the same Weobserve that the interaction of changes in the IBOVESPAindex and the dummy female is positive and statisticallysignificant +is empirical finding corroborates the view thatfemale investors have a less pronounced mean-reversalstrategy than males as they look at longer-term returns andare less attentive to short-term variations of the IBOVESPAindex which could arise due to noisy information Forinstance looking at Specification (1) a 1 percent positivechange in the IBOVESPA index associates with a decrease ofminus 10345 + 6543 minus 3802 of the invested volume of femaleinvestors In contrast the entire sample (males and females)decreases their portfolio volume on average by minus 10345for a 1 percent positive change in the IBOVESPA indexInterestingly even though statistically insignificant 30-dayvariations of the IBOVESPA index are positively associatedwith investment volumes for females suggesting traits of amomentum strategy +is is also suggestive evidence that

Table 3 Output from Regression (5)We ask whether female investors have different sensitiveness with respect to their investment portfolioto IBOVESPA index changes We only use changes rather than past averages because the former has greater prediction power as reported byour feature selection procedure+e dependent variable is the variation of portfolio investment volume of investor i at time t in the Brazilianstock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3) 5- (4) and 30-day (5) IBOVESPA indexvariations as well as their interaction with the investorrsquos gender+e panel is on a daily frequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010 lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔIBOVESPAt with1-day variation minus 10345lowastlowastlowast

(1754)2-day variation minus 5019lowastlowastlowast

(1275)3-day variation minus 2650lowastlowast

(1055)5-day variation minus 2315lowastlowastlowast

(0886)30-day variation minus 0001

(0693)Interactions of ΔIBOVESPAt with gender1-day variation middot Female 6543lowastlowastlowast

(2392)2-day variation middot Female 3708lowastlowastlowast

(1264)3-day variation middot Female 2585lowastlowast

(1212)5-day variation middot Female 0517

(1538)30-day variation middot Female 0572

(0831)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0039 0037 0036 0035 0034Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

Complexity 11

females tend to look at longer horizons when taking in-vestment decisions

43 Does Formal Education Impact Investors Responsivenessto IBOVESPA Index Changes In this section we look at howformal education (academic degree or level of schooling) caninfluence investorsrsquo sensitiveness to IBOVESPA index varia-tions +ere are several works in the behavioral finance lit-erature that attempt to establish a connection between level ofschooling and investorsrsquo awareness of stock markets and theirdecision-making determinants We highlight the researchstudies of Grinblatt et al [53] and Guiso and Jappelli [54] Intheory educated investors should behave in more rationalways and trade less frequently when there is no new relevantinformation arriving in the market but noises +erefore wewould expect these investors to have a smaller reaction to pricefluctuations as they are able to better identify informationfrom noise To empirically test this behavior we run thefollowing specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Higher Educationi + ηit(6)

in which Higher Educationi is a dummy variable that takesthe value of 1 when investor i has a higher education (at leastcollege degree) and 0 otherwise (high school or a lowerdegree) Our coefficient of interest is β2 which captures anybehavioral deviation of investors with higher formal edu-cation to changes of the IBOVESPA index with respect to theaverage of the entire sample +e hypothesis is that β2 lt 0 inwhich more educated investors tend to better discern in-formation from noise out of variations of the IBOVESPAindex and therefore the mean-reversal strategy would be lesspronounced

Table 4 reports our estimates of Regression (6) Onaverage the mean-reversal strategy remains We note thatthe interaction of changes of the IBOVESPA index and thedummy higher education is positive and statistically sig-nificant +is suggests that investors with higher academic

Table 4 +is table reports output from Regression (6) We ask whether investors with higher academic degree have different sensitivenesswith respect to their investment portfolio to IBOVESPA index changes We only use changes rather than past averages because the formerhas greater prediction power as reported by our feature selection procedure +e dependent variable is the variation of portfolio investmentvolume of investor i at time t in the Brazilian stock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3)5- (4) and 30-day (5) IBOVESPA index variations as well as their interaction with the investorrsquos academic degree +e panel is on a dailyfrequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔΔIBOVESPAt with1-day variation minus 10347lowastlowastlowast

(1795)2-day variation minus 5136lowastlowastlowast

(1299)3-day variation minus 2750lowastlowast

(1076)5-day variation minus 2565lowastlowastlowast

(0931)30-day variation 0085

(0695)Interactions of ΔIBOVESPAt with academic degree1-day variation middot Higher education 5347lowastlowastlowast

(1520)2-day variation middot Higher education 3915lowastlowastlowast

(1040)3-day variation middot Higher education 2864lowast

(1573)5-day variation middot Higher education 2471lowast

(1398)30-day variation middot Higher education minus 0237

(0647)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0038 0036 0036 0035 0035Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

12 Complexity

degree have a less pronounced mean-reversal strategy thanless educated investors which corroborates our hypothesisLooking at Specification (3) we observe a positive thoughmarginally significant relationship between IBOVESPAchanges and investment volume (minus 2750 + 2864 0114) formore educated investors suggesting traits of a momentumstrategy

5 Conclusions

We employ machine learning techniques together witheconometrics techniques to model investor behavior using aunique dataset for investors that focus on stock marketinvestments We propose a methodological approach to linkmachine learning methods widely used in computer scienceto standard econometric techniques commonly employed insocial sciences

Using the unique data set with high-frequency dailyinvestment decision of a broad set of investors in Brazil weprovide evidence that investors look at past performance of abenchmark stock index in order to decide their investmentdecisions Investors seem to prefer mean-reverting strategiesin the short-run rather than momentum +is may be as-sociated with the disposition effect - investors prefer to sellthe winners and buy the losers [55 56] Furthermore re-search could exploit alternative explanations for thisbehavior

In addition we study the determinants that either softenor exacerbate the mean-reversal behavior of Brazilian in-vestors by looking at the role of gender and level ofschoolingWe find that females andmore educated investorsare less sensitive to changes of past IBOVESPA variationswhich is consistent with the literature on behavioral finance

+is paper highlights the importance of using non-traditional methods in econometric analysis +e use ofmachine learning methods permits us to automate the oftensubjective process of choosing which variables are importantin any econometric analysis By using a feature selectionschememdashsuch as the elastic net in this papermdashwe are able toidentify those attributes that best describe how investorsdecide to buy or sell their positions in an objective andstatistically correct manner In addition to that the businessspecialist can always assess these variables pointed out asmost important to analyze their economic meaning

Data Availability

+e data is confidential

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+iago C Silva (grant no 4085462018-2) and Benjamin MTabak (grant no 3105412018-2 4251232018-9) gratefullyacknowledge financial support from the CNPq foundation

References

[1] H Zou and T Hastie ldquoRegularization and variable selectionvia the elastic netrdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 67 no 2 pp 301ndash3202005

[2] E F Fama and K R French ldquoPermanent and temporarycomponents of stock pricesrdquo Journal of Political Economyvol 96 no 2 pp 246ndash273 1988

[3] A W Lo and A C MacKinlay ldquoStock market prices do notfollow random walks evidence from a simple specificationtestrdquo Review of Financial Studies vol 1 no 1 pp 41ndash66 1988

[4] J M Poterba and L H Summers ldquoMean reversion in stockpricesrdquo Journal of Financial Economics vol 22 no 1pp 27ndash59 1988

[5] J Conrad and G Kaul ldquoAn anatomy of trading strategiesrdquoReview of Financial Studies vol 11 no 3 pp 489ndash519 1998

[6] C Engel and J D Hamilton ldquoLong swings in the dollar arethey in the data and do markets know itrdquo 8e AmericanEconomic Review vol 80 pp 689ndash713 1990

[7] N Jegadeesh and S Titman ldquoMomentumrdquo Annual Review ofFinancial Economics vol 3 no 1 pp 493ndash509 2011

[8] R S J Koijen J C Rodrıguez and A Sbuelz ldquoMomentumand mean reversion in strategic asset allocationrdquo Manage-ment Science vol 55 no 7 pp 1199ndash1213 2009

[9] M Morrin J Jacoby G V Johar X He A Kuss andD Mazursky ldquoTaking stock of stockbrokers exploring mo-mentum versus contrarian investor strategies and profilesrdquoJournal of Consumer Research vol 29 no 2 pp 188ndash198 2002

[10] J Okunev andDWhite ldquoDomomentum-based strategies stillwork in foreign currency marketsrdquo 8e Journal of Financialand Quantitative Analysis vol 38 no 2 pp 425ndash447 2003

[11] D Schiereck W De Bondt and M Weber ldquoContrarian andmomentum strategies in Germanyrdquo Financial AnalystsJournal vol 55 no 6 pp 104ndash116 1999

[12] D O Cajueiro and B M Tabak ldquoTesting for predictability inequity returns for European transition marketsrdquo EconomicSystems vol 30 no 1 pp 56ndash78 2006

[13] D O Cajueiro and B M Tabak ldquoTesting for time-varyinglong-range dependence in real state equity returnsrdquo ChaosSolitons amp Fractals vol 38 no 1 pp 293ndash307 2008

[14] E J Chang E J A Lima and B M Tabak ldquoTesting forpredictability in emerging equity marketsrdquo Emerging MarketsReview vol 5 no 3 pp 295ndash316 2004

[15] A Sensoy K Ozturk E Hacihasanoglu and B M TabakldquoNot all emerging markets are the same a classification ap-proach with correlation based networksrdquo Journal of FinancialStability vol 33 pp 163ndash186 2017

[16] B M Tabak and E J A Lima ldquoMarket efficiency of Brazilianexchange rate evidence from variance ratio statistics andtechnical trading rulesrdquo European Journal of OperationalResearch vol 194 no 3 pp 814ndash820 2009

[17] C M Boya ldquoFrom efficient markets to adaptive marketsevidence from the French stock exchangerdquo Research in In-ternational Business and Finance vol 49 pp 156ndash165 2019

[18] R Ding and P Cheng ldquoSpeculative trading price pressureand overvaluationrdquo Journal of International Financial Mar-kets Institutions and Money vol 21 no 3 pp 419ndash442 2011

[19] E Lee and N Piqueira ldquoBehavioral biases of informed tradersevidence from insider trading on the 52-week highrdquo Journal ofEmpirical Finance vol 52 pp 56ndash75 2019

[20] T-Y Pak and P Babiarz ldquoDoes cognitive aging affect port-folio choicerdquo Journal of Economic Psychology vol 66pp 1ndash12 2018

Complexity 13

[21] T Suzuki and Y Ohkura ldquoFinancial technical indicator basedon chaotic bagging predictors for adaptive stock selection inJapanese and American marketsrdquo Physica A Statistical Me-chanics and its Applications vol 442 pp 50ndash66 2016

[22] A Urquhart and F McGroarty ldquoAre stock markets reallyefficient evidence of the adaptive market hypothesisrdquo In-ternational Review of Financial Analysis vol 47 pp 39ndash492016

[23] X Xiong Y Meng X Li and D Shen ldquoAn empirical analysisof the adaptive market hypothesis with calendar effects ev-idence from Chinardquo Finance Research Letters vol 31 2019

[24] H Takahashi and T Terano ldquoAnalyzing the influence ofoverconfident investors on financial markets through agent-based modelrdquo in Intelligent Data Engineering and AutomatedLearning-IDEAL 2007 H Yin P Tino E CorchadoW Byrne and X Yao Eds Springer Berlin HeidelbergGermany 2007

[25] B LeBaron ldquoEmpirical regularities from interacting long- andshort-memory investors in an agent-based stock marketrdquoIEEE Transactions on Evolutionary Computation vol 5 no 5pp 442ndash455 2001

[26] M A Bertella F R Pires L Feng and H E StanleyldquoConfidence and the stock market an agent-based approachrdquoPLoS One vol 9 no 1 Article ID e83488 2014

[27] H R Varian ldquoBig data new tricks for econometricsrdquo Journalof Economic Perspectives vol 28 no 2 pp 3ndash28 2014

[28] G James D Witten T Hastie and R Tibshirani An In-troduction to Statistical Learning With Applications in RSpringer Publishing Company Incorporated Switzerland2014

[29] T C Silva and L Zhao Machine Learning in ComplexNetworks Springer Publishing Company IncorporatedSwitzerland 1st edition 2016

[30] T Hastie R Tibshirani and J Friedman 8e Elements ofStatistical Learning Data Mining Inference and PredictionSpringer Berlin Heidelberg Germany 2nd edition 2009

[31] K J Arrow and G Debreu ldquoExistence of an equilibrium for acompetitive economyrdquo Econometrica vol 22 no 3pp 265ndash290 1954

[32] H A Simon Models of Man Social and Rational- Mathe-matical Essays on Rational Human Behavior in a Social Set-ting Wiley Hoboken NJ USA 1957

[33] L Neyse S Bosworth P Ring and U Schmidt ldquoOver-confidence incentives and digit ratiordquo Scientific Reportsvol 6 no 1 2016

[34] M A Lundeberg P W Fox and J Pun coha ldquoHighlyconfident but wrong gender differences and similarities inconfidence judgmentsrdquo Journal of Educational Psychologyvol 86 no 1 pp 114ndash121 1994

[35] O Onishchenko and N Ulku ldquoForeign investor tradingbehavior has evolvedrdquo Journal of Multinational FinancialManagement vol 51 pp 98ndash115 2019

[36] M Abreu ldquoHow biased is the behavior of the individualinvestor in warrantsrdquo Research in International Business andFinance vol 47 pp 139ndash149 2019

[37] J-C Li Y-X Li N-S Tang and D-C Mei ldquo+e roles ofmean residence time on herd behavior in a financial marketrdquoPhysica A Statistical Mechanics and Its Applications vol 462pp 350ndash357 2016

[38] C Liu and X Li ldquoMedia coverage and investor scare behaviordiffusionrdquo Physica A Statistical Mechanics and Its Applica-tions vol 527 p 121398 2019

[39] KW Park S H Jeong and J Y J Oh ldquoForeigners at the gateforeign investor trading and the disposition effect of domestic

individual investorsrdquo 8e North American Journal of Eco-nomics and Finance vol 49 pp 165ndash180 2019

[40] Y Shi Y-r Tang and W Long ldquoSentiment contagionanalysis of interacting investors evidence from Chinarsquos stockforumrdquo Physica A Statistical Mechanics and its Applicationsvol 523 pp 246ndash259 2019

[41] J R Wei J P Huang and P M Hui ldquoAn agent-based modelof stock markets incorporating momentum investorsrdquoPhysica A Statistical Mechanics and its Applications vol 392no 12 pp 2728ndash2735 2013

[42] R J Balvers and Y Wu ldquoMomentum and mean reversionacross national equity marketsrdquo Journal of Empirical Financevol 13 no 1 pp 24ndash48 2006

[43] D B Chaves and V Viswanathan ldquoMomentum and mean-reversion in commodity spot and futures marketsrdquo Journal ofCommodity Markets vol 3 no 1 pp 39ndash53 2016

[44] B M Barber and T Odean ldquoBoys will be boys genderoverconfidence and common stock investmentrdquo 8e Quar-terly Journal of Economics vol 116 no 1 pp 261ndash292 2001

[45] J I PeNtildea ldquoDaily seasonalities and stock market reforms inSpainrdquo Applied Financial Economics vol 5 no 6 pp 419ndash423 1995

[46] T C Silva and L Liang Zhao ldquoNetwork-based high level dataclassificationrdquo IEEE Transactions on Neural Networks andLearning Systems vol 23 no 6 pp 954ndash970 2012

[47] T C Silva and L Liang Zhao ldquoNetwork-based stochasticsemisupervised learningrdquo IEEE Transactions on Neural Net-works and Learning Systems vol 23 no 3 pp 451ndash466 2012

[48] T C Silva and L Liang Zhao ldquoStochastic competitive learningin complex networksrdquo IEEE Transactions on Neural Networksand Learning Systems vol 23 no 3 pp 385ndash398 2012

[49] C M Bishop Pattern Recognition and Machine Learning(Information Science and Statistics) Springer-Verlag BerlinHeidelberg Germany 2006

[50] M A Petersen ldquoEstimating standard errors in finance paneldata sets comparing approachesrdquo Review of Financial Studiesvol 22 no 1 pp 435ndash480 2009

[51] T K Hira and C Loibl ldquoGender differences in investmentbehaviorrdquo in Handbook of Consumer Finance ResearchJ J Xiao Ed Springer New York NY USA 2008

[52] A E Sunden and B J Surette ldquoGender differences in theallocation of assets in retirement savings plansrdquo8eAmericanEconomic Review vol 88 pp 207ndash211 1998

[53] M Grinblatt M Keloharju and J Linnainmaa ldquoIQ and stockmarket participationrdquo 8e Journal of Finance vol 66 no 6pp 2121ndash2164 2011

[54] L Guiso and T Jappelli ldquoAwareness and stock market par-ticipationrdquo Review of Finance vol 9 no 4 pp 537ndash567 2005

[55] N Barberis andW Xiong ldquoWhat drives the disposition effectan analysis of a long-standing preference-based explanationrdquo8e Journal of Finance vol 64 no 2 pp 751ndash784 2009

[56] H Shefrin and M Statman ldquo+e disposition to sell winnerstoo early and ride losers too long theory and evidencerdquo 8eJournal of Finance vol 40 no 3 pp 777ndash790 1985

14 Complexity

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 9: Modeling Investor Behavior Using Machine Learning: Mean … · 2019. 12. 26. · Research Article Modeling Investor Behavior Using Machine Learning: Mean-Reversion and Momentum Trading

answer as to whether each variable contributes positively ornegatively to the target variable ie the investment decisionof the investor (buy or sell) In this section we look at suchdirection and estimate the magnitude of the most relevantvariables found by our feature selection technique

In Section 41 we first test whether investment decisionsof Brazilian investors better fit to a mean-reversal or mo-mentum strategy For that we regress total investmentvariations of investors against past variations of the IBO-VESPA index For robustness we use 1- 2- 3- 5- and 30-day variations of the IBOVESPA index Our regressions areat the investor level which enables us to control for un-observed time-invariant characteristics of each Brazilianinvestor which would otherwise be impractical in case wehad aggregate data like most existing studies +erein wefind that the mean-reversal technique better explains buy orsell operations in the Brazilian stock market during theperiod from 2016 to 2018 Our results corroborate thefindings of our feature selection technique short-termvariations explain more buy or sell operations than long-term variations

In Sections 42 and 43 we study the determinants thateither soften or exacerbate the mean-reversal behavior ofBrazilian investors by looking at the role of gender and levelof schooling respectively of investors +ese exercisesconnect with the existing literature on the influence ofsocioeconomic and biological features in shaping the be-havior of economic agents

41 Do Investors Use a Mean-Reversion or a MomentumStrategy in 8eir Buy and Sell Operations To answer howinvestors respond to changes in the IBOVESPA index werun the following econometric specification

Δyit αi + αtlowast + βΔIBOVESPAt + ηit (4)

in which Δyit is the portfolio volume variation of investor iat time t +ere is a positive variation (Δyit gt 0) when in-vestor i buys more stocks at time t and a negative variation(Δyit lt 0) when she sells Alternatively when the investorholds her investment over time then (Δyit 0) +e factorηit is the standard error term

Our coefficient of interest is β which captures investorsrsquoresponses to variations of the IBOVESPA index denoted asΔIBOVESPAt We test whether investors use a mean-re-versal or momentum strategy as follows (we discard thehypothesis that investorsrsquo buy and sell decisions are un-related to variations of the IBOVESPA index because ourfeature selection technique identified past variations of theIBOVESPA index as the most relevant predictors of in-vestor-specific investment variations)

(i) If investors use amean-reversal strategy then increasesin the IBOVESPA indexmdashie ΔIBOVESPAt gt 0mdashare followed by sell operations in such a way that theinvestment volume of investors on average decreases(Δyit lt 0) +erefore a mean-reversal strategy istranslated by a negative β coefficient (βlt 0)

(ii) If investors use a momentum strategy then increasesin the IBOVESPA indexmdashie ΔIBOVESPAt gt 0mdashare followed by buy operations in such a way thatthe investment volume of investors on averageincreases (Δyit gt 0) +erefore a momentum strat-egy is translated by a positive β coefficient (βgt 0)

As there is persistence of the past IBOVESPA indexvariations by construction we test how investorsrsquo in-vestment volume respond to 1- 2- 3- 5- and 30-dayvariations of the IBOVESPA index in an independentmanner +is empirical design strategy prevents standarderrors to get overly inflated due to high pairwise correlationof these regressors

+e term αi represents investor fixed effects and ab-sorbs any nonobserved time-invariant characteristic ofeach investor in the sample +is mitigates potentialomitted variables that could bias our results such as in-vestorsrsquo skill which is hard to measure We should notethat any omitted variable that is time variant would not beabsorbed by the investor fixed effect +erefore while theintroduction of such fixed effect mitigates omitted variablebias it does not completely avoid it For instance if in-vestorsrsquo skill significantly increases over time then wewould have an omitted variable bias Since our panel spansa relatively small time periodmdash2016 to 2018mdashit is fair toassume that investorsrsquo skill remains roughly constant +e

Satildeo Paulo StateRio Grande do Sul State

20-day IBOVESPA variationParanaacute State

Minas Gerais StateDistrito Federal State

10-day IBOVESPA variationRio de Janeiro State

Santa Catarina State30-day IBOVESPA variation

Male investorHighly educated investor

3-day IBOVESPA variation5-day IBOVESPA variation2-day IBOVESPA variation1-day IBOVESPA variation

25 50 75 1000Importance ( normalized)

Figure 5 Feature selection results using an elastic net procedure with L2 and L1 regularization Coefficients are normalized in terms of themost important attribute (ldquo1-day IBOVESPA variationrdquo)

Complexity 9

term αtlowast connotes time-fixed effects at the year-monthlevel which absorbs any homogeneous time-variant effectsuch as the Brazilian recession or month-wise exchangerate fluctuations Since our panel frequency is on a dailybasis we cannot add a time fixed effect at the same fre-quency because our coefficient of interestmdashβmdashwould getabsorbed by the time fixed effects as it only varies acrosstime To prevent such problem we use a less granular timefixed effects namely month-year

Our data set contains 13247 investors in a large andrepresentative bank in Brazil and 610 time points Due to thisconfiguration we follow Petersen [50] and double-clusterstandard errors at the investor and time levels +is is arobust strategy that is important for panels with a largenumber of individuals and time points because it mitigatesheteroscedasticity and serial correlation Finally our data isin percentage terms

Table 2 reports our estimates of Regression (4) Weobserve that a 1 percent increase of the IBOVESPA indexassociates with an average decrease of 9693 of the investorportfolio volume when we look at the 1-day IBOVESPAvariation +e results remain with a statistically significantcoefficient across different lengths of past IBOVESPA var-iations (2- 3- and 5-day variations) except for 30-dayvariations in which the statistical significance vanishesMoreover the magnitude of the coefficient reduces as we useless recent past variations of the IBOVESPA index which isconsistent with the view that investors in our sample aremore concerned with short-term rather long-term variationsof the IBOVESPA index +e negative and statisticallysignificant sign corroborates the hypothesis that investorsuse mean-reverting trading strategies in which they tend to

sell after substantial upward changes of the IBOVESPAindex and tend to buy after downward changes

42 Does Gender Impact Investors Responsiveness to IBO-VESPA IndexChanges We have showed empirical evidencethat investorsrsquo strategy on average better fit to a mean-reverting behavior in the Brazilian stockmarket+at is theytend to sell after positive changes of the IBOVESPA indexand buy after negative changes In this section we askwhether the sensitiveness of investors to the IBOVESPAindex depends on their biological characteristics in specialtheir gender Biological factorsmdashespecially gendermdashhavebeen extensively explored in investment decision-makingNotable works relating biological factors including genderare Hira and Loibl [51] Lundeberg et al [34] Neyse et al[33] and Sunden and Surette [52] +is paper providesfurther evidence of the existence of such gender gap ininvestment decisions using a microdata on investor-matched buy and sell operations

In this line of research Neyse et al [33] and Lundeberget al [34] partly attribute behavioral differences among malesand females due to systematic changes in overconfidenceExcessive overconfidence is associated with higher levels oftestosterone which is more pronounced in males Over-confidence may induce investors to take on higher risksleading them to look for higher returns in the short term Inthis way we would expect that females be less sensitive tochanges of past IBOVESPA variations as they would valuemore fundamentals and look for yields in the longer term+erefore short-term variations of the IBOVESPA indiceswould explain less their buy or sell operations comparatively

Table 2 Output from Regression (4) We ask how investors respond to changes in the IBOVESPA index We only use changes rather thanpast averages because the former has greater prediction power as reported by our feature selection procedure +e dependent variable is thevariation of portfolio investment volume of investor i at time t in the Brazilian stock market from the beginning of 2016 to the end of 2018Regressors are 1- (1) 2- (2) 3- (3) 5- (4) and 30-day (5) IBOVESPA index variations +e panel is on a daily frequency basis FollowingPetersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010 lowastlowastplt 005 and lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔIBOVESPAt with1-day variation minus 9693lowastlowastlowast

(1580)2-day variation minus 4656lowastlowastlowast

(1160)3-day variation minus 2400lowastlowast

(0964)5-day variation minus 2265lowastlowastlowast

(0852)30-day variation 0058

(0680)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0037 0036 0036 0035 0033Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

10 Complexity

to males To empirically answer this question we constructthe following econometric specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Femalei + ηit(5)

in which Femalei is a dummy variable that takes the value of1 when investor i is female and 0 otherwise We do not addthe investorrsquos gender alone in (5) because it would beabsorbed by the investor fixed effects αi Our coefficient ofinterest is β2 which captures any behavioral deviation offemales to changes of the IBOVESPA index with respect tothe average of the entire sample (male and female) If β2 gt 0then the mean-reversal strategy is less pronounced to fe-males while β2 lt 0 indicates a more accentuated behaviortowards the mean-reversal strategy In the case β2 0 thenfemales and males respond on average equivalently tochanges of the IBOVESPA index Following the discussionon overconfidence and its influence on short-term decisionsover males and females our hypothesis is that β2 gt 0

Table 3 reports our estimates of Regression (5) Ourprevious results relating the mean-reversal strategy of in-vestors in the Brazilian stock market remain the same Weobserve that the interaction of changes in the IBOVESPAindex and the dummy female is positive and statisticallysignificant +is empirical finding corroborates the view thatfemale investors have a less pronounced mean-reversalstrategy than males as they look at longer-term returns andare less attentive to short-term variations of the IBOVESPAindex which could arise due to noisy information Forinstance looking at Specification (1) a 1 percent positivechange in the IBOVESPA index associates with a decrease ofminus 10345 + 6543 minus 3802 of the invested volume of femaleinvestors In contrast the entire sample (males and females)decreases their portfolio volume on average by minus 10345for a 1 percent positive change in the IBOVESPA indexInterestingly even though statistically insignificant 30-dayvariations of the IBOVESPA index are positively associatedwith investment volumes for females suggesting traits of amomentum strategy +is is also suggestive evidence that

Table 3 Output from Regression (5)We ask whether female investors have different sensitiveness with respect to their investment portfolioto IBOVESPA index changes We only use changes rather than past averages because the former has greater prediction power as reported byour feature selection procedure+e dependent variable is the variation of portfolio investment volume of investor i at time t in the Brazilianstock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3) 5- (4) and 30-day (5) IBOVESPA indexvariations as well as their interaction with the investorrsquos gender+e panel is on a daily frequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010 lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔIBOVESPAt with1-day variation minus 10345lowastlowastlowast

(1754)2-day variation minus 5019lowastlowastlowast

(1275)3-day variation minus 2650lowastlowast

(1055)5-day variation minus 2315lowastlowastlowast

(0886)30-day variation minus 0001

(0693)Interactions of ΔIBOVESPAt with gender1-day variation middot Female 6543lowastlowastlowast

(2392)2-day variation middot Female 3708lowastlowastlowast

(1264)3-day variation middot Female 2585lowastlowast

(1212)5-day variation middot Female 0517

(1538)30-day variation middot Female 0572

(0831)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0039 0037 0036 0035 0034Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

Complexity 11

females tend to look at longer horizons when taking in-vestment decisions

43 Does Formal Education Impact Investors Responsivenessto IBOVESPA Index Changes In this section we look at howformal education (academic degree or level of schooling) caninfluence investorsrsquo sensitiveness to IBOVESPA index varia-tions +ere are several works in the behavioral finance lit-erature that attempt to establish a connection between level ofschooling and investorsrsquo awareness of stock markets and theirdecision-making determinants We highlight the researchstudies of Grinblatt et al [53] and Guiso and Jappelli [54] Intheory educated investors should behave in more rationalways and trade less frequently when there is no new relevantinformation arriving in the market but noises +erefore wewould expect these investors to have a smaller reaction to pricefluctuations as they are able to better identify informationfrom noise To empirically test this behavior we run thefollowing specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Higher Educationi + ηit(6)

in which Higher Educationi is a dummy variable that takesthe value of 1 when investor i has a higher education (at leastcollege degree) and 0 otherwise (high school or a lowerdegree) Our coefficient of interest is β2 which captures anybehavioral deviation of investors with higher formal edu-cation to changes of the IBOVESPA index with respect to theaverage of the entire sample +e hypothesis is that β2 lt 0 inwhich more educated investors tend to better discern in-formation from noise out of variations of the IBOVESPAindex and therefore the mean-reversal strategy would be lesspronounced

Table 4 reports our estimates of Regression (6) Onaverage the mean-reversal strategy remains We note thatthe interaction of changes of the IBOVESPA index and thedummy higher education is positive and statistically sig-nificant +is suggests that investors with higher academic

Table 4 +is table reports output from Regression (6) We ask whether investors with higher academic degree have different sensitivenesswith respect to their investment portfolio to IBOVESPA index changes We only use changes rather than past averages because the formerhas greater prediction power as reported by our feature selection procedure +e dependent variable is the variation of portfolio investmentvolume of investor i at time t in the Brazilian stock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3)5- (4) and 30-day (5) IBOVESPA index variations as well as their interaction with the investorrsquos academic degree +e panel is on a dailyfrequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔΔIBOVESPAt with1-day variation minus 10347lowastlowastlowast

(1795)2-day variation minus 5136lowastlowastlowast

(1299)3-day variation minus 2750lowastlowast

(1076)5-day variation minus 2565lowastlowastlowast

(0931)30-day variation 0085

(0695)Interactions of ΔIBOVESPAt with academic degree1-day variation middot Higher education 5347lowastlowastlowast

(1520)2-day variation middot Higher education 3915lowastlowastlowast

(1040)3-day variation middot Higher education 2864lowast

(1573)5-day variation middot Higher education 2471lowast

(1398)30-day variation middot Higher education minus 0237

(0647)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0038 0036 0036 0035 0035Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

12 Complexity

degree have a less pronounced mean-reversal strategy thanless educated investors which corroborates our hypothesisLooking at Specification (3) we observe a positive thoughmarginally significant relationship between IBOVESPAchanges and investment volume (minus 2750 + 2864 0114) formore educated investors suggesting traits of a momentumstrategy

5 Conclusions

We employ machine learning techniques together witheconometrics techniques to model investor behavior using aunique dataset for investors that focus on stock marketinvestments We propose a methodological approach to linkmachine learning methods widely used in computer scienceto standard econometric techniques commonly employed insocial sciences

Using the unique data set with high-frequency dailyinvestment decision of a broad set of investors in Brazil weprovide evidence that investors look at past performance of abenchmark stock index in order to decide their investmentdecisions Investors seem to prefer mean-reverting strategiesin the short-run rather than momentum +is may be as-sociated with the disposition effect - investors prefer to sellthe winners and buy the losers [55 56] Furthermore re-search could exploit alternative explanations for thisbehavior

In addition we study the determinants that either softenor exacerbate the mean-reversal behavior of Brazilian in-vestors by looking at the role of gender and level ofschoolingWe find that females andmore educated investorsare less sensitive to changes of past IBOVESPA variationswhich is consistent with the literature on behavioral finance

+is paper highlights the importance of using non-traditional methods in econometric analysis +e use ofmachine learning methods permits us to automate the oftensubjective process of choosing which variables are importantin any econometric analysis By using a feature selectionschememdashsuch as the elastic net in this papermdashwe are able toidentify those attributes that best describe how investorsdecide to buy or sell their positions in an objective andstatistically correct manner In addition to that the businessspecialist can always assess these variables pointed out asmost important to analyze their economic meaning

Data Availability

+e data is confidential

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+iago C Silva (grant no 4085462018-2) and Benjamin MTabak (grant no 3105412018-2 4251232018-9) gratefullyacknowledge financial support from the CNPq foundation

References

[1] H Zou and T Hastie ldquoRegularization and variable selectionvia the elastic netrdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 67 no 2 pp 301ndash3202005

[2] E F Fama and K R French ldquoPermanent and temporarycomponents of stock pricesrdquo Journal of Political Economyvol 96 no 2 pp 246ndash273 1988

[3] A W Lo and A C MacKinlay ldquoStock market prices do notfollow random walks evidence from a simple specificationtestrdquo Review of Financial Studies vol 1 no 1 pp 41ndash66 1988

[4] J M Poterba and L H Summers ldquoMean reversion in stockpricesrdquo Journal of Financial Economics vol 22 no 1pp 27ndash59 1988

[5] J Conrad and G Kaul ldquoAn anatomy of trading strategiesrdquoReview of Financial Studies vol 11 no 3 pp 489ndash519 1998

[6] C Engel and J D Hamilton ldquoLong swings in the dollar arethey in the data and do markets know itrdquo 8e AmericanEconomic Review vol 80 pp 689ndash713 1990

[7] N Jegadeesh and S Titman ldquoMomentumrdquo Annual Review ofFinancial Economics vol 3 no 1 pp 493ndash509 2011

[8] R S J Koijen J C Rodrıguez and A Sbuelz ldquoMomentumand mean reversion in strategic asset allocationrdquo Manage-ment Science vol 55 no 7 pp 1199ndash1213 2009

[9] M Morrin J Jacoby G V Johar X He A Kuss andD Mazursky ldquoTaking stock of stockbrokers exploring mo-mentum versus contrarian investor strategies and profilesrdquoJournal of Consumer Research vol 29 no 2 pp 188ndash198 2002

[10] J Okunev andDWhite ldquoDomomentum-based strategies stillwork in foreign currency marketsrdquo 8e Journal of Financialand Quantitative Analysis vol 38 no 2 pp 425ndash447 2003

[11] D Schiereck W De Bondt and M Weber ldquoContrarian andmomentum strategies in Germanyrdquo Financial AnalystsJournal vol 55 no 6 pp 104ndash116 1999

[12] D O Cajueiro and B M Tabak ldquoTesting for predictability inequity returns for European transition marketsrdquo EconomicSystems vol 30 no 1 pp 56ndash78 2006

[13] D O Cajueiro and B M Tabak ldquoTesting for time-varyinglong-range dependence in real state equity returnsrdquo ChaosSolitons amp Fractals vol 38 no 1 pp 293ndash307 2008

[14] E J Chang E J A Lima and B M Tabak ldquoTesting forpredictability in emerging equity marketsrdquo Emerging MarketsReview vol 5 no 3 pp 295ndash316 2004

[15] A Sensoy K Ozturk E Hacihasanoglu and B M TabakldquoNot all emerging markets are the same a classification ap-proach with correlation based networksrdquo Journal of FinancialStability vol 33 pp 163ndash186 2017

[16] B M Tabak and E J A Lima ldquoMarket efficiency of Brazilianexchange rate evidence from variance ratio statistics andtechnical trading rulesrdquo European Journal of OperationalResearch vol 194 no 3 pp 814ndash820 2009

[17] C M Boya ldquoFrom efficient markets to adaptive marketsevidence from the French stock exchangerdquo Research in In-ternational Business and Finance vol 49 pp 156ndash165 2019

[18] R Ding and P Cheng ldquoSpeculative trading price pressureand overvaluationrdquo Journal of International Financial Mar-kets Institutions and Money vol 21 no 3 pp 419ndash442 2011

[19] E Lee and N Piqueira ldquoBehavioral biases of informed tradersevidence from insider trading on the 52-week highrdquo Journal ofEmpirical Finance vol 52 pp 56ndash75 2019

[20] T-Y Pak and P Babiarz ldquoDoes cognitive aging affect port-folio choicerdquo Journal of Economic Psychology vol 66pp 1ndash12 2018

Complexity 13

[21] T Suzuki and Y Ohkura ldquoFinancial technical indicator basedon chaotic bagging predictors for adaptive stock selection inJapanese and American marketsrdquo Physica A Statistical Me-chanics and its Applications vol 442 pp 50ndash66 2016

[22] A Urquhart and F McGroarty ldquoAre stock markets reallyefficient evidence of the adaptive market hypothesisrdquo In-ternational Review of Financial Analysis vol 47 pp 39ndash492016

[23] X Xiong Y Meng X Li and D Shen ldquoAn empirical analysisof the adaptive market hypothesis with calendar effects ev-idence from Chinardquo Finance Research Letters vol 31 2019

[24] H Takahashi and T Terano ldquoAnalyzing the influence ofoverconfident investors on financial markets through agent-based modelrdquo in Intelligent Data Engineering and AutomatedLearning-IDEAL 2007 H Yin P Tino E CorchadoW Byrne and X Yao Eds Springer Berlin HeidelbergGermany 2007

[25] B LeBaron ldquoEmpirical regularities from interacting long- andshort-memory investors in an agent-based stock marketrdquoIEEE Transactions on Evolutionary Computation vol 5 no 5pp 442ndash455 2001

[26] M A Bertella F R Pires L Feng and H E StanleyldquoConfidence and the stock market an agent-based approachrdquoPLoS One vol 9 no 1 Article ID e83488 2014

[27] H R Varian ldquoBig data new tricks for econometricsrdquo Journalof Economic Perspectives vol 28 no 2 pp 3ndash28 2014

[28] G James D Witten T Hastie and R Tibshirani An In-troduction to Statistical Learning With Applications in RSpringer Publishing Company Incorporated Switzerland2014

[29] T C Silva and L Zhao Machine Learning in ComplexNetworks Springer Publishing Company IncorporatedSwitzerland 1st edition 2016

[30] T Hastie R Tibshirani and J Friedman 8e Elements ofStatistical Learning Data Mining Inference and PredictionSpringer Berlin Heidelberg Germany 2nd edition 2009

[31] K J Arrow and G Debreu ldquoExistence of an equilibrium for acompetitive economyrdquo Econometrica vol 22 no 3pp 265ndash290 1954

[32] H A Simon Models of Man Social and Rational- Mathe-matical Essays on Rational Human Behavior in a Social Set-ting Wiley Hoboken NJ USA 1957

[33] L Neyse S Bosworth P Ring and U Schmidt ldquoOver-confidence incentives and digit ratiordquo Scientific Reportsvol 6 no 1 2016

[34] M A Lundeberg P W Fox and J Pun coha ldquoHighlyconfident but wrong gender differences and similarities inconfidence judgmentsrdquo Journal of Educational Psychologyvol 86 no 1 pp 114ndash121 1994

[35] O Onishchenko and N Ulku ldquoForeign investor tradingbehavior has evolvedrdquo Journal of Multinational FinancialManagement vol 51 pp 98ndash115 2019

[36] M Abreu ldquoHow biased is the behavior of the individualinvestor in warrantsrdquo Research in International Business andFinance vol 47 pp 139ndash149 2019

[37] J-C Li Y-X Li N-S Tang and D-C Mei ldquo+e roles ofmean residence time on herd behavior in a financial marketrdquoPhysica A Statistical Mechanics and Its Applications vol 462pp 350ndash357 2016

[38] C Liu and X Li ldquoMedia coverage and investor scare behaviordiffusionrdquo Physica A Statistical Mechanics and Its Applica-tions vol 527 p 121398 2019

[39] KW Park S H Jeong and J Y J Oh ldquoForeigners at the gateforeign investor trading and the disposition effect of domestic

individual investorsrdquo 8e North American Journal of Eco-nomics and Finance vol 49 pp 165ndash180 2019

[40] Y Shi Y-r Tang and W Long ldquoSentiment contagionanalysis of interacting investors evidence from Chinarsquos stockforumrdquo Physica A Statistical Mechanics and its Applicationsvol 523 pp 246ndash259 2019

[41] J R Wei J P Huang and P M Hui ldquoAn agent-based modelof stock markets incorporating momentum investorsrdquoPhysica A Statistical Mechanics and its Applications vol 392no 12 pp 2728ndash2735 2013

[42] R J Balvers and Y Wu ldquoMomentum and mean reversionacross national equity marketsrdquo Journal of Empirical Financevol 13 no 1 pp 24ndash48 2006

[43] D B Chaves and V Viswanathan ldquoMomentum and mean-reversion in commodity spot and futures marketsrdquo Journal ofCommodity Markets vol 3 no 1 pp 39ndash53 2016

[44] B M Barber and T Odean ldquoBoys will be boys genderoverconfidence and common stock investmentrdquo 8e Quar-terly Journal of Economics vol 116 no 1 pp 261ndash292 2001

[45] J I PeNtildea ldquoDaily seasonalities and stock market reforms inSpainrdquo Applied Financial Economics vol 5 no 6 pp 419ndash423 1995

[46] T C Silva and L Liang Zhao ldquoNetwork-based high level dataclassificationrdquo IEEE Transactions on Neural Networks andLearning Systems vol 23 no 6 pp 954ndash970 2012

[47] T C Silva and L Liang Zhao ldquoNetwork-based stochasticsemisupervised learningrdquo IEEE Transactions on Neural Net-works and Learning Systems vol 23 no 3 pp 451ndash466 2012

[48] T C Silva and L Liang Zhao ldquoStochastic competitive learningin complex networksrdquo IEEE Transactions on Neural Networksand Learning Systems vol 23 no 3 pp 385ndash398 2012

[49] C M Bishop Pattern Recognition and Machine Learning(Information Science and Statistics) Springer-Verlag BerlinHeidelberg Germany 2006

[50] M A Petersen ldquoEstimating standard errors in finance paneldata sets comparing approachesrdquo Review of Financial Studiesvol 22 no 1 pp 435ndash480 2009

[51] T K Hira and C Loibl ldquoGender differences in investmentbehaviorrdquo in Handbook of Consumer Finance ResearchJ J Xiao Ed Springer New York NY USA 2008

[52] A E Sunden and B J Surette ldquoGender differences in theallocation of assets in retirement savings plansrdquo8eAmericanEconomic Review vol 88 pp 207ndash211 1998

[53] M Grinblatt M Keloharju and J Linnainmaa ldquoIQ and stockmarket participationrdquo 8e Journal of Finance vol 66 no 6pp 2121ndash2164 2011

[54] L Guiso and T Jappelli ldquoAwareness and stock market par-ticipationrdquo Review of Finance vol 9 no 4 pp 537ndash567 2005

[55] N Barberis andW Xiong ldquoWhat drives the disposition effectan analysis of a long-standing preference-based explanationrdquo8e Journal of Finance vol 64 no 2 pp 751ndash784 2009

[56] H Shefrin and M Statman ldquo+e disposition to sell winnerstoo early and ride losers too long theory and evidencerdquo 8eJournal of Finance vol 40 no 3 pp 777ndash790 1985

14 Complexity

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 10: Modeling Investor Behavior Using Machine Learning: Mean … · 2019. 12. 26. · Research Article Modeling Investor Behavior Using Machine Learning: Mean-Reversion and Momentum Trading

term αtlowast connotes time-fixed effects at the year-monthlevel which absorbs any homogeneous time-variant effectsuch as the Brazilian recession or month-wise exchangerate fluctuations Since our panel frequency is on a dailybasis we cannot add a time fixed effect at the same fre-quency because our coefficient of interestmdashβmdashwould getabsorbed by the time fixed effects as it only varies acrosstime To prevent such problem we use a less granular timefixed effects namely month-year

Our data set contains 13247 investors in a large andrepresentative bank in Brazil and 610 time points Due to thisconfiguration we follow Petersen [50] and double-clusterstandard errors at the investor and time levels +is is arobust strategy that is important for panels with a largenumber of individuals and time points because it mitigatesheteroscedasticity and serial correlation Finally our data isin percentage terms

Table 2 reports our estimates of Regression (4) Weobserve that a 1 percent increase of the IBOVESPA indexassociates with an average decrease of 9693 of the investorportfolio volume when we look at the 1-day IBOVESPAvariation +e results remain with a statistically significantcoefficient across different lengths of past IBOVESPA var-iations (2- 3- and 5-day variations) except for 30-dayvariations in which the statistical significance vanishesMoreover the magnitude of the coefficient reduces as we useless recent past variations of the IBOVESPA index which isconsistent with the view that investors in our sample aremore concerned with short-term rather long-term variationsof the IBOVESPA index +e negative and statisticallysignificant sign corroborates the hypothesis that investorsuse mean-reverting trading strategies in which they tend to

sell after substantial upward changes of the IBOVESPAindex and tend to buy after downward changes

42 Does Gender Impact Investors Responsiveness to IBO-VESPA IndexChanges We have showed empirical evidencethat investorsrsquo strategy on average better fit to a mean-reverting behavior in the Brazilian stockmarket+at is theytend to sell after positive changes of the IBOVESPA indexand buy after negative changes In this section we askwhether the sensitiveness of investors to the IBOVESPAindex depends on their biological characteristics in specialtheir gender Biological factorsmdashespecially gendermdashhavebeen extensively explored in investment decision-makingNotable works relating biological factors including genderare Hira and Loibl [51] Lundeberg et al [34] Neyse et al[33] and Sunden and Surette [52] +is paper providesfurther evidence of the existence of such gender gap ininvestment decisions using a microdata on investor-matched buy and sell operations

In this line of research Neyse et al [33] and Lundeberget al [34] partly attribute behavioral differences among malesand females due to systematic changes in overconfidenceExcessive overconfidence is associated with higher levels oftestosterone which is more pronounced in males Over-confidence may induce investors to take on higher risksleading them to look for higher returns in the short term Inthis way we would expect that females be less sensitive tochanges of past IBOVESPA variations as they would valuemore fundamentals and look for yields in the longer term+erefore short-term variations of the IBOVESPA indiceswould explain less their buy or sell operations comparatively

Table 2 Output from Regression (4) We ask how investors respond to changes in the IBOVESPA index We only use changes rather thanpast averages because the former has greater prediction power as reported by our feature selection procedure +e dependent variable is thevariation of portfolio investment volume of investor i at time t in the Brazilian stock market from the beginning of 2016 to the end of 2018Regressors are 1- (1) 2- (2) 3- (3) 5- (4) and 30-day (5) IBOVESPA index variations +e panel is on a daily frequency basis FollowingPetersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010 lowastlowastplt 005 and lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔIBOVESPAt with1-day variation minus 9693lowastlowastlowast

(1580)2-day variation minus 4656lowastlowastlowast

(1160)3-day variation minus 2400lowastlowast

(0964)5-day variation minus 2265lowastlowastlowast

(0852)30-day variation 0058

(0680)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0037 0036 0036 0035 0033Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

10 Complexity

to males To empirically answer this question we constructthe following econometric specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Femalei + ηit(5)

in which Femalei is a dummy variable that takes the value of1 when investor i is female and 0 otherwise We do not addthe investorrsquos gender alone in (5) because it would beabsorbed by the investor fixed effects αi Our coefficient ofinterest is β2 which captures any behavioral deviation offemales to changes of the IBOVESPA index with respect tothe average of the entire sample (male and female) If β2 gt 0then the mean-reversal strategy is less pronounced to fe-males while β2 lt 0 indicates a more accentuated behaviortowards the mean-reversal strategy In the case β2 0 thenfemales and males respond on average equivalently tochanges of the IBOVESPA index Following the discussionon overconfidence and its influence on short-term decisionsover males and females our hypothesis is that β2 gt 0

Table 3 reports our estimates of Regression (5) Ourprevious results relating the mean-reversal strategy of in-vestors in the Brazilian stock market remain the same Weobserve that the interaction of changes in the IBOVESPAindex and the dummy female is positive and statisticallysignificant +is empirical finding corroborates the view thatfemale investors have a less pronounced mean-reversalstrategy than males as they look at longer-term returns andare less attentive to short-term variations of the IBOVESPAindex which could arise due to noisy information Forinstance looking at Specification (1) a 1 percent positivechange in the IBOVESPA index associates with a decrease ofminus 10345 + 6543 minus 3802 of the invested volume of femaleinvestors In contrast the entire sample (males and females)decreases their portfolio volume on average by minus 10345for a 1 percent positive change in the IBOVESPA indexInterestingly even though statistically insignificant 30-dayvariations of the IBOVESPA index are positively associatedwith investment volumes for females suggesting traits of amomentum strategy +is is also suggestive evidence that

Table 3 Output from Regression (5)We ask whether female investors have different sensitiveness with respect to their investment portfolioto IBOVESPA index changes We only use changes rather than past averages because the former has greater prediction power as reported byour feature selection procedure+e dependent variable is the variation of portfolio investment volume of investor i at time t in the Brazilianstock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3) 5- (4) and 30-day (5) IBOVESPA indexvariations as well as their interaction with the investorrsquos gender+e panel is on a daily frequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010 lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔIBOVESPAt with1-day variation minus 10345lowastlowastlowast

(1754)2-day variation minus 5019lowastlowastlowast

(1275)3-day variation minus 2650lowastlowast

(1055)5-day variation minus 2315lowastlowastlowast

(0886)30-day variation minus 0001

(0693)Interactions of ΔIBOVESPAt with gender1-day variation middot Female 6543lowastlowastlowast

(2392)2-day variation middot Female 3708lowastlowastlowast

(1264)3-day variation middot Female 2585lowastlowast

(1212)5-day variation middot Female 0517

(1538)30-day variation middot Female 0572

(0831)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0039 0037 0036 0035 0034Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

Complexity 11

females tend to look at longer horizons when taking in-vestment decisions

43 Does Formal Education Impact Investors Responsivenessto IBOVESPA Index Changes In this section we look at howformal education (academic degree or level of schooling) caninfluence investorsrsquo sensitiveness to IBOVESPA index varia-tions +ere are several works in the behavioral finance lit-erature that attempt to establish a connection between level ofschooling and investorsrsquo awareness of stock markets and theirdecision-making determinants We highlight the researchstudies of Grinblatt et al [53] and Guiso and Jappelli [54] Intheory educated investors should behave in more rationalways and trade less frequently when there is no new relevantinformation arriving in the market but noises +erefore wewould expect these investors to have a smaller reaction to pricefluctuations as they are able to better identify informationfrom noise To empirically test this behavior we run thefollowing specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Higher Educationi + ηit(6)

in which Higher Educationi is a dummy variable that takesthe value of 1 when investor i has a higher education (at leastcollege degree) and 0 otherwise (high school or a lowerdegree) Our coefficient of interest is β2 which captures anybehavioral deviation of investors with higher formal edu-cation to changes of the IBOVESPA index with respect to theaverage of the entire sample +e hypothesis is that β2 lt 0 inwhich more educated investors tend to better discern in-formation from noise out of variations of the IBOVESPAindex and therefore the mean-reversal strategy would be lesspronounced

Table 4 reports our estimates of Regression (6) Onaverage the mean-reversal strategy remains We note thatthe interaction of changes of the IBOVESPA index and thedummy higher education is positive and statistically sig-nificant +is suggests that investors with higher academic

Table 4 +is table reports output from Regression (6) We ask whether investors with higher academic degree have different sensitivenesswith respect to their investment portfolio to IBOVESPA index changes We only use changes rather than past averages because the formerhas greater prediction power as reported by our feature selection procedure +e dependent variable is the variation of portfolio investmentvolume of investor i at time t in the Brazilian stock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3)5- (4) and 30-day (5) IBOVESPA index variations as well as their interaction with the investorrsquos academic degree +e panel is on a dailyfrequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔΔIBOVESPAt with1-day variation minus 10347lowastlowastlowast

(1795)2-day variation minus 5136lowastlowastlowast

(1299)3-day variation minus 2750lowastlowast

(1076)5-day variation minus 2565lowastlowastlowast

(0931)30-day variation 0085

(0695)Interactions of ΔIBOVESPAt with academic degree1-day variation middot Higher education 5347lowastlowastlowast

(1520)2-day variation middot Higher education 3915lowastlowastlowast

(1040)3-day variation middot Higher education 2864lowast

(1573)5-day variation middot Higher education 2471lowast

(1398)30-day variation middot Higher education minus 0237

(0647)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0038 0036 0036 0035 0035Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

12 Complexity

degree have a less pronounced mean-reversal strategy thanless educated investors which corroborates our hypothesisLooking at Specification (3) we observe a positive thoughmarginally significant relationship between IBOVESPAchanges and investment volume (minus 2750 + 2864 0114) formore educated investors suggesting traits of a momentumstrategy

5 Conclusions

We employ machine learning techniques together witheconometrics techniques to model investor behavior using aunique dataset for investors that focus on stock marketinvestments We propose a methodological approach to linkmachine learning methods widely used in computer scienceto standard econometric techniques commonly employed insocial sciences

Using the unique data set with high-frequency dailyinvestment decision of a broad set of investors in Brazil weprovide evidence that investors look at past performance of abenchmark stock index in order to decide their investmentdecisions Investors seem to prefer mean-reverting strategiesin the short-run rather than momentum +is may be as-sociated with the disposition effect - investors prefer to sellthe winners and buy the losers [55 56] Furthermore re-search could exploit alternative explanations for thisbehavior

In addition we study the determinants that either softenor exacerbate the mean-reversal behavior of Brazilian in-vestors by looking at the role of gender and level ofschoolingWe find that females andmore educated investorsare less sensitive to changes of past IBOVESPA variationswhich is consistent with the literature on behavioral finance

+is paper highlights the importance of using non-traditional methods in econometric analysis +e use ofmachine learning methods permits us to automate the oftensubjective process of choosing which variables are importantin any econometric analysis By using a feature selectionschememdashsuch as the elastic net in this papermdashwe are able toidentify those attributes that best describe how investorsdecide to buy or sell their positions in an objective andstatistically correct manner In addition to that the businessspecialist can always assess these variables pointed out asmost important to analyze their economic meaning

Data Availability

+e data is confidential

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+iago C Silva (grant no 4085462018-2) and Benjamin MTabak (grant no 3105412018-2 4251232018-9) gratefullyacknowledge financial support from the CNPq foundation

References

[1] H Zou and T Hastie ldquoRegularization and variable selectionvia the elastic netrdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 67 no 2 pp 301ndash3202005

[2] E F Fama and K R French ldquoPermanent and temporarycomponents of stock pricesrdquo Journal of Political Economyvol 96 no 2 pp 246ndash273 1988

[3] A W Lo and A C MacKinlay ldquoStock market prices do notfollow random walks evidence from a simple specificationtestrdquo Review of Financial Studies vol 1 no 1 pp 41ndash66 1988

[4] J M Poterba and L H Summers ldquoMean reversion in stockpricesrdquo Journal of Financial Economics vol 22 no 1pp 27ndash59 1988

[5] J Conrad and G Kaul ldquoAn anatomy of trading strategiesrdquoReview of Financial Studies vol 11 no 3 pp 489ndash519 1998

[6] C Engel and J D Hamilton ldquoLong swings in the dollar arethey in the data and do markets know itrdquo 8e AmericanEconomic Review vol 80 pp 689ndash713 1990

[7] N Jegadeesh and S Titman ldquoMomentumrdquo Annual Review ofFinancial Economics vol 3 no 1 pp 493ndash509 2011

[8] R S J Koijen J C Rodrıguez and A Sbuelz ldquoMomentumand mean reversion in strategic asset allocationrdquo Manage-ment Science vol 55 no 7 pp 1199ndash1213 2009

[9] M Morrin J Jacoby G V Johar X He A Kuss andD Mazursky ldquoTaking stock of stockbrokers exploring mo-mentum versus contrarian investor strategies and profilesrdquoJournal of Consumer Research vol 29 no 2 pp 188ndash198 2002

[10] J Okunev andDWhite ldquoDomomentum-based strategies stillwork in foreign currency marketsrdquo 8e Journal of Financialand Quantitative Analysis vol 38 no 2 pp 425ndash447 2003

[11] D Schiereck W De Bondt and M Weber ldquoContrarian andmomentum strategies in Germanyrdquo Financial AnalystsJournal vol 55 no 6 pp 104ndash116 1999

[12] D O Cajueiro and B M Tabak ldquoTesting for predictability inequity returns for European transition marketsrdquo EconomicSystems vol 30 no 1 pp 56ndash78 2006

[13] D O Cajueiro and B M Tabak ldquoTesting for time-varyinglong-range dependence in real state equity returnsrdquo ChaosSolitons amp Fractals vol 38 no 1 pp 293ndash307 2008

[14] E J Chang E J A Lima and B M Tabak ldquoTesting forpredictability in emerging equity marketsrdquo Emerging MarketsReview vol 5 no 3 pp 295ndash316 2004

[15] A Sensoy K Ozturk E Hacihasanoglu and B M TabakldquoNot all emerging markets are the same a classification ap-proach with correlation based networksrdquo Journal of FinancialStability vol 33 pp 163ndash186 2017

[16] B M Tabak and E J A Lima ldquoMarket efficiency of Brazilianexchange rate evidence from variance ratio statistics andtechnical trading rulesrdquo European Journal of OperationalResearch vol 194 no 3 pp 814ndash820 2009

[17] C M Boya ldquoFrom efficient markets to adaptive marketsevidence from the French stock exchangerdquo Research in In-ternational Business and Finance vol 49 pp 156ndash165 2019

[18] R Ding and P Cheng ldquoSpeculative trading price pressureand overvaluationrdquo Journal of International Financial Mar-kets Institutions and Money vol 21 no 3 pp 419ndash442 2011

[19] E Lee and N Piqueira ldquoBehavioral biases of informed tradersevidence from insider trading on the 52-week highrdquo Journal ofEmpirical Finance vol 52 pp 56ndash75 2019

[20] T-Y Pak and P Babiarz ldquoDoes cognitive aging affect port-folio choicerdquo Journal of Economic Psychology vol 66pp 1ndash12 2018

Complexity 13

[21] T Suzuki and Y Ohkura ldquoFinancial technical indicator basedon chaotic bagging predictors for adaptive stock selection inJapanese and American marketsrdquo Physica A Statistical Me-chanics and its Applications vol 442 pp 50ndash66 2016

[22] A Urquhart and F McGroarty ldquoAre stock markets reallyefficient evidence of the adaptive market hypothesisrdquo In-ternational Review of Financial Analysis vol 47 pp 39ndash492016

[23] X Xiong Y Meng X Li and D Shen ldquoAn empirical analysisof the adaptive market hypothesis with calendar effects ev-idence from Chinardquo Finance Research Letters vol 31 2019

[24] H Takahashi and T Terano ldquoAnalyzing the influence ofoverconfident investors on financial markets through agent-based modelrdquo in Intelligent Data Engineering and AutomatedLearning-IDEAL 2007 H Yin P Tino E CorchadoW Byrne and X Yao Eds Springer Berlin HeidelbergGermany 2007

[25] B LeBaron ldquoEmpirical regularities from interacting long- andshort-memory investors in an agent-based stock marketrdquoIEEE Transactions on Evolutionary Computation vol 5 no 5pp 442ndash455 2001

[26] M A Bertella F R Pires L Feng and H E StanleyldquoConfidence and the stock market an agent-based approachrdquoPLoS One vol 9 no 1 Article ID e83488 2014

[27] H R Varian ldquoBig data new tricks for econometricsrdquo Journalof Economic Perspectives vol 28 no 2 pp 3ndash28 2014

[28] G James D Witten T Hastie and R Tibshirani An In-troduction to Statistical Learning With Applications in RSpringer Publishing Company Incorporated Switzerland2014

[29] T C Silva and L Zhao Machine Learning in ComplexNetworks Springer Publishing Company IncorporatedSwitzerland 1st edition 2016

[30] T Hastie R Tibshirani and J Friedman 8e Elements ofStatistical Learning Data Mining Inference and PredictionSpringer Berlin Heidelberg Germany 2nd edition 2009

[31] K J Arrow and G Debreu ldquoExistence of an equilibrium for acompetitive economyrdquo Econometrica vol 22 no 3pp 265ndash290 1954

[32] H A Simon Models of Man Social and Rational- Mathe-matical Essays on Rational Human Behavior in a Social Set-ting Wiley Hoboken NJ USA 1957

[33] L Neyse S Bosworth P Ring and U Schmidt ldquoOver-confidence incentives and digit ratiordquo Scientific Reportsvol 6 no 1 2016

[34] M A Lundeberg P W Fox and J Pun coha ldquoHighlyconfident but wrong gender differences and similarities inconfidence judgmentsrdquo Journal of Educational Psychologyvol 86 no 1 pp 114ndash121 1994

[35] O Onishchenko and N Ulku ldquoForeign investor tradingbehavior has evolvedrdquo Journal of Multinational FinancialManagement vol 51 pp 98ndash115 2019

[36] M Abreu ldquoHow biased is the behavior of the individualinvestor in warrantsrdquo Research in International Business andFinance vol 47 pp 139ndash149 2019

[37] J-C Li Y-X Li N-S Tang and D-C Mei ldquo+e roles ofmean residence time on herd behavior in a financial marketrdquoPhysica A Statistical Mechanics and Its Applications vol 462pp 350ndash357 2016

[38] C Liu and X Li ldquoMedia coverage and investor scare behaviordiffusionrdquo Physica A Statistical Mechanics and Its Applica-tions vol 527 p 121398 2019

[39] KW Park S H Jeong and J Y J Oh ldquoForeigners at the gateforeign investor trading and the disposition effect of domestic

individual investorsrdquo 8e North American Journal of Eco-nomics and Finance vol 49 pp 165ndash180 2019

[40] Y Shi Y-r Tang and W Long ldquoSentiment contagionanalysis of interacting investors evidence from Chinarsquos stockforumrdquo Physica A Statistical Mechanics and its Applicationsvol 523 pp 246ndash259 2019

[41] J R Wei J P Huang and P M Hui ldquoAn agent-based modelof stock markets incorporating momentum investorsrdquoPhysica A Statistical Mechanics and its Applications vol 392no 12 pp 2728ndash2735 2013

[42] R J Balvers and Y Wu ldquoMomentum and mean reversionacross national equity marketsrdquo Journal of Empirical Financevol 13 no 1 pp 24ndash48 2006

[43] D B Chaves and V Viswanathan ldquoMomentum and mean-reversion in commodity spot and futures marketsrdquo Journal ofCommodity Markets vol 3 no 1 pp 39ndash53 2016

[44] B M Barber and T Odean ldquoBoys will be boys genderoverconfidence and common stock investmentrdquo 8e Quar-terly Journal of Economics vol 116 no 1 pp 261ndash292 2001

[45] J I PeNtildea ldquoDaily seasonalities and stock market reforms inSpainrdquo Applied Financial Economics vol 5 no 6 pp 419ndash423 1995

[46] T C Silva and L Liang Zhao ldquoNetwork-based high level dataclassificationrdquo IEEE Transactions on Neural Networks andLearning Systems vol 23 no 6 pp 954ndash970 2012

[47] T C Silva and L Liang Zhao ldquoNetwork-based stochasticsemisupervised learningrdquo IEEE Transactions on Neural Net-works and Learning Systems vol 23 no 3 pp 451ndash466 2012

[48] T C Silva and L Liang Zhao ldquoStochastic competitive learningin complex networksrdquo IEEE Transactions on Neural Networksand Learning Systems vol 23 no 3 pp 385ndash398 2012

[49] C M Bishop Pattern Recognition and Machine Learning(Information Science and Statistics) Springer-Verlag BerlinHeidelberg Germany 2006

[50] M A Petersen ldquoEstimating standard errors in finance paneldata sets comparing approachesrdquo Review of Financial Studiesvol 22 no 1 pp 435ndash480 2009

[51] T K Hira and C Loibl ldquoGender differences in investmentbehaviorrdquo in Handbook of Consumer Finance ResearchJ J Xiao Ed Springer New York NY USA 2008

[52] A E Sunden and B J Surette ldquoGender differences in theallocation of assets in retirement savings plansrdquo8eAmericanEconomic Review vol 88 pp 207ndash211 1998

[53] M Grinblatt M Keloharju and J Linnainmaa ldquoIQ and stockmarket participationrdquo 8e Journal of Finance vol 66 no 6pp 2121ndash2164 2011

[54] L Guiso and T Jappelli ldquoAwareness and stock market par-ticipationrdquo Review of Finance vol 9 no 4 pp 537ndash567 2005

[55] N Barberis andW Xiong ldquoWhat drives the disposition effectan analysis of a long-standing preference-based explanationrdquo8e Journal of Finance vol 64 no 2 pp 751ndash784 2009

[56] H Shefrin and M Statman ldquo+e disposition to sell winnerstoo early and ride losers too long theory and evidencerdquo 8eJournal of Finance vol 40 no 3 pp 777ndash790 1985

14 Complexity

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 11: Modeling Investor Behavior Using Machine Learning: Mean … · 2019. 12. 26. · Research Article Modeling Investor Behavior Using Machine Learning: Mean-Reversion and Momentum Trading

to males To empirically answer this question we constructthe following econometric specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Femalei + ηit(5)

in which Femalei is a dummy variable that takes the value of1 when investor i is female and 0 otherwise We do not addthe investorrsquos gender alone in (5) because it would beabsorbed by the investor fixed effects αi Our coefficient ofinterest is β2 which captures any behavioral deviation offemales to changes of the IBOVESPA index with respect tothe average of the entire sample (male and female) If β2 gt 0then the mean-reversal strategy is less pronounced to fe-males while β2 lt 0 indicates a more accentuated behaviortowards the mean-reversal strategy In the case β2 0 thenfemales and males respond on average equivalently tochanges of the IBOVESPA index Following the discussionon overconfidence and its influence on short-term decisionsover males and females our hypothesis is that β2 gt 0

Table 3 reports our estimates of Regression (5) Ourprevious results relating the mean-reversal strategy of in-vestors in the Brazilian stock market remain the same Weobserve that the interaction of changes in the IBOVESPAindex and the dummy female is positive and statisticallysignificant +is empirical finding corroborates the view thatfemale investors have a less pronounced mean-reversalstrategy than males as they look at longer-term returns andare less attentive to short-term variations of the IBOVESPAindex which could arise due to noisy information Forinstance looking at Specification (1) a 1 percent positivechange in the IBOVESPA index associates with a decrease ofminus 10345 + 6543 minus 3802 of the invested volume of femaleinvestors In contrast the entire sample (males and females)decreases their portfolio volume on average by minus 10345for a 1 percent positive change in the IBOVESPA indexInterestingly even though statistically insignificant 30-dayvariations of the IBOVESPA index are positively associatedwith investment volumes for females suggesting traits of amomentum strategy +is is also suggestive evidence that

Table 3 Output from Regression (5)We ask whether female investors have different sensitiveness with respect to their investment portfolioto IBOVESPA index changes We only use changes rather than past averages because the former has greater prediction power as reported byour feature selection procedure+e dependent variable is the variation of portfolio investment volume of investor i at time t in the Brazilianstock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3) 5- (4) and 30-day (5) IBOVESPA indexvariations as well as their interaction with the investorrsquos gender+e panel is on a daily frequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010 lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔIBOVESPAt with1-day variation minus 10345lowastlowastlowast

(1754)2-day variation minus 5019lowastlowastlowast

(1275)3-day variation minus 2650lowastlowast

(1055)5-day variation minus 2315lowastlowastlowast

(0886)30-day variation minus 0001

(0693)Interactions of ΔIBOVESPAt with gender1-day variation middot Female 6543lowastlowastlowast

(2392)2-day variation middot Female 3708lowastlowastlowast

(1264)3-day variation middot Female 2585lowastlowast

(1212)5-day variation middot Female 0517

(1538)30-day variation middot Female 0572

(0831)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0039 0037 0036 0035 0034Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

Complexity 11

females tend to look at longer horizons when taking in-vestment decisions

43 Does Formal Education Impact Investors Responsivenessto IBOVESPA Index Changes In this section we look at howformal education (academic degree or level of schooling) caninfluence investorsrsquo sensitiveness to IBOVESPA index varia-tions +ere are several works in the behavioral finance lit-erature that attempt to establish a connection between level ofschooling and investorsrsquo awareness of stock markets and theirdecision-making determinants We highlight the researchstudies of Grinblatt et al [53] and Guiso and Jappelli [54] Intheory educated investors should behave in more rationalways and trade less frequently when there is no new relevantinformation arriving in the market but noises +erefore wewould expect these investors to have a smaller reaction to pricefluctuations as they are able to better identify informationfrom noise To empirically test this behavior we run thefollowing specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Higher Educationi + ηit(6)

in which Higher Educationi is a dummy variable that takesthe value of 1 when investor i has a higher education (at leastcollege degree) and 0 otherwise (high school or a lowerdegree) Our coefficient of interest is β2 which captures anybehavioral deviation of investors with higher formal edu-cation to changes of the IBOVESPA index with respect to theaverage of the entire sample +e hypothesis is that β2 lt 0 inwhich more educated investors tend to better discern in-formation from noise out of variations of the IBOVESPAindex and therefore the mean-reversal strategy would be lesspronounced

Table 4 reports our estimates of Regression (6) Onaverage the mean-reversal strategy remains We note thatthe interaction of changes of the IBOVESPA index and thedummy higher education is positive and statistically sig-nificant +is suggests that investors with higher academic

Table 4 +is table reports output from Regression (6) We ask whether investors with higher academic degree have different sensitivenesswith respect to their investment portfolio to IBOVESPA index changes We only use changes rather than past averages because the formerhas greater prediction power as reported by our feature selection procedure +e dependent variable is the variation of portfolio investmentvolume of investor i at time t in the Brazilian stock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3)5- (4) and 30-day (5) IBOVESPA index variations as well as their interaction with the investorrsquos academic degree +e panel is on a dailyfrequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔΔIBOVESPAt with1-day variation minus 10347lowastlowastlowast

(1795)2-day variation minus 5136lowastlowastlowast

(1299)3-day variation minus 2750lowastlowast

(1076)5-day variation minus 2565lowastlowastlowast

(0931)30-day variation 0085

(0695)Interactions of ΔIBOVESPAt with academic degree1-day variation middot Higher education 5347lowastlowastlowast

(1520)2-day variation middot Higher education 3915lowastlowastlowast

(1040)3-day variation middot Higher education 2864lowast

(1573)5-day variation middot Higher education 2471lowast

(1398)30-day variation middot Higher education minus 0237

(0647)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0038 0036 0036 0035 0035Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

12 Complexity

degree have a less pronounced mean-reversal strategy thanless educated investors which corroborates our hypothesisLooking at Specification (3) we observe a positive thoughmarginally significant relationship between IBOVESPAchanges and investment volume (minus 2750 + 2864 0114) formore educated investors suggesting traits of a momentumstrategy

5 Conclusions

We employ machine learning techniques together witheconometrics techniques to model investor behavior using aunique dataset for investors that focus on stock marketinvestments We propose a methodological approach to linkmachine learning methods widely used in computer scienceto standard econometric techniques commonly employed insocial sciences

Using the unique data set with high-frequency dailyinvestment decision of a broad set of investors in Brazil weprovide evidence that investors look at past performance of abenchmark stock index in order to decide their investmentdecisions Investors seem to prefer mean-reverting strategiesin the short-run rather than momentum +is may be as-sociated with the disposition effect - investors prefer to sellthe winners and buy the losers [55 56] Furthermore re-search could exploit alternative explanations for thisbehavior

In addition we study the determinants that either softenor exacerbate the mean-reversal behavior of Brazilian in-vestors by looking at the role of gender and level ofschoolingWe find that females andmore educated investorsare less sensitive to changes of past IBOVESPA variationswhich is consistent with the literature on behavioral finance

+is paper highlights the importance of using non-traditional methods in econometric analysis +e use ofmachine learning methods permits us to automate the oftensubjective process of choosing which variables are importantin any econometric analysis By using a feature selectionschememdashsuch as the elastic net in this papermdashwe are able toidentify those attributes that best describe how investorsdecide to buy or sell their positions in an objective andstatistically correct manner In addition to that the businessspecialist can always assess these variables pointed out asmost important to analyze their economic meaning

Data Availability

+e data is confidential

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+iago C Silva (grant no 4085462018-2) and Benjamin MTabak (grant no 3105412018-2 4251232018-9) gratefullyacknowledge financial support from the CNPq foundation

References

[1] H Zou and T Hastie ldquoRegularization and variable selectionvia the elastic netrdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 67 no 2 pp 301ndash3202005

[2] E F Fama and K R French ldquoPermanent and temporarycomponents of stock pricesrdquo Journal of Political Economyvol 96 no 2 pp 246ndash273 1988

[3] A W Lo and A C MacKinlay ldquoStock market prices do notfollow random walks evidence from a simple specificationtestrdquo Review of Financial Studies vol 1 no 1 pp 41ndash66 1988

[4] J M Poterba and L H Summers ldquoMean reversion in stockpricesrdquo Journal of Financial Economics vol 22 no 1pp 27ndash59 1988

[5] J Conrad and G Kaul ldquoAn anatomy of trading strategiesrdquoReview of Financial Studies vol 11 no 3 pp 489ndash519 1998

[6] C Engel and J D Hamilton ldquoLong swings in the dollar arethey in the data and do markets know itrdquo 8e AmericanEconomic Review vol 80 pp 689ndash713 1990

[7] N Jegadeesh and S Titman ldquoMomentumrdquo Annual Review ofFinancial Economics vol 3 no 1 pp 493ndash509 2011

[8] R S J Koijen J C Rodrıguez and A Sbuelz ldquoMomentumand mean reversion in strategic asset allocationrdquo Manage-ment Science vol 55 no 7 pp 1199ndash1213 2009

[9] M Morrin J Jacoby G V Johar X He A Kuss andD Mazursky ldquoTaking stock of stockbrokers exploring mo-mentum versus contrarian investor strategies and profilesrdquoJournal of Consumer Research vol 29 no 2 pp 188ndash198 2002

[10] J Okunev andDWhite ldquoDomomentum-based strategies stillwork in foreign currency marketsrdquo 8e Journal of Financialand Quantitative Analysis vol 38 no 2 pp 425ndash447 2003

[11] D Schiereck W De Bondt and M Weber ldquoContrarian andmomentum strategies in Germanyrdquo Financial AnalystsJournal vol 55 no 6 pp 104ndash116 1999

[12] D O Cajueiro and B M Tabak ldquoTesting for predictability inequity returns for European transition marketsrdquo EconomicSystems vol 30 no 1 pp 56ndash78 2006

[13] D O Cajueiro and B M Tabak ldquoTesting for time-varyinglong-range dependence in real state equity returnsrdquo ChaosSolitons amp Fractals vol 38 no 1 pp 293ndash307 2008

[14] E J Chang E J A Lima and B M Tabak ldquoTesting forpredictability in emerging equity marketsrdquo Emerging MarketsReview vol 5 no 3 pp 295ndash316 2004

[15] A Sensoy K Ozturk E Hacihasanoglu and B M TabakldquoNot all emerging markets are the same a classification ap-proach with correlation based networksrdquo Journal of FinancialStability vol 33 pp 163ndash186 2017

[16] B M Tabak and E J A Lima ldquoMarket efficiency of Brazilianexchange rate evidence from variance ratio statistics andtechnical trading rulesrdquo European Journal of OperationalResearch vol 194 no 3 pp 814ndash820 2009

[17] C M Boya ldquoFrom efficient markets to adaptive marketsevidence from the French stock exchangerdquo Research in In-ternational Business and Finance vol 49 pp 156ndash165 2019

[18] R Ding and P Cheng ldquoSpeculative trading price pressureand overvaluationrdquo Journal of International Financial Mar-kets Institutions and Money vol 21 no 3 pp 419ndash442 2011

[19] E Lee and N Piqueira ldquoBehavioral biases of informed tradersevidence from insider trading on the 52-week highrdquo Journal ofEmpirical Finance vol 52 pp 56ndash75 2019

[20] T-Y Pak and P Babiarz ldquoDoes cognitive aging affect port-folio choicerdquo Journal of Economic Psychology vol 66pp 1ndash12 2018

Complexity 13

[21] T Suzuki and Y Ohkura ldquoFinancial technical indicator basedon chaotic bagging predictors for adaptive stock selection inJapanese and American marketsrdquo Physica A Statistical Me-chanics and its Applications vol 442 pp 50ndash66 2016

[22] A Urquhart and F McGroarty ldquoAre stock markets reallyefficient evidence of the adaptive market hypothesisrdquo In-ternational Review of Financial Analysis vol 47 pp 39ndash492016

[23] X Xiong Y Meng X Li and D Shen ldquoAn empirical analysisof the adaptive market hypothesis with calendar effects ev-idence from Chinardquo Finance Research Letters vol 31 2019

[24] H Takahashi and T Terano ldquoAnalyzing the influence ofoverconfident investors on financial markets through agent-based modelrdquo in Intelligent Data Engineering and AutomatedLearning-IDEAL 2007 H Yin P Tino E CorchadoW Byrne and X Yao Eds Springer Berlin HeidelbergGermany 2007

[25] B LeBaron ldquoEmpirical regularities from interacting long- andshort-memory investors in an agent-based stock marketrdquoIEEE Transactions on Evolutionary Computation vol 5 no 5pp 442ndash455 2001

[26] M A Bertella F R Pires L Feng and H E StanleyldquoConfidence and the stock market an agent-based approachrdquoPLoS One vol 9 no 1 Article ID e83488 2014

[27] H R Varian ldquoBig data new tricks for econometricsrdquo Journalof Economic Perspectives vol 28 no 2 pp 3ndash28 2014

[28] G James D Witten T Hastie and R Tibshirani An In-troduction to Statistical Learning With Applications in RSpringer Publishing Company Incorporated Switzerland2014

[29] T C Silva and L Zhao Machine Learning in ComplexNetworks Springer Publishing Company IncorporatedSwitzerland 1st edition 2016

[30] T Hastie R Tibshirani and J Friedman 8e Elements ofStatistical Learning Data Mining Inference and PredictionSpringer Berlin Heidelberg Germany 2nd edition 2009

[31] K J Arrow and G Debreu ldquoExistence of an equilibrium for acompetitive economyrdquo Econometrica vol 22 no 3pp 265ndash290 1954

[32] H A Simon Models of Man Social and Rational- Mathe-matical Essays on Rational Human Behavior in a Social Set-ting Wiley Hoboken NJ USA 1957

[33] L Neyse S Bosworth P Ring and U Schmidt ldquoOver-confidence incentives and digit ratiordquo Scientific Reportsvol 6 no 1 2016

[34] M A Lundeberg P W Fox and J Pun coha ldquoHighlyconfident but wrong gender differences and similarities inconfidence judgmentsrdquo Journal of Educational Psychologyvol 86 no 1 pp 114ndash121 1994

[35] O Onishchenko and N Ulku ldquoForeign investor tradingbehavior has evolvedrdquo Journal of Multinational FinancialManagement vol 51 pp 98ndash115 2019

[36] M Abreu ldquoHow biased is the behavior of the individualinvestor in warrantsrdquo Research in International Business andFinance vol 47 pp 139ndash149 2019

[37] J-C Li Y-X Li N-S Tang and D-C Mei ldquo+e roles ofmean residence time on herd behavior in a financial marketrdquoPhysica A Statistical Mechanics and Its Applications vol 462pp 350ndash357 2016

[38] C Liu and X Li ldquoMedia coverage and investor scare behaviordiffusionrdquo Physica A Statistical Mechanics and Its Applica-tions vol 527 p 121398 2019

[39] KW Park S H Jeong and J Y J Oh ldquoForeigners at the gateforeign investor trading and the disposition effect of domestic

individual investorsrdquo 8e North American Journal of Eco-nomics and Finance vol 49 pp 165ndash180 2019

[40] Y Shi Y-r Tang and W Long ldquoSentiment contagionanalysis of interacting investors evidence from Chinarsquos stockforumrdquo Physica A Statistical Mechanics and its Applicationsvol 523 pp 246ndash259 2019

[41] J R Wei J P Huang and P M Hui ldquoAn agent-based modelof stock markets incorporating momentum investorsrdquoPhysica A Statistical Mechanics and its Applications vol 392no 12 pp 2728ndash2735 2013

[42] R J Balvers and Y Wu ldquoMomentum and mean reversionacross national equity marketsrdquo Journal of Empirical Financevol 13 no 1 pp 24ndash48 2006

[43] D B Chaves and V Viswanathan ldquoMomentum and mean-reversion in commodity spot and futures marketsrdquo Journal ofCommodity Markets vol 3 no 1 pp 39ndash53 2016

[44] B M Barber and T Odean ldquoBoys will be boys genderoverconfidence and common stock investmentrdquo 8e Quar-terly Journal of Economics vol 116 no 1 pp 261ndash292 2001

[45] J I PeNtildea ldquoDaily seasonalities and stock market reforms inSpainrdquo Applied Financial Economics vol 5 no 6 pp 419ndash423 1995

[46] T C Silva and L Liang Zhao ldquoNetwork-based high level dataclassificationrdquo IEEE Transactions on Neural Networks andLearning Systems vol 23 no 6 pp 954ndash970 2012

[47] T C Silva and L Liang Zhao ldquoNetwork-based stochasticsemisupervised learningrdquo IEEE Transactions on Neural Net-works and Learning Systems vol 23 no 3 pp 451ndash466 2012

[48] T C Silva and L Liang Zhao ldquoStochastic competitive learningin complex networksrdquo IEEE Transactions on Neural Networksand Learning Systems vol 23 no 3 pp 385ndash398 2012

[49] C M Bishop Pattern Recognition and Machine Learning(Information Science and Statistics) Springer-Verlag BerlinHeidelberg Germany 2006

[50] M A Petersen ldquoEstimating standard errors in finance paneldata sets comparing approachesrdquo Review of Financial Studiesvol 22 no 1 pp 435ndash480 2009

[51] T K Hira and C Loibl ldquoGender differences in investmentbehaviorrdquo in Handbook of Consumer Finance ResearchJ J Xiao Ed Springer New York NY USA 2008

[52] A E Sunden and B J Surette ldquoGender differences in theallocation of assets in retirement savings plansrdquo8eAmericanEconomic Review vol 88 pp 207ndash211 1998

[53] M Grinblatt M Keloharju and J Linnainmaa ldquoIQ and stockmarket participationrdquo 8e Journal of Finance vol 66 no 6pp 2121ndash2164 2011

[54] L Guiso and T Jappelli ldquoAwareness and stock market par-ticipationrdquo Review of Finance vol 9 no 4 pp 537ndash567 2005

[55] N Barberis andW Xiong ldquoWhat drives the disposition effectan analysis of a long-standing preference-based explanationrdquo8e Journal of Finance vol 64 no 2 pp 751ndash784 2009

[56] H Shefrin and M Statman ldquo+e disposition to sell winnerstoo early and ride losers too long theory and evidencerdquo 8eJournal of Finance vol 40 no 3 pp 777ndash790 1985

14 Complexity

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 12: Modeling Investor Behavior Using Machine Learning: Mean … · 2019. 12. 26. · Research Article Modeling Investor Behavior Using Machine Learning: Mean-Reversion and Momentum Trading

females tend to look at longer horizons when taking in-vestment decisions

43 Does Formal Education Impact Investors Responsivenessto IBOVESPA Index Changes In this section we look at howformal education (academic degree or level of schooling) caninfluence investorsrsquo sensitiveness to IBOVESPA index varia-tions +ere are several works in the behavioral finance lit-erature that attempt to establish a connection between level ofschooling and investorsrsquo awareness of stock markets and theirdecision-making determinants We highlight the researchstudies of Grinblatt et al [53] and Guiso and Jappelli [54] Intheory educated investors should behave in more rationalways and trade less frequently when there is no new relevantinformation arriving in the market but noises +erefore wewould expect these investors to have a smaller reaction to pricefluctuations as they are able to better identify informationfrom noise To empirically test this behavior we run thefollowing specification

Δyit αi + αtlowast + β1ΔIBOVESPAt + β2ΔIBOVESPAt

times Higher Educationi + ηit(6)

in which Higher Educationi is a dummy variable that takesthe value of 1 when investor i has a higher education (at leastcollege degree) and 0 otherwise (high school or a lowerdegree) Our coefficient of interest is β2 which captures anybehavioral deviation of investors with higher formal edu-cation to changes of the IBOVESPA index with respect to theaverage of the entire sample +e hypothesis is that β2 lt 0 inwhich more educated investors tend to better discern in-formation from noise out of variations of the IBOVESPAindex and therefore the mean-reversal strategy would be lesspronounced

Table 4 reports our estimates of Regression (6) Onaverage the mean-reversal strategy remains We note thatthe interaction of changes of the IBOVESPA index and thedummy higher education is positive and statistically sig-nificant +is suggests that investors with higher academic

Table 4 +is table reports output from Regression (6) We ask whether investors with higher academic degree have different sensitivenesswith respect to their investment portfolio to IBOVESPA index changes We only use changes rather than past averages because the formerhas greater prediction power as reported by our feature selection procedure +e dependent variable is the variation of portfolio investmentvolume of investor i at time t in the Brazilian stock market from the beginning of 2016 to the end of 2018 Regressors are 1- (1) 2- (2) 3- (3)5- (4) and 30-day (5) IBOVESPA index variations as well as their interaction with the investorrsquos academic degree +e panel is on a dailyfrequency basis Following Petersen [50] we double-cluster standard errors at the investor and time levels Significance levels lowastplt 010lowastlowastplt 005 lowastlowastlowastplt 001

Dependent variableInvestor portfolio volume variation (Δyit)

(1) (2) (3) (4) (5)

Regressor ΔΔIBOVESPAt with1-day variation minus 10347lowastlowastlowast

(1795)2-day variation minus 5136lowastlowastlowast

(1299)3-day variation minus 2750lowastlowast

(1076)5-day variation minus 2565lowastlowastlowast

(0931)30-day variation 0085

(0695)Interactions of ΔIBOVESPAt with academic degree1-day variation middot Higher education 5347lowastlowastlowast

(1520)2-day variation middot Higher education 3915lowastlowastlowast

(1040)3-day variation middot Higher education 2864lowast

(1573)5-day variation middot Higher education 2471lowast

(1398)30-day variation middot Higher education minus 0237

(0647)Fixed effectsInvestor Yes Yes Yes Yes YesMonth-year Yes Yes Yes Yes YesObservations 356172 355796 355419 354588 343592R2 0038 0036 0036 0035 0035Error clustering Investor Investor Investor Investor Investor

Time Time Time Time Time

12 Complexity

degree have a less pronounced mean-reversal strategy thanless educated investors which corroborates our hypothesisLooking at Specification (3) we observe a positive thoughmarginally significant relationship between IBOVESPAchanges and investment volume (minus 2750 + 2864 0114) formore educated investors suggesting traits of a momentumstrategy

5 Conclusions

We employ machine learning techniques together witheconometrics techniques to model investor behavior using aunique dataset for investors that focus on stock marketinvestments We propose a methodological approach to linkmachine learning methods widely used in computer scienceto standard econometric techniques commonly employed insocial sciences

Using the unique data set with high-frequency dailyinvestment decision of a broad set of investors in Brazil weprovide evidence that investors look at past performance of abenchmark stock index in order to decide their investmentdecisions Investors seem to prefer mean-reverting strategiesin the short-run rather than momentum +is may be as-sociated with the disposition effect - investors prefer to sellthe winners and buy the losers [55 56] Furthermore re-search could exploit alternative explanations for thisbehavior

In addition we study the determinants that either softenor exacerbate the mean-reversal behavior of Brazilian in-vestors by looking at the role of gender and level ofschoolingWe find that females andmore educated investorsare less sensitive to changes of past IBOVESPA variationswhich is consistent with the literature on behavioral finance

+is paper highlights the importance of using non-traditional methods in econometric analysis +e use ofmachine learning methods permits us to automate the oftensubjective process of choosing which variables are importantin any econometric analysis By using a feature selectionschememdashsuch as the elastic net in this papermdashwe are able toidentify those attributes that best describe how investorsdecide to buy or sell their positions in an objective andstatistically correct manner In addition to that the businessspecialist can always assess these variables pointed out asmost important to analyze their economic meaning

Data Availability

+e data is confidential

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+iago C Silva (grant no 4085462018-2) and Benjamin MTabak (grant no 3105412018-2 4251232018-9) gratefullyacknowledge financial support from the CNPq foundation

References

[1] H Zou and T Hastie ldquoRegularization and variable selectionvia the elastic netrdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 67 no 2 pp 301ndash3202005

[2] E F Fama and K R French ldquoPermanent and temporarycomponents of stock pricesrdquo Journal of Political Economyvol 96 no 2 pp 246ndash273 1988

[3] A W Lo and A C MacKinlay ldquoStock market prices do notfollow random walks evidence from a simple specificationtestrdquo Review of Financial Studies vol 1 no 1 pp 41ndash66 1988

[4] J M Poterba and L H Summers ldquoMean reversion in stockpricesrdquo Journal of Financial Economics vol 22 no 1pp 27ndash59 1988

[5] J Conrad and G Kaul ldquoAn anatomy of trading strategiesrdquoReview of Financial Studies vol 11 no 3 pp 489ndash519 1998

[6] C Engel and J D Hamilton ldquoLong swings in the dollar arethey in the data and do markets know itrdquo 8e AmericanEconomic Review vol 80 pp 689ndash713 1990

[7] N Jegadeesh and S Titman ldquoMomentumrdquo Annual Review ofFinancial Economics vol 3 no 1 pp 493ndash509 2011

[8] R S J Koijen J C Rodrıguez and A Sbuelz ldquoMomentumand mean reversion in strategic asset allocationrdquo Manage-ment Science vol 55 no 7 pp 1199ndash1213 2009

[9] M Morrin J Jacoby G V Johar X He A Kuss andD Mazursky ldquoTaking stock of stockbrokers exploring mo-mentum versus contrarian investor strategies and profilesrdquoJournal of Consumer Research vol 29 no 2 pp 188ndash198 2002

[10] J Okunev andDWhite ldquoDomomentum-based strategies stillwork in foreign currency marketsrdquo 8e Journal of Financialand Quantitative Analysis vol 38 no 2 pp 425ndash447 2003

[11] D Schiereck W De Bondt and M Weber ldquoContrarian andmomentum strategies in Germanyrdquo Financial AnalystsJournal vol 55 no 6 pp 104ndash116 1999

[12] D O Cajueiro and B M Tabak ldquoTesting for predictability inequity returns for European transition marketsrdquo EconomicSystems vol 30 no 1 pp 56ndash78 2006

[13] D O Cajueiro and B M Tabak ldquoTesting for time-varyinglong-range dependence in real state equity returnsrdquo ChaosSolitons amp Fractals vol 38 no 1 pp 293ndash307 2008

[14] E J Chang E J A Lima and B M Tabak ldquoTesting forpredictability in emerging equity marketsrdquo Emerging MarketsReview vol 5 no 3 pp 295ndash316 2004

[15] A Sensoy K Ozturk E Hacihasanoglu and B M TabakldquoNot all emerging markets are the same a classification ap-proach with correlation based networksrdquo Journal of FinancialStability vol 33 pp 163ndash186 2017

[16] B M Tabak and E J A Lima ldquoMarket efficiency of Brazilianexchange rate evidence from variance ratio statistics andtechnical trading rulesrdquo European Journal of OperationalResearch vol 194 no 3 pp 814ndash820 2009

[17] C M Boya ldquoFrom efficient markets to adaptive marketsevidence from the French stock exchangerdquo Research in In-ternational Business and Finance vol 49 pp 156ndash165 2019

[18] R Ding and P Cheng ldquoSpeculative trading price pressureand overvaluationrdquo Journal of International Financial Mar-kets Institutions and Money vol 21 no 3 pp 419ndash442 2011

[19] E Lee and N Piqueira ldquoBehavioral biases of informed tradersevidence from insider trading on the 52-week highrdquo Journal ofEmpirical Finance vol 52 pp 56ndash75 2019

[20] T-Y Pak and P Babiarz ldquoDoes cognitive aging affect port-folio choicerdquo Journal of Economic Psychology vol 66pp 1ndash12 2018

Complexity 13

[21] T Suzuki and Y Ohkura ldquoFinancial technical indicator basedon chaotic bagging predictors for adaptive stock selection inJapanese and American marketsrdquo Physica A Statistical Me-chanics and its Applications vol 442 pp 50ndash66 2016

[22] A Urquhart and F McGroarty ldquoAre stock markets reallyefficient evidence of the adaptive market hypothesisrdquo In-ternational Review of Financial Analysis vol 47 pp 39ndash492016

[23] X Xiong Y Meng X Li and D Shen ldquoAn empirical analysisof the adaptive market hypothesis with calendar effects ev-idence from Chinardquo Finance Research Letters vol 31 2019

[24] H Takahashi and T Terano ldquoAnalyzing the influence ofoverconfident investors on financial markets through agent-based modelrdquo in Intelligent Data Engineering and AutomatedLearning-IDEAL 2007 H Yin P Tino E CorchadoW Byrne and X Yao Eds Springer Berlin HeidelbergGermany 2007

[25] B LeBaron ldquoEmpirical regularities from interacting long- andshort-memory investors in an agent-based stock marketrdquoIEEE Transactions on Evolutionary Computation vol 5 no 5pp 442ndash455 2001

[26] M A Bertella F R Pires L Feng and H E StanleyldquoConfidence and the stock market an agent-based approachrdquoPLoS One vol 9 no 1 Article ID e83488 2014

[27] H R Varian ldquoBig data new tricks for econometricsrdquo Journalof Economic Perspectives vol 28 no 2 pp 3ndash28 2014

[28] G James D Witten T Hastie and R Tibshirani An In-troduction to Statistical Learning With Applications in RSpringer Publishing Company Incorporated Switzerland2014

[29] T C Silva and L Zhao Machine Learning in ComplexNetworks Springer Publishing Company IncorporatedSwitzerland 1st edition 2016

[30] T Hastie R Tibshirani and J Friedman 8e Elements ofStatistical Learning Data Mining Inference and PredictionSpringer Berlin Heidelberg Germany 2nd edition 2009

[31] K J Arrow and G Debreu ldquoExistence of an equilibrium for acompetitive economyrdquo Econometrica vol 22 no 3pp 265ndash290 1954

[32] H A Simon Models of Man Social and Rational- Mathe-matical Essays on Rational Human Behavior in a Social Set-ting Wiley Hoboken NJ USA 1957

[33] L Neyse S Bosworth P Ring and U Schmidt ldquoOver-confidence incentives and digit ratiordquo Scientific Reportsvol 6 no 1 2016

[34] M A Lundeberg P W Fox and J Pun coha ldquoHighlyconfident but wrong gender differences and similarities inconfidence judgmentsrdquo Journal of Educational Psychologyvol 86 no 1 pp 114ndash121 1994

[35] O Onishchenko and N Ulku ldquoForeign investor tradingbehavior has evolvedrdquo Journal of Multinational FinancialManagement vol 51 pp 98ndash115 2019

[36] M Abreu ldquoHow biased is the behavior of the individualinvestor in warrantsrdquo Research in International Business andFinance vol 47 pp 139ndash149 2019

[37] J-C Li Y-X Li N-S Tang and D-C Mei ldquo+e roles ofmean residence time on herd behavior in a financial marketrdquoPhysica A Statistical Mechanics and Its Applications vol 462pp 350ndash357 2016

[38] C Liu and X Li ldquoMedia coverage and investor scare behaviordiffusionrdquo Physica A Statistical Mechanics and Its Applica-tions vol 527 p 121398 2019

[39] KW Park S H Jeong and J Y J Oh ldquoForeigners at the gateforeign investor trading and the disposition effect of domestic

individual investorsrdquo 8e North American Journal of Eco-nomics and Finance vol 49 pp 165ndash180 2019

[40] Y Shi Y-r Tang and W Long ldquoSentiment contagionanalysis of interacting investors evidence from Chinarsquos stockforumrdquo Physica A Statistical Mechanics and its Applicationsvol 523 pp 246ndash259 2019

[41] J R Wei J P Huang and P M Hui ldquoAn agent-based modelof stock markets incorporating momentum investorsrdquoPhysica A Statistical Mechanics and its Applications vol 392no 12 pp 2728ndash2735 2013

[42] R J Balvers and Y Wu ldquoMomentum and mean reversionacross national equity marketsrdquo Journal of Empirical Financevol 13 no 1 pp 24ndash48 2006

[43] D B Chaves and V Viswanathan ldquoMomentum and mean-reversion in commodity spot and futures marketsrdquo Journal ofCommodity Markets vol 3 no 1 pp 39ndash53 2016

[44] B M Barber and T Odean ldquoBoys will be boys genderoverconfidence and common stock investmentrdquo 8e Quar-terly Journal of Economics vol 116 no 1 pp 261ndash292 2001

[45] J I PeNtildea ldquoDaily seasonalities and stock market reforms inSpainrdquo Applied Financial Economics vol 5 no 6 pp 419ndash423 1995

[46] T C Silva and L Liang Zhao ldquoNetwork-based high level dataclassificationrdquo IEEE Transactions on Neural Networks andLearning Systems vol 23 no 6 pp 954ndash970 2012

[47] T C Silva and L Liang Zhao ldquoNetwork-based stochasticsemisupervised learningrdquo IEEE Transactions on Neural Net-works and Learning Systems vol 23 no 3 pp 451ndash466 2012

[48] T C Silva and L Liang Zhao ldquoStochastic competitive learningin complex networksrdquo IEEE Transactions on Neural Networksand Learning Systems vol 23 no 3 pp 385ndash398 2012

[49] C M Bishop Pattern Recognition and Machine Learning(Information Science and Statistics) Springer-Verlag BerlinHeidelberg Germany 2006

[50] M A Petersen ldquoEstimating standard errors in finance paneldata sets comparing approachesrdquo Review of Financial Studiesvol 22 no 1 pp 435ndash480 2009

[51] T K Hira and C Loibl ldquoGender differences in investmentbehaviorrdquo in Handbook of Consumer Finance ResearchJ J Xiao Ed Springer New York NY USA 2008

[52] A E Sunden and B J Surette ldquoGender differences in theallocation of assets in retirement savings plansrdquo8eAmericanEconomic Review vol 88 pp 207ndash211 1998

[53] M Grinblatt M Keloharju and J Linnainmaa ldquoIQ and stockmarket participationrdquo 8e Journal of Finance vol 66 no 6pp 2121ndash2164 2011

[54] L Guiso and T Jappelli ldquoAwareness and stock market par-ticipationrdquo Review of Finance vol 9 no 4 pp 537ndash567 2005

[55] N Barberis andW Xiong ldquoWhat drives the disposition effectan analysis of a long-standing preference-based explanationrdquo8e Journal of Finance vol 64 no 2 pp 751ndash784 2009

[56] H Shefrin and M Statman ldquo+e disposition to sell winnerstoo early and ride losers too long theory and evidencerdquo 8eJournal of Finance vol 40 no 3 pp 777ndash790 1985

14 Complexity

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 13: Modeling Investor Behavior Using Machine Learning: Mean … · 2019. 12. 26. · Research Article Modeling Investor Behavior Using Machine Learning: Mean-Reversion and Momentum Trading

degree have a less pronounced mean-reversal strategy thanless educated investors which corroborates our hypothesisLooking at Specification (3) we observe a positive thoughmarginally significant relationship between IBOVESPAchanges and investment volume (minus 2750 + 2864 0114) formore educated investors suggesting traits of a momentumstrategy

5 Conclusions

We employ machine learning techniques together witheconometrics techniques to model investor behavior using aunique dataset for investors that focus on stock marketinvestments We propose a methodological approach to linkmachine learning methods widely used in computer scienceto standard econometric techniques commonly employed insocial sciences

Using the unique data set with high-frequency dailyinvestment decision of a broad set of investors in Brazil weprovide evidence that investors look at past performance of abenchmark stock index in order to decide their investmentdecisions Investors seem to prefer mean-reverting strategiesin the short-run rather than momentum +is may be as-sociated with the disposition effect - investors prefer to sellthe winners and buy the losers [55 56] Furthermore re-search could exploit alternative explanations for thisbehavior

In addition we study the determinants that either softenor exacerbate the mean-reversal behavior of Brazilian in-vestors by looking at the role of gender and level ofschoolingWe find that females andmore educated investorsare less sensitive to changes of past IBOVESPA variationswhich is consistent with the literature on behavioral finance

+is paper highlights the importance of using non-traditional methods in econometric analysis +e use ofmachine learning methods permits us to automate the oftensubjective process of choosing which variables are importantin any econometric analysis By using a feature selectionschememdashsuch as the elastic net in this papermdashwe are able toidentify those attributes that best describe how investorsdecide to buy or sell their positions in an objective andstatistically correct manner In addition to that the businessspecialist can always assess these variables pointed out asmost important to analyze their economic meaning

Data Availability

+e data is confidential

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+iago C Silva (grant no 4085462018-2) and Benjamin MTabak (grant no 3105412018-2 4251232018-9) gratefullyacknowledge financial support from the CNPq foundation

References

[1] H Zou and T Hastie ldquoRegularization and variable selectionvia the elastic netrdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 67 no 2 pp 301ndash3202005

[2] E F Fama and K R French ldquoPermanent and temporarycomponents of stock pricesrdquo Journal of Political Economyvol 96 no 2 pp 246ndash273 1988

[3] A W Lo and A C MacKinlay ldquoStock market prices do notfollow random walks evidence from a simple specificationtestrdquo Review of Financial Studies vol 1 no 1 pp 41ndash66 1988

[4] J M Poterba and L H Summers ldquoMean reversion in stockpricesrdquo Journal of Financial Economics vol 22 no 1pp 27ndash59 1988

[5] J Conrad and G Kaul ldquoAn anatomy of trading strategiesrdquoReview of Financial Studies vol 11 no 3 pp 489ndash519 1998

[6] C Engel and J D Hamilton ldquoLong swings in the dollar arethey in the data and do markets know itrdquo 8e AmericanEconomic Review vol 80 pp 689ndash713 1990

[7] N Jegadeesh and S Titman ldquoMomentumrdquo Annual Review ofFinancial Economics vol 3 no 1 pp 493ndash509 2011

[8] R S J Koijen J C Rodrıguez and A Sbuelz ldquoMomentumand mean reversion in strategic asset allocationrdquo Manage-ment Science vol 55 no 7 pp 1199ndash1213 2009

[9] M Morrin J Jacoby G V Johar X He A Kuss andD Mazursky ldquoTaking stock of stockbrokers exploring mo-mentum versus contrarian investor strategies and profilesrdquoJournal of Consumer Research vol 29 no 2 pp 188ndash198 2002

[10] J Okunev andDWhite ldquoDomomentum-based strategies stillwork in foreign currency marketsrdquo 8e Journal of Financialand Quantitative Analysis vol 38 no 2 pp 425ndash447 2003

[11] D Schiereck W De Bondt and M Weber ldquoContrarian andmomentum strategies in Germanyrdquo Financial AnalystsJournal vol 55 no 6 pp 104ndash116 1999

[12] D O Cajueiro and B M Tabak ldquoTesting for predictability inequity returns for European transition marketsrdquo EconomicSystems vol 30 no 1 pp 56ndash78 2006

[13] D O Cajueiro and B M Tabak ldquoTesting for time-varyinglong-range dependence in real state equity returnsrdquo ChaosSolitons amp Fractals vol 38 no 1 pp 293ndash307 2008

[14] E J Chang E J A Lima and B M Tabak ldquoTesting forpredictability in emerging equity marketsrdquo Emerging MarketsReview vol 5 no 3 pp 295ndash316 2004

[15] A Sensoy K Ozturk E Hacihasanoglu and B M TabakldquoNot all emerging markets are the same a classification ap-proach with correlation based networksrdquo Journal of FinancialStability vol 33 pp 163ndash186 2017

[16] B M Tabak and E J A Lima ldquoMarket efficiency of Brazilianexchange rate evidence from variance ratio statistics andtechnical trading rulesrdquo European Journal of OperationalResearch vol 194 no 3 pp 814ndash820 2009

[17] C M Boya ldquoFrom efficient markets to adaptive marketsevidence from the French stock exchangerdquo Research in In-ternational Business and Finance vol 49 pp 156ndash165 2019

[18] R Ding and P Cheng ldquoSpeculative trading price pressureand overvaluationrdquo Journal of International Financial Mar-kets Institutions and Money vol 21 no 3 pp 419ndash442 2011

[19] E Lee and N Piqueira ldquoBehavioral biases of informed tradersevidence from insider trading on the 52-week highrdquo Journal ofEmpirical Finance vol 52 pp 56ndash75 2019

[20] T-Y Pak and P Babiarz ldquoDoes cognitive aging affect port-folio choicerdquo Journal of Economic Psychology vol 66pp 1ndash12 2018

Complexity 13

[21] T Suzuki and Y Ohkura ldquoFinancial technical indicator basedon chaotic bagging predictors for adaptive stock selection inJapanese and American marketsrdquo Physica A Statistical Me-chanics and its Applications vol 442 pp 50ndash66 2016

[22] A Urquhart and F McGroarty ldquoAre stock markets reallyefficient evidence of the adaptive market hypothesisrdquo In-ternational Review of Financial Analysis vol 47 pp 39ndash492016

[23] X Xiong Y Meng X Li and D Shen ldquoAn empirical analysisof the adaptive market hypothesis with calendar effects ev-idence from Chinardquo Finance Research Letters vol 31 2019

[24] H Takahashi and T Terano ldquoAnalyzing the influence ofoverconfident investors on financial markets through agent-based modelrdquo in Intelligent Data Engineering and AutomatedLearning-IDEAL 2007 H Yin P Tino E CorchadoW Byrne and X Yao Eds Springer Berlin HeidelbergGermany 2007

[25] B LeBaron ldquoEmpirical regularities from interacting long- andshort-memory investors in an agent-based stock marketrdquoIEEE Transactions on Evolutionary Computation vol 5 no 5pp 442ndash455 2001

[26] M A Bertella F R Pires L Feng and H E StanleyldquoConfidence and the stock market an agent-based approachrdquoPLoS One vol 9 no 1 Article ID e83488 2014

[27] H R Varian ldquoBig data new tricks for econometricsrdquo Journalof Economic Perspectives vol 28 no 2 pp 3ndash28 2014

[28] G James D Witten T Hastie and R Tibshirani An In-troduction to Statistical Learning With Applications in RSpringer Publishing Company Incorporated Switzerland2014

[29] T C Silva and L Zhao Machine Learning in ComplexNetworks Springer Publishing Company IncorporatedSwitzerland 1st edition 2016

[30] T Hastie R Tibshirani and J Friedman 8e Elements ofStatistical Learning Data Mining Inference and PredictionSpringer Berlin Heidelberg Germany 2nd edition 2009

[31] K J Arrow and G Debreu ldquoExistence of an equilibrium for acompetitive economyrdquo Econometrica vol 22 no 3pp 265ndash290 1954

[32] H A Simon Models of Man Social and Rational- Mathe-matical Essays on Rational Human Behavior in a Social Set-ting Wiley Hoboken NJ USA 1957

[33] L Neyse S Bosworth P Ring and U Schmidt ldquoOver-confidence incentives and digit ratiordquo Scientific Reportsvol 6 no 1 2016

[34] M A Lundeberg P W Fox and J Pun coha ldquoHighlyconfident but wrong gender differences and similarities inconfidence judgmentsrdquo Journal of Educational Psychologyvol 86 no 1 pp 114ndash121 1994

[35] O Onishchenko and N Ulku ldquoForeign investor tradingbehavior has evolvedrdquo Journal of Multinational FinancialManagement vol 51 pp 98ndash115 2019

[36] M Abreu ldquoHow biased is the behavior of the individualinvestor in warrantsrdquo Research in International Business andFinance vol 47 pp 139ndash149 2019

[37] J-C Li Y-X Li N-S Tang and D-C Mei ldquo+e roles ofmean residence time on herd behavior in a financial marketrdquoPhysica A Statistical Mechanics and Its Applications vol 462pp 350ndash357 2016

[38] C Liu and X Li ldquoMedia coverage and investor scare behaviordiffusionrdquo Physica A Statistical Mechanics and Its Applica-tions vol 527 p 121398 2019

[39] KW Park S H Jeong and J Y J Oh ldquoForeigners at the gateforeign investor trading and the disposition effect of domestic

individual investorsrdquo 8e North American Journal of Eco-nomics and Finance vol 49 pp 165ndash180 2019

[40] Y Shi Y-r Tang and W Long ldquoSentiment contagionanalysis of interacting investors evidence from Chinarsquos stockforumrdquo Physica A Statistical Mechanics and its Applicationsvol 523 pp 246ndash259 2019

[41] J R Wei J P Huang and P M Hui ldquoAn agent-based modelof stock markets incorporating momentum investorsrdquoPhysica A Statistical Mechanics and its Applications vol 392no 12 pp 2728ndash2735 2013

[42] R J Balvers and Y Wu ldquoMomentum and mean reversionacross national equity marketsrdquo Journal of Empirical Financevol 13 no 1 pp 24ndash48 2006

[43] D B Chaves and V Viswanathan ldquoMomentum and mean-reversion in commodity spot and futures marketsrdquo Journal ofCommodity Markets vol 3 no 1 pp 39ndash53 2016

[44] B M Barber and T Odean ldquoBoys will be boys genderoverconfidence and common stock investmentrdquo 8e Quar-terly Journal of Economics vol 116 no 1 pp 261ndash292 2001

[45] J I PeNtildea ldquoDaily seasonalities and stock market reforms inSpainrdquo Applied Financial Economics vol 5 no 6 pp 419ndash423 1995

[46] T C Silva and L Liang Zhao ldquoNetwork-based high level dataclassificationrdquo IEEE Transactions on Neural Networks andLearning Systems vol 23 no 6 pp 954ndash970 2012

[47] T C Silva and L Liang Zhao ldquoNetwork-based stochasticsemisupervised learningrdquo IEEE Transactions on Neural Net-works and Learning Systems vol 23 no 3 pp 451ndash466 2012

[48] T C Silva and L Liang Zhao ldquoStochastic competitive learningin complex networksrdquo IEEE Transactions on Neural Networksand Learning Systems vol 23 no 3 pp 385ndash398 2012

[49] C M Bishop Pattern Recognition and Machine Learning(Information Science and Statistics) Springer-Verlag BerlinHeidelberg Germany 2006

[50] M A Petersen ldquoEstimating standard errors in finance paneldata sets comparing approachesrdquo Review of Financial Studiesvol 22 no 1 pp 435ndash480 2009

[51] T K Hira and C Loibl ldquoGender differences in investmentbehaviorrdquo in Handbook of Consumer Finance ResearchJ J Xiao Ed Springer New York NY USA 2008

[52] A E Sunden and B J Surette ldquoGender differences in theallocation of assets in retirement savings plansrdquo8eAmericanEconomic Review vol 88 pp 207ndash211 1998

[53] M Grinblatt M Keloharju and J Linnainmaa ldquoIQ and stockmarket participationrdquo 8e Journal of Finance vol 66 no 6pp 2121ndash2164 2011

[54] L Guiso and T Jappelli ldquoAwareness and stock market par-ticipationrdquo Review of Finance vol 9 no 4 pp 537ndash567 2005

[55] N Barberis andW Xiong ldquoWhat drives the disposition effectan analysis of a long-standing preference-based explanationrdquo8e Journal of Finance vol 64 no 2 pp 751ndash784 2009

[56] H Shefrin and M Statman ldquo+e disposition to sell winnerstoo early and ride losers too long theory and evidencerdquo 8eJournal of Finance vol 40 no 3 pp 777ndash790 1985

14 Complexity

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 14: Modeling Investor Behavior Using Machine Learning: Mean … · 2019. 12. 26. · Research Article Modeling Investor Behavior Using Machine Learning: Mean-Reversion and Momentum Trading

[21] T Suzuki and Y Ohkura ldquoFinancial technical indicator basedon chaotic bagging predictors for adaptive stock selection inJapanese and American marketsrdquo Physica A Statistical Me-chanics and its Applications vol 442 pp 50ndash66 2016

[22] A Urquhart and F McGroarty ldquoAre stock markets reallyefficient evidence of the adaptive market hypothesisrdquo In-ternational Review of Financial Analysis vol 47 pp 39ndash492016

[23] X Xiong Y Meng X Li and D Shen ldquoAn empirical analysisof the adaptive market hypothesis with calendar effects ev-idence from Chinardquo Finance Research Letters vol 31 2019

[24] H Takahashi and T Terano ldquoAnalyzing the influence ofoverconfident investors on financial markets through agent-based modelrdquo in Intelligent Data Engineering and AutomatedLearning-IDEAL 2007 H Yin P Tino E CorchadoW Byrne and X Yao Eds Springer Berlin HeidelbergGermany 2007

[25] B LeBaron ldquoEmpirical regularities from interacting long- andshort-memory investors in an agent-based stock marketrdquoIEEE Transactions on Evolutionary Computation vol 5 no 5pp 442ndash455 2001

[26] M A Bertella F R Pires L Feng and H E StanleyldquoConfidence and the stock market an agent-based approachrdquoPLoS One vol 9 no 1 Article ID e83488 2014

[27] H R Varian ldquoBig data new tricks for econometricsrdquo Journalof Economic Perspectives vol 28 no 2 pp 3ndash28 2014

[28] G James D Witten T Hastie and R Tibshirani An In-troduction to Statistical Learning With Applications in RSpringer Publishing Company Incorporated Switzerland2014

[29] T C Silva and L Zhao Machine Learning in ComplexNetworks Springer Publishing Company IncorporatedSwitzerland 1st edition 2016

[30] T Hastie R Tibshirani and J Friedman 8e Elements ofStatistical Learning Data Mining Inference and PredictionSpringer Berlin Heidelberg Germany 2nd edition 2009

[31] K J Arrow and G Debreu ldquoExistence of an equilibrium for acompetitive economyrdquo Econometrica vol 22 no 3pp 265ndash290 1954

[32] H A Simon Models of Man Social and Rational- Mathe-matical Essays on Rational Human Behavior in a Social Set-ting Wiley Hoboken NJ USA 1957

[33] L Neyse S Bosworth P Ring and U Schmidt ldquoOver-confidence incentives and digit ratiordquo Scientific Reportsvol 6 no 1 2016

[34] M A Lundeberg P W Fox and J Pun coha ldquoHighlyconfident but wrong gender differences and similarities inconfidence judgmentsrdquo Journal of Educational Psychologyvol 86 no 1 pp 114ndash121 1994

[35] O Onishchenko and N Ulku ldquoForeign investor tradingbehavior has evolvedrdquo Journal of Multinational FinancialManagement vol 51 pp 98ndash115 2019

[36] M Abreu ldquoHow biased is the behavior of the individualinvestor in warrantsrdquo Research in International Business andFinance vol 47 pp 139ndash149 2019

[37] J-C Li Y-X Li N-S Tang and D-C Mei ldquo+e roles ofmean residence time on herd behavior in a financial marketrdquoPhysica A Statistical Mechanics and Its Applications vol 462pp 350ndash357 2016

[38] C Liu and X Li ldquoMedia coverage and investor scare behaviordiffusionrdquo Physica A Statistical Mechanics and Its Applica-tions vol 527 p 121398 2019

[39] KW Park S H Jeong and J Y J Oh ldquoForeigners at the gateforeign investor trading and the disposition effect of domestic

individual investorsrdquo 8e North American Journal of Eco-nomics and Finance vol 49 pp 165ndash180 2019

[40] Y Shi Y-r Tang and W Long ldquoSentiment contagionanalysis of interacting investors evidence from Chinarsquos stockforumrdquo Physica A Statistical Mechanics and its Applicationsvol 523 pp 246ndash259 2019

[41] J R Wei J P Huang and P M Hui ldquoAn agent-based modelof stock markets incorporating momentum investorsrdquoPhysica A Statistical Mechanics and its Applications vol 392no 12 pp 2728ndash2735 2013

[42] R J Balvers and Y Wu ldquoMomentum and mean reversionacross national equity marketsrdquo Journal of Empirical Financevol 13 no 1 pp 24ndash48 2006

[43] D B Chaves and V Viswanathan ldquoMomentum and mean-reversion in commodity spot and futures marketsrdquo Journal ofCommodity Markets vol 3 no 1 pp 39ndash53 2016

[44] B M Barber and T Odean ldquoBoys will be boys genderoverconfidence and common stock investmentrdquo 8e Quar-terly Journal of Economics vol 116 no 1 pp 261ndash292 2001

[45] J I PeNtildea ldquoDaily seasonalities and stock market reforms inSpainrdquo Applied Financial Economics vol 5 no 6 pp 419ndash423 1995

[46] T C Silva and L Liang Zhao ldquoNetwork-based high level dataclassificationrdquo IEEE Transactions on Neural Networks andLearning Systems vol 23 no 6 pp 954ndash970 2012

[47] T C Silva and L Liang Zhao ldquoNetwork-based stochasticsemisupervised learningrdquo IEEE Transactions on Neural Net-works and Learning Systems vol 23 no 3 pp 451ndash466 2012

[48] T C Silva and L Liang Zhao ldquoStochastic competitive learningin complex networksrdquo IEEE Transactions on Neural Networksand Learning Systems vol 23 no 3 pp 385ndash398 2012

[49] C M Bishop Pattern Recognition and Machine Learning(Information Science and Statistics) Springer-Verlag BerlinHeidelberg Germany 2006

[50] M A Petersen ldquoEstimating standard errors in finance paneldata sets comparing approachesrdquo Review of Financial Studiesvol 22 no 1 pp 435ndash480 2009

[51] T K Hira and C Loibl ldquoGender differences in investmentbehaviorrdquo in Handbook of Consumer Finance ResearchJ J Xiao Ed Springer New York NY USA 2008

[52] A E Sunden and B J Surette ldquoGender differences in theallocation of assets in retirement savings plansrdquo8eAmericanEconomic Review vol 88 pp 207ndash211 1998

[53] M Grinblatt M Keloharju and J Linnainmaa ldquoIQ and stockmarket participationrdquo 8e Journal of Finance vol 66 no 6pp 2121ndash2164 2011

[54] L Guiso and T Jappelli ldquoAwareness and stock market par-ticipationrdquo Review of Finance vol 9 no 4 pp 537ndash567 2005

[55] N Barberis andW Xiong ldquoWhat drives the disposition effectan analysis of a long-standing preference-based explanationrdquo8e Journal of Finance vol 64 no 2 pp 751ndash784 2009

[56] H Shefrin and M Statman ldquo+e disposition to sell winnerstoo early and ride losers too long theory and evidencerdquo 8eJournal of Finance vol 40 no 3 pp 777ndash790 1985

14 Complexity

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 15: Modeling Investor Behavior Using Machine Learning: Mean … · 2019. 12. 26. · Research Article Modeling Investor Behavior Using Machine Learning: Mean-Reversion and Momentum Trading

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom