Britton Deets, Jordan Argueta, Jiajian Ou

Britton Deets, Jordan Argueta, Jiajian Ou 1

Abstract

Choosing what to invest in and how to invest is a problem that most people will be posed

with at some point in their lives. Our project seeks to answer these questions, partially, by examining three different methods on investing, in an automated form. We will compare all three of these algorithms by recording their relative performance in relation to each other, and a basic investing strategy, buy-and-hold investing. These results will also be compared to the relative merits of each algorithm in terms of efficiency.

2

Preface

We aim to develop three algorithms that will work in any market, since the concept of trading is universal. These algorithms are designed to outperform traditional buy-and-hold investing by entering and exiting the market based on certain signals. In addition to that, these algorithms can run on real time without human interactions. For the most part we have achieved these short term goals. However, because of time constraint, we have to make several small compromises.

Firstly, instead of running analysis and performance tests among all markets, only the U.S. stock market is chosen for this project. This may cause the standard deviation of the overall performance to fluctuate because other markets (foreign or domestic) can be more or less volatile. Secondly, although these algorithms are capable of running and trading on real time data, we have decided to feed our algorithms with simulated data from the historical charts (2016 - 2017). The logic behind the simulated data is that, it generally takes a year or more to have any significant gains from a matured market, and waiting on real time data to occur day by day for the timeframe this project allows may produce inconsistent results. Thirdly, it will take an enormous amount of time and resources to analyze every stock across the market, therefore, we have carefully selected six stocks that represent a sample of the overall composition of the market. For instance, Bank of America is chosen to represent the banking industry and large capitalization, and Johnson & Johnson is chosen to represent the healthcare industry. Our test group consists of 6 stocks from different industries and market capitalizations (company size). We believe this will give greater validation to our test cases.

Finally, the reason we chose to pursue this topic is our combined interests in financial

markets, as well as the applicability of automated trading algorithms in the modern financial markets context. We believe that developing these algorithms to interact with and interpret stock data is not only beneficial in terms of learning about algorithm design and analysis, but also with the tangible benefits of using automated strategies in the stock market.

3

Informal Problem Statement

Our goal is to maximize total capital gains from buying and selling stocks based on each algorithm. If we have n number of trades, we expect the the gain or loss from each trade c, will be summed to give us our total gain or loss of C. The limiting factor in our problem will be the number of trades we allow our algorithms to execute. In our case, we chose to let our algorithms to run for a period of one year simulated using historical data.

Formal Problem Statement Variables:

● Total number of trades n, Specific trade at trade number i = ni ● Change ci for trade ni = exit price - entry price, ● Sum of changes = C = sum(c1 + c2 + … + cn) ● t = time for algorithm to run (size of backtest file)

4

Pattern Searching Algorithm

The Pattern Searching algorithm is designed to search through historical data and find

patterns in price action. The assumption made with this algorithm is that with certain financial assets, price action patterns will occur repeatedly and with consistent resulting price changes. Meaning that for a given price pattern, there is a certain percentage probability that the price change following that pattern will be either up or down. Proof of this assumption can be made by backtesting patterns found through searching through historical price action. The pattern searching algorithm used for the purposes of this project was broken down into it’s most basic form by disregarding the quantity of price action change and simply recording the positive or negative (using 0 or 1) change of the price. For example if we were testing the pattern 010, we would look for price patterns in the data that corresponded (down, up, down) and record the resulting price change (up or down) and factor that resulting price change into the result probability of that specific pattern. A demonstration of this can be found as follows:

Figure 1. Pattern Searching example statistics

As you can see the pattern “11” came back with a 54.40% probability of having a positive resulting price change. This algorithm goes one step further by calculating expected return of a pattern by considering the percent up statistic, along with the average percent change of the resulting candles, using the following formula:

Expected Return = (Pu * Cu) + ((Pd) * Cd)

● Pu = Percent up ● Pd = Percent down ● Cu = Average change of up candles ● Cd = Average change of down candles

5

How it works

Once the algorithm has been trained based on a historical price data set and has set expected returns for each of the patterns it searched for, it can begin live trading. The program executes trades based on the value of the expected return. If the expected return is positive, then the algorithm buys the financial asset with all the capital available in the account and sells at the end of time period chosen. In this case, the time period chosen is one trading day. If the value is negative, the algorithm is able to short sell the asset (short selling is essentially borrowing stock from your broker to sell now with the promise of buying those shares back from the market. Money is made by short selling the stock when the price is high, and buying back when the price is low). Sequentially, the algorithm works as follows: a pattern occurs in the testing data (or real time stock data), the pattern is searched for in the algorithms trained pattern list, when the pattern is found, the algorithm finds the expected return for that pattern, with the expected return value, the pattern is then able to execute a trade based on if the value is positive or negative.

Analyzing Big(O)

The algorithm can be broken down into three main pieces: pattern generation, training, and executing. The pattern generation part of the process is the most inefficient in terms of computational complexity. This part of the algorithm is O(2n) with n being the length of the patterns to be generated.

Figure 2. Pattern creation time efficiency

6

This part of the algorithm has to generate the total set of permutations of patterns with the variables {1,0} of a certain length n, which works out to being the length of the variables list (2) to the power of the length of the pattern. For the purposes of our project, we used patterns of length 5. The computational complexity becomes exponentially more taxing as the length of the pattern increases. Thankfully, the predictive capability of certain patterns in historical stock prices above about length 5 or 6 are not statistically significant, therefore the performance of the algorithm beyond this threshold is almost meaningless. Nonetheless, the algorithm is still O(2n).

The next part of the algorithm is much less taxing in respect to time. This part of the algorithm trains the program by iterating through a list of historical prices and for each period of length 5, we checked the list of patterns generated in the first stage, and calculated the resulting price change statistics. This can be expressed on a time complexity basis as O(25*n), which can be compressed to O(n).

The execution portion of the algorithm is O(n). For each pattern passed into the program it checks the pattern list trained from the previous step and trades based on the expected return found. So, for each pattern the algorithm iterates over a list of length 25 and does a few other operations to buy or sell the stock, which is constant time. But, if we run the program for n number of time periods, the time complexity becomes O(n * 2n + <5 operations) which can be compressed to O(n). Analyzing Performance

Figure 3. Returns for Pattern Searching algorithm

This algorithm performed better than our benchmark metric, which is the traditional

method of buy-and-hold investing. The algorithm lowered the standard deviation of returns, which, for investors, means that there is less risk and variability associated with the returns. This

7

means that an investor using the pattern searching algorithm to trade will have greater peace of mind because their returns will not be as volatile and much more predictable.

Figure 4. Graph of Pattern Searching returns

As you can see, the returns are not all positive, but were less extreme than the

buy-and-hold alternative. The graph moves so much because the program executes a round trip trade (buy and sell) each day. This can be contrasted to the characteristics of the RSI algorithm trades graphed in Figure 10. More detailed performance and comparison data is available in the conclusion.

8

Relative Strength Index Detecting Algorithm

The main tool that is being used in this automated trading algorithm is the RSI indicator. The Relative Strength Index (RSI) was first introduced in Commodities Magazine in 1978 by technical analyst Welles Wilders. It is a price momentum indicator that compares the magnitude of the recent gains and losses over a specified period of time. The RSI indicator is one of the most commonly used indicators in the financial market. While there are many unknown elements in the financial market, the price chart of an asset is the universal language that the traders use to make critical decision upon. Like many other indicators, the more technical analysts use the RSI indicator, the more accurately it indicates since the financial market’s users and audiences are a collection of technical analysts with varies levels of analytical skills. The price charts or the RSI charts of any assets are records of market consensus. To put it in simple terms, the RSI is a signaling tool that shows whether the price of a stock or a commodity is moving too much and too quickly at any given time. If a stock’s price increases too much within a relatively short period of time, it is generally classified as an overbought scenario. Conversely, if a stock’s price decreases too much too quickly, it is an oversold scenario. Since the price action of a stock is being monitored by traders at all time, the market will almost always react to these two extreme scenarios. Which will cause the price of the stock to change its moving direction suddenly. The overbought scenarios are generally a sell signal because the market collectively believes that the price of a particular stock is extremely overvalued. And the oversold scenarios are exactly the opposite, the market believes that a particular stock is extremely undervalued and a price rebound is imminent, therefore, traders will rush in to fill the order book with buy orders. Which will ultimately drive the price of the stock up temporarily. This automated trading algorithm is built from the ground up to detect this oversold scenario because, in a rational market, the likelihood of having a small price recovery after an oversold scenario is almost certain. Using the RSI Formula

The Relative Strength Index is simply an integer number between 0 to 100 and it is calculated using the formula in Figure 1.

Figure 5. The RSI Formula.

9

It is theoretically possible for the RSI to reach 0 or 100, but a rational market tend not to behave this way. When the RSI is 0, it indicates a particular asset is extremely oversold. When it reaches 100, an extreme overbought scenario is presented. Under a rational and healthy market, the RSI tend to fluctuate between 20 to 80. The larger the asset, the less the volatility. Which means the RSI will fluctuate less. For example, apple’s market capitalization is over 800 billion U.S. dollars, and it will generally take a large amount of money to drive price movement in the market. As a result its stock’s RSI tend to seesaw within the channel of 30 and 70 (Figure 6).

Figure 6. Apple price chart and RSI from TradingView.com

Straight away, we can see that whenever the RSI in Apple’s weekly RSI chart gets near the lower boundary (RSI 30), the price moves upward almost immediately. Apple has been in a uptrending market for years, it receives a high level of market attention from traders, and traders see oversold scenarios as buy opportunities. When the RSI gets close to the lower RSI 30, the market collectively rushes in to “prevent” the price from falling any further, and this sudden buying pressure drives the price upward. This phenomenon is known in the financial world as the Oversold Bounce. And oversold Bounces happen frequently amount the a variety of markets. See more examples in Figure 7 and Figure 8.

10

Figure 7. Johnson And Johnson RSI and Price chart in a weekly timeframe.

Figure 8. Office Depot RSI and Price Chart in a daily timeframe.

11

Incorporating RSI into the Algorithm

The main function of this automated trading algorithm is to take the place of a trader who uses RSI as his/hers main tool. The logic is that, when the RSI hits below a certain threshold, a buy order is placed. On the contrary, a sell order is placed when RSI crosses an upper limit.

The data processing structure of this trading algorithm is minimalistic (Figure 5). And the

RSI timeframe of this algorithm uses is 14 candlesticks. Which means, when calculating the RSI level, the algorithm will always use the current candlestick and the last 13 candlesticks. The 14 candlestick calculation ensures a constant runtime efficiency. This constant time efficiency can be particularly beneficial in markets without closing hours, such as the Foreign Exchange market.

Figure 9. The simplified data structure of the trading algorithm.

There are three main components in this data processing structure: data receiving, data checking and trade executing. The data receiving component is responsible for gathering real time data from a source on the internet, such as Google Finance or Nasdaq. Once the connection is established, the update time efficiency is O(1). The real time data received from the internet will be converted into a candlestick, which contains all the candlestick properties such as the open price, close price and period high. The efficiency of this converting process is O(1). For the purpose of this project, this data gathering process is modified to receive data from a pre-built price simulator. All price data are historical data from Nasdaq (Nasdaq), however, this alteration does not affect its overall functionality and efficiency.

12

The data checking component is the backbone of this algorithm. All previously

mentioned RSI comparing strategies are embedded in this component. For each candlestick that goes into this component, it compares with the last 13 candlestick and returns a RSI value. Since the amount of candlesticks that involve in this calculation is always 14, the runtime efficiency is O(1).

The trading component uses the RSI value from the checking component to check

whether or not it meets the buying condition or selling condition. The buying and selling conditions are based on the targeted asset’s moving averages and RSI fluctuation channel. For instance, Apple’s healthy RSI channel is 30 to 70, if the returned RSI value is less than 30, it indicates that an oversold scenario is present, and this triggers the buy function within the algorithm. Once the algorithm executes a buy order, it has two possible actions from here, which are holding and selling. These actions can be adjusted based on the operator’s personal preference. The trading style can be adjusted from conservative to aggressive by increasing the expecting profit percentage. Each freshly generated RSI value will go through the same amount of conditional statements, and this makes the runtime efficiency O (1).

Altogether, this algorithm’s runtime efficiency is constant time since its main

components are running at constant time efficiency. On a longer timeframe such as hourly or daily time frame, the runtime efficiency may not matter significantly. However, when performing micro trading, meaning executing multiple complete trades within a split second, the runtime efficiency plays huge role in its profitability rates.

Testing and Results

Finally, an automated trading algorithm must prove its worthiness by making profits.

Because of time constraint, this algorithm is tested on a year of simulated data from the historical charts. By doing this, we can compare its performance to the traditional long term investing. Each candlestick is represented a day worth data, and there are 235 candlesticks for this entire testing because exchanges close on holidays and weekends. Six diversified stocks are chosen for this test to ensure the fairness of the test. And they are as follows: Facebook, Johnson & Johnson, Office Depot, Bank of America, GoPro and JCPenney. The initial amount for this test is 100,000 U.S dollars, and the daily account balance is graphed in chart for comparison (Figure 10).

13

Figure 10. BAC, FB, and JNJ performance chart. Only three of the six stocks are shown in the chart for cleanliness. One of the main thing

that stands out from these graphs is that very few trades were executed over this one year period. For Bank of America (BAC), only two trades were completed, and Johnson & Johnson completed three. This is showing that the algorithm was being very conservative with its trade conditions. The RSI value for these stocks rarely drops below their healthy moving average. However, the amount of trades can be increased by loosening the trade conditions. As its current trading style, 5 out 6 stocks were profitable by using this algorithm.

14

Keyword Trading Algorithm

The Keyword Trading Algorithm (KTA) was created to analyze and assess the importance of public opinion in predicting stock values. Public opinion is one of the largest unquantifiable variables that affects the success of a stock. What the public thinks or says about a company can heavily influence the value in which their shares are trading at. It is important to examine the mass opinion of consumers, because of the revenue they generate within the market. If consumers are buying a company's product then the company is generating revenue. It can be inferred that there is direct correlation between consumer’s wallet and the amount of money a business produces. For publicly traded companies, fluctuations in their stock value can be caused by the amount of attention they are receiving through different types of media. An example of this would be Apple during the weeks leading up to and following their keynote events. At these events, Apple will announce their latest products which generates a high amount of media coverage and market predictions (Booton). Apple’s stock prices increase solely on the fact that consumers are eagerly awaiting for new announcements (Booton). In cases such as these, the Keyword Trading Algorithm will be able to examine public opinion, and predict whether or not it is an ample to buy, hold, or sell. How KTA Works:

In order to calculate such an abstract variable, the KTA searches for articles referencing a specific company on a specific day. A pool of articles is returned, and from that pool three articles are randomly selected to be compiled into a single text file. The text file is first scanned for duplicate words and assigned a value. For example, If Johnson and Johnson was to be searched using the algorithm then an array of words would be returned based off of how many times it was repeat within all three of the articles. The output would like the text below:

[johnson=26, our 22, JNJ =19, the=18, and=13, information=8, news=7, sell=3, for=6,

privacy=6] Once the array is returned, the array is then referenced by two other arrays in search of

keywords. Keywords are a specific predetermined list of words that hold a value depending on their meaning. There are two types of keywords, positive and negative. These words are used to calculate a variable title opinionScore. The opinionScore is the final number that determines whether or not the algorithm decides buy, hold, or sell. Before every trade, the KTA must complete these steps in order to create an accurate assessment.

In the example of Johnson and Johnson you can see that the word “sell” appeared three times in the text file. The word score is then assigned a weight variable of int 3. The word “sell” would then iterate through the negativeArray first. The negativeArray is a list of keywords that

15

subtracts from the opinionScore based off the weight of the word. If the word is not found within the negativeArray, the algorithm will repeat the step for the positiveArray. The positiveArray is a list of keywords that add to the overall opinionScore. Each array contains twenty words each (Figure 11).

Figure 11. Keyword Arrays.

In the instance of Johnson and Johnson, the only word that was found was “sell.” Sell is a

one of the words that can be found in the negativeArray. This would calculate the opinionScore as int -3 for the specific date the algorithm was executed. If the opinionScore is more than one or positive then the algorithm sells the current volume. If the opinionScore is equal to zero then the algorithm holds its current volume. In this case, the opinionScore is less than one and would be instructed to buy. This is implemented in Figure 12.

Figure 12. OpinionScore Calculation.

The reasoning behind this method of trading is to purchase when public opinion is low on

the assumption that the value will be low. When there is an influx of individuals selling the value of the stock can drop significantly. The rationale behind selling during a period with high public opinion because there will most likely be an influx of individuals seeking to buy causing the market to peak. The overall goal of the algorithm is to sell when the public is buying and buy when public is selling. It is worth noting that when the opinionScore is stagnant at zero, it can be assumed that the public’s interest is indifferent toward the stock and it is in the best interest to hold.

16

Analyzing Big(O)

The algorithm is contained within three critical parts: Searching, Calculating Opinion,

and Trading. First part of the algorithm is the Searching method. The KTA must iterate through n amount for google searches meeting the time and date requirements to provide a pool to select from. This selection process is O(n) because of the variating amount of urls it will need to read through. While there is a limit on the amount time the algorithm has to iterate through urls, this variable is not a constant. The time cap is put into place to avoid iterating through the entire google search. Once the urls have been found, the algorithm will compile the n amount urls into one text file. Through a process called web scraping, the text on each website will be added to the text file. This process is also order n, but occurs separately. The Big (O) for this part of the function remains order n; O(n+n) simplifies into just O(n).

The next part of the algorithm is the Calculating Opinion section. This part of the

algorithm will iterate through the x amount of words within the text file. For each word, the algorithm will calculate how many duplicates it has by iterating through the text file. This makes the Calculating Opinion function the most demanding compared to the rest of the algorithm reaching a Big(O) of n2. After finding the duplicates the function will compare the duplicates array to both the negative and positive keyword arrays. This area of the algorithm is O(40n + 40). This is because the duplicates array will list an n amount of repeating words from the text file. The number assigned to a words with duplicates is called a weight. In order to calculate the opinionScore variable, it must determine which words are contained within both the duplicate array and the keyword arrays. There are a total of 40 keywords that either have a positive or negative value. Iterating through these 40 keywords must happen for there to be a calculation made. The calculation is simple addition which is order 1. This calculation is made for each keyword depending on the weight of the matching duplicate word. We can simplify O(n2 + 40n + 40) to just O(n2).

The final part of the algorithm is the least taxing of all the components. Depending on the

opinionScore calculated in the Calculating Opinion section the algorithm must decide what to do with the information. As stated earlier, if the opinionScore is less than 1 the algorithm purchases stock. If the opinionScore is greater than 1 the algorithm sells its stock, and if the opinionScore is zero the stock is held until further notice. Because this is implemented with just a if/else function the O(1).

17

KTA Results

After running several simulations of year-long trading over a multitude of different stocks, the results showed that the Keyword Trading Algorithm performed better than the benchmark trading method. The standard buy and hold method that we used as our benchmark showed an average loss of -28.75% over a year’s worth of buying, holding, and selling. This benchmark was applied to six different stocks of various markets to create a traditional guideline for measuring success. After the Keyword Trading Algorithm completed the same simulations with the same data the results showed only a loss of -1.71%. The results also showed similar results for the large market cap stocks like Bank of America, Facebook, and Johnson & Johnson. The data from the KTA varied slightly in profits from 3-4% on the large cap market stocks. On the small cap stocks the KTA shared similarities to the benchmark in regards to whether or not it had gains or losses as seen in Figure 14. However, unlike the benchmark the KTA was able to reduce the amount of money lost for all small cap stocks.

The data at from the six yearly simulations can be seen in Figure 13. The data represents

a year’s worth of trading starting at $100,000 USD. From the graph presented it goes to show that large caps benefited the most from the algorithm. Bank of America, Facebook, and Johnson & Johnson all demonstrated profits that grew at an almost linear rate. While the small cap stocks did not receive the same benefits the algorithm did decline the loss rate as the stocks lost their value. JCPenney, GoPro, Office Depot all contained declining stocks but compared to the benchmark it minimized the risk involved with trading those particular stocks. While the algorithm is O(n2), it provides a consistent and small risk trading. For each trade the algorithm can take anywhere from .387 to 2.763 seconds. The algorithm varies so vastly because of its google search mechanic. Occasionally, the algorithm would hang for a second because of its reliance on an internet connection.

Figure 13. Graph of KTA Results.

18

Conclusion

Figure 14. The results of different investing methods.

By comparing our algorithm’s performance/profitability to the traditional method of

investing, we can have a firm grasp of whether or not we have achieved our objectives. The benchmark, the traditional method of buying and holding for long term, is comprised of the same stock portfolio. Although there are some good performers with 20% and above gains in the portfolio, the overall profits is dragged down severely by under performers. Overall, the long term investing suffers a 28.75% loss, and this is a significant portion of the original investment. On the contrary, all three of our algorithms managed to be better off than the benchmark. While all of our algorithms work differently, one thing that is worth noting is that they have “sophisticatedly” dodged JCP’s severe market crash. All investors want to make profits one way or another in the financial market, but what they want the most is not to lose money. It is safe to conclude that all three of our algorithms have outperformed the benchmark, and for the most part, we have achieved our goals. While the algorithms exceeded the benchmark, there is one algorithm that is noticeably more efficient and effective than the other two. The Relative Strength Index (RSI) was the only algorithm with a positive growth average for the year as seen in Figure 14. The RSI did not produce the highest yields, but it made a profit for almost every stock. The only loss the RSI algorithm experienced was with Johnson & Johnson. Not only was the algorithm the most consistent, it also ran with an O(1). Keyword Trading and Pattern Searching both have an O(n2) and did not produce the same amount of consistent gains as the RSI algorithm.

19

Future Work

For work on the Pattern Searching algorithm in the future, the pattern generation portion of the program could be made robust. If we focus only on price action we are missing a lot of other factors that may need to be considered in order to find a pattern that has predictive capacity. Also, training data could be optimized for length and time period. If we did more extensive testing, we might find that the algorithm would perform better over certain periods of time with certain training data. Another thing we could change is the price granularity. We used daily price action to test our algorithm, but price action is different among different time frames, so in order to truly optimize algorithm performance, we would need to test other time frames. Overall, this algorithm performed better than the traditional method of investing, but in order to be truly actionable on the stock market, there needs to be changes made to ensure optimum performance.

20

Work Cited

Booton, Jennifer. Calm the iphone 8 buzz, Apple analyst warns. MarketWatch, 22 Feb. 2017, http://www.marketwatch.com/story/calm-the-iphone-8-buzz-apple-analyst-warns-2017-02-21. Accessed 29 Oct. 2017

NASDAQ's Homepage for Retail Investors. (n.d.). Retrieved Sept. 30, 2017, from

http://www.nasdaq.com/ Unknown author. “Relative Strength Index (RSI)” StockCharts.com Inc. Web. Accessed on 24

Sept. 2017. Wilder, Wells J. “New Concepts in Technical Trading Systems.” 1978. Print. Accessed on 24

Sept. 2017.

21

http://www.marketwatch.com/story/calm-the-iphone-8-buzz-apple-analyst-warns-2017-02-21

http://www.marketwatch.com/story/calm-the-iphone-8-buzz-apple-analyst-warns-2017-02-21

http://www.nasdaq.com/

Britton Deets, Jordan Argueta, Jiajian Ou

Documents

Transcript of Britton Deets, Jordan Argueta, Jiajian Ou