FT 500 Assignment Study of Financial Use Cases influenced...

21
FT 500 Assignment Study of Financial Use Cases influenced by Machine Learning Presented By: Ankit Jain (Group Leader) Haresh Jani Gurpreet Kaur Bilkhu Pratiksha Mishra

Transcript of FT 500 Assignment Study of Financial Use Cases influenced...

Page 1: FT 500 Assignment Study of Financial Use Cases influenced ...fintechprofiles.com/FT01/Gurpreetkaur/FT400-Gurpreet Grp6 ML.pdf · FT 500 Assignment Study of Financial Use Cases influenced

FT 500 Assignment

Study of Financial Use Cases influenced by Machine Learning

Presented By:

Ankit Jain (Group Leader)

Haresh Jani

Gurpreet Kaur Bilkhu

Pratiksha Mishra

Page 2: FT 500 Assignment Study of Financial Use Cases influenced ...fintechprofiles.com/FT01/Gurpreetkaur/FT400-Gurpreet Grp6 ML.pdf · FT 500 Assignment Study of Financial Use Cases influenced

Page | 2

INTRODUCTION TO MACHINE LEARNING (ML):

ML is one of the technique using Artificial Intelligence (AI), in which computer autonomously learns from data and information fed to it using algorithms. The computer grasps the data and analysis them to consider and decide on which moves should be followed to have better chances to achieve the desired results. This helps to improve the chances of success in goal achieving. Also, the machine adds those moves which led to success to the algorithm.

ML has different use cases like: Recommendation Systems; Natural Language Processing; Medical Diagnosis; Object Recognition and Tracking, Mining ‘Big Data’ – Analytics; Classification and Clustering of Data (Fraud Detection, Sequence Mining etc.).

All of these are based on the concept of learning from the past data and predicting the outcome for an unseen/new situation, the same way humans learn. But the advantage for computers is that they can process data at a much larger scale, with much larger complexity, and make useful inferences out of it; which is simply impossible to human beings.

Given today’s environment where one has trillions of gigabytes of data being generated every day, it just becomes impossible for humans to process and make useful inferences out of it. ML algorithm techniques beat humans in both at the scale and complexity level. Predictions made by these computers with learning over period of time will surpass the human level for sure.

ML in some or the other form has been implemented in almost every industry and one as a Trader can also use this to his advantage and gain more profit. As such, ML and Trading goes hand-in-hand. Top Traders and Hedge Fund Managers have extensively been using ML Algorithms to have better predictions and more profit, also highly depends on ML, with Deep Learning.

Attempt here is made on use of ML Techniques in Share Trading and Predictions.

Page 3: FT 500 Assignment Study of Financial Use Cases influenced ...fintechprofiles.com/FT01/Gurpreetkaur/FT400-Gurpreet Grp6 ML.pdf · FT 500 Assignment Study of Financial Use Cases influenced

Page | 3

USE CASE: TRADING AND PREDICTION OF STOCK PRICES

PROBLEM STATEMENT AND UNDERSTANDING THE PROBLEM:

People engaged in share trading have to consider lot many factors at macro and micro levels, which may be global, country specific, industry specific, corporate specific, etc. Numerous news are flooding in, affecting share price and stock market behavior, and one has to manage lot of data sources and come up with new patterns to help estimate trading, ideas, and make better trading or investing decisions.

As individual investors, funds and large trading houses are faced with thousand of stocks to pick every day, it’s a very daunting task; today by using ML, one can actually do all the number crunching, look at all the news media, the social media, blogs, and also the real-time codes, and can basically scan thousands of stocks in real time and get the best idea, so the use of ML technology lies.

Traders having so many real-time streaming news, and to mine information from these unstructured data sets becomes very important, they need new technology to handle this. With ML and Deep Learning, one can now look at all these unstructured data sets and mine lots of trading insights, which one could not do before.

Further, due to the difficulty in publishing empirical results that are often barely or border-line statistically, though insignificant; but markets, since they are partially driven by human emotions, involve a large degree of error. One can do all these today with various ML techniques, including Natural Language processing, which means one can have a computer understand the semantics and meaning of how people say something; and in news, this could be something positive or negative about certain companies, and that’s something one call Sentiment Analysis. A tool can be built which allows leveraging all the sentiment collected from traders, news, blogs, and also some of the data collected from transactions. For example, one can collect all the insider data trading sets, so that he can know for which company, which CEO or CXO is buying or selling stocks; try integrating this transaction data with the trader’s sentiments, and one can come up with a better score to know how people think about a set of stocks.

If a trader wanted to acquire an equity position, knowing whether to buy now or wait and re-evaluate in N seconds could allow the trader to purchase the stock at a lower price than was previously expected. Over time this could make a significant difference in the profitability of the strategy. As there is little incentive to publish such methods in academic literature, lack of published working models exists. The incentive to instead sell them to a trading firm is much greater, monetarily, as many traditional forms of stock market predictions are simply inadequate or sponsoring companies may not wish to divulge successful applications.

This may result in a glut of research arguing that the market is efficient, and thus unpredictable; this is called the “reverse file-drawer” bias. It is also possible that many traditional forms of stock market predictions are simply inadequate or sponsoring companies may not wish to divulge successful applications. Existence of inefficiency and moments of predictability can be proved using millions of stock transactions.

Page 4: FT 500 Assignment Study of Financial Use Cases influenced ...fintechprofiles.com/FT01/Gurpreetkaur/FT400-Gurpreet Grp6 ML.pdf · FT 500 Assignment Study of Financial Use Cases influenced

Page | 4

When predicting stock price direction, practitioners typically use one of three approaches. The first is the fundamental approach, which examines the economic factors that drive the price of stock (e.g. a company’s financial statements such as the balance sheet or income statement). The second approach is to use traditional technical analysis to anticipate what others are thinking based on the price and volume of the stock. Indicators are computed from past prices and volumes, and these are used to foresee future changes in prices. The goal of technical analysis is to identify regularities by extracting patterns from noisy data and by inspecting the stock charts visually. Studies show 80% to 90% of polled professionals and individual investors rely on at least some form of technical analysis. With recent breakthroughs in technology and algorithms, technical analysis has morphed into a more quantitative and statistical approach; which is called quantitative technical analysis and it is the third approach to predicting market direction. Whereas traditional technical analysis is visual, quantitative technical analysis is numerical, which allows one to easily program the rules into a computer.

Markets do not remain stable; indicators that are highly predictive at one moment may cease to be so as more traders spot the patterns and implement them in their trading approaches. Widespread adoption of a particular trading strategy is enough to drive the price either up or down to eliminate the pattern. This concept drift complicates the learning of models and is unique to streaming data. As the concept changes, model performance may decrease, requiring an update in the training data and/or change in the quantitative technical analysis indicators used as attributes. Modern ML classification techniques provide solutions and this, along with quantitative technical analysis, allows one to outperform existing published methods of stock market direction.

As per a very interesting study, all the robotics advisory and financial planning done today is assuming that one will stick to the strategy for 30 or 35 years, but the study shows most people change their strategy every 3 to 5 years; which shows the assumption for all these robo-advisors does not work with all users. So one need to build new technology to take people’s behavior into consideration and come up with a more adaptive asset locator. ML and Deep Learning are allowing financial firms and traders to analyze unstructured data (like financial information on news sites, blogs, across social media, etc.) and reveal patterns not previously identifiable by just human eyes – allowing for an entirely new approach to and ‘accuracy’ in trading decisions. There’s a need for robo-advisors that will better take into account and integrate individuals’ behavior patterns alongside their stated financial objectives, resulting in more adaptive and targeted investments.

Algorithm & HF Trading & ML:

Technological innovations accelerated the emergence of predetermined algorithm dominated trade transactions which are carried on certain trades based on certain conditions specified in the algorithm and checked against other market data, being received from external sources. High Frequency Trading (HFT) is a specialized case of Algorithmic Trading (AT) involving the frequent turnover of many small positions of a security. In fact, ML is the very basis of AT & HFT.

Technological innovations offered low latency for trade confirmations. This opened another channel of Algorithm Trading - Co-location. Algorithm and High Frequency Trading entirely

Page 5: FT 500 Assignment Study of Financial Use Cases influenced ...fintechprofiles.com/FT01/Gurpreetkaur/FT400-Gurpreet Grp6 ML.pdf · FT 500 Assignment Study of Financial Use Cases influenced

Page | 5

depends on information advantage- knowing something (or algorithmically making trading decisions and flow of information and trade commands at lightning speed, where success or loss is highly dependent on flow of information. HFT strategy depends entirely on information advantage - knowing something (or algorithmically decoding some signal) before everyone else does. Lately, the limiting factor in fast trading is not computing power, but communication power. Thus, firms are paying to construct ultrafast cables between financial centers. It is also argued by Pasquale that technology does not necessarily drive markets toward the goal of ever-faster trading. The social practice of buying and selling stocks is fundamentally malleable, and technology makes it ever more malleable. The pursuit of speed of ordering for its own sake has now reached the point that it rewards the purchasing power of certain traders (their ability to buy access to mountain-spanning cables) over their skill at allocating capital. Most HFT firms are run by scientists and engineers, and it is unlikely that they pay close attention to economic fundamentals and create a map of market structure that updates as fundamentals change. Evidences have established that though HFT helps in synchronization of prices, but then it is not a panacea for markets. When prices are tightly connected to one another, errors can quickly propagate throughout the financial system if safeguards are not in place. In addition, if shared misconceptions exist among investors, they are amplified so that prices are less accurate overall.

Finally, synchronization can create spurious structure in markets if information about the changing relationships of securities does not make its way to the high-frequency domain. It has also been proved in the past that AT and HFT can be used to manipulate markets using techniques like quote stuffing, layering (spoofing) and momentum ignition. Evidence suggests that market manipulation algorithms lead to decreased liquidity, higher trading costs, increased short term volatility, impact performance and fill rates, and massive price moves backed by false volume.

MACHINE LEARNING TECHNIQUE FOR SHARE TRADING: Although there are various ML Techniques to be used for share trading and price movement prediction, we have tried to focus on two techniques, viz. Clustering (Clustering of News- Unsupervised), and Sentiment Analysis. Both of these techniques are discussed below:

A) CLUSTERING:

Introduction: The method of identifying similar groups of data in a data set is called clustering. Entities in each group are comparatively more similar to entities of that group than those of the other groups. In this article, I will be taking you through the types of clustering, different clustering algorithms and a comparison between two of the most commonly used cluster methods. Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group than those in other groups. In simple words, the aim is to segregate groups with similar traits and assign them into clusters. Types of Clustering:

Page 6: FT 500 Assignment Study of Financial Use Cases influenced ...fintechprofiles.com/FT01/Gurpreetkaur/FT400-Gurpreet Grp6 ML.pdf · FT 500 Assignment Study of Financial Use Cases influenced

Page | 6

Broadly speaking, clustering can be divided into two subgroups: Hard Clustering: In hard clustering, each data point either belongs to a cluster completely or not. For example, in the above example each customer is put into one group out of the 10 groups. Soft Clustering: In soft clustering, instead of putting each data point into a separate cluster, a probability or likelihood of that data point to be in those clusters is assigned. For example, from the above scenario each costumer is assigned a probability to be in either of 10 clusters of the retail store. Types of clustering algorithms: Since the task of clustering is subjective, the means that can be used for achieving this goal are plenty. Every methodology follows a different set of rules for defining the ‘similarity’ among data points. In fact, there are more than 100 clustering algorithms known. But few of the algorithms are used popularly, which are as under: Connectivity models: As the name suggests, these models are based on the notion that the data points closer in data space exhibit more similarity to each other than the data points lying farther away. These models can follow two approaches. In the first approach, they start with classifying all data points into separate clusters & then aggregating them as the distance decreases. In the second approach, all data points are classified as a single cluster and then partitioned as the distance increases. Also, the choice of distance function is subjective. These models are very easy to interpret but lack scalability for handling big datasets. Examples of these models are hierarchical clustering algorithm and its variants. Centroid models: These are iterative clustering algorithms in which the notion of similarity is derived by the closeness of a data point to the centroid of the clusters. K-Means clustering algorithm is a popular algorithm that falls into this category. In these models, the no. of clusters required at the end have to be mentioned beforehand, which makes it important to have prior knowledge of the dataset. These models run iteratively to find the local optima. Distribution models: These clustering models are based on the notion of how probable is it that all data points in the cluster belong to the same distribution (For example: Normal, Gaussian). These models often suffer from overfitting. A popular example of these models is Expectation-maximization algorithm which uses multivariate normal distributions. Density Models: These models search the data space for areas of varied density of data points in the data space. It isolates various different density regions and assign the data points within these regions in the same cluster. Popular examples of density models are DBSCAN and OPTICS. K Means Clustering: K means is an iterative clustering algorithm that aims to find local maxima in each iteration. This algorithm works in 5 steps. Hierarchical Clustering:

Page 7: FT 500 Assignment Study of Financial Use Cases influenced ...fintechprofiles.com/FT01/Gurpreetkaur/FT400-Gurpreet Grp6 ML.pdf · FT 500 Assignment Study of Financial Use Cases influenced

Page | 7

Hierarchical clustering, as the name suggests is an algorithm that builds hierarchy of clusters. This algorithm starts with all the data points assigned to a cluster of their own. Then two nearest clusters are merged into the same cluster. In the end, this algorithm terminates when there is only a single cluster left. Applications of Clustering: Clustering has a large no. of applications spread across various domains. Some of the most popular applications of clustering are: Recommendation Engines, Market Segmentation, Social Network Analysis, Search Result Grouping, Medical Imaging, Image Segmentation, Anomaly Detection. Improving Supervised Learning Algorithms with Clustering: Clustering is an unsupervised machine learning approach, but can it be used to improve the accuracy of supervised machine learning algorithms as well by clustering the data points into similar groups and using these cluster labels as independent variables in the supervised machine learning algorithm? Let’s find out. Clustering can also be helpful for supervised machine learning tasks.

B) SENTIMENT ANALYSIS:

Sentiment Analysis:

Opinion mining is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine.

Generally speaking, sentiment analysis aims to determine the attitude of a person with respect to some topic or the overall contextual polarity or emotional reaction to a document, interaction, or event. The attitude may be a judgment or evaluation, affective state, or the intended emotional communication.

Page 8: FT 500 Assignment Study of Financial Use Cases influenced ...fintechprofiles.com/FT01/Gurpreetkaur/FT400-Gurpreet Grp6 ML.pdf · FT 500 Assignment Study of Financial Use Cases influenced

Page | 8

Sentiment Classification Methodology:

A human analysis component is required in sentiment analysis, as automated systems are not able to analyze historical tendencies of the individual commenter, or the platform and are often classified incorrectly in their expressed sentiment. Automation impacts approximately 23% of comments that are correctly classified by humans. However, humans often disagree, and it is argued that the inter-human agreement provides an upper bound that automated sentiment classifiers can eventually reach.

Sometimes, the structure of sentiments and topics is fairly complex. Also, the problem of sentiment analysis is non-monotonic in respect to sentence extension and stop-word substitution. To address this issue a number of rule-based and reasoning-based approaches have been applied to sentiment analysis, including defeasible logic programming. Also, there is a number of tree traversal rules applied to syntactic parse tree to extract the topicality of sentiment in open domain setting.

To better fit market needs, evaluation of sentiment analysis has moved to more task-based measures, formulated together with representatives from PR agencies and market research professionals. The focus in e.g. the RepLab evaluation data set is less on the content of the text under consideration and more on the effect of the text in question on brand reputation.

Web 2.0:

The rise of social media such as blogs and social networks has fueled interest in sentiment analysis. With the proliferation of reviews, ratings, recommendations and other forms of online expression, online opinion has turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities and manage their reputations. As businesses look to automate the process of filtering out the noise, understanding the conversations, identifying the relevant content and actioning it appropriately, many are now looking to the field of sentiment analysis. Further complicating the matter is the rise of anonymous social media platforms such as 4chan and Reddit. If web 2.0 was all about democratizing publishing, then the next stage of the web may well be based on democratizing data mining of all the content that is getting published. One step towards this aim is accomplished in research. Several research teams in universities around the world currently focus on understanding the

Page 9: FT 500 Assignment Study of Financial Use Cases influenced ...fintechprofiles.com/FT01/Gurpreetkaur/FT400-Gurpreet Grp6 ML.pdf · FT 500 Assignment Study of Financial Use Cases influenced

Page | 9

dynamics of sentiment in e-communities through sentiment analysis. The CyberEmotions project, for instance, recently identified the role of negative emotions in driving social networks discussions.

The problem is that most sentiment analysis algorithms use simple terms to express sentiment about a product or service. However, cultural factors, linguistic nuances and differing contexts make it extremely difficult to turn a string of written text into a simple pro or con sentiment. The fact that humans often disagree on the sentiment of text illustrates how big a task it is for computers to get this right. The shorter the string of text, the harder it becomes. Even though short text strings might be a problem, sentiment analysis within microblogging has shown that Twitter can be seen as a valid online indicator of political sentiment. Tweets' political sentiment demonstrates close correspondence to parties' and politicians' political positions, indicating that the content of Twitter messages plausibly reflects the offline political landscape.

Application in Recommender System:

For a recommender system, sentiment analysis has been proven to be a valuable technique. A recommender system aims to predict the preference to an item of a target user. Mainstream recommender systems work on explicit data set. For example, collaborative filtering works on the rating matrix, and content-based filtering works on the meta-data of the items.

In many social networking services or e-commerce websites, users can provide text review, comment or feedback to the items. These user-generated text provide a rich source of user's sentiment opinions about numerous products and items. Potentially, for an item, such text can reveal both the related feature/aspects of the item and the users' sentiments on each feature. The item's feature/aspects described in the text play the same role with the meta-data in content-based filtering, but the former are more valuable for the recommender system. Since these features are broadly mentioned by users in their reviews, they can be seen as the most crucial features that can significantly influence the user's experience on the item, while the meta-data of the item (usually provided by the producers instead of consumers) may ignore features that are concerned by the users. For different items with common features, a user may give different sentiments. Also, a feature of the same item may receive different sentiments from different users. Users' sentiments on the features can be regarded as a multi-dimensional rating score, reflecting their preference on the items.

Based on the feature/aspects and the sentiments extracted from the user-generated text, a hybrid recommender system can be constructed. There are two types of motivation to recommend a candidate item to a user. The first motivation is the candidate item have numerous common features with the user's preferred items, while the second motivation is that the candidate item receives a high sentiment on its features. For a preferred item, it is reasonable to believe that items with the same features will have a similar function or utility. So, these items will also likely to be preferred by the user. On the other hand, for a shared feature of two candidate items, other users may give positive sentiment to one of them while give negative sentiment to another. Clearly, the high evaluated item should be recommended

Page 10: FT 500 Assignment Study of Financial Use Cases influenced ...fintechprofiles.com/FT01/Gurpreetkaur/FT400-Gurpreet Grp6 ML.pdf · FT 500 Assignment Study of Financial Use Cases influenced

Page | 10

to the user. Based on these two motivations, a combination ranking score of similarity and sentiment rating can be constructed for each candidate item.

Except the difficulty of the sentiment analysis itself, applying sentiment analysis on reviews or feedback also face the challenge of spam and biased reviews. One direction of work is focused on evaluating the helpfulness of each review. Review or feedback poorly written is hardly helpful for recommender system. Besides, a review can be designed to hinder sales of a target product, thus be harmful to the recommender system even it is well written.

Use Case - Stock Markets & Indian Scenario:

There are various ups and downs in Indian stock market. In order to invest money in stock market for purchasing the shares it is very essential for the investors to predict the stock market condition. In India scenario Sensex and Nifty are two major indicator for prediction of stock market condition. For BSE (Bombay Stock Exchange) companies Sensex and for NSE (National Stock Exchange) companies Nifty is used as an indicator of stock market prediction. But the major problem for the investors are to predict the stock market condition which depends upon regular checking and testing of Sensex and Nifty prediction values. Finding future trend for a stock is a crucial task because stock trends depend on number of factors. It is assumed that news articles and stock price are related to each other. And, news may have capacity to fluctuate stock trend. So, if one thoroughly study this relationship, can conclude that stock trend can be predicted using news articles and previous price history. As news articles capture sentiment about the current market, this sentiment detection is automated and based on the words in the news articles, one can get an overall news polarity. If the news is positive, then one can state that this news impact is good in the market, so more chances of stock price go high. And if the news is negative, then it may impact the stock price to go down in trend. Polarity detection algorithm may be used for initially labelling news and making the train set. For this, dictionary based approach may be used. The dictionaries for positive and negative words are created using general and finance specific sentiment carrying words. Based on the collected data, classification models can be implemented and tested under different test scenarios. Then after comparing their results, accuracy under different algorithm methods is measured. Accuracy under SVM,. Naive Bayes algorithm methods are considerable. Given any news article, it would be possible for the model to arrive on a polarity which would further predict the stock trend.

In the Indian Scenario, for Demonstrating sentiment analysis for stock market, fetching Sensex and Nifty live server data values on different interval of time can be used for predicting the stock market status. For this purpose one may use Python scripting language, which has fast execution environment and this will help out the investors in order to make a prediction of, on what shares money should be invested and it will also help in maintaining the economical balance of share market. In Future, work can be done by running these python script code with more advanced functions.

Page 11: FT 500 Assignment Study of Financial Use Cases influenced ...fintechprofiles.com/FT01/Gurpreetkaur/FT400-Gurpreet Grp6 ML.pdf · FT 500 Assignment Study of Financial Use Cases influenced

Page | 11

A Framework for Stock Prediction

Example 1:

What are the factors that cause changes in stock prices?

A very simplistic response to the above question would be that the price of a stock is a function of its demand

and supply. However, the reality is that there are various factors that affect the price of stocks including but

not limited to industry / sectoral cycles, GDP, inflation, news, competitors, regulations, etc.

In this example we will see the impact of news articles i.e. qualitative / non-quantifiable information on the

prices of stocks.

The efficient-market hypothesis (EMH) is a theory in financial economics that states that asset prices fully

reflect all available information. A direct implication is that it is impossible to "beat the market" consistently

on a risk-adjusted basis since market prices should only react to new information. There are basically three

forms of markets:

1. Weak: Historical prices have no bearing on future prices.

2. Semi-strong: Share prices adjust to publicly available information very rapidly

3. Strong: Share prices reflect all information, public & private.

Methodology:

Articles on a given stock can be collected over the period on which the analysis needs to be concluded.

The analysis could be filtered and broken down into the following:

1. Bag of words: In this model, a text (such as a sentence or a document) is represented as the bag

(multiset) of its words, disregarding grammar and even word order but keeping the multiplicity.

2. Co-reference resolution of noun phrases: Co-reference occurs when two or more expressions in

a text refer to the same person or thing.

This information is then processed by Support Vector Machines which are supervised learning models

with associated learning algorithms that analyse data for classification and regression analysis. The

evaluation can be done on two metrics: Closeness and Directional accuracy.

1. Closeness measures the distance between the hyperplane and the closest data points. The

lesser the margin the best fit is the hyperplane.

2. Directional accuracy measure the direction of the movements of the stock prices

Page 12: FT 500 Assignment Study of Financial Use Cases influenced ...fintechprofiles.com/FT01/Gurpreetkaur/FT400-Gurpreet Grp6 ML.pdf · FT 500 Assignment Study of Financial Use Cases influenced

Page | 12

Using a multi-modal approach, Sentiment Analysis could be used to classify whether the opinion

document/articles expresses a positive or negative opinion or sentiment.

Where to find the datasets?

It is difficult to obtain datasets where stock prices are available not only on a daily but for every

minute/second in a day to fully understand the impact on the stock price before / after the release of a

news item.

Example 2:

Prediction of the price of share on the basis of the Index and such other variables:

Attempts to forecast stock prices have been studied and a number of methodologies have been proposed and

applied. Among them, artificial neural network (ANN) has been widely used in classification problems.

Page 13: FT 500 Assignment Study of Financial Use Cases influenced ...fintechprofiles.com/FT01/Gurpreetkaur/FT400-Gurpreet Grp6 ML.pdf · FT 500 Assignment Study of Financial Use Cases influenced

Page | 13

It has three layers in all: input, hidden and output layer. The main task of a nueral network is to find

optimal weights. In this algorithm weights are fixed to minimise the loss function. It is agreed by most

researchers that: the higher the number of hidden layers, the better the performance after optimizing the weights.

However it can cause the over-fitting problem and the number of layers should be determined properly according

to experiments.

Where to find the datasets?

For this analysis historical data of IndusInd Bank Ltd was downloaded from Quandl and the Bank Nifty

was considered to be the Indicator. Similarly we could add more variables which will make the problem

more complex.

Source: Quandl.com

Page 14: FT 500 Assignment Study of Financial Use Cases influenced ...fintechprofiles.com/FT01/Gurpreetkaur/FT400-Gurpreet Grp6 ML.pdf · FT 500 Assignment Study of Financial Use Cases influenced

Page | 14

Conclusion:

Predicting stock prices has been a major issue for organisations and researchers all across the world. In

this report we have attempted to give the description of the various methodologies with which

investors world over are using ML in stock prediction.

Page 15: FT 500 Assignment Study of Financial Use Cases influenced ...fintechprofiles.com/FT01/Gurpreetkaur/FT400-Gurpreet Grp6 ML.pdf · FT 500 Assignment Study of Financial Use Cases influenced

Page | 15

Use Case 02 – Credit Card Fraud Detection

Understanding Problem Statement

“Credit Card Fraud” is a wide-ranging term for theft and fraud committed using a credit card or any similar payment mechanism as a fraudulent source of funds in a transaction. The purpose may be to obtain goods without paying, or to obtain unauthorized funds from an account. Although incidence of credit card fraud is limited to about 0.1% of all card transactions, this has

resulted in huge financial losses as the fraudulent transactions have been large value

transactions.

Defining Problem Statement

The purpose of fraud detection systems is to evaluate and classify financial transactions. In the case of credit and debit cards, a system has to check thousands of transactions per second and flag those that are potentially fraudulent. Note that a set of tested transactions can be divided into four groups:

1. True-positives. Transactions that were fraudulent and which the system correctly classified as fraudulent.

2. False-positives. Transactions that were legitimate, but which the system incorrectly classified as fraudulent.

3. True-negatives. Transactions that were legitimate and which the system correctly

classified as legitimate.

4. False-negatives. Transactions that were fraudulent, but which the system incorrectly

classified as legitimate.

The table below summarizes these classification results

Methodology

The goal is to minimize the number of false positives and false-negatives (incorrect

classifications); however, these two measures very often work against each other. For instance,

we can reduce the number of false-negatives by making a system more “suspicious” (i.e.,

thereby flagging more transactions as fraudulent); but flagging a larger number of transactions

Page 16: FT 500 Assignment Study of Financial Use Cases influenced ...fintechprofiles.com/FT01/Gurpreetkaur/FT400-Gurpreet Grp6 ML.pdf · FT 500 Assignment Study of Financial Use Cases influenced

Page | 16

would increase the number of false-positives (as many of these transactions would be

legitimate).

Any change in these two measures might have significant consequences for a bank. A bank uses a fraud detection system that flags “suspicious” transactions, and, later, a subset of these transactions is classified as fraudulent. Now the bank can drastically reduce fraud by flagging all suspicious transactions (i.e., reducing the number of false negatives), but this is not feasible for two reasons:

1. A battalion of fraud prevention officers would be needed to review all the flagged transactions.

2. The bank would annoy its customers by blocking many legitimate transactions (i.e.,

increasing the number of false-positives, as not all suspicious transactions turn out to be

fraudulent).

Hence, the “suspicion threshold” cannot be set too high. However, although there will be fewer

transactions to review and less upset customers when the threshold is lowered, this might

result in an increased number of false-negatives (which will increase the financial losses of the

bank). This problem is compounded by the fact that many new fraud patterns emerge each

year, and a bank might not recognize them until the fraud has already occurred.

Where can I find Dataset?

You can find data to test and validate fraudulent transactions on any of the following links

https://www.kaggle.com/c/dam-assignment-3-credit-card-default/data

https://www.kaggle.com/benjaminlott/credit-card-ml-rpart-xgboost-tsne-attempts/data

http://weka.8497.n7.nabble.com/file/n23121/credit_fruad.arff (Data is in arff format)

The variables from dataset that can be useful for our purpose could be as follows:

Customer Identifier Number

Authentication Type

Current Bank Balance

Average Bank Balance

Average Daily Spending

Location of card credit being used

Time of the credit card used with respect to location

Number of times credit card being used

Times of Overdraft

Average daily Over draft

Building Detection Systems on Machine Learning

Page 17: FT 500 Assignment Study of Financial Use Cases influenced ...fintechprofiles.com/FT01/Gurpreetkaur/FT400-Gurpreet Grp6 ML.pdf · FT 500 Assignment Study of Financial Use Cases influenced

Page | 17

The dataset once selected has to be trained in order to find the pattern among all transactions.

The pattern would indicate a transaction being fraudulent, which would be later used to flagged

those transactions in prediction.

A machine learning algorithm called ‘GENETIC ALGORITHM’ is helpful for training the dataset

and it works as follows.

The initial population is selected randomly from the sample space which has many

populations. The fitness value is calculated in each population and is sorted out. In

selection process is selected through tournament method. The Crossover is calculated

using single point probability. Mutation mutates the new offspring using uniform

probability measure. In elitism selection the best solution are passed to the further

generation. The new population is generated and undergoes the same process it

maximum number of generation is reached. [1]

The Genetic Algorithm based training model is incorporated with rule base as a method to

classify new transactions.

The fraudulent transactions are flagged with the prediction module where all transactions are

assigned with a “suspicion score” to each transaction. For this complex classification task, it

might be prudent to build and train several prediction models and then combine them together

into one module.

In order to make a decision, the prediction module can use several models that are based on

different prediction methods. Each model assigns a suspicion score in the range of 0 to 1: the

higher the score, the greater the probability that the transaction is fraudulent.

Once each of the models produces a score for the transaction, the prediction module then

combines the scores. Each model has a weight that represents the importance of its prediction,

and the sum of all the weights equals 1; therefore, the combined suspicion score is also in the

range of 0 to 1. The final decision of “fraudulent” versus “legitimate” is based on whether the

weighted score is higher than the “suspicion threshold” parameter in the system.

The diagram provides an example of the prediction process, which is based on four models and

a voting system (averaging of scores):

Page 18: FT 500 Assignment Study of Financial Use Cases influenced ...fintechprofiles.com/FT01/Gurpreetkaur/FT400-Gurpreet Grp6 ML.pdf · FT 500 Assignment Study of Financial Use Cases influenced

Page | 18

In this example, the four models have weights equal to 0.3, 0.2, 0.15, and 0.35, respectively. The individual scores are combined to come up with the final suspicion score, which is a weighted average. Now, the prediction module may have different suspicion thresholds that define how a transaction should be processed. For example, the thresholds may be:

➢ Not greater than 0.6 – grant authorization.

➢ Greater than 0.6, but not greater than 0.9 – grant authorization, but flag the transaction for later review by a fraud prevention officer.

➢ Greater than 0.9 – deny authorization and call customer.

Page 19: FT 500 Assignment Study of Financial Use Cases influenced ...fintechprofiles.com/FT01/Gurpreetkaur/FT400-Gurpreet Grp6 ML.pdf · FT 500 Assignment Study of Financial Use Cases influenced

Page | 19

To measure the effectiveness of the prediction module over some period of time, the system can compute several performance ratios.

1. True-positive ratio: Correctly classified fraudulent transactions / total number of fraudulent transactions.

2. True-negative ratio: Correctly classified legitimate transactions / total number of legitimate transactions.

3. False-positive ratio: Incorrectly classified legitimate transactions / total number of legitimate transactions.

4. False-negative ratio: Incorrectly classified fraudulent transactions / total number of fraudulent transactions

The true-positive ratio for January was 45.02%, which means that during the month of January

the system correctly classified 45.02% of the fraudulent transactions (2,375 fraudulent

transactions were correctly classified from a total number of 5,276). During the same time

period, the false-positive ratio was 0.43%, which means that the system incorrectly classified

0.43% of the legitimate transactions during the month of January (10,088 legitimate

transactions were incorrectly classified from a total number of 2,363,893).

It might be desirable to use several prediction methods for the problem of detecting fraudulent

transactions.

One of the more popular prediction methods used for this type of problem is based on rules,

which are defined and maintained for classifying new transactions. Some other widely used

methods for fraud detection include artificial neural networks, fuzzy logic, and decision trees –

all of which have their own advantages and disadvantages.

Page 20: FT 500 Assignment Study of Financial Use Cases influenced ...fintechprofiles.com/FT01/Gurpreetkaur/FT400-Gurpreet Grp6 ML.pdf · FT 500 Assignment Study of Financial Use Cases influenced

Page | 20

Based on the rules in the rule base, a decision tree, fuzzy logic system, neural network model,

or decision table is used to make the final prediction. Such selective processing speeds up the

total classification/prediction time, which is important in time-critical applications.

Consider a credit card fraud detection system that has to decide whether a transaction is

fraudulent or not within a fraction of a second.

As the models used for detecting fraudulent transactions are trained and tested on historical data, the performance of these prediction models depends on the quality and quantity of available data. A separate module is required for allow the prediction module to learn and recognize new fraud patterns as they occur. This module is also responsible for continuously adjusting the weights of the individual prediction models, thereby ensuring that the more useful (i.e., precise) models exert more influence on the final prediction. Another module can be used for recommending the best suspicion thresholds, if need be. Recall that the prediction module has different thresholds for the final suspicion score, which define various courses of action:

a. If the score is not greater than α– grant authorization.

b. If the score is greater than α, but not greater than β – grant authorization, but flag the transaction for later review by a fraud prevention officer.

c. If the score is greater than β– deny authorization and call customer.

It is important to minimize losses (thus minimizing the number of false-negatives), but it is also very important to take into account customer satisfaction (thus minimizing the number of false-positives). This module should find the optimal values of α and β (assuming two thresholds in the decision making process) to minimize false-negatives and false-positives. Conclusion

Page 21: FT 500 Assignment Study of Financial Use Cases influenced ...fintechprofiles.com/FT01/Gurpreetkaur/FT400-Gurpreet Grp6 ML.pdf · FT 500 Assignment Study of Financial Use Cases influenced

Page | 21

Machine Learning can be reinforced in prediction module in the work flow of credit card fraud detection. It would thereby impact the further module drastically, providing efficient results. References [1] https://pdfs.semanticscholar.org/d579/9e9e81a48a1cb9ea230f133bc8ee4baf48f1.pdf