Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah

21
Case Studies in Creating Quant Models from Large Scale Unstructured Text Dr. SAMEENA SHAH ([email protected])

Transcript of Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah

Page 1: Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah

Case Studies in Creating Quant Models from Large Scale Unstructured Text Dr. SAMEENA SHAH ([email protected])

Page 2: Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah

DISRUPTIONS •  LARGE SCALE DATA ANALYSIS:

–  Hadoop, Spark

•  NATURAL LANGUAGE PROCESSING: –  Sentiment, context, text mining

•  NOVEL/EFFICIENT ALGORITHMS: –  Deep Learning, Topic Modeling

•  NOVEL DATA SETS: –  Twitter, satellite images

•  Accessibility

Page 3: Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah

Some large scale textual datasets •  Social Media

•  SEC filings

•  News

•  Courtwires

•  Patents

Page 4: Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah

ANALYZING UNSTRUCTURED TEXT IN SEC FILINGS

Page 5: Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah

•  All public companies, domestic and foreign, trading on any of the US exchanges, are required to file registration statements, periodic reports, insider trading forms and other forms describing any significant changes to the SEC.

•  Typically contain financial statements as well as large amounts of `unstructured text' describing the past, present and anticipated future performance of the company.

Page 6: Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah

For example, if a company changed its accounting methods to inflate its earnings, or changed its fiscal year end to include some extra sales, or shifted some expenses to a later period or included revenues which are not yet payable, or expensed or capitalized certain items.

Page 7: Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah

Can we •  Create an automated system that identifies

“abnormal” sentences in filings, hence alerting regulators/investors faster

•  This usually requires a deep amount of domain expertise even for humans to recognize such sentences.

•  Value is clear … but

•  > 3TB in compressed format

•  Running this on a small subset of data on a dual core machine gave us an estimate of few months

Page 8: Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah

Text modeling on hadoop •  Reading compressed files through a custom

inputreader

•  Parsing of sections

•  Division into sentences and comparison across different reference groups

•  Scoring each sentence wrt reference group model

•  Divergence of scores from distribution of

reference group

•  All this under 30 minutes for 8 years of filings

Page 9: Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah
Page 10: Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah
Page 11: Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah
Page 12: Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah
Page 13: Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah
Page 14: Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah

TEXT PROCESSING •  Use of text processing techniques to check for

–  Clarity in overall disclosure compared to peers –  Redundancy in language –  Comparison of language model across sector and market

cap peers –  Comparison of model with its own (the company in

context) historical model –  If overly vague or ‘boilerplate’ disclosures in recognition of

revenue

Page 15: Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah

SIGNALS FROM SOCIAL MEDIA

Page 16: Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah

Winning Traders •  Questions:

–  Can we find good traders and follow them to make money?

•  Method: –  Identify trading-related tweets ( buy/sell a specific stock) –  Evaluate traders based on past performance –  Follow their trades

16

Page 17: Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah

Why People Express their Trading Positions? •  Everyone has an opinion !

•  Positive Motivations –  Enhance reputation/brand –  Build network by attracting other experts –  Benefit personal trading positions

•  Negative Motivations –  Hired to promote a position –  Nothing else to do ….

17

Page 18: Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah

18

The Winners strategy gains 9.48% while S&P 500 lost 3.55%

13.03% difference

Cost does cost you! ( 0.2% per transaction)

Page 19: Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah

Conclusions •  While Twitter signal to noise is very low, targeted data

collection and mining can be more promising

•  In event-based sentiment analysis, we assumed stock market related tweets posted after a bad (good) news has a negative (positive) polarity. The data can be used to training a supervised model.

•  User-based analysis (following traders with good record of trading based on their tweets) also showed adapting traders move in the market could be a winning strategy.

•  M. Makrehchi, S. Shah, W. Liao. Stock prediction using Event information from Twitter. In Web Intelligence, 2013.

19

Page 20: Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah

Q & A

Page 21: Case Studies in Creating Quant Models from Large Scale Unstructured Text by Sameena Shah

21

The Winners strategy gains 19.76% while S&P 500 lost 3.55%