Applying Data Mining for News Analytics

15
Applying Data Mining for News Analytics Vasko Yordanov

description

 

Transcript of Applying Data Mining for News Analytics

Page 1: Applying Data Mining for News Analytics

Applying Data Mining for News Analytics

Vasko Yordanov

Page 2: Applying Data Mining for News Analytics

The Problem

• Increased volume of online financial data which causes useful investment information to be “lost”

• Traditional means of news delivery and analysis are becoming obsolete.

• Online information is unstructured

Page 3: Applying Data Mining for News Analytics

Information Overload

3

Thousands of blogospheres

Millions of newsfeeds

Thousands of syndicators

Information is not uniform:• Structured data: news• Unstructured data: blogs• Mixed: syndicators• Dynamic: constant changing and flow of information stream• Quality: search engine optimization dilutes quality

Page 4: Applying Data Mining for News Analytics

Solution:

• Machine Interpretation of news.

• Quantify the news to be used by “news-flow” trading algos.

• Detect “news sentiment”:

-Stocks react to market sentiment

Incorporate the above into “smart” news feed aggregator

Page 5: Applying Data Mining for News Analytics

Opportunity:

• Social media can offer glimpses of information well before it reaches mainstream media.

Ex: Emergency landing on airplane in Hudson got known first on Twitter.

Such data is real time and instant.

• Real time analysis of twitter, blogs, news, gives you a view of what the public sees

Page 6: Applying Data Mining for News Analytics

• Reacting appropriately to breaking news events can give traders a significant edge over the rest of the market, if they act on it faster than the competition.

• Quantitative trading groups say that in the near future they will look to develop models that use the historical impact of news events on stock performance to predict the effect real-time events may have on future performance

Page 7: Applying Data Mining for News Analytics

Ex: Real time news for Enviro Technologies:

Page 8: Applying Data Mining for News Analytics
Page 9: Applying Data Mining for News Analytics

Example:

Avastin is Genetech’s trade name for Bevacizumab ,an anti-angiogenic drug which has been approved for use against colorectal cancer since 2004.On 14th March , 2005, The National Cancer Institute ( NCI) posted the result of Phase 3 trials using Bevacizumab combined with chemotherapy for patients with advanced lung cancer. Four hours later Genetech did a press release.

The immediate result was 25% hike in the company’s stock price. Anyone who had made the connection between Bevacizumab and Genetech within the four-hour window could have had a significant

market lead.

Page 10: Applying Data Mining for News Analytics

Incorporate news into trading algorithms

• News flow itself is important trading signal following the old adage “There is no smoke without fire”.

• The sheer volume of news items can be just as much an indicator as the actual information they convey. Sudden rush of headlines does suggest that volatility may increase ( uncertainty breeds volatility)

• Identify the news “sentiment”• Data mine Events of Interest ( EOI)

Page 11: Applying Data Mining for News Analytics

What is an event?

• An event is a significant change• An event is detected by observing a

pattern in data acquired over time from multiple sources: -Stock tickers, news, blogs,..

• Observe an anomaly:-The number of blogs per day about a

company are 50% higher in the last day than over the last year

Page 12: Applying Data Mining for News Analytics

How ?

• Employ advances in emerging technologies of AI such as :Natural Language Processing , Sentiment Analysis and Entity identification, Semantic Web.

• That arena is becoming hot and already sophisticated tools exist such as Thomson Reuters’s “OpenCalais” service.

Page 13: Applying Data Mining for News Analytics

The Market for Unstructured Data

• Only 2 % of firms employing electronic trading strategies with unstructured data in a machine readable format, estimates Aite Group.

• Some content is free; for paid content, firms will spend more than $75 million globally in 2009 and over $141 million by 2011, estimates Aite Group.

Page 14: Applying Data Mining for News Analytics

Use emerging edge technology:

Semantic Web:• “The Semantic Web is an evolving extension of the

World Wide Web in which the semantics of information and services on the web is defined, making it possible for machines to understand content” –Wikipedia

• Contextual search for more relevant information

– Proprietary algorithm to search and render information using Semantic Web standards

– Google just acquired Freebase – formerly the “poster child” of Semantic linked data . They contained and linked world’s knowledge. This is likely to cause a shift similar to WWW adoption.

Proprietary 14

Page 15: Applying Data Mining for News Analytics

Competitors:

• Relegence’s “FirstTrack” service

• Collective Intellect

• Bloomberg news “heat” analytics.