Applying Data Mining for News Analytics

Post on 30-Nov-2014

598 views 3 download

Tags:

description

 

Transcript of Applying Data Mining for News Analytics

Applying Data Mining for News Analytics

Vasko Yordanov

The Problem

• Increased volume of online financial data which causes useful investment information to be “lost”

• Traditional means of news delivery and analysis are becoming obsolete.

• Online information is unstructured

Information Overload

3

Thousands of blogospheres

Millions of newsfeeds

Thousands of syndicators

Information is not uniform:• Structured data: news• Unstructured data: blogs• Mixed: syndicators• Dynamic: constant changing and flow of information stream• Quality: search engine optimization dilutes quality

Solution:

• Machine Interpretation of news.

• Quantify the news to be used by “news-flow” trading algos.

• Detect “news sentiment”:

-Stocks react to market sentiment

Incorporate the above into “smart” news feed aggregator

Opportunity:

• Social media can offer glimpses of information well before it reaches mainstream media.

Ex: Emergency landing on airplane in Hudson got known first on Twitter.

Such data is real time and instant.

• Real time analysis of twitter, blogs, news, gives you a view of what the public sees

• Reacting appropriately to breaking news events can give traders a significant edge over the rest of the market, if they act on it faster than the competition.

• Quantitative trading groups say that in the near future they will look to develop models that use the historical impact of news events on stock performance to predict the effect real-time events may have on future performance

Ex: Real time news for Enviro Technologies:

Example:

Avastin is Genetech’s trade name for Bevacizumab ,an anti-angiogenic drug which has been approved for use against colorectal cancer since 2004.On 14th March , 2005, The National Cancer Institute ( NCI) posted the result of Phase 3 trials using Bevacizumab combined with chemotherapy for patients with advanced lung cancer. Four hours later Genetech did a press release.

The immediate result was 25% hike in the company’s stock price. Anyone who had made the connection between Bevacizumab and Genetech within the four-hour window could have had a significant

market lead.

Incorporate news into trading algorithms

• News flow itself is important trading signal following the old adage “There is no smoke without fire”.

• The sheer volume of news items can be just as much an indicator as the actual information they convey. Sudden rush of headlines does suggest that volatility may increase ( uncertainty breeds volatility)

• Identify the news “sentiment”• Data mine Events of Interest ( EOI)

What is an event?

• An event is a significant change• An event is detected by observing a

pattern in data acquired over time from multiple sources: -Stock tickers, news, blogs,..

• Observe an anomaly:-The number of blogs per day about a

company are 50% higher in the last day than over the last year

How ?

• Employ advances in emerging technologies of AI such as :Natural Language Processing , Sentiment Analysis and Entity identification, Semantic Web.

• That arena is becoming hot and already sophisticated tools exist such as Thomson Reuters’s “OpenCalais” service.

The Market for Unstructured Data

• Only 2 % of firms employing electronic trading strategies with unstructured data in a machine readable format, estimates Aite Group.

• Some content is free; for paid content, firms will spend more than $75 million globally in 2009 and over $141 million by 2011, estimates Aite Group.

Use emerging edge technology:

Semantic Web:• “The Semantic Web is an evolving extension of the

World Wide Web in which the semantics of information and services on the web is defined, making it possible for machines to understand content” –Wikipedia

• Contextual search for more relevant information

– Proprietary algorithm to search and render information using Semantic Web standards

– Google just acquired Freebase – formerly the “poster child” of Semantic linked data . They contained and linked world’s knowledge. This is likely to cause a shift similar to WWW adoption.

Proprietary 14

Competitors:

• Relegence’s “FirstTrack” service

• Collective Intellect

• Bloomberg news “heat” analytics.