Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel...

13
Predicting the News of Tomorrow Using Patterns in Web Search Queries Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology

Transcript of Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel...

Predicting the News of Tomorrow Using Patterns in

Web Search Queries

Kira Radinsky, Sagie Davidovich, Shaul MarkovitchComputer Science Department Technion – Israel Institute of technology

Goal

"We find that changes in oil prices strongly predict future stock market returns in many countries in the world... The impact of this predictability on stock returns tends to be large.“ (“Striking Oil: Another Puzzle?”Gerben Driesprong, Benjamin

Maat and Ben Jacobsen)

Oil Peaks and Stock Market Crashes

NEW YORK – Crude-oil futures shot up as commodities

markets benefited from a surge in investor confidence.

Light, sweet crude for January delivery settled $4.57, or

9.2%, higher at $54.50 a barrel on the New York

Mercantile Exchange. January Brent crude on the ICE

futures exchange settled $4.74, or 9.6%, higher at

$53.93 a barrel.

Humans can predict eventsCan it be done automatically?

Solution OutlineIdentify events that occur today

More than 0.5 billion daily searches on the web (2008)

Many queries are related to current events

Analyze what events tend to follow today’s events in the pastHistory repeats itselfQuery log archives

• Google Hot Trends• Technorati• Online news (Newzingo)

Knowledge Sources

July 08

Aug 08

Sep08

July 08

Aug 08

Sep08

Identifying EventsHurricane Ivan

Hurricane Wilma Hurricane

Dean

Hurricane Gustav

Hurricane Katrina

Peak Detection AlgorithmEach maximum point my has at most two neighboring minimumpoints. We consider a maximum point as a peak if:

1. Local maximum my> Δ1 (high-pass filter).2. The difference between the point my and the lowest of its neighboring minimum points is above Δ2.

Prediction

Indication Weight1. : How many of the peaks of w2 (future

candidate) appeared k days after w1 (today’s term)

2. Saliency of w1: Significance of the peak in the search volume.

hurricane

Storm

Flood

Weather

Evacuation

Gas

Economics

TalibanWar

South Asia

china

pope

texans

0.85

0.40

0.10

0.36

0.12

0.30

0.05

0.01

0.08

Goal: For each candidate term evaluate the likelihood of it to appear in the future, given today’s terms.

Likelihood to appear in k days

Future candidate

terms

Today’s salient

terms

Indication weight on

the candidate

0.9

0.7

212 | tktt wPwwP

Hurricane

Gas

Oil, Gas May Soar as Storm Shuts U.S. Gulf ProductionCrude-oil and natural-gas prices may soar after Hurricane Katrina moved into production regions of the Gulf of Mexico, forcing companies including Exxon Mobil Corp. and Chevron Corp. to close operations

Gas Prices Rise

as Industry Assesses Storm

Damage HOUSTON — Gasoline prices rose

Saturday by an average of five

cents a gallon across the country as

the oil industry anticipated

disruptions at several refineries

along the Texas coast because of

Hurricane Ike.

Hurricane

Empirical MethodologyTesting on aggregation of 4500 online news

sources

What is “to appear in the news”Appear significantly more times than its

average in the past year

Precision at 100

Empirical Evaluation

• Baseline method - What happens today happens tomorrow• Each point is how many of the 100 appeared• A total of 30 days of experiments

Empirical Evaluation

• Baseline method - What happens today happens tomorrow• Each point is an average of results from 30 days of tests

Empirical Evaluation

• Baseline-related – 100 terms which are related to today’s terms are selected randomly• Each point is how many of the 100 appeared• A total of 30 days of experiments

Baseline - Related

Baseline - Related

Empirical Evaluation

• Cross-Correlation - Not using indication weights• Each point is how many of the 100 appeared• A total of 30 days of experiments

Conclusions

A new method for prediction of global future events using their patterns in the past.

A novel application of aggregated collection of search queries, represented as a time series of a search term.

Testing methodology for evaluating such news prediction algorithms.