Event data in forecasting models: Where does it come from...

Event data in forecasting models:Where does it come from, what can it do?

Philip A. Schrodt

Parus AnalyticsCharlottesville, Virginia, USA

schrodt735@gmail.com

Paper presented at the Conference on Forecasting and EarlyWarning of Conflict, Peace Research Institute, Oslo

April 22, 2015

Why is event data suddenly attracting attention after50 years?

I Rifkin [NYT March 2014]: The most disruptivetechnologies in the current environment combine networkeffects with zero marginal cost

I Key: zero marginal costs even though open source softwareis still “free-as-in-puppy”

I ExamplesI Operating systems: LinuxI General purpose programming: gcc, PythonI Statistical software: RI Encyclopedia: WikipediaI Scientific typesetting and presentations: LATEX

EL:DIABLOEvent Location: Dataset in a Box, Linux Option

I Open source: https://openeventdata.github.io

I Full modular open-source pipeline to produce daily eventdata from web sources. http://phoenixdata.org

I Scraper from white-list of RSS feeds and web pages

I Event coding from any of several coders: TABARI,PETRARCH, others

I Geolocation: “Cliff” open source geolocater

I “One-A-Day” deduplication keeping URLs of all duplicates

I Designed for implementation in inexpensive Linux cloudsystems

I Supported by Open Event Data Alliancehttp://openeventdata.org

An incident must first generate one or more texts

This is the biggest challenge to accuracy. At least the followingfactors are involved

I A reporter actually witnesses, or learns about, the incident

I An editor thinks incident is “newsworthy”: This has abimodal distribution of routine incidents such asannouncements and meeting, and high-intensity incidents:“when it bleeds, it leads.”

I Report is not formally or informally censored

I Report corresponds to actual events, rather than beingcreated for propaganda or entertainment purposes

I News coverage is biased towards the coverage of certaingeographical regions, and generally “follows the money”

I Reports will be amplified if they are repeated in additionalsources

Humans use multiple sources to create narratives

I Redundant information is automatically discarded

I Sources are assessed for reliability and validity

I Obscure sources can be used to “connect the dots”

I Episodic processing in humans provides a pleasantdopamine hit when you put together a “median narrative”:this is why people read novels and watch movies.

Machines latch on to anything that looks like an event

This must be filtered

Implications of one-a-day filtering

I Expected number of correct codes from a single incidentincreases exponentially but is asymptotic to 1

I Expected number of incorrect codings increases linearlyand is bounded only by the number of distinct codes

Tension in two approaches to using machines [Isaacson]

I “Artificial intelligence” [Turing, McCarthy]: figure out howto get machines to think like humans

I “Computers are tools” [Hopper, Jobs]: Design systems tooptimally complement human capabilities

Does this affect the common uses of event data?

I Trends and monitoring: probably okay, at least forsophisticated users

I Narratives and trigger models: a disaster

I Structural substitution models: seem to work pretty wellbecause these are usually based on approaches that extractsignal from noise

I Time series models: also work well, again because thesehave explicit error models

I Big Data approaches: who knows?

Weighted correlation between two data sets

wtcorr =

A−1∑i=1

A∑j=i

Nri,j (1)

I A = number of actors;

I ni,j = number of events involving dyad i,j

I N = total number of events in the two data sets whichinvolve the undirected dyads in A x A

I ri,j = correlation on various measures: counts andGoldstein-Reising scores

Correlations over time: total counts andGoldstein-Reising totals

Correlations over time: pentacode counts

Dyads with highest correlations

Dyads with lowest correlations

What is to be done: Part 1

I Open-access gold standard cases, then use the estimatedclassification matrices for statistical adjustments

I Systematically assess the trade-offs in multiple-source data,or create more sophisticated filters

I Evaluate the utility of multiple-data-set methods such asmultiple systems estimation

I Systematic assessment of the native language versusmachine translation issue

I Extend CAMEO and standardize sub-state actor codes:canonical CAMEO is too complicated, but ICEWSsubstate actors are too simple

What is to be done: Part 2

I Automated verb phrase recognition and extraction: thiswill also be required to extend CAMEO. Entityidentification, in contrast, is largely a solved problem(ICEWS: 100,000 actors in dictionary)

I Establish a user-friendly open-source collaborationplatform for dictionary development

I Systematically explore aggregation methods: ICEWS has10,742 aggregations, which is too many

I Solve—or at least improve upon—the open sourcegeocoding issue

I Develop event-specific coding modules

Thank you

Email:schrodt735@gmail.com

Slides:http://eventdata.parusanalytics.com/presentations.html

Data: http://phoenixdata.org

Software: https://openeventdata.github.io/

Papers:http://eventdata.parusanalytics.com/papers.html

Event data in forecasting models: Where does it come from...

Documents

Transcript of Event data in forecasting models: Where does it come from...

Computers and Weather forecasting WeatherWeather forecasting.

AFFECTIVE FORECASTING PSYC 385. Affective Forecasting - Definition.

Chapter 4 Forecasting Production Planning Overview What is forecasting? Types of forecasts 7 steps of forecasting Qualitative forecasting.

Economic Forecasting Lecture 7: Macroeconomic Forecasting

Judgment in forecasting Nigel Harvey. Harvey Evidence 2/11/4 Studying judgment in forecasting Forecasting using judgment alone (judgmental forecasting).

Forecasting & Replenishment. Forecasting Forecasting & Replenishment Historical Usage Forecasting.

Political Instability Task Force Worldwide Atrocities ...eventdata.parusanalytics.com/data.dir/Atrocities.codebook.1.0B2.pdf · Political Instability Task Force Worldwide Atrocities

Structures of Militarized Conﬂict in a Post-Sovereign Worldeventdata.parusanalytics.com/presentations.dir/Schrodt... · 2014-09-12 · The Westphalian-Clausewitzian-realist paradigm

Introduction Definition Forecasting Technology Technological Forecasting Process Forecasting Techniques Current Status Technology Discontinuity Summary.

Practical Methods for the Analysis of “Big Data”eventdata.parusanalytics.com/presentations.dir/Schrodt.UNC.BigData.Module3.pdfPractical Methods for the Analysis of “Big Data”

Forecasting Chapter 6. Economic Forecasting Economic Forecasting.

Introduction to (Demand) Forecasting. Topics Introduction to (demand) forecasting Overview of forecasting methods A generic approach to quantitative forecasting.

Comparing Methods for Generating Large Scale Political ...eventdata.parusanalytics.com/papers.dir/Schrodt.TAD-NYU.EventData.pdfComparing Methods for Generating Large Scale Political

Forecasting Methods Chapter 9 Forecasting How Important Is Forecasting?How Important Is Forecasting? Is Forecasting Only Financial?Is Forecasting Only.

Patrick T. Brandt Associate Professor - Computational …eventdata.parusanalytics.com/papers.dir/RHMethods... · · 2014-09-12Patrick T. Brandt Associate Professor School of Economic

WSDOT Traffic Forecasting Guide. Volume 2 – Forecasting ...

Weather Forecasting Guide Forecasting Temperatures Forecasting Precipitation Reading Weather Maps.

NOAA SPACE WEATHER PREDICTION CENTER Space Weather Forecasting Overview Solar Flare Forecasting Geomagnetic Forecasting Solar Energetic Particle Forecasting.

Chapter 9 Forecasting. Forecasting Demand Why is demand forecasting important? What is bad about poor forecasting? What do these organizations forecast:

Practical Methods for the Analysis of ``Big Data''eventdata.parusanalytics.com/.../Schrodt.UNC.BigData.Module1.pdf · Practical Methods for the Analysis of “Big Data ... supervised