Talk of Europe @ DHBenelux2015

14
The possibilities and challenges of using linked data for academic research The case of the Talk of Europe project Laura Hollink Centrum Wiskunde & Informatica, Amsterdam Martijn Kleppe Erasmus University Rotterdam Max Kemman University of Luxembourg Astrid van Aggelen VU University Amsterdam Willem van Hage SynerScope, Helvoirt

Transcript of Talk of Europe @ DHBenelux2015

The possibilities and challenges of using linked data for academic research

The case of the Talk of Europe project

Laura Hollink Centrum Wiskunde & Informatica, Amsterdam Martijn Kleppe Erasmus University Rotterdam Max Kemman University of Luxembourg Astrid van Aggelen VU University Amsterdam Willem van Hage SynerScope, Helvoirt

European Parliament as Linked Data

• Goal: publish the plenary debates of the European Parliament as Linked Open Data

• Why is this important? A. Large scale analysis across

time spans B. To residents of the European

Union access to the proceedings of the European parliament is a formal right.

• Linked Data: a format for publishing data on the Web, with URI’s as permanent identifiers, designed for connecting pieces of data.

Data

14M statements about the 30K speeches by 3K speakers in 1K session days that were held in the EU parliament between 1999 and 2014

Links

Country namesMembers of Parliament

Members of Parliament + Parties Members of

ParliamentOnline database with background information about MEPs: “committee, party group and delegation membership, as well as leadership positions” [An Automated Database of the European Parliament. Bjørn Høyland, Indraneel Sircar, and Simon Hix, European Union Politics 10(1):143-152, 2009.]

Example 1: speeches that contain a certain keyword

Query: all speeches that contain the phrase “open data”

…. So let us go for open data, let us go for utilisation of all the instruments available to that end! …..

…. but there too governments are encouraging the use of open data to increase transparency, accountability and citizen participation ….

…. We already have many open data projects in the Member States and local authorities…..

Example 2: speeches that contain a certain keyword by date

Mentions of 'human rights'

dates

Frequency

0200

400

600

800

1999 2000 2001 2003 2004 2005 2006 2007 2009 2010 2011 2012 2013

Example 2: speeches that contain a certain keyword by country

AT BE BG CY CZ DE DK EE ES FI FR GB GR HR HU IE IT LT LU LV MT NL PL PT RO SE SI SK

Mentions of 'human rights' by country

01000

2000

3000

4000

5000

6000

7000

Example 3: background info about the MEPs

• MEPs that were not born in Europe.

Members of Parliament

Integrate data from the EU parliament with external datasets

What other knowledge do we have available?

GBP Region Population density Neighbouring countries

Age Religion

Education Spouse / children

Previous occupations Place of birth/residence

Speeches in the Italian parliament

Membership of committees Leadership positions

DEMO tomorrow 14:20-16:00

Discussion: What happens if we use linked data as source data for research?

• •

Implications for use?

Credibility • Who created it? How? • The quality may vary:

• EP vs. Wikipedia

Completeness • How complete is it? Is there a

way to tell how complete it is? • Completeness may vary:

• EP vs. wikipedia

Update frequency • When was the data last

updated? • Update frequency may vary:

• EP vs. “An automated database of the EP”

Credibility, completeness, update frequency of the links

• Who made them? How? When? How complete are they?

Message: the need for dataset evaluation is exacerbated when using linked data

How to use this data, in practice

The bad news: we don’t have a friendly user interface :’(!!!

!The good news: our data + all sources we link to are openly available for everyone :)!!Options for use:

1. Tell us what you want to know and we will write you a query.

2. Go to our website, copy-paste an example query into the query editor.

3. Go to our website, write a SPARQL query in the query editor

!4. Query our SPARQL endpoint

programmatically.!Website: via http://talkofeurope.eu/data/

Use of the data during three Creative Camps

• 3 events of one week each, where people are invited to work with our data on-site.!

• Outcome CC #1 in Hilversum:• Links to the Italian

parliament.• Detection of people who

speak about an unusual mix of topics.

• Sentiment analysis

Talk of Europe team

Martijn Kleppe Henri Beunders Max Kemman Jill Briggeman

Astrid van Aggelen

Laura Hollink

Marnix van Berchum