Guest Lecture: Linked Open Data for the Humanities and Social Sciences

90
Linked Open Data for the Humanities and Social Sciences Use cases: linking government data to news data in the PoliMedia and Talk of Europe projects Laura Hollink Centrum Wiskunde & Informatica (CWI) KU Leuven Guest lecture November 10, 2016

Transcript of Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Linked Open Data for the Humanities and Social Sciences

Use cases: linking government data to news data in the PoliMedia and Talk of Europe projects

Laura HollinkCentrum Wiskunde & Informatica (CWI)

KU LeuvenGuest lecture November 10, 2016

Linked Open Data in the SSH?

Example question:

How did the debate about the financial crisis in Greece develop?

Searching the proceedings of the European Parliament

"Greece" in the plenary meetings of the European Parliament

Year

Nr.

of m

entio

ns

050

100

150

200

1999 2000 2001 2001 2002 2003 2004 2005 2006 2006 2007 2008 2009 2010 2010 2011 2012 2013

Searching through newspaper archives

Mentions of “Griekenland” in the Dutch newspaper De Telegraaf

Search volumes of a search engine

Frequency of the query “Greece” on Google

http://www.google.com/trends

Search volumes of a search engine

Frequency of the query “Greece” on Google

http://www.google.com/trends

We need:

✦open access to data ✦to combine sources ✦more complex queries

Linked Open Data in the SSH?

Example question:

Which political debate in the post-war period has attracted most media attention?

“De Indonesische Quaestie"

“De Indonesische Quaestie"

To answer this question we need to go through all newspaper articles about all political debates…

“De Indonesische Quaestie"

To answer this question we need to go through all newspaper articles about all political debates…

We need:

✦open access to data ✦to combine sources ✦more complex queries

Linked Open Data in the SSH?

Example question:

What are the differences between different media?

Example question:

Has the coverage changed over time?

A method of publishing structured data on the Web in such a way that it can be linked and queried by computers as well as people.

A very brief introduction…

✦open access to data ✦to combine sources ✦more complex queries

Linked Open Data

A method of publishing structured data on the Web in such a way that it can be linked and queried by computers as well as people.

A very brief introduction…

✦open access to data ✦to combine sources ✦more complex queries

Linked Open Data

Thing Type Population Airport

Amsterdam City 1364422 Schiphol

…. … …. …

Structured data

ex:Amsterdam a ex:City . ex:Amsterdam dbo:populationUrban "1330235"^^xsd:integer . ex:Amsterdam dbp:cityServed ex:Schiphol .

Comparable to the data one may find in a database table

Represented as RDF triples

On the WebEverything is identified by URIs (documents, concepts, instances, links)http://example.org/cities#Amsterdam http://example.org/City http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/population

On the Web

Triples can be distributed over the Web

Everything is identified by URIs (documents, concepts, instances, links)http://example.org/cities#Amsterdam http://example.org/City http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/population

http://example.org/cities#Amsterdam a ex:City.

http://example.org/cities#Amsterdam dbo:populationUrban "1364422"

http://example.org/cities#Amsterdam dbp:cityServed ex:Schiphol

On the Web

Amsterdamhas population

“1364422” City Schiphol

is a has airport

Triples can be distributed over the Web

Everything is identified by URIs (documents, concepts, instances, links)http://example.org/cities#Amsterdam http://example.org/City http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/population

http://example.org/cities#Amsterdam a ex:City.

http://example.org/cities#Amsterdam dbo:populationUrban "1364422"

http://example.org/cities#Amsterdam dbp:cityServed ex:Schiphol

Forming a graph

The Web of Data vs. the Web of Documents

The Web of Data vs. the Web of Documents

The Web of Data vs. the Web of Documents

Note the differences Web of Data <-> database:• Non-unique naming assumption• Open World assumption• Everyone can say anything about anything

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Querying Linked Open Data

• A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language”

• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/sparql11-query/

:JamesDean ?what :Giant.

?who :playedIn :Giant.

:JamesDean :playedIn ?what .

:JamesDean :playedIn :Giant .

:Giant

:JamesDean

:playedIn

Data

Query Result

Two example projects of Linked Open Data in SSH: data modelling and linking in the PoliMedia and

Talk of Europe projects

Linking government data to news data

Transcriptions of all 9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches.

Roughly 1.8 Million news bulletins between 1937-1984

(We only use 1945-1995)

Archives of hundreds of newspaper with tons of newspaper issues or 10’s of Millions of articles between 1618-1995.

(We only use 1945-1995)

Links in PoliMedia

is about

• 3 Million links

Step 1: Translate the Dutch parliamentary debates to the standard structured web format RDF

nl.proc.sgd.d.194519460000002

nl.proc.sgd.d.194519460000002.1

PartOfDebateDebate

http://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002

http://statengeneraaldigitaal.nl/

http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdf

nl.proc.sgd.d.19720000002

Handelingen Verenigde Vergadering...

Dutch

1945-11-20rdf:type

dc:id

dc:source

dc:source

dc:publisher

dc:language

dc:date

hasPart

rdf:type

nl.proc.sgd.d.194519460000002.1.1hasPart

DebateContext

rdf:type

nl.proc.sgd.d.194519460000002.1.2

Speech

rdf:type

hasPart

nl.proc.sgd.d.194519460000002.1.3

hasSubsequentSpeech

"Mijnheer de Voorzitter, de Commissie van …"

hasSpokenText

sem:hasActorSpeaker_0006

4

Party_kvp

hasParty

hasSpeaker

member_of _parliament

"De voorzitter opent de vergadering…"

hasText

http://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocr

coveredIn

Party

KVP

Katholieke Volkspartijrdf:type

hasAcronym

hasFullName

Joannes Antonius James

Bargefoaf:firstName

foaf:lastName

Bargerdfs:label

http://resolver.politicalmashup.nl/nl.m.00064

dc:source

Politician

rdf:typehasRole

nl.proc.sgd.d.194519460000002.2

hasSubsequentPartOfDebate

Step 2: Discovering links between politics and news

Detect topics in

speeches

Create queries

Search newspaper

archive

Topics

Named Entities

Name of speaker

Detect Named

Entities in speeches

Candidate articles

Queries

Rank candidate

articles

Links between speeches

and articles

Debates

Date of debate

Step 2: Discovering links between politics and news

Detect topics in

speeches

Create queries

Search newspaper

archive

Topics

Named Entities

Name of speaker

Detect Named

Entities in speeches

Candidate articles

Queries

Rank candidate

articles

Links between speeches

and articles

Debates

Date of debate

Intuition 1: The name of the speaker should appear in the article and the article should be published within a week of the debate

Step 2: Discovering links between politics and news

Detect topics in

speeches

Create queries

Search newspaper

archive

Topics

Named Entities

Name of speaker

Detect Named

Entities in speeches

Candidate articles

Queries

Rank candidate

articles

Links between speeches

and articles

Debates

Date of debate

Intuition 1: The name of the speaker should appear in the article and the article should be published within a week of the debate

Intuition 2: the more the article and the speech overlap in terms of topics and named entities, the more they are related.

Representation of links

architecten architectsskos:exactMatch

Representation of links

architecten

architects

Link 001

skos:exactMatch

handmatigL. Hollink

concept1

concept2

link type

link methodeauteur

architecten architectsskos:exactMatch

Representation of links

architecten

architects

Link 001

skos:exactMatch

handmatigL. Hollink

concept1

concept2

link type

link methodeauteur

architecten architectsskos:exactMatch

• This is an example of the“design pattern” referred to as n-ary relations or relations as classes.

• It allows us to save provenance information about the statements we create.

Evaluation of Links

Recall that we aim to use the links to answer a research question.

Can we still do that if there are errors in the links?

How many errors are acceptable?

We need to know the quality!

Evaluation of Links

How would you determine the quality of the links?

Recall that we aim to use the links to answer a research question.

Can we still do that if there are errors in the links?

How many errors are acceptable?

We need to know the quality!

Evaluation of Links

How would you determine the quality of the links?

1. Manually rating (a sample of) mappings

• relatively cheap and easy to interpret

• only precision, no recall

Recall that we aim to use the links to answer a research question.

Can we still do that if there are errors in the links?

How many errors are acceptable?

We need to know the quality!

Evaluation of Links

How would you determine the quality of the links?

1. Manually rating (a sample of) mappings

• relatively cheap and easy to interpret

• only precision, no recall

2. Comparison to manually found links

• precision and recall

• more expensive! (but: crowd sourcing?)

Recall that we aim to use the links to answer a research question.

Can we still do that if there are errors in the links?

How many errors are acceptable?

We need to know the quality!

Evaluation of links in PoliMedia

How good are the links?

• We ask 2 raters to manually score pairs of newspaper articles and speeches.

• a pilot study showed that we needed more than a 2 point scale.

• inter-rater agreement: 0.5 -> acceptable, but not high.

• Score: 80%

Evaluation of links in PoliMedia

Score Setting 1 Setting 2 Setting 3

I don’t know 0,14 0,15 0,08

0 - unrelated 0,38 0,23 0,12

1- related 0,29 0,36 0,36

2- explicit mention of the debate 0,19 0,26 0,44

1+2 0,48 0,62 0,8

How good are the links?

• We ask 2 raters to manually score pairs of newspaper articles and speeches.

• a pilot study showed that we needed more than a 2 point scale.

• inter-rater agreement: 0.5 -> acceptable, but not high.

• Score: 80%

Evaluation of links in PoliMedia

Score Setting 1 Setting 2 Setting 3

I don’t know 0,14 0,15 0,08

0 - unrelated 0,38 0,23 0,12

1- related 0,29 0,36 0,36

2- explicit mention of the debate 0,19 0,26 0,44

1+2 0,48 0,62 0,8

How many links did we miss?

• We ask the raters to manually search the archives of the National Library for related articles.

• Score: 62%

How good are the links?

• We ask 2 raters to manually score pairs of newspaper articles and speeches.

• a pilot study showed that we needed more than a 2 point scale.

• inter-rater agreement: 0.5 -> acceptable, but not high.

• Score: 80%

Results

• An open data set of Dutch parliamentary debates,

• with almost 3 Million links between 450.000 speeches and 1.5 Million news paper articles and radio bulletins at the National Library.

• accessible though a Web demonstrator and through a Sparql Enpoint

Demo

Online database: SPARQL endpoint

• A service to query a knowledge base using the SPARQL query language.

“All speeches with more than 60 associated news items.”

The European Parliament as Linked Open Data

Laura Hollink Centrum Wiskunde & Informatica, Amsterdam Astrid van Aggelen VU University Amsterdam Martijn Kleppe Erasmus University Rotterdam Henri Beunders Erasmus University Rotterdam Jill Briggeman Erasmus University Rotterdam Max Kemman University of Luxembourg

Talk of Europe goals

• To publish the entire plenary debates of the European Parliament as Linked Open Data

• To improve access to the data• To enable large scale analysis across time spans.‣ To residents of the European Union access to the proceedings

of the European parliament is a formal right.

Step 1: Translate the European parliamentary debates to Linked Open Data

Step 1: Translate the European parliamentary debates to Linked Open Data

14M RDF statements about the 30K speeches in 23 languages by 3K speakers in 1K session days that were held in the EU parliament between 1999 and 2014

Step 1: Translate the European parliamentary debates to Linked Open Data

Modelling debates as events, not documents

• `

lpv:number

lpv:month

lpv:year

rdf:type

lp:eu/plenary/SessionDay/2013-11-20

lp:eu/plenary/2013-11-20/AgendaItem_6

lp:eu/plenary/2013-11-20/Speech_103

lp:eu/plenary/Session/2013-11

"2013-11-20"^xsd:date

"11"^xsd:gMonth

"2013"^xsd:gYear

lp:eu/plenary/2013-11-20/AgendaItem_7

lp:eu/plenary/2013-11-20/Speech_104

lpv:hasSubsequent

lpv:hasSubsequent

dc:date

dc:date

dc:date

103^xsd:integer

6^xsd:integer lpv:number

dc:hasPart dc:isPartOf

dc:hasPart dc:isPartOf

dc:isPartOfdc:hasPart

lpv:eu/plenary/Speech

lpv:eu/plenary/AgendaItem

lpv:eu/plenary/SessionDay

lpv:eu/plenary/Sessionrdf:type

rdf:type

rdf:type

PREFIX lpv: <http://purl.org/linkedpolitics/vocabulary/> PREFIX lp: <http://purl.org/linkedpolitics/> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX foaf: <http://xmlns.com/foaf/0.1/>

How to relate a speech the party of the speaker?

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUParty/SomeParty

lpv:hasParty

How to relate a speech the party of the speaker?

Why is this not a good solution?

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUParty/SomeParty

lpv:hasParty

How to relate a speech the party of the speaker?

Why is this not a good solution?

1. A person might be a member of more than one party (at different times)

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUParty/SomeParty

lpv:hasParty

How to relate a speech the party of the speaker?

Why is this not a good solution?

1. A person might be a member of more than one party (at different times)

2. Since there is no link between a speech and a party, queries for all speeches spoken by the members of a certain party become very complicated.

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUParty/SomeParty

lpv:hasParty

How to relate a speech to the party of the speaker?

"20111126"^ xsd:date

"20090716"^ xsd:date

lp:political-Function102

lpv:beginning

lpv:end

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>

lpv:role

lp:EUCommittee/Committee_on_Legal_Affairs

lp:Role/substitutelpv:political

Function

lpv:institution

lpv:speaker

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUparty_NI

lpv:party

How to relate a speech to the party of the speaker?

"20111126"^ xsd:date

"20090716"^ xsd:date

lp:political-Function102

lpv:beginning

lpv:end

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>

lpv:role

lp:EUCommittee/Committee_on_Legal_Affairs

lp:Role/substitutelpv:political

Function

lpv:institution

lpv:speaker

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUparty_NI

lpv:party

"20111126"^ xsd:datelp:political-Function101

lpv:end"20111126"^ xsd:date

lpv:beginning

"20071114"^xsd:date

lpv:PoliticalFunction

"20090716"^ xsd:date

lp:political-Function102

lpv:beginning

lpv:end

lp:EUmember_1023

lp:politicalFunction

lp:eu/plenary/2009-10-21/Speech_140>

lpv:role

lp:EUCommittee/Committee_on_Legal_Affairs

lp:Role/substitutelp:Role/member

lp:EUParty/NI

lpv:rolelpv:political

Function

lpv:institutionlpv:institution rdf:type

lpv:speaker

rdf:type

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUparty_NI

lpv:party

How to relate a speech to the party of the speaker?

"20111126"^ xsd:date

"20090716"^ xsd:date

lp:political-Function102

lpv:beginning

lpv:end

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>

lpv:role

lp:EUCommittee/Committee_on_Legal_Affairs

lp:Role/substitutelpv:political

Function

lpv:institution

lpv:speaker

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUparty_NI

lpv:party

"20111126"^ xsd:datelp:political-Function101

lpv:end"20111126"^ xsd:date

lpv:beginning

"20071114"^xsd:date

lpv:PoliticalFunction

"20090716"^ xsd:date

lp:political-Function102

lpv:beginning

lpv:end

lp:EUmember_1023

lp:politicalFunction

lp:eu/plenary/2009-10-21/Speech_140>

lpv:role

lp:EUCommittee/Committee_on_Legal_Affairs

lp:Role/substitutelp:Role/member

lp:EUParty/NI

lpv:rolelpv:political

Function

lpv:institutionlpv:institution rdf:type

lpv:speaker

rdf:type

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUparty_NI

lpv:party

"20111126"^ xsd:datelp:political-Function101

lpv:end"20111126"^ xsd:date

lpv:beginning

"20071114"^xsd:date

lpv:PoliticalFunction

"20090716"^ xsd:date

lp:political-Function102

lpv:beginning

lpv:end

lp:EUmember_1023

lp:politicalFunction

lp:eu/plenary/2009-10-21/Speech_140>

lpv:role

lp:EUCommittee/Committee_on_Legal_Affairs

lp:Role/substitutelp:Role/member

lp:EUParty/NI

lpv:rolelpv:political

Function

lpv:institutionlpv:institution rdf:type

lpv:spokenAs

lpv:speaker

lpv:spokenAs

rdf:type

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUparty_NI

lpv:party

How to relate a speech to the party of the speaker?

"20111126"^ xsd:date

"20090716"^ xsd:date

lp:political-Function102

lpv:beginning

lpv:end

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>

lpv:role

lp:EUCommittee/Committee_on_Legal_Affairs

lp:Role/substitutelpv:political

Function

lpv:institution

lpv:speaker

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUparty_NI

lpv:party

"20111126"^ xsd:datelp:political-Function101

lpv:end"20111126"^ xsd:date

lpv:beginning

"20071114"^xsd:date

lpv:PoliticalFunction

"20090716"^ xsd:date

lp:political-Function102

lpv:beginning

lpv:end

lp:EUmember_1023

lp:politicalFunction

lp:eu/plenary/2009-10-21/Speech_140>

lpv:role

lp:EUCommittee/Committee_on_Legal_Affairs

lp:Role/substitutelp:Role/member

lp:EUParty/NI

lpv:rolelpv:political

Function

lpv:institutionlpv:institution rdf:type

lpv:speaker

rdf:type

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUparty_NI

lpv:party

"20111126"^ xsd:datelp:political-Function101

lpv:end"20111126"^ xsd:date

lpv:beginning

"20071114"^xsd:date

lpv:PoliticalFunction

"20090716"^ xsd:date

lp:political-Function102

lpv:beginning

lpv:end

lp:EUmember_1023

lp:politicalFunction

lp:eu/plenary/2009-10-21/Speech_140>

lpv:role

lp:EUCommittee/Committee_on_Legal_Affairs

lp:Role/substitutelp:Role/member

lp:EUParty/NI

lpv:rolelpv:political

Function

lpv:institutionlpv:institution rdf:type

lpv:spokenAs

lpv:speaker

lpv:spokenAs

rdf:type

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUparty_NI

lpv:party

Note: this is another example of the design pattern called n-ary relations or relations as classes.

Step 2: create links to external data sources

Step 2: create links to external data sources

Step 2: create links to external data sources

(links made by the EC)

Linking Members of Parliament to Wikipedia / DBpedia

how?

Linking Members of Parliament to Wikipedia / DBpedia

Linking Members of Parliament to Wikipedia / DBpedia

• String matching is the most important feature in the linking process.

• “nearly all [alignment systems] use a string similarity metric” [12]

• stopping and stemming is not helpful! Nor is using WordNet synonyms. [12]

[12] Cheatham, M., & Hitzler, P. String similarity metrics for ontology alignment. ISWC 2013.

http://www.dbpedia.org/page/Judith_Sargentini

Example query 1: speeches that contain a certain keyword

Query: all speeches that contain the phrase “open data”

…. So let us go for open data, let us go for utilisation of all the instruments available to that end! …..

…. but there too governments are encouraging the use of open data to increase transparency, accountability and citizen participation ….

…. We already have many open data projects in the Member States and local authorities…..

Example 2: speeches that contain a certain keyword by date

"Slovenia" in the plenary meetings of the European Parliament

Year

Nr.

of m

entio

ns

020

4060

80100

1999 2000 2001 2003 2004 2005 2006 2007 2008 2010 2011 2012 2013

Example 2: speeches that contain a certain keyword by date

"Slovenia" in the plenary meetings of the European Parliament

Year

Nr.

of m

entio

ns

020

4060

80100

1999 2000 2001 2003 2004 2005 2006 2007 2008 2010 2011 2012 2013

Example 2: speeches that contain a certain keyword by date

Mentions of 'human rights'

dates

Frequency

0200

400

600

800

1999 2000 2001 2003 2004 2005 2006 2007 2009 2010 2011 2012 2013

Example 3: speeches that contain a certain keyword by country

AT BE BG CY CZ DE DK EE ES FI FR GB GR HR HU IE IT LT LU LV MT NL PL PT RO SE SI SK

Mentions of 'human rights' by country

01000

2000

3000

4000

5000

6000

7000

Example 4: the number of speeches per EU country

SELECT ?c (COUNT(?c) as ?count)

WHERE {

?x rdf:type <http://purl.org/linkedpolitics/vocabulary/eu/plenary/Speech>.

?x <http://purl.org/linkedpolitics/vocabulary#speaker> ?p.

?p <http://purl.org/linkedpolitics/vocabulary#countryOfRepresentation> ?c

} GROUP BY ?c LIMIT 50

Example 5: include data external sourceQuery: MEPs that were born outside Europe.

Members of Parliament

(DBpedia contains info on birthplace, birth date, schools, careers, residence, family, etc. )

Example 5: include data external sourceQuery: MEPs that were born outside Europe.

Members of Parliament

(DBpedia contains info on birthplace, birth date, schools, careers, residence, family, etc. )

Intermezzo: one-question Quiz Reasoning on the Web of Data

Question: What can we conclude from this graph?A. Stihler is a member of exactly 3 partiesB. Stihler is a member of at least 3 partiesC. Stihler is a member of at most 3 partiesD. None of the aboveE. All of the aboveF. Other, namely ….

http://purl.org/linkedpolitics/EUmember_4545 "Catherine Stihler"foaf:name

http://purl.org/linkedpolitics/EUParty/PES http://dbpedia.org/resource/Party_of_European_Socialists

http://dbpedia.org/resource/Progressive_Alliance_of_Socialists_and_Democrats

:memberOf:memberOf

:memberOf

Results

• An open data set of EU parliamentary debates,

• with links to other sources on the Web of Data

• accessible though a through a Sparql Enpoint

Reflection: to what extent can we now answer these questions?

How did the debate about the financial crisis in Greece develop?

Which political event has attracted most media attention?

What are the differences between different media?

Has the coverage changed over time?

Reflection: to what extent can we now answer these questions?

How did the debate about the financial crisis in Greece develop?

Which political event has attracted most media attention?

What are the differences between different media?

Has the coverage changed over time?

We can, but:• what is the influence of the selection of newspapers

available at the National Library?• what was the quality of the digitisation process (OCR)?• How good is our linking approach (based on

automatically detected entities and topics)?• How much can we trust the quality of external sources?

➡ How to handle these uncertainties is one of our research questions. We call this Tool Criticism

Research directions at CWI

Transparent, reproducible analysis of large volumes of connected, heterogenous, multimodal data. 1. How do we automatically link heterogeneous datasets?

2. How do we interpret links between datasets of different quality and certainty?

3. How do we handle the fact that knowledge evolves?

4. How do we design interfaces that allow scholars to study the datasets

• including the links between them?

• while assessing the reliability of the findings?

Research directions at CWI

Transparent, reproducible analysis of large volumes of connected, heterogenous, multimodal data. 1. How do we automatically link heterogeneous datasets?

2. How do we interpret links between datasets of different quality and certainty?

3. How do we handle the fact that knowledge evolves?

4. How do we design interfaces that allow scholars to study the datasets

• including the links between them?

• while assessing the reliability of the findings?

Data Science - Big Data - Web of Data

PoliMedia demo: http://polimedia.nl/ PoliMedia project video: https://youtu.be/u24oRCj7xrQ

Talk of Europe project: http://talkofeurope.eu/ Talk of Europe data: purl.org/linkedpolitics Talk of Europe project video: https://youtu.be/GxA53gkCe0o

My website: http://homepages.cwi.nl/~hollink/

A. van Aggelen, L. Hollink, M. Kemman, M. Kleppe & H. Beunders. The debates of the European Parliament as Linked Open Data. Semantic Web Journal. In press, 2016.

M. Kleppe, L. Hollink, J. Oomen, M. Kenman, D. Juric, J. Blom, H. Beunders. PoliMedia - Improving the Analyses of Radio & Newspaper coverage of Political Debates. First prize winner of the LinkedUp Veni Competition, presented at the Open Knowledge Conference (OKCon), Geneva, September 2013..

I’d be happy to answer any questions!