From unstructured data to structured journalism
-
Upload
giuseppefutia -
Category
Data & Analytics
-
view
529 -
download
2
Transcript of From unstructured data to structured journalism
From unstructured data to structured journalism
Giuseppe FutiaNexa Center for Internet and Society, Politecnico di Torino (DAUIN)
April 12, 2016Master in Giornalismo "Giorgio Bocca" di Torino
Nexa Center for Internet & Society at Politecnico di Torino
Website: http://nexa.polito.it/
Communication ManagerWebsite, social media,
mailing-list
Research FellowGitHub account:
https://github.com/giuseppefutia
Start with Why
Presentation ofJonathan Stray
(Journalist, data scientist)
YouTube Video:
https://www.youtube.com/watch?v=z4wHiv4bs-Y
Who said What?Best tool for multi-lingual
journalists
#newsHack 2016
organized byBBC Connected Studio
Team
• 1 Product manager
• 1 Software engineer
• 2 Researchers
Team
• 1 Product manager
• 1 Software engineer
• 2 Researchers
• And journalists…?
New York Times, BBC, Washington Post
Source: Poynter.org
Using "machine learning," technologists at news outlets around the world are helping newsrooms eliminate extra time-consuming tasks and giving humans more time to do what they do best: reporting the news (Poynter.org)
Juicer BBC News Labs
Linked Data CloudSource:
https://en.wikipedia.org/wiki/Linked_data
Knowledge Map Washington Post
Panama papers leak Source: Wired.com
Panama papers leak
• 11.5 million of documents
– 4.8 million of mails
– 4 million of database entries
– 2 million of PDFs
– 1 million of images
– 320.000 text documents
• 100 news organisations and 400 journalists
Panama papers processing
• Sort and organise the files
• Index these files
• Bring out all of the metadata
• Investigate data from the big data and analytical perspective
Panama papers result
• The final database: 30 per cent of the original data size
• Bring out entities: first names and second names
• Analytics to find how these names refer to the documents
TellMeFirst http://tellmefirst.polito.it
Public Contracts http://public-contracts.nexacenter.org/
Data journalism as a framework
BBC News Labs Project
“To help news organisationscurate stories that scale, adapt and connect across platforms
and use cases”
Thanks!
GitHub Repository
https://github.com/giuseppefutia/