Eric Verheul, Infosecurity.nl, 3 november, Jaarbeurs Utrecht
07 verheul texcavator
-
Upload
ingeangevaare -
Category
Government & Nonprofit
-
view
76 -
download
0
Transcript of 07 verheul texcavator
T O I N E P I E T E R S A N D J A A P V E R H E U L U T R E C H T U N I V E R S I T Y , T H E N E T H E R L A N D S
Texcavator Text Mining Historical Newspapers
Overview
Translantis research project Concept of reference cultures
Digital humanities
Texcavator tool Requirements
Features
Configuration
Texcavator use cases
Future ambitions Challenges
Cultural Text Mining
KB Big Data Conference 24 March 2015
T R A N S L A N T I S . N L
KB Big Data Conference 24 March 2015
Translantis research project
Translantis
Topic: emergence of the United States in Public Discourse in the Netherlands, 1890-1990 Concept: transnational reference cultures Method: digital humanities text mining Translantis.nl
KB Big Data Conference 24 March 2015
Culture Mining
Culture
• Ideas
• Kowledge
• Practices
Public Sphere
• Public Opinion
• Citizens engaging in enlightened debate
Public Media
• Periodicals
• Radio
• TV
• Internet
Digitized Newspapers
(sample of 10%)
Digitized Newspapers
• Sample of 10% of all printed newspapers
Mediation
KB Big Data Conference 24 March 2015
T R A N S L A N T I S . N L
KB Big Data Conference 24 March 2015
Texcavator
Texcavator
generic tool for cultural text mining and big data research
enables scholars to systematically search very large quantities of textual data in a reliable and reproducible way
able to support exploration and contextualization
serve multiple user groups
Wide community of historians using big data
Translantis team (NWO-funded)
Asymmetrical Encounters team (HERA-funded)
KB Big Data Conference 24 March 2015
Features
Direct access to big data repository
Integrated text-mining tools Boolean search
Named Entity Recognition
Sentiment mining
Stemming
Real-time visualization of search results Dynamic word clouds (and export of underlying data)
Timelines (normalized, bursts)
Input-output storage
Close and distant reading
KB Big Data Conference 24 March 2015
Current configuration
Digitized newspapers
(National Library)
9m pages
Texcavator interface
Elastic Search
(500GB) xTAS
KB Big Data Conference 24 March 2015
Current configuration
Digitized newspapers
(National Library)
9m pages
Texcavator interface
Elastic Search
(500GB) xTAS
real-time, scalable indexing
eXtensible Text Analysis Suite
KB Big Data Conference 24 March 2015
B U F FA L O B I L L
C O C A - C O L A
TAY L O R I S M
KB Big Data Conference 24 March 2015
Use cases
Records and word cloud
KB Big Data Conference 24 March 2015
Timeline + cloud of one “burst” (1965)
Normalized timeline
KB Big Data Conference 24 March 2015
Access to original
KB Big Data Conference 24 March 2015
Configuration
KB Big Data Conference 24 March 2015
Visualizing historical change
KB Big Data Conference 24 March 2015
Soft drinks
KB Big Data Conference 24 March 2015
Verwijzingen naar Coca-Cola èn Amerika in reclames Verklaar de pieken en dalen
Soft drinks
KB Big Data Conference 24 March 2015
Verwijzingen naar Coca-Cola zonder Amerika in reclames Verklaar de piek
Topic modeling en GIS
KB Big Data Conference 24 March 2015
Taylorism
KB Big Data Conference 24 March 2015
Voyant word cloud van “wetenschappelijke bedrijfsleiding” dataset
Verwijzingen over tijd binnen “wetenschappelijke bedrijfsleiding” dataset
naar “Taylor”, “taylor-stelsel”, “Taylor- systeem”
C H A L L E N G E S &
O P P O R T U N I T I E S
KB Big Data Conference 24 March 2015
Ambitions
Challenges
Software development Stable version of Texcavator
Intuitive interface
Additional features
Technological Processor and server capacity
Data exchange and standardization (metatags)
OCR
Scientific Combining close and distant reading
Reproducability
KB Big Data Conference 24 March 2015
Cultural Text Mining
Mining of cultural aspects of entities and events Concepts, mentalities, ideas, utopia’s, etc
Mining for Meaning
Towards digital conceptual history or digital history of mentalities
Address macro-historical questions: Trends, patterns, structures in debates
Circulation of knowledge
Emergence of transnational reference cultures
KB Big Data Conference 24 March 2015
Thank you!
KB Big Data Conference 24 March 2015