All WP Meeting Athens - Preliminary Results of the Contextualisation - Klaus Thoden
-
Upload
digitised-manuscripts-to-europeana -
Category
Technology
-
view
620 -
download
0
Transcript of All WP Meeting Athens - Preliminary Results of the Contextualisation - Klaus Thoden
co-funded by the European Union
Preliminary Results from the Contextualization
Dominique Ritze, Klaus Thoden
27.11.13
Contextualization in Year 1
• Baseline• Identification of global identifiers• Authority and type of identity• BBAW, SBB, NLI, UB Frankfurt, MPIWG, ÖNB:
– mostly contextualization of persons and corporate bodies
• What can we do more?
27.11.13
Sources
• GND/ VIAF – Persons, corporations, titles• LCSH, DDC – Subject headings• Wikipedia/Dbpedia – Everything• Geonames – Places• InPho – Argumentation structure• ISIL – Libraries• CERL – Historical places, printers
827.11.13
Wikipedia
<subfield code=“a“>Aus der Bibliothek des Prinzen Eugen von Savoyen</subfield>
Wikipedia/Dbpedia
1027.11.13
DDC
The 1914 - 1918 Collection of the American Jewish Joint Distribution Committee is comprised of the records of the New York headquarters for the period from the Joint's origins
providing emergency relief through World War I.
DDC
27.11.13
Sources
• GND/ VIAF – Persons, corporations, titles• LCSH, DDC – Subject headings• Wikipedia/Dbpedia – Everything• Geonames – Places• InPho – Argumentation structure• ISIL – Libraries• CERL – Historical places, printers
27.11.13
Workflow
• Ingestion through Omnom• Contextualization in DM2E Triplestore• Common input vocabulary – but not really
consistent• Saved as independent triples – no change of
original data
SILK Demo
1427.11.13
• Workbench to create Linkage Rules with a GUI
• Transformations and Normalizations• Similarity metrics to compare values
• Aggregators to combine various comparisons
Structured Data
1627.11.13
a1 GND “118650130“
a2GND
“118650130“
equals
Unique Identifier
project dataGND
Structured Data
1727.11.13
a1 name “C. Brodley“
a2 name “Brodley, Carla“
similarity
Datatype Properties
project dataGND
Structured Data
1827.11.13
a1name
“C. Brodley“
a2
“Brodley, Carla“
namesimilarity
“1991“ “1991“
year year
similarity
Combination of Datatype Properties
project dataGND
Structured Data
1927.11.13
a1 name “C. Brodley“
a2 name “Brodley, Carla“
similarity
“1956“year of birth
“1820“
year of death
project dataGND
Excluding Links
27.11.13
Limitations
Needs high computing power No on-the-fly change of linkage rules Not well-suited for structured data Sparse metadata: get information out of
transcriptions? Named Entity Recognition? Know your data! Results have to be checked.