Semantic Retrieval of the TIB|AV-Portal

24
Dr. Sven Strobel IATUL 2015 July 9, 2015; Hannover Semantic Retrieval of the TIB|AV-Portal

Transcript of Semantic Retrieval of the TIB|AV-Portal

Page 1: Semantic Retrieval of the TIB|AV-Portal

Dr. Sven StrobelIATUL 2015

July 9, 2015; Hannover

Semantic Retrieval of the TIB|AV-Portal

Page 2: Semantic Retrieval of the TIB|AV-Portal

2

Semantic Retrieval of the TIB|AV-Portal

1. TIB|AV-Portal2. Automatic Video Analysis 3. Named-Entity Recognition4. Metadata and Retrieval5. Semantic Retrieval

Contents

Page 3: Semantic Retrieval of the TIB|AV-Portal

3

av.getinfo.de

1. TIB|AV-Portal

• Free web-based portal for scientific videos from the realms of science & technology

• Automatic video analysis: scene, text, speech and image recognition

Profile

• Competence Centre for Non-Textual Materials at TIB in cooperation with Hasso Plattner Institute

• 2011-2014; Launch: April 2014

Development

• Teachers, students, researchersTarget Group

Page 4: Semantic Retrieval of the TIB|AV-Portal

4

1. TIB|AV-Portal

• 2900 videos / 1900 film credits with external links (June 2015)

• Most videos under open access

Content

• Videos from the fields of engineering, architecture, chemistry, informatics, mathematics and physics (TIB core subjects)

• Recordings of lectures and conferences, experiments, interviews, animations, simulations etc.

Focus of the collection

av.getinfo.de

Page 5: Semantic Retrieval of the TIB|AV-Portal

5

Semantic Retrieval of the TIB|AV-Portal

1. TIB|AV-Portal2. Automatic Video Analysis3. Named-Entity Recognition4. Metadata and Retrieval5. Semantic Retrieval

Contents

Page 6: Semantic Retrieval of the TIB|AV-Portal

2. Automatic Video Analysis

• Permanent linking / citability

• Time-related video segments

• Full-text search in the OCR transcript

• Full-text search in the speech transcript

• Search for image motifs

• Linking textual metadata (OCR / speech transcripts) with GND ontology

DOI assignment

Named‐entity recognition

Page 7: Semantic Retrieval of the TIB|AV-Portal

7

Semantic Retrieval of the TIB|AV-Portal

1. TIB|AV-Portal2. Automatic Video Analysis 3. Named-Entity Recognition4. Metadata and Retrieval5. Semantic Retrieval

Contents

Page 8: Semantic Retrieval of the TIB|AV-Portal

3. Named-Entity Recognition

Named-Entity RecognitionLinking automatically extracted textual metadata with terms of a knowledge base

Definition

GND subject sections for the 6 TIB core subjects

OCR transcript

OCR transcript

OCR transcriptOCR transcript

Speech transcript

Speech transcript

Speech transcript

Speech transcript

Textual Metadata

63 000 GND subject headings

Knowledge Base

8

Page 9: Semantic Retrieval of the TIB|AV-Portal

Textual MetadataSpeech transcript

9

Page 10: Semantic Retrieval of the TIB|AV-Portal

Knowledge Base

10

Video segments indexed by GND subject headings

Page 11: Semantic Retrieval of the TIB|AV-Portal

Algorithm of Named-Entity Recognition

disambiguateGND: Thermodynamik

context

Figure is based on slide 37 from Steinmetz, N.; Sack, H.: Cross-Lingual Semantic Mapping of Authority Files. Presentation held at ‚Semantic Web in Libraries 2013‘. Hamburg (2013).

11

Page 12: Semantic Retrieval of the TIB|AV-Portal

12

Benefits of Named-Entity Recognition

• Fine-grained descriptions of the video segments enable pinpoint segment-based searches within the video content.

• Linking textual metadata with the GND ontology enables a semantic search.

Page 13: Semantic Retrieval of the TIB|AV-Portal

13

Semantic Retrieval of the TIB|AV-Portal

1. TIB|AV-Portal2. Automatic Video Analysis 3. Named-Entity Recognition4. Metadata and Retrieval5. Semantic Retrieval

Contents

Page 14: Semantic Retrieval of the TIB|AV-Portal

4. Metadata and Retrieval

Speech transcriptOCR transcript Automatic indexing

Keyword‐based full‐text search in the writtencontent of the video

Keyword‐based full‐text search in the spokencontent of the video

Taxonomic Schema

14

Metadata

Manual metadata Automatic metadata

‐ Coarse‐grained‐ Highly reliable

Search for ‚classical‘ metadata (title, author...)

‐ Fine‐grained‐ Less reliable

Page 15: Semantic Retrieval of the TIB|AV-Portal

4. Metadata and RetrievalTaxonomic Schema

Page 16: Semantic Retrieval of the TIB|AV-Portal

16

Semantic Retrieval of the TIB|AV-Portal

1. TIB|AV-Portal2. Automatic Video Analysis 3. Named-Entity Recognition4. Metadata and Retrieval5. Semantic Retrieval

Contents

Page 17: Semantic Retrieval of the TIB|AV-Portal

5. Semantic Retrieval

• 63 356 GND subject headings plus synonyms• English translations of the GND subject

headings from DBpedia, LCSH, MACS and WTI Thesaurus

• Semantic search is based on the TIB|AV-Portal knowledge base. This knowledge base includes among other things:

17

Page 18: Semantic Retrieval of the TIB|AV-Portal

Textual Query

• When the user enters a search term, all available synonyms and English (or German) translations from the TIB|AV-Portal knowledge base are automatically included in the query.

18

Page 19: Semantic Retrieval of the TIB|AV-Portal

Textual QueryExample: „Wärmelehre“

„Thermodynamik“ (speech transcript)

„Thermodynamics“ (speech transcript)

„Thermodynamik“ (GND term)

„Thermodynamics“ (manual metadata)

19

Page 20: Semantic Retrieval of the TIB|AV-Portal

Semantic faceted search

Facets:

•Subject

•Language

•Author & contributors

•Publisher

•Licence

•Year of Publication

•Person

•Organization

•Image motif20

Refine search results

Page 21: Semantic Retrieval of the TIB|AV-Portal

Semantic faceted search

21

• Facet terms are terms from GND. • Search index stores:

• URI of the GND term• ID of the video• Position, which was assigned to that term

Search returns videos that contain the selected faceted term and highlights the corresponding video segments

Search index

• No keyword-based search but rather an ‚entity‘-search

Page 22: Semantic Retrieval of the TIB|AV-Portal

Semantic faceted search

22

Example of a search result

Page 23: Semantic Retrieval of the TIB|AV-Portal

Semantic faceted search

23

• By clicking on a facet, that term, synonyms of that term and translations of that term are included in the query.

• GND facet terms are disambiguated.

Improving Recall

Improving Precision

Benefits

Page 24: Semantic Retrieval of the TIB|AV-Portal

Thank you for your attention!