SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

97
Enabling Exploratory Video Search by Semantic Video Analysis LWA 2011 Magdeburg, 30. Sep. 2011 Dr. Harald Sack Hasso-Plattner-Institut for IT-Systems Engineering University of Potsdam Freitag, 30. September 11

description

Presentation Slides from the LWA 2011 in Magdeburg at 30 Sep 2011http://lwa2011.cs.uni-magdeburg.de/

Transcript of SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Page 1: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Enabling Exploratory Video Search by Semantic Video

AnalysisLWA 2011

Magdeburg, 30. Sep. 2011

Dr. Harald SackHasso-Plattner-Institut for IT-Systems Engineering

University of Potsdam

Freitag, 30. September 11

Page 2: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

■ HPI was founded in October 1998 as a Public-Private-Partnership

■ HPI Research and Teaching is focussed onIT Systems Engineering

■ 10 Professors and 100 Scientific Coworkers■ 450 Bachelor / Master Students ■ HPI is winner of CHE-Ranking 2010

http://hpi.uni-potsdam.de/Freitag, 30. September 11

Page 3: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

■ Research Topics□ Semantic Web Technologies□ Ontological Engineering□ Information Retrieval□ Multimedia Analysis & Retrieval□ Social Networking□ Data/Information Visualization

■ Research Projects

Semantic Technologies & Multimedia Retrieval

Freitag, 30. September 11

Page 4: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

■ Research Topics□ Semantic Web Technologies□ Ontological Engineering□ Information Retrieval□ Multimedia Analysis & Retrieval□ Social Networking□ Data/Information Visualization

■ Research Projects

Semantic Technologies & Multimedia Retrieval

Freitag, 30. September 11

Page 5: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Overview(1) Searching Audiovisual Data(2) Semantic Multimedia Analysis(3) Explorative Semantic Search(4) SeMEX - Semantic Multimedia Explorer

SEMEX - Enabling Exploratory Video Search by Semantic Video AnalysisLDW 2011, Magdeburg, 30. Sep 2011

Freitag, 30. September 11

Page 6: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

The Google Challenge...Freitag, 30. September 11

Page 7: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, Workshop ,Corporate Semantic Web‘, XInnovations 2011, Berlin, 19. Sep. 2011

Google Multimedia SearchFreitag, 30. September 11

Page 8: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

How does Google find Multimedia?

Freitag, 30. September 11

Page 9: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

How does Google find Multimedia?

Freitag, 30. September 11

Page 10: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, Workshop ,Corporate Semantic Web‘, XInnovations 2011, Berlin, 19. Sep. 2011Freitag, 30. September 11

Page 11: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, Workshop ,Corporate Semantic Web‘, XInnovations 2011, Berlin, 19. Sep. 2011Freitag, 30. September 11

Page 12: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, Workshop ,Corporate Semantic Web‘, XInnovations 2011, Berlin, 19. Sep. 2011

...<a href="/mission_pages/shuttle/shuttlemissions/sts134/multimedia/index.html">

<IMG WIDTH="100" ALT="Close-up view of Endeavour's crew cabin prior to docking with the International Space Station" TITLE="Close-up view of Endeavour's crew cabin prior to docking with the International Space Station" SRC="/images/content/549665main_2011-05-18_1600_100-75.jpg" HEIGHT="75" ALIGN="Bottom" BORDER="0" /></a><p><a href="/mission_pages/shuttle/shuttlemissions/sts134/multimedia/index.html">&rsaquo;&nbsp;STS-134 Multimedia</a></p>

...

‣Google Multimedia Search relies on link context

How does Google find Multimedia?

Freitag, 30. September 11

Page 13: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

How to Search in Multimedia Archives?

Freitag, 30. September 11

Page 14: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Step 1: Digitalization of analog data

Step 2: Annotation with (textbased) metadata

How to Search in Multimedia Archives?

Freitag, 30. September 11

Page 15: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

How to Search in Multimedia Archives?• manual anotation with text-based

descriptive metadata

Freitag, 30. September 11

Page 16: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

How to Search in Multimedia Archives?• manual anotation with text-based

descriptive metadata

...how to extract metadatain an automated way?

Freitag, 30. September 11

Page 17: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Overview(1) Searching Audiovisual Data(2) Semantic Multimedia Analysis(3) Explorative Semantic Search(4) SeMEX - Semantic Multimedia Explorer

SEMEX - Enabling Exploratory Video Search by Semantic Video AnalysisLDW 2011, Magdeburg, 30. Sep 2011

Freitag, 30. September 11

Page 18: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Automated Audiovisual Analysis

Freitag, 30. September 11

Page 19: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Automated Audiovisual Analysis

Freitag, 30. September 11

Page 20: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Automated Audiovisual Analysis

Freitag, 30. September 11

Page 21: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Automated Audiovisual Analysis

Face Detection

Freitag, 30. September 11

Page 22: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Automated Audiovisual Analysis

Face Detection

Genre Analysis

Classification:StudioIndoor

News Show

Freitag, 30. September 11

Page 23: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Automated Audiovisual Analysis

Face Detection

overlay text

Genre Analysis

Classification:StudioIndoor

News Show

Freitag, 30. September 11

Page 24: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Automated Audiovisual Analysis

Face Detection

overlay text

Genre Analysis

Classification:StudioIndoor

News Show

scenetext

Freitag, 30. September 11

Page 25: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Automated Audiovisual Analysis

Face Detection

overlay text

Logo Detection

Genre Analysis

Classification:StudioIndoor

News Show

scenetext

Freitag, 30. September 11

Page 26: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Automated Audiovisual Analysis

Face Detection

overlay text

Logo Detection

Genre Analysis

Classification:StudioIndoor

News Show

scenetext

Audio-Mining

structuralanalysis

AutomatedSpeech

Recognitionspeaker

identification

Freitag, 30. September 11

Page 27: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

• Visual Analysis• Structural Analysis• Intelligent Character

Recognition (ICR)• Character/Logo

Detection• Character Filtering• Character Recognition

• Genre Analysis &Categorization

• Face / Body / Object •Detection•Tracking•Clustering

Automated Audiovisual Analysis

• Audio Analysis • Structural Analysis • Speaker Detection • Automated Speech

Recognition (ASR)

Freitag, 30. September 11

Page 28: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

• Visual Analysis• Structural Analysis• Intelligent Character

Recognition (ICR)• Character/Logo

Detection• Character Filtering• Character Recognition

• Genre Analysis &Categorization

• Face / Body / Object •Detection•Tracking•Clustering

Automated Audiovisual Analysis

• Audio Analysis • Structural Analysis • Speaker Detection • Automated Speech

Recognition (ASR)

Freitag, 30. September 11

Page 29: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

video

• Decomposition of time-based media into meaningful media fragments of coherent content that can be used as basic element for indexing and classification

Structural Analysis

Freitag, 30. September 11

Page 30: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

video

• Decomposition of time-based media into meaningful media fragments of coherent content that can be used as basic element for indexing and classification

Structural Analysis

scenes

Freitag, 30. September 11

Page 31: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

video

• Decomposition of time-based media into meaningful media fragments of coherent content that can be used as basic element for indexing and classification

Structural Analysis

scenes

shots

Freitag, 30. September 11

Page 32: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

video

• Decomposition of time-based media into meaningful media fragments of coherent content that can be used as basic element for indexing and classification

Structural Analysis

scenes

shots

subshots

Freitag, 30. September 11

Page 33: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

video

• Decomposition of time-based media into meaningful media fragments of coherent content that can be used as basic element for indexing and classification

Structural Analysis

scenes

shots

subshots

frameskey frames

Freitag, 30. September 11

Page 34: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

•Shot Boundary Detection

• Automated Identification of• Hard Cuts• Defects, as e.g.,

• Drop Outs, White Outs, etc.• Soft Cuts, as e.g.,

• Fade-In/Out, • Dissolve, Wipe, Cross-Fade, etc.

• Automated Structural Analysis based on• Analytical Shot Boundary Detection• Machine Learning Based Shot Detection

Structural Analysis

time

Freitag, 30. September 11

Page 35: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

•Shot Boundary Detection• Automated Identification of Hard Cuts based on• Luminance/Chrominance

Histogram Differences & Derivatives

• Edge Distribution/Density

Structural Analysis

576 577 578575574573

Freitag, 30. September 11

Page 36: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Hardcut: if and is true for all Subregions a

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Structural Analysis

i i+1 i+2i-1i-2i-3

1 2

3 4

tha(i) = ↵ ·

2

4

0

@i+W�1X

k=i�W

Da(k, k � 1)

1

A�Da(i, i� 1)

3

5+ �

Da(i, i� 1) > th↵(i)

Da(i+ 1, i) < th↵(i)

1

Window Size=4 (W=2)

Decompose Frame into a=4 Subregions

Da(i,i-1) ... Histogram Difference (L2-norm) between Frames i and i-1 of Subregion a

tha(i) ... adaptive Threshold for Frame i of Subregion a

Adaptive Threshold

tha(i) = ↵ ·

2

4

0

@i+W�1X

k=i�W

Da(k, k � 1)

1

A�Da(i, i� 1)

3

5+ �

Da(i, i� 1) > th↵(i)

Da(i+ 1, i) < th↵(i)

1

tha(i) = ↵ ·

2

4

0

@i+W�1X

k=i�W

Da(k, k � 1)

1

A�Da(i, i� 1)

3

5+ �

Da(i, i� 1) > th↵(i)

Da(i+ 1, i) < th↵(i)

1

Freitag, 30. September 11

Page 37: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

•Shot Boundary Detection• Automated Identification of Defects, as e.g. Drop Outs / White Outs

Structural Analysis

Drop Out

Histogram/Chrominance Difference Analysis

Flashlight / White Out

Histogram/Chrominance Difference Analysis

Freitag, 30. September 11

Page 38: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

•Shot Boundary Detection• Automated Identification of Defects, as e.g., Drop Outs / White Outs

Structural Analysis

...i i+10i+9i+8 i+11 i+12 i+13i+1

• Luminance/ChrominanceHistogram Differences & Derivatives

Freitag, 30. September 11

Page 39: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

•Shot Boundary Detection• Automated Identification of Soft Cuts, as e.g. Fade Out / Fade In

Structural Analysis

Fade Out

Fade In

Freitag, 30. September 11

Page 40: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

•Shot Boundary Detection• Automated Identification of Soft Cuts, , as e.g. Fade Out / Fade In

• Features applied for machine learning:• luminance histogram (Fade In / Fade Out)

• luminance average Yµ and luminance variance Yσ2 follow distinct patterns

• image decomposition • component-based analysis to

distinguish regional and global changes in image content

• entropy• motion vectors

Structural Analysis

1 2 3

4 5 6

7 8 9

Freitag, 30. September 11

Page 41: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

•Shot Boundary Detection• Automated Identification of Soft Cuts, , as e.g. Fade Out / Fade In

• Features deployed for machine learning:• luminance/chrominance

histogram• entropy• motion vectors

Structural Analysis

Freitag, 30. September 11

Page 42: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

•Shot Boundary Detection• Automated Identification of Soft Cuts, , as e.g. Fade Out / Fade In

• Features deployed for machine learning:• luminance/chrominance

histogram• entropy•motion vectors

Structural Analysis

Freitag, 30. September 11

Page 43: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

•Shot Boundary Detection• Automated Identification of Soft Cuts, , as e.g. Fade Out / Fade In

• Features deployed for machine learning:• luminance/chrominance

histogram• entropy•motion vectors• image decomposition• compute average motion

vectors for all areas• identify camera movements

(zoom, pan, etc.) andmoving objects

Structural Analysis

1 2

3 4

Freitag, 30. September 11

Page 44: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

• Visual Analysis• Structural Analysis• Intelligent Character Recognition (ICR)

• Character/Logo Detection

• Character Filtering• Character Recognition

• Genre Analysis &Categorization

• Face / Body / Object •Detection•Tracking•Clustering

Automated Audiovisual Analysis

• Audio Analysis • Structural Analysis • Speaker Detection • Automated Speech

Recognition (ASR)

Freitag, 30. September 11

Page 45: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

• Preprocessing• Character Identification• Text Preprocessing

• Text Filtering• Adaption of script geometry (Deskew)• Image quality enhancement

• Optical Character Recognition (OCR)• Standard OCR software (OCRopus)

• Postprocessing• Lexical analysis • Statistical / context based filtering

Ermittlungen nachBombenfunden

Intelligent Character Recognition

Freitag, 30. September 11

Page 46: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

• Character Identification• Robust filter to extract text candidate frames

• 25 fps results in 90.000 frames per 60 min• too expensive for single frame preprocessing & OCR• fast and robust text identification for preprocessing

Intelligent Character Recognition

• Features used for text identification:• edge detection

• DCT / Fourier Transformation• Sobel-/Canny Edge Filter

• horizontal and vertical edge distribution• Local Binary Patterns (LBP)• Histogram of Oriented Gradients

• stroke width analysis

TTTTT T TT T T

Freitag, 30. September 11

Page 47: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

• Stroke Width Transformation• based on edge filtering as a preprocessing step• for each edge pixel a stroke is projected along its gradient direction until

another edge pixel is hit• all pixels along the stroke will receive the same stroke width value (color)

Intelligent Character Recognition

Freitag, 30. September 11

Page 48: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

• Stroke Width Transformation• based on edge filtering as a preprocessing step• for each edge pixel a stroke is projected along its gradient direction until

another edge pixel is hit• all pixels along the stroke will receive the same stroke width value (color)• connected component analysis groups pixels with similar stroke width value

Intelligent Character Recognition

Freitag, 30. September 11

Page 49: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

• Stroke Width Transformation• based on edge filtering as a preprocessing step• for each edge pixel a stroke is projected along its gradient direction until

another edge pixel is hit• all pixels along the stroke will receive the same stroke width value (color)• connected component analysis groups pixels with similar stroke width value

Intelligent Character Recognition

Freitag, 30. September 11

Page 50: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Original Image Bounding Box

Intelligent Character Recognition

• Preprocessing• Text Preprocessing

• Text Filtering

Freitag, 30. September 11

Page 51: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Advanced Image Enhancement

Intelligent Character Recognition

• Preprocessing• Text Preprocessing

• Quality Enhancement

Freitag, 30. September 11

Page 52: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Standard OCR (OCRopus)

Intelligent Character Recognition

• Optical Character Recognition (OCR)• Standard OCR software (OCRopus)

Freitag, 30. September 11

Page 53: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Context-based Spell Correction

Intelligent Character Recognition• Postprocessing

• Lexical analysis • Statistical / context based filtering

Freitag, 30. September 11

Page 54: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

• Result: Multimedia data with spatiotemporal Annotations

Metadata Extraction

Metadata (e.g. MPEG-7) ... <Video> <TemporalDecomposition> <VideoSegment> <TextAnnotation> <KeywordAnnotation> <Keyword>Astronaut</Keyword> </KeywordAnnotation> </TextAnnotation> <MediaTime> <MediaTimePoint> T00:05:05:0F25 </MediaTimePoint> <MediaDuration> PT00H00M31S0N25F </MediaDuration> </MediaTime> ... </VideoSegment> </TemporalDecomposition> </Video> ...

time

Automated Audiovisual Analysis

Freitag, 30. September 11

Page 55: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, Workshop ,Corporate Semantic Web‘, XInnovations 2011, Berlin, 19. Sep. 2011

Metadata Extraction

Metadata (e.g. MPEG-7) ... <SpatialDecomposition> <TextAnnotation> <KeywordAnnotation> <Keyword>Astronaut</Keyword> </KeywordAnnotation> </TextAnnotation> <SpatialMask> <SubRegion> <Polygon> <Coords> 480 150 620 480 </Coords> </Polygon> </SubRegion> </SpatialMask> ... </SpatialDecomposition> ...

• Result: Multimedia data with spatiotemporal Annotations

Automated Audiovisual Analysis

Freitag, 30. September 11

Page 56: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, Workshop ,Corporate Semantic Web‘, XInnovations 2011, Berlin, 19. Sep. 2011

But what about semantic metadata..?

... <SpatialDecomposition> <TextAnnotation> <KeywordAnnotation> <Keyword>Astronaut</Keyword> </KeywordAnnotation> </TextAnnotation> <SpatialMask> <SubRegion> <Polygon> <Coords> 480 150 620 480 </Coords> </Polygon> </SubRegion> </SpatialMask> ... </SpatialDecomposition> ...

Freitag, 30. September 11

Page 57: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

• MPEG-7 has been re-engineered to become an OWL-DL ontology(2007: Arndt et al., COMM model)

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, Workshop ,Corporate Semantic Web‘, XInnovations 2011, Berlin, 19. Sep. 2011

Multimedia Ontologies

• Localize a region → Draw a bounding box

• Annotate the content → Interpret the content → Tag ,Astronaut‘

Freitag, 30. September 11

Page 58: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, Workshop ,Corporate Semantic Web‘, XInnovations 2011, Berlin, 19. Sep. 2011

Multimedia OntologiesExample: Tagging with an MPEG-7 Ontology

Reg1

mpeg7:image

mpeg7:depicts

Man on the Moon

mpeg7:spatial_decomposition Reg1

mpeg7:StillRegion

rdf:type

mpeg7:depicts

dbpedia:Astronaut

mpeg7:SpatialMask

mpeg7:polygon

mpeg7:Coords

Freitag, 30. September 11

Page 59: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, Workshop ,Corporate Semantic Web‘, XInnovations 2011, Berlin, 19. Sep. 2011

Named Entity Recognition

Astronaut Person

Neil Armstrong

Science Occupation

Employment

is a is a

is a

is a

Entities

Classes

Named Entity Recognition„locating and classifying atomic elements...intopredefined categories such as names, persons, organizations, locations, expressions of time,quantities, monetary values, etc.“C.J.Rijsbergen, Information Retrieval (1979)

Freitag, 30. September 11

Page 60: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, Workshop ,Corporate Semantic Web‘, XInnovations 2011, Berlin, 19. Sep. 2011

Named Entity Recognition

Astronaut Person

Neil Armstrong

Science Occupation

Employment

is a is a

is a

is a

Freitag, 30. September 11

Page 61: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Video Analysis /Metadata Extraction

Semantic Multimedia Analysis

timemetadata

metadatametadata

metadatametadata

Freitag, 30. September 11

Page 62: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Video Analysis /Metadata Extraction

Semantic Multimedia Analysis

timemetadata

metadatametadata

metadatametadata

e.g., person xylocation yzevent abc

e.g., bibliographical data,geographical data,encyclopedic data, ..

Entity Recognition/ Mapping

Freitag, 30. September 11

Page 63: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Named Entity Recognition• Mapping keyterms (text) to semantic entities

• Context Analysis and Disambiguation

Semantic Multimedia Analysis

Freitag, 30. September 11

Page 64: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Named Entity Recognition• Mapping keyterms (text) to semantic entities

• Context Analysis and Disambiguation

JaguarKeyterm / User Tag

Semantic Multimedia Analysis

Freitag, 30. September 11

Page 65: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Named Entity Recognition• Mapping keyterms (text) to semantic entities

• Context Analysis and Disambiguation

JaguarKeyterm / User Tag

Semantic Multimedia Analysis

Jaguar (Car)

Jaguar (Cat)

Jaguar (OS)

Jaguar (Aircraft)

?

?

?

?

Semantic Entities

Freitag, 30. September 11

Page 66: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

RDF graph to find relations between entities co-occurringin a text maintaining the hypothesis that disambiguationof co-occurring elements in a text can be obtained byfinding connected elements in an RDF graph [7]. In orderto regard the special compilation of non-textual data, staticand user-genrated metadata in audio-visual content our novelapproach combines the use of semantic technologies andLinked Data with linguistic methods.

III. METHOD

According to a study about structure and characteristicsof folksonomy tags [8] an average of 83% of user-generatedtags are single terms. Also, an average of 82% of thereviewed tags are nouns. Based on these study results, weignore tag practices, such as camel case (”barackObama”)and treat tags as subjects or categories describing a resource.As a tag could also be part of a group of nouns representingan entity or a name (”flying machine”,”albert einstein”) thetags stored as single words without any given order have tobe combined in term groups of two or more terms to findall appropriate entities. Hence, every tag or group of tagswithin a given context may represent a distinct entity. Theterm combination process and subsequent mapping of termsand term groups to entities are described in sect. III-B.

To disambiguate ambiguous terms we combine two meth-ods: a co-occurences analysis of the terms in the context inWikipedia articles and an analysis of the page link graph ofthe Wikipedia articles of entity candidates. The scores forboth analysis steps are calculated to a total score.

A. Context Definition

Metadata exists in a certain context and has to be inter-preted according to this context. For tags of audio-visualcontent we identified two dimensions:

• temporal dimension• user-centered dimensionIn the temporal dimension a context can be defined as the

entire video, a segment or a single timestamp in the video.The user-centered dimension classifies a context by howmany users created the concerning metadata - only tags by acertain user or all tags regardless of which user. Fig. 1 showsthe combinations of the two dimensions of contexts formetadata in audio-visual content the interpretation regardingthe significance of a context.

Audio-visual content also provides the opportunity tosupply spatial information. Thus, tags in the same regionof a video frame are considered as related to each other.In the current approach we did not consider this contextdimension.

To describe our approach we use a sample context of ourtest set (see sect. IV). This sample context is composed oftags by only one user at a certain timestamp in the video.The video containing this sample context is a presentation

Figure 1. Dimensions of context definition in audio-visual content

by Dr. Garik Israelian at the TED conference3 entitled ”Howspectroscopy could reveal alien life”4. Our sample contextconsists of the tags ”hubble”, ”spitzer”, ”carbon”, ”dioxide”,”methan”, ”co2”, and ”water”.

B. Preprocessing

Term Combination: Our combination algorithm takesall tags of a specified spatio-temporal context (at a certaintimestamp/in a certain segment of a video, of a singleURL/image and generates every possible combination of atmost three terms of the context in every possible order. Inthat way we make sure to rectify groups of single termsthat belong together. We chose to generate combinationsof three words to make sure to also hit named entitiesconsisting of more than two words, such as ”public keycryptography” or ”alberto santos dumont”. About 90% ofthe DBpedia [9] labels consist of at most three words, butless than 5% consist of 4 words. Due to these numbersand performance issues we decided to limit the number ofterms to be combined to three. Subsequently in this paperby terms we will refer to single terms as well as generatedterm groups. The number c of combinations is calcultaed byc =

�jk=1

n!(n�k)! .

For our sample context containing 7 tags and at most3 terms in a combination (j = 3), 259 combinations aregenerated.

Term Mapping: The terms then have to be mapped tosemantic entities. For our approach we use entities of theLinked Open Data Cloud [10], in particular of the DBpedia,version 3.5.1.

DBpedia provides labels for the identification of distinctentities in 92 languages. We use English and German aswell as Finnish labels, as we noticed that neither English northe German labels contain important acronyms as labels, butthe Finnish language version does. As tagging users prefer tokeep it simple and short[2], resources dealing with ”DomainName System” would rather be tagged with ”DNS” than”Domain Name System”.

After simple string matching of the terms of the contextto DBpedia URIs, the URIs are revised for redirects and

3http://www.ted.com4http://yovisto.com/play/14415

Context Analysis and DisambiguationWhat defines a Context in AV-Data?

• Temporal Coherence • Spatial Coherence• Provenance

Semantic Multimedia Analysis

Freitag, 30. September 11

Page 67: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

RDF graph to find relations between entities co-occurringin a text maintaining the hypothesis that disambiguationof co-occurring elements in a text can be obtained byfinding connected elements in an RDF graph [7]. In orderto regard the special compilation of non-textual data, staticand user-genrated metadata in audio-visual content our novelapproach combines the use of semantic technologies andLinked Data with linguistic methods.

III. METHOD

According to a study about structure and characteristicsof folksonomy tags [8] an average of 83% of user-generatedtags are single terms. Also, an average of 82% of thereviewed tags are nouns. Based on these study results, weignore tag practices, such as camel case (”barackObama”)and treat tags as subjects or categories describing a resource.As a tag could also be part of a group of nouns representingan entity or a name (”flying machine”,”albert einstein”) thetags stored as single words without any given order have tobe combined in term groups of two or more terms to findall appropriate entities. Hence, every tag or group of tagswithin a given context may represent a distinct entity. Theterm combination process and subsequent mapping of termsand term groups to entities are described in sect. III-B.

To disambiguate ambiguous terms we combine two meth-ods: a co-occurences analysis of the terms in the context inWikipedia articles and an analysis of the page link graph ofthe Wikipedia articles of entity candidates. The scores forboth analysis steps are calculated to a total score.

A. Context Definition

Metadata exists in a certain context and has to be inter-preted according to this context. For tags of audio-visualcontent we identified two dimensions:

• temporal dimension• user-centered dimensionIn the temporal dimension a context can be defined as the

entire video, a segment or a single timestamp in the video.The user-centered dimension classifies a context by howmany users created the concerning metadata - only tags by acertain user or all tags regardless of which user. Fig. 1 showsthe combinations of the two dimensions of contexts formetadata in audio-visual content the interpretation regardingthe significance of a context.

Audio-visual content also provides the opportunity tosupply spatial information. Thus, tags in the same regionof a video frame are considered as related to each other.In the current approach we did not consider this contextdimension.

To describe our approach we use a sample context of ourtest set (see sect. IV). This sample context is composed oftags by only one user at a certain timestamp in the video.The video containing this sample context is a presentation

Figure 1. Dimensions of context definition in audio-visual content

by Dr. Garik Israelian at the TED conference3 entitled ”Howspectroscopy could reveal alien life”4. Our sample contextconsists of the tags ”hubble”, ”spitzer”, ”carbon”, ”dioxide”,”methan”, ”co2”, and ”water”.

B. Preprocessing

Term Combination: Our combination algorithm takesall tags of a specified spatio-temporal context (at a certaintimestamp/in a certain segment of a video, of a singleURL/image and generates every possible combination of atmost three terms of the context in every possible order. Inthat way we make sure to rectify groups of single termsthat belong together. We chose to generate combinationsof three words to make sure to also hit named entitiesconsisting of more than two words, such as ”public keycryptography” or ”alberto santos dumont”. About 90% ofthe DBpedia [9] labels consist of at most three words, butless than 5% consist of 4 words. Due to these numbersand performance issues we decided to limit the number ofterms to be combined to three. Subsequently in this paperby terms we will refer to single terms as well as generatedterm groups. The number c of combinations is calcultaed byc =

�jk=1

n!(n�k)! .

For our sample context containing 7 tags and at most3 terms in a combination (j = 3), 259 combinations aregenerated.

Term Mapping: The terms then have to be mapped tosemantic entities. For our approach we use entities of theLinked Open Data Cloud [10], in particular of the DBpedia,version 3.5.1.

DBpedia provides labels for the identification of distinctentities in 92 languages. We use English and German aswell as Finnish labels, as we noticed that neither English northe German labels contain important acronyms as labels, butthe Finnish language version does. As tagging users prefer tokeep it simple and short[2], resources dealing with ”DomainName System” would rather be tagged with ”DNS” than”Domain Name System”.

After simple string matching of the terms of the contextto DBpedia URIs, the URIs are revised for redirects and

3http://www.ted.com4http://yovisto.com/play/14415

Context Analysis and DisambiguationWhat defines a Context in AV-Data?

• Temporal Coherence • Spatial Coherence• Provenance

Semantic Multimedia Analysis

Spatial Dimension

Freitag, 30. September 11

Page 68: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

RDF graph to find relations between entities co-occurringin a text maintaining the hypothesis that disambiguationof co-occurring elements in a text can be obtained byfinding connected elements in an RDF graph [7]. In orderto regard the special compilation of non-textual data, staticand user-genrated metadata in audio-visual content our novelapproach combines the use of semantic technologies andLinked Data with linguistic methods.

III. METHOD

According to a study about structure and characteristicsof folksonomy tags [8] an average of 83% of user-generatedtags are single terms. Also, an average of 82% of thereviewed tags are nouns. Based on these study results, weignore tag practices, such as camel case (”barackObama”)and treat tags as subjects or categories describing a resource.As a tag could also be part of a group of nouns representingan entity or a name (”flying machine”,”albert einstein”) thetags stored as single words without any given order have tobe combined in term groups of two or more terms to findall appropriate entities. Hence, every tag or group of tagswithin a given context may represent a distinct entity. Theterm combination process and subsequent mapping of termsand term groups to entities are described in sect. III-B.

To disambiguate ambiguous terms we combine two meth-ods: a co-occurences analysis of the terms in the context inWikipedia articles and an analysis of the page link graph ofthe Wikipedia articles of entity candidates. The scores forboth analysis steps are calculated to a total score.

A. Context Definition

Metadata exists in a certain context and has to be inter-preted according to this context. For tags of audio-visualcontent we identified two dimensions:

• temporal dimension• user-centered dimensionIn the temporal dimension a context can be defined as the

entire video, a segment or a single timestamp in the video.The user-centered dimension classifies a context by howmany users created the concerning metadata - only tags by acertain user or all tags regardless of which user. Fig. 1 showsthe combinations of the two dimensions of contexts formetadata in audio-visual content the interpretation regardingthe significance of a context.

Audio-visual content also provides the opportunity tosupply spatial information. Thus, tags in the same regionof a video frame are considered as related to each other.In the current approach we did not consider this contextdimension.

To describe our approach we use a sample context of ourtest set (see sect. IV). This sample context is composed oftags by only one user at a certain timestamp in the video.The video containing this sample context is a presentation

Figure 1. Dimensions of context definition in audio-visual content

by Dr. Garik Israelian at the TED conference3 entitled ”Howspectroscopy could reveal alien life”4. Our sample contextconsists of the tags ”hubble”, ”spitzer”, ”carbon”, ”dioxide”,”methan”, ”co2”, and ”water”.

B. Preprocessing

Term Combination: Our combination algorithm takesall tags of a specified spatio-temporal context (at a certaintimestamp/in a certain segment of a video, of a singleURL/image and generates every possible combination of atmost three terms of the context in every possible order. Inthat way we make sure to rectify groups of single termsthat belong together. We chose to generate combinationsof three words to make sure to also hit named entitiesconsisting of more than two words, such as ”public keycryptography” or ”alberto santos dumont”. About 90% ofthe DBpedia [9] labels consist of at most three words, butless than 5% consist of 4 words. Due to these numbersand performance issues we decided to limit the number ofterms to be combined to three. Subsequently in this paperby terms we will refer to single terms as well as generatedterm groups. The number c of combinations is calcultaed byc =

�jk=1

n!(n�k)! .

For our sample context containing 7 tags and at most3 terms in a combination (j = 3), 259 combinations aregenerated.

Term Mapping: The terms then have to be mapped tosemantic entities. For our approach we use entities of theLinked Open Data Cloud [10], in particular of the DBpedia,version 3.5.1.

DBpedia provides labels for the identification of distinctentities in 92 languages. We use English and German aswell as Finnish labels, as we noticed that neither English northe German labels contain important acronyms as labels, butthe Finnish language version does. As tagging users prefer tokeep it simple and short[2], resources dealing with ”DomainName System” would rather be tagged with ”DNS” than”Domain Name System”.

After simple string matching of the terms of the contextto DBpedia URIs, the URIs are revised for redirects and

3http://www.ted.com4http://yovisto.com/play/14415

Context Analysis and DisambiguationWhat defines a Context in AV-Data?

• Temporal Coherence • Spatial Coherence• Provenance

Semantic Multimedia Analysis

Temporal Dimension

Spatial Dimension

Freitag, 30. September 11

Page 69: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

RDF graph to find relations between entities co-occurringin a text maintaining the hypothesis that disambiguationof co-occurring elements in a text can be obtained byfinding connected elements in an RDF graph [7]. In orderto regard the special compilation of non-textual data, staticand user-genrated metadata in audio-visual content our novelapproach combines the use of semantic technologies andLinked Data with linguistic methods.

III. METHOD

According to a study about structure and characteristicsof folksonomy tags [8] an average of 83% of user-generatedtags are single terms. Also, an average of 82% of thereviewed tags are nouns. Based on these study results, weignore tag practices, such as camel case (”barackObama”)and treat tags as subjects or categories describing a resource.As a tag could also be part of a group of nouns representingan entity or a name (”flying machine”,”albert einstein”) thetags stored as single words without any given order have tobe combined in term groups of two or more terms to findall appropriate entities. Hence, every tag or group of tagswithin a given context may represent a distinct entity. Theterm combination process and subsequent mapping of termsand term groups to entities are described in sect. III-B.

To disambiguate ambiguous terms we combine two meth-ods: a co-occurences analysis of the terms in the context inWikipedia articles and an analysis of the page link graph ofthe Wikipedia articles of entity candidates. The scores forboth analysis steps are calculated to a total score.

A. Context Definition

Metadata exists in a certain context and has to be inter-preted according to this context. For tags of audio-visualcontent we identified two dimensions:

• temporal dimension• user-centered dimensionIn the temporal dimension a context can be defined as the

entire video, a segment or a single timestamp in the video.The user-centered dimension classifies a context by howmany users created the concerning metadata - only tags by acertain user or all tags regardless of which user. Fig. 1 showsthe combinations of the two dimensions of contexts formetadata in audio-visual content the interpretation regardingthe significance of a context.

Audio-visual content also provides the opportunity tosupply spatial information. Thus, tags in the same regionof a video frame are considered as related to each other.In the current approach we did not consider this contextdimension.

To describe our approach we use a sample context of ourtest set (see sect. IV). This sample context is composed oftags by only one user at a certain timestamp in the video.The video containing this sample context is a presentation

Figure 1. Dimensions of context definition in audio-visual content

by Dr. Garik Israelian at the TED conference3 entitled ”Howspectroscopy could reveal alien life”4. Our sample contextconsists of the tags ”hubble”, ”spitzer”, ”carbon”, ”dioxide”,”methan”, ”co2”, and ”water”.

B. Preprocessing

Term Combination: Our combination algorithm takesall tags of a specified spatio-temporal context (at a certaintimestamp/in a certain segment of a video, of a singleURL/image and generates every possible combination of atmost three terms of the context in every possible order. Inthat way we make sure to rectify groups of single termsthat belong together. We chose to generate combinationsof three words to make sure to also hit named entitiesconsisting of more than two words, such as ”public keycryptography” or ”alberto santos dumont”. About 90% ofthe DBpedia [9] labels consist of at most three words, butless than 5% consist of 4 words. Due to these numbersand performance issues we decided to limit the number ofterms to be combined to three. Subsequently in this paperby terms we will refer to single terms as well as generatedterm groups. The number c of combinations is calcultaed byc =

�jk=1

n!(n�k)! .

For our sample context containing 7 tags and at most3 terms in a combination (j = 3), 259 combinations aregenerated.

Term Mapping: The terms then have to be mapped tosemantic entities. For our approach we use entities of theLinked Open Data Cloud [10], in particular of the DBpedia,version 3.5.1.

DBpedia provides labels for the identification of distinctentities in 92 languages. We use English and German aswell as Finnish labels, as we noticed that neither English northe German labels contain important acronyms as labels, butthe Finnish language version does. As tagging users prefer tokeep it simple and short[2], resources dealing with ”DomainName System” would rather be tagged with ”DNS” than”Domain Name System”.

After simple string matching of the terms of the contextto DBpedia URIs, the URIs are revised for redirects and

3http://www.ted.com4http://yovisto.com/play/14415

Context Analysis and DisambiguationWhat defines a Context in AV-Data?

• Temporal Coherence • Spatial Coherence• Provenance

Semantic Multimedia Analysis

User-centered Dimension

Temporal Dimension

Spatial Dimension

Freitag, 30. September 11

Page 70: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

jaguarKeyterm / User Tag

LOD Cloud

Semantic Graph Analysis

1956 Stevejaguar

McQueenrim wheel

context

Jaguar (Car)Steve McQueen

1956

Jaguar (Cat)Jaguar (OS)

Freitag, 30. September 11

Page 71: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Overview(1) Searching Audiovisual Data(2) Semantic Multimedia Analysis(3) Explorative Semantic Search(4) SeMEX - Semantic Multimedia Explorer

SEMEX - Enabling Exploratory Video Search by Semantic Video AnalysisLDW 2011, Magdeburg, 30. Sep 2011

Freitag, 30. September 11

Page 72: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Searching is not always just searching

Freitag, 30. September 11

Page 73: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

a simple example:

I‘m looking for a book by Earnest Hemingway with the title ,For Whom the Bell Tolls‘ in the first German edition...“

Freitag, 30. September 11

Page 74: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Wem die Stunde schlägt. - Ernest H E M I N G W A Y. (Stockholm usw., Bermann-Fischer Verlag, 1941) 560 S. 8“

II 1, 2506, 34548

a simple example:

I‘m looking for a book by Earnest Hemingway with the title ,For Whom the Bell Tolls‘ in the first German edition...“

Freitag, 30. September 11

Page 75: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

...but what if...

I really liked the book ,For Whom the Bell Tolls‘ but I have no idea what I should read next...

Freitag, 30. September 11

Page 76: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

...but what if...

I really liked the book ,For Whom the Bell Tolls‘ but I have no idea what I should read next...

Freitag, 30. September 11

Page 77: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, Indian Summer School on Linked Data, Leipzig, 12-18. Sep. 2011

Exploratory Search• What, if the user does not know, which query string to use?• What, if the user is looking for complex answers ?• What, if the user does not know the domain he/she is looking for?• What, if the user wants to know all(!) about a specific topic?

• ...,Browsing‘ instead of ,Searching‘• ...to find something by chance -> Serendipity• ...to get an overview• ...enable content based navigation

Freitag, 30. September 11

Page 78: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Video Analysis /Metadata Extraction

Exploratory Multimedia Search

timemetadata

metadatametadata

metadatametadata

e.g., person xylocation yzevent abc

e.g., bibliographical data,geographical data,encyclopedic data, ..

Entity Recognition/ Mapping

Freitag, 30. September 11

Page 79: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011http://linkeddata.org/

Data is a precious thing and will last longer than the systems themselves. (Tim Berners-Lee)

The Web of Data - The Semantic Web

Freitag, 30. September 11

Page 80: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

dbpedia:For_Whom_the_Bell_Tolls

What facts for dbpedia:For_Whom_the_Bell_Tollsare relevant?

http://dbpedia.org/page/For_Whom_the_Bell_Tolls

DBPedia - the Semantic Wikipedia

...use heuristicsFreitag, 30. September 11

Page 81: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

dbpedia-owl:author

dbpedia:Ernest_Hemingwaydbpedia:For_Whom_the_Bell_Tolls

Exploratory Multimedia Search

Freitag, 30. September 11

Page 82: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

dbpedia-owl:author

dbpedia:Ernest_Hemingwaydbpedia:For_Whom_the_Bell_Tolls

dbpedia-owl:author

Exploratory Multimedia Search

Freitag, 30. September 11

Page 83: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

dbpedia-owl:author

dbpedia:Ernest_Hemingwaydbpedia:For_Whom_the_Bell_Tolls

dbpedia-owl:author

dbpedia-owl:author

Exploratory Multimedia Search

Freitag, 30. September 11

Page 84: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

dbpedia-owl:author

dbpedia:Ernest_Hemingwaydbpedia:For_Whom_the_Bell_Tolls

dbpedia-owl:author

dbpedia-owl:author

dbpedia-owl:author

Exploratory Multimedia Search

Freitag, 30. September 11

Page 85: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Exploratory Multimedia Search

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

dbpedia-owl:author

dbpedia:Ernest_Hemingwaydbpedia:For_Whom_the_Bell_Tolls

Freitag, 30. September 11

Page 86: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Exploratory Multimedia Search

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

dbpedia-owl:author

dbpedia:Ernest_Hemingwaydbpedia:For_Whom_the_Bell_Tolls

dbpedia:Raymond_Carver

dbpedia-

owl:influenced_by

Freitag, 30. September 11

Page 87: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Exploratory Multimedia Search

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

dbpedia-owl:author

dbpedia:Ernest_Hemingwaydbpedia:For_Whom_the_Bell_Tolls

dbpedia:Raymond_Carver

dbpedia-

owl:influenced_by

dbpedia:Jack_Kerouac

dbpedia-

owl:influenced_by

Freitag, 30. September 11

Page 88: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Exploratory Multimedia Search

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

dbpedia-owl:author

dbpedia:Ernest_Hemingwaydbpedia:For_Whom_the_Bell_Tolls

dbpedia:Raymond_Carver

dbpedia-

owl:influenced_by

dbpedia:Jack_Kerouac

dbpedia-

owl:influenced_by

dbpedia-owl:influenced_by

dbpedia:Jerome_D._Salinger

Freitag, 30. September 11

Page 89: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Exploratory Multimedia Search

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

dbpedia:Jack_Kerouac dbpedia:Raymond_Carverdbpedia:Jerome_D._Salinger

Freitag, 30. September 11

Page 90: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Exploratory Multimedia Search

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

dbpedia:Jack_Kerouac dbpedia:Raymond_Carverdbpedia:Jerome_D._Salinger

dbpedia-owl:notableWork

Freitag, 30. September 11

Page 91: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Exploratory Multimedia Search

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

dbpedia:Jack_Kerouac dbpedia:Raymond_Carverdbpedia:Jerome_D._Salinger

dbpedia-owl:notableWork dbpedia-owl:notableWork

Freitag, 30. September 11

Page 92: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Exploratory Multimedia Search

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

dbpedia:Jack_Kerouac dbpedia:Raymond_Carverdbpedia:Jerome_D._Salinger

dbpedia-owl:notableWork dbpedia-owl:notableWork dbpedia-owl:notableWork

Freitag, 30. September 11

Page 93: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Overview(1) Searching Audiovisual Data(2) Semantic Multimedia Analysis(3) Explorative Semantic Search(4) SeMEX - Semantic Multimedia

Explorer

SEMEX - Enabling Exploratory Video Search by Semantic Video AnalysisLDW 2011, Magdeburg, 30. Sep 2011

Freitag, 30. September 11

Page 95: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

29

http://mediaglobe.yovisto.com:8080

Freitag, 30. September 11

Page 96: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Overview(1) Searching Audiovisual Data(2) Semantic Multimedia Analysis(3) Explorative Semantic Search(4) SeMEX - Semantic Multimedia Explorer

SEMEX - Enabling Exploratory Video Search by Semantic Video AnalysisLDW 2011, Magdeburg, 30. Sep 2011

Freitag, 30. September 11

Page 97: SEMEX: Enabling Exploratory Video Search by Semantic Video Analysis

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Contact:Dr. Harald SackHasso-Plattner-Institut für SoftwaresystemtechnikUniversität PotsdamProf.-Dr.-Helmert-Str. 2-3D-14482 Potsdam

Homepage:http://www.hpi.uni-potsdam.de/meinel/team/sack.html http://www.yovisto.com/Blog: http://moresemantic.blogspot.com/E-Mail: [email protected] [email protected]: lysander07 / biblionomicon / yovisto

Thank you very much

for your attention!

Freitag, 30. September 11