Mediaglobe - Medienanalyseverfahren

Post on 27-Jun-2015

659 views 0 download

Tags:

Transcript of Mediaglobe - Medienanalyseverfahren

Dr. Harald SackHasso-Plattner-Institut for IT-Systems Engineering

University of Potsdam

Medienanalyse-­‐verfahren

• Das HPI wurde im Oktober 1998 als Public-Private-Partnership gegründet

• HPI Forschung und Lehre konzentriert sichauf IT Systems Engineering

• 10 Professoren & 100 wissenschaftl. Mitarbeiter

• 450 Bachelor / Master Studenten

• HPI is winner of CHE-Ranking 2010

http://hpi.uni-potsdam.de/

Hasso Plattner Institute für SoftwaresystemtechnikUniversität Potsdam

• Forschungsthemen• Semantic Web Technologies

• Ontological Engineering

• Information Retrieval

• Multimedia Analysis & Retrieval

• Social Networking

• Data/Information Visualization

• Forschungsprojekte:

Hasso Plattner Institute für SoftwaresystemtechnikSemantic Technologies & Multimedia Retrieval Research Group

Semantic Search Engine

Media Analysis‣Structural Video Analysis‣Intelligent Character Recognition‣Face Detection & Clustering ‣Audio Mining‣Visual Concept Detection

Semantic Analysis‣Named Entity Recognition‣Context Analysis‣Semantic Annotation

konzep3oneller  Workflow

Graphical User Interface‣Facetted Search‣Explorative Search‣fine granular User Annotation

Distribution / Production‣Media Asset Management

Digitization | Metadata | Rights

Medienanalyseverfahren

‣Structural Video Analysis‣Intelligent Character Recognition‣Face Detection & Clustering ‣Audio Mining‣Visual Concept Detection

Medienanalyseverfahren

‣Structural Video Analysis‣Intelligent Character Recognition‣Visual Concept Detection‣Face Detection & Clustering ‣Audio Mining

scenes

shots

subshots

frames

video

Structural  Video  Analysis

‣Zerlegung der AV-Daten in Medienfragmente unter Berücksichtigung inhaltlicher Kohärenz

key frames

scenes

shots

subshots

frames

video

Structural  Video  Analysis

‣Zerlegung der AV-Daten in Medienfragmente unter Berücksichtigung inhaltlicher Kohärenz

Shot  Boundary  Detec3on

‣Automatische Identifikation von‣Hard Cuts‣Defects, z.B. Drop Outs, White Outs, etc.‣Soft Cuts, z.B., Fade-In/Out, Dissolve, Wipe, Cross-Fade, etc.

‣Analytische Shot Boundary Detection‣Basierend auf Luminanz-, Chrominanz-, und Kantenverteilung in Verbindung mit weiteren charakteristischen Bildeigenschaften‣Adaptive Schwellwertberechnung

‣Shot Boundary Detection mit maschinellen Lernverfahren‣Support Vector Machines‣Random Forrests Classifier

time

Shot  Boundary  Detec3on

‣ Identifikation von Hard Cuts basierend auf‣Luminanz, Chrominanz und Derivative‣Kantenverteilung, Kantendichte

576 577 578575574573

Shot  Boundary  Detec3on

Hardcut: if and is true for all Subregions a

i i+1 i+2i-1i-2i-3

1 2

3 4

tha(i) = α ·

i+W−1�

k=i−W

Da(k, k − 1)

−Da(i, i− 1)

+ β

Da(i, i− 1) > thα(i)

Da(i+ 1, i) < thα(i)

1

Window Size=4 (W=2)

Decompose Frame into a=4 Subregions

Da(i,i-1) ... Histogram Difference (L2-norm) between Frames i and i-1 of Subregion a

tha(i) ... adaptive Threshold for Frame i of Subregion a

Adaptive Threshold

tha(i) = α ·

i+W−1�

k=i−W

Da(k, k − 1)

−Da(i, i− 1)

+ β

Da(i, i− 1) > thα(i)

Da(i+ 1, i) < thα(i)

1

tha(i) = α ·

i+W−1�

k=i−W

Da(k, k − 1)

−Da(i, i− 1)

+ β

Da(i, i− 1) > thα(i)

Da(i+ 1, i) < thα(i)

1

Shot  Boundary  Detec3on

Drop Out

Histogram/Chrominance Difference Analysis

Flashlight / White Out

Histogram/Chrominance Difference Analysis

i i+10i+9i+8 i+11 i+12 i+13i+1

‣Identifikation und Differenzierung von Defekten

Shot  Boundary  Detec3on

Drop Out

Histogram/Chrominance Difference Analysis

Flashlight / White Out

Histogram/Chrominance Difference Analysis

i i+10i+9i+8 i+11 i+12 i+13i+1

‣Identifikation und Differenzierung von Defekten

Shot  Boundary  Detec3on

‣Identifikation von Soft Cuts, z.B. FadeIn/FadeOut

‣verwendete Bildeigenschaften:‣Luminanzhistogramme ‣Entropieverlauf‣Bewegungsvektoren

Shot  Boundary  Detec3on

‣Identifikation von Soft Cuts, z.B. FadeIn/FadeOut

‣verwendete Bildeigenschaften:‣Luminanzhistogramme ‣Entropieverlauf‣Bewegungsvektoren

1 2

3 4

Medienanalyseverfahren

‣Structural Video Analysis‣Intelligent Character Recognition‣Visual Concept Detection‣Face Detection & Clustering ‣Audio Mining

Fig. 1. Workflow of the proposed text detection method. (b) is the vertical edge map of (a). (c) is the vertical dilation map of(b). (d) is the binary map of (c). (e) the result map of subsequent connected component analysis. (f) shows the binary map afterthe adaptive projection profile refinement. (g) is the final detection result.

for text detection of nature scene images. The operator com-putes for each pixel the width of the most likely stroke con-taining the pixel. The output of the operator is a stroke-featuremap, which has the same size as the input image, while eachpixel represents the corresponding stroke width value of theinput image.

3. TEXT DETECTION IN VIDEO IMAGES

Text detection is the first task of video OCR. Our approachdetermines, whether a single frame of a video file containstext lines, for which a tight bounding box is returned. In or-der to manage detected text lines efficiently, we have defined aclass ”text line object” with the following properties: bound-ing box location (the top-left corner position), bounding boxsize. After the first round of text detection, the refinement andthe verification procedures ensure the validity of the detectionresults in order to reduce false alarms.

3.1. Text detector

Before performing the text detection process, a gaussiansmooth filter is applied to the images that have an entropyvalue larger than a predefined threshold Tentr . For our pur-pose, Tentr =5.25 has proven to be to the best advantage.

We have developed an edge based text detector, subse-quently referred to edge text detector. The advantage of ourdetector is its computational efficiency compared to other ma-chine learning based approaches, because no computation-ally expensive training period is required. However, for vi-sually different video sequences a parameter adaption has tobe performed. The best suited parameter combination of ourmethod were learned from the test runs on the given test data.

Fig. 2. Workflow of the proposed adaptive text line refinementprocedure

The processing workflow for a single frame is depictedin Fig. 1 (a-e). First, a vertical edge map is produced usingSobel filter [8] (cf. Fig. 1 (b)). Then, the morphological dila-tion operation is adopted to link the vertical character edgestogether (cf. Fig. 1 (c)). Let MinW denote the detected min-imal text line width. A rectangle kernel:1×MinW is definedfor vertical dilation operator. Subsequently, a binary maskis generated by using Otsu’s thresholding method [9]. Ulti-mately, we create a binary map after Connected Component

Intelligent  Character  Recogni3on

‣Video OCR ist im Vergleich zur traditionellen Print OCR eineanspruchsvolle Aufgabe‣heterogener/niedriger Kontrast‣schlechte Lichtverhältnisse‣verzerrter und verdeckter Text‣Kompressionsartefakte‣etc.

‣Preprocessing‣Character Identification‣Text Preprocessing‣Text Filtering‣Adaption of script geometry (Deskew)‣Image Quality Enhancement

‣Optical Character Recognition (OCR)‣Standard OCR software (OCRopus)

‣Postprocessing‣Lexical analysis ‣Statistical / context based filtering

Intelligent  Character  Recogni3on

Rostock

Text Filtering

Image QualityEnhancement

OCR

‣Character Identification‣Robuste Filtermethoden zur effizienten Extraktion von Text-Kandidaten

‣25 fps resultiert in 90.000 Einzelbildern pro 60 Minuten‣zu aufwändig für eine vollständige Filterung & OCR aller Einzelbilder

Intelligent  Character  Recogni3on

TTTTT T TT T T

Frame Frame with CandidateTextboxes

Analytical Character Identification• Edge Based Detection

• DCT / Fourier Transformation• Sobel-/Canny Edge Filter• Histogram of Oriented Gradients• Constant Gradient Variance

• Texture Based Detection• Local Binary Patterns )• Spatial Variance

Region Based Detection• Connected Component Analysis • Stroke Width Analysis

‣Analytical Textbox Filtering‣Horizontal & Vertical Projection Profile‣Stroke Width Analysis Based Verification

Intelligent  Character  Recogni3on

‣Analytical Textbox Filtering‣Horizontal & Vertical Projection Profile‣Stroke Width Analysis Based Verification

Intelligent  Character  Recogni3on

Frame with VerifiedTextboxes

Frame with CandidateTextboxes

Intelligent  Character  Recogni3on

‣Analytical Edge Based Character Identification

Intelligent  Character  Recogni3on

‣Analytical Edge Based Character Identification

Intelligent  Character  Recogni3on

‣Analytical Edge Based Character Identification

Intelligent  Character  Recogni3on

‣Character Binarization & Normalization

Original Video

Frames

Intelligent  Character  Recogni3on

‣Character Binarization & Normalization

Original Video

Frames

TextboxQuality

Enhancement

Intelligent  Character  Recogni3on

‣Character Binarization & Normalization

Original Video

Frames

TextboxQuality

Enhancement

TextboxNormalization

andBinarization

Intelligent  Character  Recogni3on

‣Standard Optical Character Recognition‣OCRopus 0.4.4 (Open Source, Apache License v2.0)‣Tesseract 3.01 (Open Source, Apache License v2.0)

Quality EnhancedNormalized Textboxes

Ueutsche Bank

Weubrandenburg

Raw OCR Results

Intelligent  Character  Recogni3on

‣OCR Post Processing‣OCR-adapted Spell Correction (hunspell 1.3.2, Open Source GNU lGPL)‣Kontextbasierte Spell Correction (siehe kontextbasierte Named Entity Recognition, AP 4.1.5)

Deutsche Bank

Neubrandenburg

OCR Results after Spell Correction

Ueutsche Bank

Weubrandenburg

Raw OCR ResultsOCR-adapted

hunspell

Medienanalyseverfahren

‣Structural Video Analysis‣Intelligent Character Recognition‣Visual Concept Detection‣Face Detection & Clustering ‣Audio Mining

Visual  Concept  Detec3on

‣Adaption des ,Bag of Words‘ Ansatzes aus dem Textretrieval‣Dictionary/Codeword Vocabulary‣Sätze werden als Vektoren über Dictionary dargestellt

Visual  Concept  Detec3on

‣Adaption des ,Bag of Words‘ Ansatzes aus dem Textretrieval‣Dictionary/Codeword Vocabulary‣Sätze werden als Vektoren über Dictionary dargestellt

‣Diskretisierung eines Einzelbildes mit Hilfe der Codewörter

Visual  Concept  Detec3on

‣Adaption des ,Bag of Words‘ Ansatzes aus dem Textretrieval‣Dictionary/Codeword Vocabulary‣Sätze werden als Vektoren über Dictionary dargestellt

‣Diskretisierung eines Einzelbildes mit Hilfe der Codewörter‣Repräsentiere Einzelbild als Histogramm der 4000 Codewortfrequenzen

‣Konzeptzuordnung durch maschinelles Lernverfahren (hier Support Vector Machines)

Medienanalyseverfahren

‣Structural Video Analysis‣Intelligent Character Recognition‣Visual Concept Detection‣Face Detection & Clustering ‣Audio Mining

Semantic Search Engine

Media Analysis‣Structural Video Analysis‣Intelligent Character Recognition‣Face Detection & Clustering ‣Audio Mining‣Visual Concept Detection

Semantic Analysis‣Named Entity Recognition‣Context Analysis‣Semantic Annotation

konzep3oneller  Workflow

Graphical User Interface‣Facetted Search‣Explorative Search‣fine granular User Annotation

Distribution / Production‣Media Asset Management

Digitization | Metadata | Rights

Contact:Dr. Harald SackHasso-Plattner-Institut für SoftwaresystemtechnikUniversität PotsdamProf.-Dr.-Helmert-Str. 2-3D-14482 Potsdam

Homepage: http://www.hpi.uni-potsdam.de/meinel/team/sack.html

Blog: http://moresemantic.blogspot.com/

E-Mail: harald.sack@hpi.uni-potsdam.de

Twitter: @lysander07 / @biblionomicon / @yovisto