Mediaglobe - Medienanalyseverfahren

Dr. Harald SackHasso-Plattner-Institut for IT-Systems Engineering

University of Potsdam

Medienanalyse-‐verfahren

• Das HPI wurde im Oktober 1998 als Public-Private-Partnership gegründet

• HPI Forschung und Lehre konzentriert sichauf IT Systems Engineering

• 10 Professoren & 100 wissenschaftl. Mitarbeiter

• 450 Bachelor / Master Studenten

• HPI is winner of CHE-Ranking 2010

http://hpi.uni-potsdam.de/

Hasso Plattner Institute für SoftwaresystemtechnikUniversität Potsdam

http://hpi.uni-potsdam.de

http://hpi.uni-potsdam.de

• Forschungsthemen• Semantic Web Technologies

• Ontological Engineering

• Information Retrieval

• Multimedia Analysis & Retrieval

• Social Networking

• Data/Information Visualization

• Forschungsprojekte:

Hasso Plattner Institute für SoftwaresystemtechnikSemantic Technologies & Multimedia Retrieval Research Group

Semantic Search Engine

Media Analysis‣Structural Video Analysis‣Intelligent Character Recognition‣Face Detection & Clustering ‣Audio Mining‣Visual Concept Detection

Semantic Analysis‣Named Entity Recognition‣Context Analysis‣Semantic Annotation

konzep3oneller Workflow

Graphical User Interface‣Facetted Search‣Explorative Search‣fine granular User Annotation

Distribution / Production‣Media Asset Management

Digitization | Metadata | Rights

Medienanalyseverfahren

‣Structural Video Analysis‣Intelligent Character Recognition‣Face Detection & Clustering ‣Audio Mining‣Visual Concept Detection


‣Structural Video Analysis‣Intelligent Character Recognition‣Visual Concept Detection‣Face Detection & Clustering ‣Audio Mining

scenes

shots

subshots

frames

video

Structural Video Analysis

‣Zerlegung der AV-Daten in Medienfragmente unter Berücksichtigung inhaltlicher Kohärenz

key frames

scenes

shots

subshots

frames

video

Structural Video Analysis

‣Zerlegung der AV-Daten in Medienfragmente unter Berücksichtigung inhaltlicher Kohärenz

Shot Boundary Detec3on

‣Automatische Identifikation von‣Hard Cuts‣Defects, z.B. Drop Outs, White Outs, etc.‣Soft Cuts, z.B., Fade-In/Out, Dissolve, Wipe, Cross-Fade, etc.

‣Analytische Shot Boundary Detection‣Basierend auf Luminanz-, Chrominanz-, und Kantenverteilung in Verbindung mit weiteren charakteristischen Bildeigenschaften‣Adaptive Schwellwertberechnung

‣Shot Boundary Detection mit maschinellen Lernverfahren‣Support Vector Machines‣Random Forrests Classifier

time


‣ Identifikation von Hard Cuts basierend auf‣Luminanz, Chrominanz und Derivative‣Kantenverteilung, Kantendichte

576 577 578575574573


Hardcut: if and is true for all Subregions a

i i+1 i+2i-1i-2i-3

1 2

3 4

tha(i) = α ·

i+W−1�

k=i−W

Da(k, k − 1)

−Da(i, i− 1)

+ β

Da(i, i− 1) > thα(i)

Da(i+ 1, i) < thα(i)

1

Window Size=4 (W=2)

Decompose Frame into a=4 Subregions

Da(i,i-1) ... Histogram Difference (L2-norm) between Frames i and i-1 of Subregion a

tha(i) ... adaptive Threshold for Frame i of Subregion a

Adaptive Threshold

tha(i) = α ·

i+W−1�

k=i−W

Da(k, k − 1)

−Da(i, i− 1)

+ β



1

tha(i) = α ·

i+W−1�

k=i−W

Da(k, k − 1)

−Da(i, i− 1)

+ β



1


Drop Out

Histogram/Chrominance Difference Analysis

Flashlight / White Out

Histogram/Chrominance Difference Analysis

i i+10i+9i+8 i+11 i+12 i+13i+1

‣Identifikation und Differenzierung von Defekten


‣Identifikation von Soft Cuts, z.B. FadeIn/FadeOut

‣verwendete Bildeigenschaften:‣Luminanzhistogramme ‣Entropieverlauf‣Bewegungsvektoren


‣Identifikation von Soft Cuts, z.B. FadeIn/FadeOut

‣verwendete Bildeigenschaften:‣Luminanzhistogramme ‣Entropieverlauf‣Bewegungsvektoren

1 2

3 4

Fig. 1. Workflow of the proposed text detection method. (b) is the vertical edge map of (a). (c) is the vertical dilation map of(b). (d) is the binary map of (c). (e) the result map of subsequent connected component analysis. (f) shows the binary map afterthe adaptive projection profile refinement. (g) is the final detection result.

for text detection of nature scene images. The operator com-putes for each pixel the width of the most likely stroke con-taining the pixel. The output of the operator is a stroke-featuremap, which has the same size as the input image, while eachpixel represents the corresponding stroke width value of theinput image.

3. TEXT DETECTION IN VIDEO IMAGES

Text detection is the first task of video OCR. Our approachdetermines, whether a single frame of a video file containstext lines, for which a tight bounding box is returned. In or-der to manage detected text lines efficiently, we have defined aclass ”text line object” with the following properties: bound-ing box location (the top-left corner position), bounding boxsize. After the first round of text detection, the refinement andthe verification procedures ensure the validity of the detectionresults in order to reduce false alarms.

3.1. Text detector

Before performing the text detection process, a gaussiansmooth filter is applied to the images that have an entropyvalue larger than a predefined threshold Tentr . For our pur-pose, Tentr =5.25 has proven to be to the best advantage.

We have developed an edge based text detector, subse-quently referred to edge text detector. The advantage of ourdetector is its computational efficiency compared to other ma-chine learning based approaches, because no computation-ally expensive training period is required. However, for vi-sually different video sequences a parameter adaption has tobe performed. The best suited parameter combination of ourmethod were learned from the test runs on the given test data.

Fig. 2. Workflow of the proposed adaptive text line refinementprocedure

The processing workflow for a single frame is depictedin Fig. 1 (a-e). First, a vertical edge map is produced usingSobel filter [8] (cf. Fig. 1 (b)). Then, the morphological dila-tion operation is adopted to link the vertical character edgestogether (cf. Fig. 1 (c)). Let MinW denote the detected min-imal text line width. A rectangle kernel:1×MinW is definedfor vertical dilation operator. Subsequently, a binary maskis generated by using Otsu’s thresholding method [9]. Ulti-mately, we create a binary map after Connected Component

Intelligent Character Recogni3on

‣Video OCR ist im Vergleich zur traditionellen Print OCR eineanspruchsvolle Aufgabe‣heterogener/niedriger Kontrast‣schlechte Lichtverhältnisse‣verzerrter und verdeckter Text‣Kompressionsartefakte‣etc.

‣Preprocessing‣Character Identification‣Text Preprocessing‣Text Filtering‣Adaption of script geometry (Deskew)‣Image Quality Enhancement

‣Optical Character Recognition (OCR)‣Standard OCR software (OCRopus)

‣Postprocessing‣Lexical analysis ‣Statistical / context based filtering


Rostock

Text Filtering

Image QualityEnhancement

OCR

‣Character Identification‣Robuste Filtermethoden zur effizienten Extraktion von Text-Kandidaten

‣25 fps resultiert in 90.000 Einzelbildern pro 60 Minuten‣zu aufwändig für eine vollständige Filterung & OCR aller Einzelbilder


TTTTT T TT T T

Frame Frame with CandidateTextboxes

Analytical Character Identification• Edge Based Detection

• DCT / Fourier Transformation• Sobel-/Canny Edge Filter• Histogram of Oriented Gradients• Constant Gradient Variance

• Texture Based Detection• Local Binary Patterns )• Spatial Variance

Region Based Detection• Connected Component Analysis • Stroke Width Analysis

‣Analytical Textbox Filtering‣Horizontal & Vertical Projection Profile‣Stroke Width Analysis Based Verification


‣Analytical Textbox Filtering‣Horizontal & Vertical Projection Profile‣Stroke Width Analysis Based Verification


Frame with VerifiedTextboxes

Frame with CandidateTextboxes


‣Analytical Edge Based Character Identification


‣Character Binarization & Normalization

Original Video

Frames



Original Video

Frames

TextboxQuality

Enhancement



Original Video

Frames

TextboxQuality

Enhancement

TextboxNormalization

andBinarization


‣Standard Optical Character Recognition‣OCRopus 0.4.4 (Open Source, Apache License v2.0)‣Tesseract 3.01 (Open Source, Apache License v2.0)

Quality EnhancedNormalized Textboxes

Ueutsche Bank

Weubrandenburg

Raw OCR Results


‣OCR Post Processing‣OCR-adapted Spell Correction (hunspell 1.3.2, Open Source GNU lGPL)‣Kontextbasierte Spell Correction (siehe kontextbasierte Named Entity Recognition, AP 4.1.5)

Deutsche Bank

Neubrandenburg

OCR Results after Spell Correction

Ueutsche Bank

Weubrandenburg

Raw OCR ResultsOCR-adapted

hunspell

Visual Concept Detec3on

‣Adaption des ,Bag of Words‘ Ansatzes aus dem Textretrieval‣Dictionary/Codeword Vocabulary‣Sätze werden als Vektoren über Dictionary dargestellt



‣Diskretisierung eines Einzelbildes mit Hilfe der Codewörter



‣Diskretisierung eines Einzelbildes mit Hilfe der Codewörter‣Repräsentiere Einzelbild als Histogramm der 4000 Codewortfrequenzen

‣Konzeptzuordnung durch maschinelles Lernverfahren (hier Support Vector Machines)

Semantic Search Engine

Media Analysis‣Structural Video Analysis‣Intelligent Character Recognition‣Face Detection & Clustering ‣Audio Mining‣Visual Concept Detection

Semantic Analysis‣Named Entity Recognition‣Context Analysis‣Semantic Annotation

konzep3oneller Workflow

Graphical User Interface‣Facetted Search‣Explorative Search‣fine granular User Annotation

Distribution / Production‣Media Asset Management

Digitization | Metadata | Rights

Contact:Dr. Harald SackHasso-Plattner-Institut für SoftwaresystemtechnikUniversität PotsdamProf.-Dr.-Helmert-Str. 2-3D-14482 Potsdam

Homepage: http://www.hpi.uni-potsdam.de/meinel/team/sack.html

Blog: http://moresemantic.blogspot.com/

E-Mail: [email protected]

Twitter: @lysander07 / @biblionomicon / @yovisto

http://www.hpi.uni-potsdam.de/meinel/team/sack.html

http://www.hpi.uni-potsdam.de/meinel/team/sack.html

http://moresemantic.blogspot.com

http://moresemantic.blogspot.com

mailto:[email protected]

mailto:[email protected]

Mediaglobe - Medienanalyseverfahren

Education

Transcript of Mediaglobe - Medienanalyseverfahren