Mediaglobe - Medienanalyseverfahren
-
Upload
harald-sack -
Category
Education
-
view
659 -
download
0
Transcript of Mediaglobe - Medienanalyseverfahren
Dr. Harald SackHasso-Plattner-Institut for IT-Systems Engineering
University of Potsdam
Medienanalyse-‐verfahren
• Das HPI wurde im Oktober 1998 als Public-Private-Partnership gegründet
• HPI Forschung und Lehre konzentriert sichauf IT Systems Engineering
• 10 Professoren & 100 wissenschaftl. Mitarbeiter
• 450 Bachelor / Master Studenten
• HPI is winner of CHE-Ranking 2010
http://hpi.uni-potsdam.de/
Hasso Plattner Institute für SoftwaresystemtechnikUniversität Potsdam
• Forschungsthemen• Semantic Web Technologies
• Ontological Engineering
• Information Retrieval
• Multimedia Analysis & Retrieval
• Social Networking
• Data/Information Visualization
• Forschungsprojekte:
Hasso Plattner Institute für SoftwaresystemtechnikSemantic Technologies & Multimedia Retrieval Research Group
Semantic Search Engine
Media Analysis‣Structural Video Analysis‣Intelligent Character Recognition‣Face Detection & Clustering ‣Audio Mining‣Visual Concept Detection
Semantic Analysis‣Named Entity Recognition‣Context Analysis‣Semantic Annotation
konzep3oneller Workflow
Graphical User Interface‣Facetted Search‣Explorative Search‣fine granular User Annotation
Distribution / Production‣Media Asset Management
Digitization | Metadata | Rights
Medienanalyseverfahren
‣Structural Video Analysis‣Intelligent Character Recognition‣Face Detection & Clustering ‣Audio Mining‣Visual Concept Detection
Medienanalyseverfahren
‣Structural Video Analysis‣Intelligent Character Recognition‣Visual Concept Detection‣Face Detection & Clustering ‣Audio Mining
scenes
shots
subshots
frames
video
Structural Video Analysis
‣Zerlegung der AV-Daten in Medienfragmente unter Berücksichtigung inhaltlicher Kohärenz
key frames
scenes
shots
subshots
frames
video
Structural Video Analysis
‣Zerlegung der AV-Daten in Medienfragmente unter Berücksichtigung inhaltlicher Kohärenz
Shot Boundary Detec3on
‣Automatische Identifikation von‣Hard Cuts‣Defects, z.B. Drop Outs, White Outs, etc.‣Soft Cuts, z.B., Fade-In/Out, Dissolve, Wipe, Cross-Fade, etc.
‣Analytische Shot Boundary Detection‣Basierend auf Luminanz-, Chrominanz-, und Kantenverteilung in Verbindung mit weiteren charakteristischen Bildeigenschaften‣Adaptive Schwellwertberechnung
‣Shot Boundary Detection mit maschinellen Lernverfahren‣Support Vector Machines‣Random Forrests Classifier
time
Shot Boundary Detec3on
‣ Identifikation von Hard Cuts basierend auf‣Luminanz, Chrominanz und Derivative‣Kantenverteilung, Kantendichte
576 577 578575574573
Shot Boundary Detec3on
Hardcut: if and is true for all Subregions a
i i+1 i+2i-1i-2i-3
1 2
3 4
tha(i) = α ·
i+W−1�
k=i−W
Da(k, k − 1)
−Da(i, i− 1)
+ β
Da(i, i− 1) > thα(i)
Da(i+ 1, i) < thα(i)
1
Window Size=4 (W=2)
Decompose Frame into a=4 Subregions
Da(i,i-1) ... Histogram Difference (L2-norm) between Frames i and i-1 of Subregion a
tha(i) ... adaptive Threshold for Frame i of Subregion a
Adaptive Threshold
tha(i) = α ·
i+W−1�
k=i−W
Da(k, k − 1)
−Da(i, i− 1)
+ β
Da(i, i− 1) > thα(i)
Da(i+ 1, i) < thα(i)
1
tha(i) = α ·
i+W−1�
k=i−W
Da(k, k − 1)
−Da(i, i− 1)
+ β
Da(i, i− 1) > thα(i)
Da(i+ 1, i) < thα(i)
1
Shot Boundary Detec3on
Drop Out
Histogram/Chrominance Difference Analysis
Flashlight / White Out
Histogram/Chrominance Difference Analysis
i i+10i+9i+8 i+11 i+12 i+13i+1
‣Identifikation und Differenzierung von Defekten
Shot Boundary Detec3on
Drop Out
Histogram/Chrominance Difference Analysis
Flashlight / White Out
Histogram/Chrominance Difference Analysis
i i+10i+9i+8 i+11 i+12 i+13i+1
‣Identifikation und Differenzierung von Defekten
Shot Boundary Detec3on
‣Identifikation von Soft Cuts, z.B. FadeIn/FadeOut
‣verwendete Bildeigenschaften:‣Luminanzhistogramme ‣Entropieverlauf‣Bewegungsvektoren
Shot Boundary Detec3on
‣Identifikation von Soft Cuts, z.B. FadeIn/FadeOut
‣verwendete Bildeigenschaften:‣Luminanzhistogramme ‣Entropieverlauf‣Bewegungsvektoren
1 2
3 4
Medienanalyseverfahren
‣Structural Video Analysis‣Intelligent Character Recognition‣Visual Concept Detection‣Face Detection & Clustering ‣Audio Mining
Fig. 1. Workflow of the proposed text detection method. (b) is the vertical edge map of (a). (c) is the vertical dilation map of(b). (d) is the binary map of (c). (e) the result map of subsequent connected component analysis. (f) shows the binary map afterthe adaptive projection profile refinement. (g) is the final detection result.
for text detection of nature scene images. The operator com-putes for each pixel the width of the most likely stroke con-taining the pixel. The output of the operator is a stroke-featuremap, which has the same size as the input image, while eachpixel represents the corresponding stroke width value of theinput image.
3. TEXT DETECTION IN VIDEO IMAGES
Text detection is the first task of video OCR. Our approachdetermines, whether a single frame of a video file containstext lines, for which a tight bounding box is returned. In or-der to manage detected text lines efficiently, we have defined aclass ”text line object” with the following properties: bound-ing box location (the top-left corner position), bounding boxsize. After the first round of text detection, the refinement andthe verification procedures ensure the validity of the detectionresults in order to reduce false alarms.
3.1. Text detector
Before performing the text detection process, a gaussiansmooth filter is applied to the images that have an entropyvalue larger than a predefined threshold Tentr . For our pur-pose, Tentr =5.25 has proven to be to the best advantage.
We have developed an edge based text detector, subse-quently referred to edge text detector. The advantage of ourdetector is its computational efficiency compared to other ma-chine learning based approaches, because no computation-ally expensive training period is required. However, for vi-sually different video sequences a parameter adaption has tobe performed. The best suited parameter combination of ourmethod were learned from the test runs on the given test data.
Fig. 2. Workflow of the proposed adaptive text line refinementprocedure
The processing workflow for a single frame is depictedin Fig. 1 (a-e). First, a vertical edge map is produced usingSobel filter [8] (cf. Fig. 1 (b)). Then, the morphological dila-tion operation is adopted to link the vertical character edgestogether (cf. Fig. 1 (c)). Let MinW denote the detected min-imal text line width. A rectangle kernel:1×MinW is definedfor vertical dilation operator. Subsequently, a binary maskis generated by using Otsu’s thresholding method [9]. Ulti-mately, we create a binary map after Connected Component
Intelligent Character Recogni3on
‣Video OCR ist im Vergleich zur traditionellen Print OCR eineanspruchsvolle Aufgabe‣heterogener/niedriger Kontrast‣schlechte Lichtverhältnisse‣verzerrter und verdeckter Text‣Kompressionsartefakte‣etc.
‣Preprocessing‣Character Identification‣Text Preprocessing‣Text Filtering‣Adaption of script geometry (Deskew)‣Image Quality Enhancement
‣Optical Character Recognition (OCR)‣Standard OCR software (OCRopus)
‣Postprocessing‣Lexical analysis ‣Statistical / context based filtering
Intelligent Character Recogni3on
Rostock
Text Filtering
Image QualityEnhancement
OCR
‣Character Identification‣Robuste Filtermethoden zur effizienten Extraktion von Text-Kandidaten
‣25 fps resultiert in 90.000 Einzelbildern pro 60 Minuten‣zu aufwändig für eine vollständige Filterung & OCR aller Einzelbilder
Intelligent Character Recogni3on
TTTTT T TT T T
Frame Frame with CandidateTextboxes
Analytical Character Identification• Edge Based Detection
• DCT / Fourier Transformation• Sobel-/Canny Edge Filter• Histogram of Oriented Gradients• Constant Gradient Variance
• Texture Based Detection• Local Binary Patterns )• Spatial Variance
Region Based Detection• Connected Component Analysis • Stroke Width Analysis
‣Analytical Textbox Filtering‣Horizontal & Vertical Projection Profile‣Stroke Width Analysis Based Verification
Intelligent Character Recogni3on
‣Analytical Textbox Filtering‣Horizontal & Vertical Projection Profile‣Stroke Width Analysis Based Verification
Intelligent Character Recogni3on
Frame with VerifiedTextboxes
Frame with CandidateTextboxes
Intelligent Character Recogni3on
‣Analytical Edge Based Character Identification
Intelligent Character Recogni3on
‣Analytical Edge Based Character Identification
Intelligent Character Recogni3on
‣Analytical Edge Based Character Identification
Intelligent Character Recogni3on
‣Character Binarization & Normalization
Original Video
Frames
Intelligent Character Recogni3on
‣Character Binarization & Normalization
Original Video
Frames
TextboxQuality
Enhancement
Intelligent Character Recogni3on
‣Character Binarization & Normalization
Original Video
Frames
TextboxQuality
Enhancement
TextboxNormalization
andBinarization
Intelligent Character Recogni3on
‣Standard Optical Character Recognition‣OCRopus 0.4.4 (Open Source, Apache License v2.0)‣Tesseract 3.01 (Open Source, Apache License v2.0)
Quality EnhancedNormalized Textboxes
Ueutsche Bank
Weubrandenburg
Raw OCR Results
Intelligent Character Recogni3on
‣OCR Post Processing‣OCR-adapted Spell Correction (hunspell 1.3.2, Open Source GNU lGPL)‣Kontextbasierte Spell Correction (siehe kontextbasierte Named Entity Recognition, AP 4.1.5)
Deutsche Bank
Neubrandenburg
OCR Results after Spell Correction
Ueutsche Bank
Weubrandenburg
Raw OCR ResultsOCR-adapted
hunspell
Medienanalyseverfahren
‣Structural Video Analysis‣Intelligent Character Recognition‣Visual Concept Detection‣Face Detection & Clustering ‣Audio Mining
Visual Concept Detec3on
‣Adaption des ,Bag of Words‘ Ansatzes aus dem Textretrieval‣Dictionary/Codeword Vocabulary‣Sätze werden als Vektoren über Dictionary dargestellt
Visual Concept Detec3on
‣Adaption des ,Bag of Words‘ Ansatzes aus dem Textretrieval‣Dictionary/Codeword Vocabulary‣Sätze werden als Vektoren über Dictionary dargestellt
‣Diskretisierung eines Einzelbildes mit Hilfe der Codewörter
Visual Concept Detec3on
‣Adaption des ,Bag of Words‘ Ansatzes aus dem Textretrieval‣Dictionary/Codeword Vocabulary‣Sätze werden als Vektoren über Dictionary dargestellt
‣Diskretisierung eines Einzelbildes mit Hilfe der Codewörter‣Repräsentiere Einzelbild als Histogramm der 4000 Codewortfrequenzen
‣Konzeptzuordnung durch maschinelles Lernverfahren (hier Support Vector Machines)
Medienanalyseverfahren
‣Structural Video Analysis‣Intelligent Character Recognition‣Visual Concept Detection‣Face Detection & Clustering ‣Audio Mining
Semantic Search Engine
Media Analysis‣Structural Video Analysis‣Intelligent Character Recognition‣Face Detection & Clustering ‣Audio Mining‣Visual Concept Detection
Semantic Analysis‣Named Entity Recognition‣Context Analysis‣Semantic Annotation
konzep3oneller Workflow
Graphical User Interface‣Facetted Search‣Explorative Search‣fine granular User Annotation
Distribution / Production‣Media Asset Management
Digitization | Metadata | Rights
Contact:Dr. Harald SackHasso-Plattner-Institut für SoftwaresystemtechnikUniversität PotsdamProf.-Dr.-Helmert-Str. 2-3D-14482 Potsdam
Homepage: http://www.hpi.uni-potsdam.de/meinel/team/sack.html
Blog: http://moresemantic.blogspot.com/
E-Mail: [email protected]
Twitter: @lysander07 / @biblionomicon / @yovisto