Post on 02-Nov-2014
description
Harald SackInternet Technologies and Systems (ITS) Future Internet Technologies / Semantic TechnologiesHasso-Plattner-Institute for IT Systems Engineering
Research SeminarOct 5th, 2010
Mediaglobe & CONTENTUSfrom 10.000 feet above ground
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
2
• Semantic Technologies & Multimedia Retrieval
• Theseus Research Program
• Projekt Mediaglobe
• Projekt Theseus/Contentus
Mediaglobe & Contentusfrom 10.000 feet above ground
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
3
Semantic Technologies & Multimedia Retrieval
• 2009/01 started with 1 senior researcher ...
• 2009/03 Jörg Waitelonis
• 2009/12 Zalan Kramer
• 2010/01 Johannes Hercher
• 2010/03 Bernhard Quehl
• 2010/03 Haojin Yang
• 2010/05 Nadine Ludwig, Johannes Osterhoff
• 2010/07 Magnus Knuth
• 2010/09 Joscha Jäger
• 2010/11 N.N.
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
4
Semantic Technologies & Multimedia Retrieval
•Research Topics
• Semantic Web Technologies
• Ontological Engineering
• Information Retrieval
•Multimedia Retrieval
•Multimedia Analysis
• Social Networking
• Data/Information Visualization
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
5
Semantic Technologies & Multimedia Retrieval
•Research Projects
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
6
• Semantic Technologies & Multimedia Retrieval
• Theseus Research Program
• Project Mediaglobe
• Project Theseus/Contentus
Mediaglobe & Contentusfrom 10.000 feet above ground
• THESEUS - New Technologies for the Internet of Services
• GOAL: to develop a new Internet-based infrastructure in order to better use and utilize the knowledge available on the Internet.
• FOCUS: Computational Linguistics and Semantic Technologies
• Overall Budget: 200 Mio Euro / Time Frame: 2007 - 2012
• Partners:
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
7
Theseus Research Program
antibodies-online GmbH / Averbis GmbH / B2M Software AG / Blue Order Technologies AG / CIM Aachen GmbH / defa-spektrum GmbH / Deutsche Thomson oHG / DISY Informationssysteme GmbH / Empolis GmbH / EXAPT Systemtechnik GmbH / Festo AG & Co. KG / Festool GmbH / Fraunhofer-Gesellschaft / German National Library / German Research Center for Artifi cial Intelligence (DFKI) / Hasso-Plattner-Institut für Softwaresystemtechnik (HPI) GmbH / Hessian Telemedia Technology Competence Center (httc e.V.) / imc information multimedia communication AG / InfoChem Gesellschaft für chemische Information mbH / Infoman AG / Institut für Rundfunktechnik GmbH / intelligent views gmbh / jCOM1 AG / Karlsruhe Institute of Technology (KIT) / Ligmatech Automationssysteme GmbH / Ludwig-Maximilians-Universität (LMU) / Medien Bildungsgesellschaft Babelsberg GmbH / Metris GmbH / mufi n GmbH / neofonie GmbH / ontoprise GmbH / raumobil GmbH / Research Center for Information Technology Karlsruhe (FZI) / RESprotect GmbH / RWTH Aachen University / SAP AG / SEEBURGER AG / Siemens AG / Sterling SIHI GmbH / Technische Universität Darmstadt / Technische Universität Dresden / Technische Universität München / Transinsight GmbH / Universität des Saarlandes / Universität Freiburg / Universität Karlsruhe (TH) / Universität Leipzig / Universität Stuttgart / Universitätsklinikum Erlangen / VDMA – Verband Deutscher Maschinen- und Anlagenbau e.V. / Yellowmap AG
www.theseus-programm.de
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
8
Theseus Research Program
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
9
Theseus Research Program
THESEUS Core Technology Cluster• WP1: CTC Management (HHI)• WP2: Video, Audio, Metadata, Platforms (HHI)• WP3: Ontology Management (FZI)• WP4: Semantic Access to Media and Services (DFKI)• WP5: User Interface, Visualization (IGD)• WP6: Statistical Machine Learning (Siemens)• WP7: DRM/IPR Management (IIS)• WP8: Evaluation (IDMT)
THESEUS Use Cases• ALEXANDRIA - A Knowledge Platform on the Internet• CONTENTUS - Technologies for the Library of the Future• MEDICO - Intelligent Searches in Medical Databases• ORDO - Order in a Digital World• PROCESSUS - Making Better Use of Corporate Knowledge• TEXO - An Infrastructure for Web-Based Services
THESEUS SME 2009• MEDIAGLOBE + 11 other projects
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
10
• Semantic Technologies & Multimedia Retrieval
• Theseus Research Program
• Project Mediaglobe
• Project Theseus/Contentus
Mediaglobe & Contentusfrom 10.000 feet above ground
• THESEUS SME Project
• Affiliated with THESEUS/CONTENTUS
• Sept 2009 – Aug 2011 / to be extended until June 2012
• 4 Partners / Budget: 2.5 Mio €
• Topic
• Open Up Audiovisual Media Archives with historic & documentary content
• Enable exploratory and semantic search in Audiovisual Media Archives
• Business Cases
• Semantic Search Engine Infrastructure and Services for
•Media Archives,
• Broadcasters and Producers
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
11
Project Mediaglobe - About
www.projekt-mediaglobe.de
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
12
Project Mediaglobe - Partners
Project Management Research & Development
AV Archive Media Asset Management System
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
13
Project Mediaglobe - Topics
Automated Media Analysis
Seman1c Search
Digi1za1on of AV Media
Rights Management
Media Archive Requirements
User Interface Design
Metadata Engineering
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
14
Project Mediaglobe - Topics
Topic: Requirement Analysis and Media Census Data Collection from > 200 AV-Archives in Germany about digitization, online distribution, and rights management
Topic: Efficient Digitization of AV-Archives Workflow definition and avaluation, best practices
Topic: Software Enabled Digital Rights Management Workflow definition and best practices for unique determination of copyrights
Topic: automated AV Media Analysis Extraction of textual and semantic metadata for semantic search
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
15
Project Mediaglobe - Topics
Topic: Metadata Engineering Definition, interlinking and validation of (semantic) metadata model for media archives
Topic: Semantic Search Combining semantic metadata of heterogeneous provenance into semantic searchIndex to enable high precision/recall multimedia retrieval and exploratory search
Topic: User Interface Design Support of innovative search strategies with semantic data/information visualization
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
16
Project Mediaglobe - Responsibilities
Structural AV-SegmentationIntelligent Character RecognitionFace/Body DetectionGenre DetectionSpeaker DetektionAutomated Speech Recognition
Ontology DesignEntity-Mapping / Schema MappingSemantic Enabled Retrieval Exploratory SearchGUI Design Data/Information Visualization
Media Asset ManagementDistribution
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
17
Project Mediaglobe - HPI Research
AutomatedMedia Analysis
Structural Analysis
Intelligent CharacterRecogni1on
Face Detec(on + Tracking
Audio Analysis
Genre Analysis
Seman1cAnalysis
Context Analysis
En1ty Mapping
Evalua1on FrameworkMedia Transcoding
Persistent Storage
UIMA -‐ Unstructured Informa1on Management Architecture
digi1zedAV-‐Media
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
18
Project Mediaglobe - HPI Research
Media Transcoding
Archival and Distribution•SD - DVCpro 50•HD - DVCpro HD
Processing•MPEG4/AVC•Downscaling
Evaluation Framework
•Accurate manual annotation of 25 video clips (750 min) from defa spektrum archive•TREC video test datasets
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
19
Project Mediaglobe - HPI Research
video
scenes
shots
subhots
frames
Structural Analysis
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
20
Project Mediaglobe - HPI Research
Structural Analysis
shots
• Shot Boundary Detection
• Identification of• Hard Cuts• Drop Outs• Soft Cuts, as e.g., Dissolve, Wipe, Cross-Fade, etc.
Analytical Shot Boundary Detection• Analysis of Luminance/Chrominance Histograms• Analysis of Edge Distribution• Analysis of Motion Vectors
Machine Learning• Classification of Hard/Soft Cuts based on Image Features• Random Trees • Support Vector Machines
histogram differences
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
21
Project Mediaglobe - HPI Research
Structural Analysis
Analytical Shot Boundary Detection• How to differentiate between Soft Cuts and Camera Rotation, Pan, and Zoom?
• Analysis of Motion Vectors
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
22
Project Mediaglobe - HPI Research
Structural Analysis
(Preliminary) Evaluation• Yovisto/Mediaglobe• CTC 2 - Shot Detection (HHI)• Advene Shot Detection• Student seminar project
(analytical analysis, AL)• Student seminar project
(machine learning, ML)
recall precision f1 measureyovisto/mediaglobe 0,76 0,77 0,75
Advene 0,64 0,76 0,67
HHI 0,78 0,77 0,77
Students AL 0,72 0,78 0,71
Students ML 0,80 0,81 0,80
new 0,87 0,83 0,85
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
23
Project Mediaglobe - HPI Research
• Preprocessing• Keyframe extraction• Script identification• Script filtering• Adaption of script geometry (Deskew)• Image quality enhancement
• Optical Character Recognition (OCR)• with standard software (tesseract)
• Postprocessing• Keyterm spotting• Lexical analysis • Statistical filtering
Intelligent Character Recognition
Prof. Rudolf AgstenLDPD
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
24
Project Mediaglobe - HPI Research
Intelligent Character Recognition
(a) Original
(f) Mask after erosion & dilation(e) Binarized(d) Normalized
(c) Weighted DCT(b) DCT
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
25
Project Mediaglobe - HPI Research
Intelligent Character Recognition
(h) sequence 1
(i) sequence 2
(k) Adapted sequence 2
(j) Adapted sequence 1
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
26
Project Mediaglobe - HPI Research
Tex
Metadaten Engineering
• Requirement Analysis• Semantic Data Modelling• Vocabulary Inter-Linking• MPEG-7 Compliance
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
27
Project Mediaglobe - HPI Research
• Entity Mapping • Mapping keyterms (text) to semantic entities• Context Analysis and Disambiguation
Truman
User Tag
LOD Cloud
Truman Capote
Harry S. Truman
Truman, Minesota
The Truman Show
?
?
?
?
Metadaten Engineering
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
28
Project Mediaglobe - HPI Research
• Entity Mapping • Mapping keyterms (text) to semantic entities• Context Analysis and Disambiguation
Truman
PotsdamEisenhower
Inauguration
Context Graph Analysis
Metadaten Engineering
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
29
Project Mediaglobe - HPI ResearchAutomated Media Analysis
Semantic Search
• Creation of a Semantic Search Index• Query String Mapping and Refinement• Facetted Search• Search by Timeline• Geographical Search• Exploratory Search
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
30
Project Mediaglobe - HPI Research
User Interface Design
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
31
Project Mediaglobe - HPI Research
User Interface Design
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
32
Project Mediaglobe - HPI Research
User Interface Design
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
33
• Semantic Technologies & Multimedia Retrieval
• Theseus Research Program
• Project Mediaglobe
• Project Theseus/Contentus
Mediaglobe & Contentusfrom 10.000 feet above ground
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
34
CONTENTUS•Use Case (among 5 others) of the German Theseus Research
Program•Time Frame: 2007 - 2012•7 Project Partners•Supported by the Bundesministerium für Wirtschaft und Technologie
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
35
Motivation•Deterioration of Media (Books,
Video, Records, DVD, CD… )
•Enormous amount of multimedia objects
•High costs and manpower to drive a digitizing workflow
•Almost no internet-based linking of cultural goods
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
36
Project Goals•Development of concepts and Technologies for
,Next Generation Multimedia Libraries‘
• Automatic quality control & restauration• Automatic metadata generation • Semi-automatic semantic linking • Incorporation of social networks and expert communities
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
37
Contentus Process Chain HPI Research
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
38 Contentus Service Platform
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
39 Contentus Process Chain
Backend Media Processing
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
40 Selected Contentus Components
Face Detection / Dirt Detection & Removal
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
41 Selected Contentus Components
Face Detection / Scratch Detection & Removal
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
42 Selected Contentus Components
Layout Detection /OCR Preprocessing
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
43 Selected Contentus Components
Audio Analysis /Audio Annotation
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
44 Contentus SMMS Process Chain
Backend Media ProcessingFrontend
Processing
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
45
SMMS GUI DEMO - D2
Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam
46
• Semantic Technologies & Multimedia Retrieval
• Project Mediaglobe
• Project Theseus/Contentus
Mediaglobe & Contentusfrom 10.000 feet above ground
Thank you for your Attention!