medialab
PISA PISA –– Proof Proof of Conceptof Concept
Production, Indexing and Search of Audiovisual MaterialProduction, Indexing and Search of Audiovisual Material
2
PISA - Positioning
PISA – Production and Indexing of Audiovisual Media
! 30 Man-year
! Virtual Modelling
! Computer Assisted Manufacturing
! Unsupervised Feature Extraction
! Search Engine Technology
3
Context - Digital Media Production
Production Platform
Suprastructure – Metadata Mgnt
Production and distribution
Infrastructure - Networks and Storage
Production and distribution
Ingest
Media
Asset Mgnt
Editing
Playout
Mastering
4
Digital Asset Management, Content Management…
Production Platform
Suprastructure – Metadata Mgnt
Infrastructure - Networks and Storage
Production and distribution
5
User Expectations
Production Platform
Data General
Data General
Data General
Data General
Data General
Data General
MetaMeta
DataData
MetaMeta
DataData
Communication
(Information)
Suprastructure – Metadata Mgnt
Infrastructure - Networks and Storage
Production and distributionMedia Production
• Mass-production
• Anywhere, anytime, on any device
• Personalisation
The ideal search engine
• retrieves all relevant items (recall 100%)
• without false positives (precision 100%)
• enables instant access to digital media
• with respect to intellectual property.
6
Archiving – Disclosure, Annotation,…
archiefnummer : ALG 20010813 1
fragmentnummer : 1
reeks : 1000 ZONNEN EN GARNALEN
bandnummer : E03024404
formaat : DBCM
fragmenttitel : 1000 ZONNEN & GARNALEN
beeld : KL/PALPLUS
fragmentduur : 18 20
tekst : 0'00" TOERISTISCH REPORTAGEMAGAZINE OVERZICHT
ONDERWERPEN GENERIEK TOERISTISCH REPORTAGEMAGAZINE,
OVERZICHT ONDERWERPEN
0'50" VANDAAG : KUNSTENAAR LUC HOFKENS ONTWIERP EEN OASE
OP ZIJN DAKTERRAS IN BORGERHOUT DIE DOET DENKEN AAN DE
GRAND CANYON INTERVIEW MET LUC EN ZIJN VROUW
MARILOU BUITENBEELD DAK MET OMGEVING BUITENKANT
ARBEIDERSWONING, PANO OVER ROTSWANDEN, KRATEN MET WATER,
BEPANTING, FOTOALBUM MET VERLOOP WERKEN
4'00" JUNIOR : KLAARTJE ALAERTS, 13 JAAR WIL ASTRONAUTEN
WORDEN ZE BEZOEKT HETEUROSPACE CENTER METRUIMTEVEREN,
RAKETTEN SIMULATIE IN RUIMTEVEER, INTERVIEW, HEEFT EEN
UFO GEZIEN MAAKT ZELF KLEIN RAKETJE, SCHIET HET AF
7'50" DE SCHEURKALENDER : ARCHIEF RECLAMEFILM IBM
INTERVIEW MAURICE DE WILDE, EERSTE PERSOONLIJKECOMPUTER
trefwoorden : BELGIE; BORGERHOUT; ARTIEST; OASE; KUNST; GRAND
CANYON (NATUURGEBIED); DAK; TERRAS; INTERVIEW; EURO
SPACE CENTER; RUIMTEVAART; PC; BOOTTOCHT; RIJKDOM;
PASSAGIER; GASTRONOMIE; RESTAURANT; PERSONEEL;
VAKANTIE; BINNENBEELD; SCHIP; BECKERS LEEN; VRT;
LOTTO; RADIOOMROEPSTER; KLANKSTUDIO; UITVINDING;
BARBECUE; BETONMOLEN; IBM; RECLAMESPOT
rechthebbende : VRT
Opzoekscherm FILM Set: 16 Aantal: 1
blz 1 van 3
trefwoorden: ibm and vrt
archiefnummer: -
uitzendjaar: maand: dag:
fragmentnummer: fragmentduur:
reeks:
formaat: bandnummer:
aflevering: afleveringsnummer:
programma: uitzenddatum:
fragmenttitel:
tekst:
kategorie:
opnamedatum: opnamenummer:
journalist: rechthebbende:
SETS
The strings required for the operation are not defined
F11 F12 F13 F14 F17 F18 F19 F20 Ent
Eindigen Sets Refset Toon Vorige Volg/Leeg Thesaurus Commando Opzoeken
7
8
Web 2.0 – « User Generated Content », « Social Tagging »?
9
Catch-22
-> “Annotation” is a subjective interpretation, and
thus it is not scalable
-> Automated processing of information is a key
discriminator, but it requires correct and
structured metadata
-> Product Engineering is the source of structured
and meaningful information, but creative staff
are not susceptible to technology
10
Objectives - Proof of Concept
• One Set of Numbers(!)
• Model Driven Development
• Computer Assisted Manufacturing
• Unsupervised Feature Extraction
• Efficient Search and Retrieval
Develop an extensible data-model and a consistent applicationDevelop an extensible data-model and a consistent application
framework, accessible via an intuitive user-interfaceframework, accessible via an intuitive user-interface
!
(! Digitizing analogue and disintegrated information flows)
11
PISA - Overview
Abstract
Information
Footage
Concept
Virtual
Model
Model Driven Development:
• Setting (Stage properties, light)
• Character
• Synthetic Speech
• Sound effects
• Character animation
• Virtual camera
VirtualModelling
AutomatedProductionRealisation
• Ingest
• Editing
• Mastering
• Reproduction to alternative distribution channels
Computer Assisted Design
Computer Assisted Manufacturing
Script Editing
• Parse scenario
• Shooting script editor
• Storyboard
Script Editing
Reverse Engineering
• Shot segmentation
• Video footprint and reuse detection
• Biometric face detection
• Background analysis
• Speech-to-text
Interpretation
• Character identification
• Background categorisation
and identification
• Topic and eventdetection
Intelligent Analysis and
Quantization
Quantization
Analysis
Indexing
Retrieval
• Timecode based indexing
• Geo-temporal reference
• Taxonomy based indexing and search
• Facetted search
Search Engine
12
The Search Client
13
The Search Engine
Media Asset
Management System
(Ardome)
Search Engine
(Lucene/SOLR)
! Search federation by system integration
! Facetted search
! Integrated application of keywords
! Intuitive and structured presentation of results
! Random access to audiovisual material
Search Client
(Custom Development)
Legacy Video Library
(Basisplus)
Actual news items
(Ardome)
Raw Material
(EBU Superpop)
<NewsML-G2>
14
The Annotation Client
15
Computer Assisted Analysis
16
Intelligent Analysis
Media Asset
Management
(Ardome)
Unsupervised feature extraction provides time-
coded attributes:
! Shot segmentation and keyframe extraction
! Audio segmentation and speaker recognition
! Subtitle processing and speech recognition
! Taxonomy-driven topic detection
! Face recognition
! Scene recognition
! Copy detection
Shot
Segmentation
Speech
Recognition
Face
DetectionTopic
Detection
Media
Production
Media Asset
Management System
(Ardome)
Search Engine
(Lucene/SOLR)
Legacy Video Library
(Basisplus)
Actual news items
(Ardome)
Raw Material
(EBU Superpop)
<NewsML-G2>
17
Conclusion
! Enterprise search – structured metadata, limited number of libraries, limited number
of records per library, dependencies between objects
! Intelligent search federation is aware of the media production process - scripts,
webpages, subtitles and formal annotation may represent the same editorial object
! Random access to audiovisual material requires an index is based on timecode and
not « wordposition in a document »
! Onthology-driven application logic is essential to enable semantic awareness, i.e.
resolving synonyms and disambiguation of homonyms
! The perfect search engine is not for sale yet and required from the ground up design
and development.
18
From « Metadata » to CAD/CAM
?
19
Scoop
20
Hype Cycle 2008…
21
! http://medialab.vrt.be/pisa
! http://projects.ibbt.be/pisa
Top Related