Dr. Qi Tian
Department of Computer ScienceThe University of Texas at San Antonio
October 31, 2007
Multimedia Information Retrieval
Qi Tian Univ. of Texas at San Antonio
OverviewProblems Our WorkTrends and Directions
Outline
Qi Tian Univ. of Texas at San Antonio
MotivationWith the explosive growth of digital media data, there is a huge demand for new tools and systems that enables average users to more efficiently and more effectively search, access, process, manage, author and share these digital media contents.
Multimedia Information RetrievalMultimedia Information Retrieval
Qi Tian Univ. of Texas at San Antonio
Overview
Multimedia Information RetrievalMultimedia Information Retrieval
Text-based Information RetrievalToo many images to annotateHigh cost of human interpretationSubjectivity of visual content, e.g., “A picture is worth a thousand words”
Content-based Retrievalautomatically retrieves images, video, and audio based on the visual and audio content
HistoryConference on Database Applications of Pictorial Applications in1979NSF workshop in 1992More active field since 1997 when Internet and web browsing became popular
Qi Tian Univ. of Texas at San Antonio
Overview
Multimedia Information RetrievalMultimedia Information Retrieval
A VERY DIVERSIFIED FIELD!
Data TypesText, hypertext, image, audio, graphics, animation, paintings, video/movie, rich text, spread sheet, slides, combinations of these and user interaction
Research ProblemsSystems, content, services, user, evaluation, implementation, social/business, applications
MethodologiesDatabase, information retrieval, signal and image processing, graphics, vision, human-computer interaction, machine learning, statistical modeling, data mining, pattern analysis, data fusion, social sciences, and domain knowledge for applications
Qi Tian Univ. of Texas at San Antonio
Overview
Multimedia Information RetrievalMultimedia Information Retrieval
Overview
Content-based Image Retrieval
Content-based Video Retrieval
Content-based Audio Retrieval
Qi Tian Univ. of Texas at San Antonio
Approaches
Hierarchical LevelsHierarchical Levels
High-level Bridge the semantic gap, integration of context and content, hybrid (text and content) approaches
Mid-levelActive Learning, Boosting, Incremental Learning
Low-levelFeature Extraction and Representation, Dimension Reduction and Selection.
Qi Tian Univ. of Texas at San Antonio
Overview
Multimedia Information RetrievalMultimedia Information Retrieval
metadata
User Interface
Similarity ranking
memory
Feature weightingVisual
C++
Feature Extraction
C/C++
Image DB
Color
Texture
structureOff-line
Automatic
ColorColor histogramColor momentsColor correlogram
TextureTamura textureCo-occurrence matricesGabor featuresWavelet moments
ShapeFourier descriptor
StructureEdge-based features
Content-based Image Retrieval
QBIC by IBM AlmadenMARS by UIUC
WISE by StanfordWISE by Stanford
WebSEEK by ColumbiaWebSEEK by Columbia
Qi Tian Univ. of Texas at San Antonio
Approaches
Many CBIR systems have been built..Many CBIR systems have been built..
The growing list: ADL, AltaVista Photofinder, Amore, ASSERT, BDLP, Blobworld, CANDID, C-bird, Chabot, CBVQ, DrawSearch, Excalibur Visual RetrievalWare, FIDS, FIR, FOCUS, ImageFinder,
ImageMiner, ImageRETRO, ImageRover, ImageScape, Jacob, LCPD, MARS, MetaSEEk, MIR, NETRA, Photobook, Picasso, PicHunter, PicToSeek, QBIC, Quicklook2, SIMBA, SQUID, Surfimage, SYNAPSE,
TODAI, VIR Image Engine, VisualSEEk, VP Image Retrieval System, WebSEEk, WebSeer, WISE…
Qi Tian Univ. of Texas at San Antonio
Overview
Multimedia Information RetrievalMultimedia Information Retrieval
Content-based Video Retrieval
Traditional Video RetrievalQuery-by-textual keyword
Automatic Visual Concept Detectione.g., indoor/outdoor, Sky, Car, Building, US-flagExample concepts:Airplane, Building, Car, Crowd, Desert, Explosion, Outdoor, People, Vehicle, Violence
Video Retrieval – SceneHow to recognize a scene? Context – Use Proto-Concepts to describe context– Machine learning to link context to concepts
Water
Sky SkySky
Water
Qi Tian Univ. of Texas at San Antonio
Overview
Multimedia Information RetrievalMultimedia Information Retrieval
Content-based Audio Retrievalto search sounds by their features in the waveform, statistics, or transform domains
Speech, Music, Environment Audio, Silence
ApplicationsEntertainment
Film making - searching sound effects TV/radio studio - editing programs Karaoke, music stores, or online shopping
- query by humming the melodyAudio/video archive management
Segmenting and indexing of raw recordingsSearching and browsing audio/video clips
SurveillanceMonitoring criminal or emergent eventsFilm rating
glass breaking,explosion, cry, shot, …
Alarming!!!
Short time energy
Zero-crossing rate
Pitch period
MFCC
Spectrogram
LPC
MIRMIR
Xiong, Zhou, Tian, Rui and Huang, “Semantic Retrieval of Video”, IEEE SP Mag., March 2006
Top 10 Problems in MIR Top 10 Problems in MIR
Bridge the Semantic Gaphigh level concept (sites, objects, events) and low-level visual/audio features (color, texture, shape and structure, layout; motion; audio - pitch, energy, etc.).
How to Best Combine Human Intelligence and Machine Intelligence.Keep human in the loop, e.g. Relevance Feedback
New Query ParadigmsQuery by keywords, similarity, sketching an object, sketching a trajectory, painting a rough image, etc. Can we think of useful new paradigms?
Multimedia Data MiningSearching for interesting/unusual patterns and correlations in multimedia has many important applications, including Web Search Engines and dealing with intelligence data. Work to date on Data Mining has been mainly in Text data.
How to Use Unlabeled DataActive learning, e.g., in Relevance FeedbackLabel propagation, e.g., image/video annotation
MIRMIR
Qi Tian Univ. of Texas at San Antonio
Top 10 Problems (continued) Top 10 Problems (continued)
Using Virtual Reality Visualization To HelpCan we use 3D audio/visual visualization techniques to help a user to navigate through the data space to browse and to retrieve?e.g., 3D MARS
Incremental LearningChange the parameters of the retrieval algorithms incrementally,not needing to start from scratch every time we have new data.
Structuring Very Large DatabasesResearchers in audio/visual scene analysis and those in Databases and Information Retrieval should really collaborate CLOSELY to find good ways of structuring very large multimedia databases for efficient retrieval and search.
Performance Evaluatione.g., TRECVID for video retrieval, how about image retrieval?
What Are the Killer Applications of Multimedia Retrieval?e.g., medical multimedia document management
Qi Tian Univ. of Texas at San Antonio
Approaches
Our Recent WorkOur Recent Work
UserFeature Extraction
Data Modeling
Similarity Estimation
•Dimension reduction
•Statistical estimation
•Dimension reduction
•Statistical estimation•Classification
•Ranking/indexing
•Classification
•Ranking/indexing
Training
Image DatabaseImage Database
Discriminant Analysis
Useful in other applicationsUseful in other applications
ApproachesApproaches
Qi Tian Univ. of Texas at San Antonio
Our Recent Work in CBIR Our Recent Work in CBIR
Semantic Subspace ProjectionBridge the semantic gap
Hybrid Discriminant AnalysisLearn high dimensional data with small samples
Adaptive Discriminant ProjectionAdaptively learn a projection from data distribution
Distance Measure for Similarity EstimationInvestigate the relations between probabilistic distributions, distance metric, and mean estimation
Qi Tian Univ. of Texas at San Antonio
Our Recent Work in CBIROur Recent Work in CBIR
Related Publications
Journalso IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)o IEEE Transactions on Circuits and Systems for Video Technology (CSVT)o IEEE Multimedia (MM)o International Journal on Computer Vision (IJCV)o Pattern Recognition (PR)o ACM Transactions on Knowledge Discovery from Data (TKDD)o IEEE Transactions on Computational Biology and Bioinformatics
Conferenceso ACM Multimedia
o IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)
o International Conference on Pattern Recognition (ICPR)
o Others including ICME, ICIP, ICASSP, MIR, CIVR
Qi Tian Univ. of Texas at San Antonio
Web Image Search and Mining
Image Annotation
Affective Video Retrieval
Information Fusion in MIR
Integration of Context and Content for Multimedia Management
Multimodal Emotion Recognition
Current Directions
Trends and DirectionsTrends and Directions
Qi Tian Univ. of Texas at San Antonio
Web SearchWeb Search 1.0 – Traditional Text Retrieval
Web Search 2.0 – Page-level Relevance Ranking
Web Search 3.0 – Object-level Structured Search
Object Level Vertical Search (MSRA Libra: http://libra.msra.cn/
Live Product Search (http://products.live.com)
Trends and DirectionsTrends and Directions
Qi Tian Univ. of Texas at San Antonio
Image AnnotationPhoto sharing through the Internet has become a common practice.
flickr.com: 19.5 million photos (30% growth/month), 2005Photo.net and airliners.net: millions of images
Most image search engines relies on textual descriptions of the images, e.g., Google, Yahoo, MSN
In general, people do not spend time labeling or annotating their personal photos
Qi Tian Univ. of Texas at San Antonio
Image Annotation
Image Annotation SystemA statistical model that can relate words to image features.Sketch an image, extract feature vectors.Descriptive words--top words ranked according to likelihood.
Current workLi, Wang (alipr.com 2006), Blei, Jordan (2003); Vasconcelos (UCSD-SML2007), Zhang et al. (MSRA 2005), Li et al. (MSRA 2007)
Promising Direction: Web Image Annotation – an integration of IR and Content Analysis
Can computer do this?BuildingSkyLake LandscapeTree
Trends and DirectionsTrends and Directions
Qi Tian Univ. of Texas at San Antonio
Affective Video RetrievalAffective:
“a feeling or emotion as distinguished from cognition, thought, or action”
Real Multimedia RetrievalSearch for a subset of nicest holiday pictures to show them to friendsSelecting the most appropriate background music for the given situationSearch for the most impressive video clipsSearch for the most appealing photographs of one and the same contentSearch for all film comedies I like most
Alternative approachSearch for
MoodMatches to users’ profileLike/dislike, interest/no interest
Affective Video Retrieval
Trends and DirectionsTrends and Directions
Qi Tian Univ. of Texas at San Antonio
Underlying Idea
VideoTemporal content flowContinuous transitions from one affective state to another
Temporal measurement ofArousalValence
Combining the two curves into the Affect Curve
Arousal
t
Valencet
Arousal
Valence
Affect curve
Trends and DirectionsTrends and Directions
Qi Tian Univ. of Texas at San Antonio
Affective Media Content Characterization
Media Feature extraction Arousal = f1 (feature values)Valence = f2 (feature values)
Mapping AV values onto the 2D affect space
Affective content characterizationRomantic
“feel-good”
Hilariousfun
Suspensehorror
A bit somber,medium excitement
Hanjalic & Xu
Trends and DirectionsTrends and Directions
Qi Tian Univ. of Texas at San Antonio
Hanjalic & Xu
Trends and DirectionsTrends and Directions
Qi Tian Univ. of Texas at San Antonio
Information Fusion in MIRFusion: A merging of diverse, distinct, or separate elements into a
unified whole (Merriam-Webster dictionary).
Feature Extraction Module:Multiple features ->vectorsConcatenated vectorFeature Fusion: more discriminating hyperspace can be found in the new vector
Matching Module:One type of classifiers for multiple features orMultiple types of classifiers for one feature orBothThe output score can be combined
Decision Module:The output decision of each classifier can be combined
Trends and DirectionsTrends and Directions
Qi Tian Univ. of Texas at San Antonio
Information Fusion in MIRTwo Forms
Multi-modalitye.g. Video clip->visual information, audio information, textural informationMulti-modality fusion occurs at feature extraction module.Single source information may be represented by multiple features,e.g. Color image->color, texture, shape
Multi-Classifiers (Ensemble of Classifiers)A set of classifiers are trained to solve the same problemApplied on single or multiple source of informationA single type of base classifiers orDifferent types of classifiers (Bayesian, K-NN, SVM)
Trends and DirectionsTrends and Directions
Qi Tian Univ. of Texas at San Antonio
Information Fusion in MIRTrends and DirectionsTrends and Directions
Fusion SchemesThe prediction of multiple classifiers need to be integrated into one fused decision by Fusion Scheme.
The output of different classifiers need to be normalized.Rule-based
Decision is made by a simple operation on the output of all classifierse.g. Max, Min, Sum, Mean (Matching Module)e.g. AND, OR (Decision Module)
Learning-basedThe output of all classifiers is fed into a learning process to obtain the final decisione.g. Decision Tree, Neutral Network
No one is guaranteed to be the best empirically or theoretically.
ApplicationsMultimodal Biometrics Fusion (face, fingerprint, iris, palmprint, voice, hand geometry)Audio/visual fusion for multimodal emotion recognition
Qi Tian Univ. of Texas at San Antonio
Integration of Context and Content for Multimedia Management
Trends and DirectionsTrends and Directions
from Carlson & Hatfield, 1992
Qi Tian Univ. of Texas at San Antonio
Integration of Context and Content for Multimedia Management
Trends and DirectionsTrends and Directions
An increasing number of active research in this direction
Crucial to human-human communication and human understanding of multimedia.
without context it is difficult for a human to recognize variousobjects,
Enable that the (semi)automatic content analysis and indexing methods become more powerful in managing multimedia data
Contextual information e.g., cell ID for the mobile phone location, GPS integrated in adigital camera, camera parameters, time information, and identity of the producer
Qi Tian Univ. of Texas at San Antonio
Integration of Context and Content for Multimedia Management (T-MM Special Issue)
Trends and DirectionsTrends and Directions
Topics of interest include:Contextual metadata extraction Models for temporal context, spatial context, imaging context (e.g., camera metadata), social context, and so onWeb context for online multimedia annotation, browsing, sharing and reuseContext tagging systems, e.g., geotagging, voice annotationContext-aware inference algorithms Context-aware multi-modal fusion systems (text, document, image, video, metadata, etc.)Models for combining contextual and content informationContext-aware interfaces and collaborationNovel methods to support and enhance social interaction, including innovative ideas like social, affective computing, and experience capture.Applications such as using context and similarity for face and location identificationContext-aware mobile media technology and applicationsUsing context to browse and navigate large media collections
Projects in High ImpactFeatures
Web-basedLarge ScaleUser participated
A combination of internet and multimediaExamples:
Photo SearchHome photo management, mobile photo search, face search
Millions Books ProjectTremendous Space for Search and Multimedia Data Mining
Human-centered Multimedia SearchSearch for UCC data (user preference, profile, and opinion), Social search (public relations, names, personalization)
Qi Tian Univ. of Texas at San Antonio
AcknowledgementCurrent Collaborators• Academia
University of Illinois
University of Amsterdam
Chinese Academy of Science
University of Science and Technology of
China
• IndustryInstitute of Infocomm Research, Singapore
HP Labs, Palo Alto, CA
Microsoft Research (MSR) Redmond
Microsoft Research Asia (MSRA)
NEC Labs America
Kodak Research Lab
IBM T.J. Watson Research Ctr
Funding Agencies
• Army Research Office (ARO)
• Department of Homeland Security (DHS)
• San Antonio Life Science Institute (SALSI)
• Center for Infrastructure Assurance and Security (CIAS)
Students
• Jerry Yu (2007, Kodak Research, NJ)
• Yijuan Lu (2008 expected)
• Yuning Xu
Qi Tian Univ. of Texas at San Antonio
Q & AQ & A
Thank you !
Questions ?
Top Related