Multimedia Retrieval. Outline Overview Indexing Multimedia Generative Models & MMIR –Probabilistic...

Multimedia Retrieval Outline Overview Indexing Multimedia Generative Models & MMIR Probabilistic Retrieval Language models, GMMs Experiments Corel experiments TREC Video benchmark Indexing Multimedia A Wealth of Information Speech Audio Images Temporal composition Database Associated Information id gender name country history picture Player Profile Biography User Interaction Database Views results Gives examples video segments Poses a query query text feedback Evaluates Indexing Multimedia Manually added descriptions Metadata Analysis of associated data Speech, captions, OCR, Content-based retrieval Approximate retrieval Domain-specific techniques Limitations of Metadata Vocabulary problem Dark vs. somber Different people describe different aspects Dark vs. evening Limitations of Metadata Encoding Specificity Problem A single person describes different aspects in different situations Many aspects of multimedia simply cannot be expressed unambiguously Processes in left (analytic, verbal) vs. right brain (aesthetics, synthetic, nonverbal) Approximate Retrieval Based on similarity Find all objects that are similar to this one Distance function Representations capture some (syntactic) meaning of the object Query by Example paradigm N-dimensional space Feature extraction Ranking Display Low-level Features Query image So, Retrieval?! IR is about satisfying vague information needs provided by users, (imprecisely specified in ambiguous natural language) by satisfying them approximately against information provided by authors (specified in the same ambiguous natural language) Smeaton No Exact Science! Evaluation is not done analytically, but experimentally real users (specifying requests) test collections (real document collections) benchmarks (TREC: text retrieval conference) Precision Recall ... Known Item Query Results Query Results Semantic gap raw multimedia data features concepts ? Observation Automatic approaches are successful under two conditions: the query example is derived from the same source as the target objects a domain-specific detector is at hand 1. Generic Detectors Retrieval Process Database Query Parsing Detector / Feature selection Filtering Ranking Query type Nouns Adjectives Camera operations People, Names Natural/physical objects Invariant color spaces Parameterized detectors People detector Example Query text Topic 41 Results Query Parsing Other examples of overhead zooming in views of canyons in the Western United States Nouns Adjectives The query type Find + names Detectors Camera operations (pan, zoom, tilt, ) People (face based) Names (VideoOCR) Natural objects (color space selection) Physical objects (color space selection) Monologues (specifically designed) Press conferences (specifically designed) Interviews (specifically designed) Domain specific detectors FOCUSFOCUS The universe and everything 2. Domain knowledge Player Segmentation Original image Initial segmentation Final segmentation Show clips from tennis matches, starring Sampras, playing close to the net; Advanced Queries 3. Get to know your users Mirror Approach Gather Users Knowledge Introduce semi-automatic processes for selection and combination of feature models Local Information Relevance feedback from a user Global Information Thesauri constructed from all users N-dimensional space Feature extraction ClusteringRanking Concepts Display Thesauri Low-level Features Identify Groups Representation Groups of feature vectors are conceptually equivalent to words in text retrieval So, techniques from text retrieval can now be applied to multimedia data as if these were text! Query Formulation Clusters are internal representations, not suited for user interaction Use automatic query formulation based on global information (thesaurus) and local information (user feedback) Interactive Query Process Select relevant clusters from thesaurus Search collection Improve results by adapting query Remove clusters occuring in irrelevant images Add clusters occuring in relevant images Assign Semantics Visual Thesaurus Correct cluster representing Tree, Forest Glcm_47 Incoherent cluster Fractal_23 Mis-labeled cluster Gabor_20 Learning Short-term: Adapt query to better reflect this users information need Long-term: Adapt thesaurus and clustering to improve system for all users Thesaurus Only After Feedback 4. Nobody is unique! Collaborative Filtering Also: social information filtering Compare user judgments Recommend differences between similar users Peoples tastes are not randomly distributed You are what you buy (Amazon) Collaborative Filtering Benefits over content-based approach Overcomes problems with finding suitable features to represent e.g. art, music Serendipity Implicit mechanism for qualitative aspects like style Problems: large groups, broad domains 5. Ask for help Query Articulation N-dimensional space Feature extraction How to articulate the query? What is the query semantics ? Details matter Problem Statement Feature vectors capture global aspects of the whole image Overall image characteristics dominate the feature-vectors Hypothesis: users are interested in details Irrelevant Background QueryResult Image Spots Image-spots articulate desired image details Foreground/background colors Colors forming shapes Enclosure of shapes by background colors Multi-spot queries define the spatial relations between a number of spots Hist16HistSpotSpot+Hist Results Query Images A: Simple Spot Query `Black sky B: Articulated Multi-Spot Query `Black sky above `Monochrome ground C: Histogram Search in `Black Sky images 2-4: 14: Complicating Factors What are Good Feature Models? What are Good Ranking Functions? Queries are Subjective! Probabilistic Approaches Generative Models A statistical model for generating data Probability distribution over samples in a given language M P ( | M )= P ( | M ) P ( | M, ) Victor Lavrenko, Aug aka Language Modelling Basic question: What is the likelihood that this document is relevant to this query? P(rel|I,Q) = P(I,Q|rel)P(rel) / P(I,Q) in Information Retrieval P(I,Q|rel) = P(Q|I,rel)P(I|rel) Language Modelling Not just English But also, the language of author newspaper text document image Hiemstra or Robertson? Parsimonious language models explicitly address the relation between levels of language models that are typically used for smoothing. Language Modelling Guardian or Times?Not just English But also, the language of author newspaper text document image Language Modelling or ?Not just English! But also, the language of author newspaper text document image Unigram and higher-order models Unigram Models N-gram Models Other Models Grammar-based models, etc. Mixture models = P ( ) P ( | ) P ( ) P ( ) P ( ) P ( ) P ( | ) P ( | ) P ( | ) Victor Lavrenko, Aug. 2002 The fundamental problem Usually we dont know the model M But have a sample representative of that model First estimate a model from a sample Then compute the observation probability P ( | M ( ) ) M Victor Lavrenko, Aug. 2002 Indexing: determine models Indexing Estimate Gaussian Mixture Models from images using EM Based on feature vector with colour, texture and position information from pixel blocks Fixed number of components DocsModels Retrieval: use query likelihood Query: Which of the models is most likely to generate these 24 samples? Probabilistic Image Retrieval ? Query Rank by P(Q|M) P(Q|M 1 ) P(Q|M 4 ) P(Q|M 3 ) P(Q|M 2 ) Topic Models Models P(Q|M 1 ) P(Q|M 4 ) P(Q|M 3 ) P(Q|M 2 ) Query P(D 1 |QM) P(D 2 |QM) P(D 3 |QM) P(D 4 |QM) Documents Query Model Probabilistic Retrieval Model Text Rank using probability of drawing query terms from document models Images Rank using probability of drawing query blocks from document models Multi-modal Rank using joint probability of drawing query samples from document models Unigram Language Models (LM) Urn metaphor Text Models P( ) ~ P ( ) P ( ) P ( ) P ( ) = 4/9 * 2/9 * 4/9 * 3/9 Victor Lavrenko, Aug. 2002 Generative Models and IR Rank models (documents) by probability of generating the query Q: P( | ) = 4/9 * 2/9 * 4/9 * 3/9 = 96/9 P( | ) = 3/9 * 3/9 * 3/9 * 3/9 = 81/9 P( | ) = 2/9 * 3/9 * 2/9 * 4/9 = 48/9 P( | ) = 2/9 * 5/9 * 2/9 * 2/9 = 40/9 The Zero-frequency Problem Suppose some event not in our example Model will assign zero probability to that event And to any set of events involving the unseen event Happens frequently with language It is incorrect to infer zero probabilities Especially when dealing with incomplete samples ? Smoothing Idea: shift part of probability mass to unseen events Interpolation with background (General English) Reflects expected frequency of events Plays role of IDF +(1- ) Hierarchical Language Model MNM Smoothed over multiple levels Alpha * P(T|Shot) + Beta * P(T|Scene) + Gamma * P(T|Video) + (1AlphaBetaGamma) * P(T|Collection) Also common in XML retrieval Element score smoothed with containing article Image Models Urn metaphor not useful Drawing pixels useless Pixels carry no semantics Drawing pixel blocks not effective chances of drawing exact query blocks from document slim Use Gaussian Mixture Models (GMM) Fixed number of Gaussian components/clusters/concepts Key-frame representation Query model split colour channels Take samples Cr Cb Y DCT coefficients position EM algorithm ? Image Models Expectation-Maximisation (EM) algorithm iteratively estimate component assignments re-estimate component parameters Component 1Component 2Component 3 Expectation Maximization E M animation Component 1Component 2Component 3 E M VisualSEEk Querying by image regions and spatial laylout Joint content-based/spatial querying Automated region extraction Direct indexing of color feature Image Query Process Color Similarity Color space similarity measure the closeness in the HSV color space Color histograms distance Minkowski metric between h q and h t Histogram quadratic distance used in QBIC project measure the weighted similarity between histograms compute the cross similarity between colors It is computational expensive Color sets A compact alternative to color histograms Color set examples Transform RGB to HSV Quantize the HSV color space to 2 hues, 2 saturations, and 2 values Assign a unique index m to each quantized HSV color => right dimensional binary space color set: a selection from the eight colors Color sets and Back-Projection Process stage Color selection 1, 2 colors, , until extract salient regions Back-projection onto the image map I[x,y] to the most similar color in the color set Thresholding Labeling Color Set Query Strategy Given the query color set perform several range queries on the query color sets colors take the intersection of these lists minimize the sum of attributes in the intersection list Single Region Query Region absolute location fixed query location The Euclidean distance of centroids bounded query location fall within a designed area: d q,t =0 otherwise: the Euclidean distance of the centroids Index Structure Centroid location spatial access Spatial quad-tree Rectangle (MBR) location spatial access R-trees Size Comparison Area The absolute distance between two regions Spatial Extent measure the distance among the width and the height of the MBRs is much simpler than shape information Single Region Query Strategy Multiple Regions Query Region Relative Location Use 2D-string Spatial Invariance Provide scaling/rotation approximate rotation invariance by providing different projections Multiple Regions Query Strategy Relatives locations perform query on all attributes except location find the intersection of the region lists for each candidate image the 2D-string is generated compared to the 2D-string of the query image Query Formulation User tools sketches regions position them on the query grid assign colors, size, and absolute location may assign boundaries Query Examples(1) Query Examples(2) Query Examples(3) Bolbworld Image is treated as a few blobs Image regions are roughly homogeneous with respect color and texture Grouping Pixels into Regions For each pixel assign a vector consisting of color, texture, and position features (8-D space) Model the distribution of pixels Use EM algorithm to fit the mixture of Gaussians model to the data Perform spatial grouping Connected pixels belonging to the same color/texture cluster Describe the Regions For each region store its color histogram Match the color of two regions Use the quadratic distance between their histograms x and y d 2 hist (x,y) = (x-y) T A(x-y) Matrix A a symmetric matrix of weights between 0 and 1 represent the similarity between bin i and bin j Querying in Blobworld The user composes a query By submitting an image see its Blobworld representation select the relevant blobs to match specify the relative importance of the blob features Atomic query Specify a particular blob with feature vector v i to match For each blob v j in the database image Find the distance between v i and v j Measure the similarity between b i and b j Take the maximum value Querying in Blobworld Compound query calculated using fuzzy-logic operator user may specify a weight for each atomic query rank the images according to the overall score indicate the blobs provided the highest score helps the user refine the query Change the weighting of blob features Specify new blobs to match Indexing of Blobworld Indexing the color feature vectors to speed up atomic queries use R*-trees higher dimensional data require larger data entries => lower fanout Need a low dimensional approximation to the full color feature vectors use SVM to find A k : the best rank k approximation to the weight matrix A project the feature space into the subspace spanned by the rows of A k index the low-dimensional vector Multimedia Cross-model Correlation Case study: automatic image captioning Each image has a set of extracted regions Some of the regions have a caption Feature Extraction Discover image regions Extracted by a standard segmentation algorithm Each region is mapped into a 30-dim feature vector Mean, standard deviation of RGB values, average responses to various texture filters, position in the entire image layout, shape description How to capture cross-media correlations? Mixed Media Graph (MMG) Graph construction V(O): the vertex of object O V(a i ): the vertex of the attribute value A= a i Add an edge if and only if the two token values are close enough for each feature-vector, choose its k nearest neighbors => add the corresponding edges the nearest neighbor relationship is not symmetric NN-links: the nearest neighbor links OAV-links: the object-attribute-value-links Example of MMG Correlation Discovery Correlation discovery by random walk random walk with restart To compute the affinity of node B for node A a random walker starts from node A choose randomly among the available edges every time with probability c to go back to node A (restart) the steady-state probability that the random walker will find at node B) the affinity of B with respect to A RWR Algorithm Given a query object O q do an RWR from node q=V(O q ) compute the steady state probability vector u q =(u q (1), , u q (N)). Example Given an image I 3 Estimate u i3 for all nodes in the G MMG Report the top few caption words for the image

Multimedia Retrieval. Outline Overview Indexing Multimedia Generative Models & MMIR –Probabilistic...

Documents

Transcript of Multimedia Retrieval. Outline Overview Indexing Multimedia Generative Models & MMIR –Probabilistic...