Multimedia Information Retrieval in Networked Digital ... · Multimedia Information Retrieval in...

Multimedia Information Retrieval in Networked Digital Libraries

Norbert FuhrUniversity of Duisburg-Essen

Germany

MIND Project• Task: Development of methods for accessing

large numbers of multimedia digital libraries through a single system

• Funded by the EC under FP5 (2/01-7/03)• Participants:

– Carnegie-Mellon University– University of Duisburg-Essen– University of Florence– University of Sheffield– University of Strathclyde

MIND Architecture

TextProxy1 Proxy2 Proxy3

Wrapper1 Wrapper2 Wrapper3

Dispatcher

Speech

Text Proxy4

Wrapper4

Data fuser

User Interface

Data fuser

User Interface

DL1 DL2 DL4DL3

Query Processing

Query Transformation Map between heterogeneous schemas (Dublin Core, MARC 21)

Resource Selection Determine relevant libraries (more cost effective than querying all libraries)

Database Query Run Query each selected library (task of “wrappers”)

Document Transformation Map between heterogeneous schemas (Dublin Core, MARC 21)

Data Fusion Fuse library results into one single ranked list or several clusters

Retrieval in FederatedDigital Libraries

Tasks:• Extract DL metadata („resource

description“)• Select relevant libraries („resource

selection“)• Communicate with libraries („wrapper“)• Combine results („data fusion“)

Resource Description

Resource Description: Query-based Sampling

• For non-cooperative DLs• Generate sequence of queries• Collect answer documents (~ 300)• Generate resource description from

sampled documents

Resource descriptionfor images• Feature vectors are clustered at different

levels of granularity• For each cluster we consider:

– Cluster centre ci;– Cluster radius ri – Cluster population Pi

• The smaller the cluster radii, the finer the approximation of feature vector density.

Demo: Resource descriptions for images Resource descriptions

• Java application demonstrating resource description process for images:• Selection of resource to be

described• Acquisition of individual

document descriptors• Processing of document

descriptors and extraction of resource descriptors at several granularity levels

Resource selection

Resource selection Resource selection

• Querying all accessible DLs is too expensive– Thus route query only to „best“ libraries

• Two competing approaches:– Resource ranking:

• Compute similarity of DL to query (heuristically)• Select fixed number of top-ranked DLs

– Decision-theoretic framework:• Assign costs to retrieval (money, time, retrieval quality)• Compute selection which minimises costs

– #selected DLs not fixed• Described in this talk

Multimedia retrievalmodel Resource selection

• Query conditions c (attribute, predicate, comparison value), weight Pr(q c)– E.g. term, year number, colour histogram

• Probabilistic indexing weights Pr(c d)– E.g. BM25 for text, histogram similarity for images

• Linear retrieval function:

• Mapping onto Pr(rel|q,d):– Linear/logistic function

Pr( )Pr()Pr() ∑∈

←⋅←=←qc

dccqdqindexing weightcondition weight

Probability of Relevance Resource selection

• Probability of relevance computed based on probability of implication (score)

– Linear function with constant Pr(rel|q d)– New approach: logistic function

• Evaluation: better approximation quality than linearfunction

))(Pr(),|Pr( dqfdqrel ←=

0.3f(x)=

0.2 0.4x=Pr(q <- d)

0.6 0.8

b0=-20, b1=50

b0=-10, b1=100b0=-4, b1=10

)exp(1)exp(:)(

xbbxbbxf⋅++

Costs Sources Resource selection

• Computation and communication time– Affine linear function

• Charges for delivery– Linear function

• Retrieval quality (most interesting for IR)– C+<C- for relevant (non-relevant) documents– Estimating retrieval quality:

for result set of si documents, estimate the number ri of relevant documents contained

Estimating Retrieval Quality –Method 1 Resource selection

Relevant Documents in DL• Resource description:

– Expectation of indexing weights for terms (images, facts: vector clusters, retrieval function)

• Resource selection: – Estimate number of relevant documents in DLi– Estimate number ri of relevant documents in the

result set• Apply recall-precision-function

– Approximated by linear function (defined by starting point P(0))

0.2 0.400

0.4prec

P(0)=0.9

P(0)=0.5

0.6recall

Estimating Retrieval Quality – Method 2 Resource selection

Simulated Retrieval on Sample• Resource description:

– Store complete index of sample • Resource selection:

– Simulate retrieval on indexed sample (for all media types)

– Derive distribution of probabilities of relevance• Assumed to be representative

for whole collection– Estimate number ri of

relevant documents in result set

0.2 0.4 0.6 0.80

density p(x)

|DL| s

0.2 0.4 0.6 0.80

density p(x)

Estimating retrieval Quality – Method 3 Resource selection

Modelling Indexing Weights• Resource description:

– Expectation/variance of indexing weights

• Resource selection:– Approximate indexing weight

distribution• New: normal distribution

– Document score distribution• Also normal distribution

– Proceed as for M2– Todo: other media types -0.02 -0.01 0 0.01

document score0.02 0.03 0.04 0.05 0.0

query 51 (collection)

query 51 (normal distribution)

0-0.02 -0.01 0 0.01

document score0.02 0.03 0.04 0.05 0.0

query 51 (collection)0

-0.06 -0.04 -0.02 0indexing weight

0.02 0.04 0.06 0.08 0.1

normal distribution fit

collection

-0.06 -0.04 -0.02 0indexing weight

0.02 0.04 0.06 0.08 0.1

collection

Estimating retrievalquality Resource selection

• Methods:1. Estimate #rel.docs. in DL, apply linear recall-

precision function2. Simulate retrieval on sample, extrapolate to DL3. Normal distribution for indexing weights

normal distribution for retrieval scores• Results:

– Use logistic mapping for retrieval score probability of relevance

– 3 ~ CORI>1 > 2

Data fusion

Data fusion Data fusion

Speech2

Image3

Speech5

Image8

Image9

Image10

• Rank data– Rank– Score– Surrogates– Duplicates across ranks

• Full document• Broad information

– Collection information– User preferences

Text data fusion Data fusion

• Using a combination of evidence– Original rank position of documents– Re-ranking based on similarity of surrogate to

query• Surrogate could be title or text summary

– Promotion of documents found to be similar to others in the rankings

Speech data fusion Data fusion

• SpeechBot has 50% Word Error Rate on average– 17.56% in 300 top ranked documents– No need to treat speech differently for

fusion

“Speech is spoken and when recognised word errors occur”

Peaches broken and when recognised word errors occur.

Image data fusion –score normalisation Data fusion

DF SearchEngine

query queryresults resultsDL Search

Engine s=0.8 σ=0.9image image

s=0.6 σ=0.8image image

normalizedscores

un-normalizedscores

Image data fusion Data fusion

Presentation of retrieval results

2D Information Display Space (IDS) Presentation of results

• Presenting simultaneously multiple properties about documents – each document is represented through a visual

object (VO) – Visual features of each VO encode relevant

properties of documents• Position, size, shape, colour

– Documents sharing the same value of one relevant property (selected by user) are displayed with the same value of the corresponding visual feature

Sample query session Presentation of results

MMIR in Networked DLs – Open Issues• Vague schema mappings for heterogeneous

environments• Cross-media searches• Resource selection for non-textual media• Decentralized (P2P) architectures

JXTA Search Architecture

Two-level architecture:

1. Search hubs

2. Provider peers

Conclusions and Outlook• MIND provides methods for

– resource description,– resource selection and– result fusionin federated multimedia DLs

• Open Issues:– Heterogeneous, cross-media environments– Time-dependent media– Decentralized (P2P) architectures

Multimedia Information Retrieval in Networked Digital ... · Multimedia Information Retrieval in...

Documents

Transcript of Multimedia Information Retrieval in Networked Digital ... · Multimedia Information Retrieval in...

Multimedia I: Image Retrieval in Biomedicine

NETWORKED MULTIMEDIA WITH INTERNET MEDIA GUIDES

A Dynamic Probabilistic Multimedia Retrieval Model

Multimedia Indexing and Retrieval - imag

Multimedia Retrieval. Outline Overview Indexing Multimedia Generative Models & MMIR –Probabilistic Retrieval –Language models, GMMs Experiments –Corel.

Integrated Multimedia Indexing and Retrieval

Evaluation of Multimedia Retrieval System

Multimedia retrieval (DCU 2016)

Multimedia Information Storage and Retrieval Techniques ... Retrieval/Multimedia... · Multimedia information storage and retrieval : techniques and technologies / Philip K. C. Tse,

Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval.

Content Based Multimedia Retrieval

Multimedia networked applications: standards, protocols ...mandrade/presentations/minho09-1.pdf · Multimedia networked applications: standards, protocols and research trends 30/01/2009

Seminar multimedia retrieval 20080418 LauriLahticis.legacy.ics.tkk.fi/.../T...20080418_LauriLahti.pdf · Interaction Seminar of multimedia retrieval (T-61.6030), TKK Presentation

Content-based Multimedia Information Retrieval: … ACM Transactions on Multimedia Computing, Communications, and Applications, Feb. 2006 Content-based Multimedia Information Retrieval:

8 Audio-Visual Multimedia Retrieval on Mobile Devicesmoncef/publications/audio-visual-multimedia.pdfAudio-Visual Multimedia Retrieval on Mobile ... Multimedia retrieval on mobile devices

Multimedia Retrieval

Lecture 8: Multimedia Information Retrieval (I)cs9519/lecture_notes_10/L8_COMP9519.pdf · Lecture 8: Multimedia Information Retrieval (I) A/Prof. Jian Zhang ... COMP9519 Multimedia

GLOCAL Event-based Retrieval of Networked Media

CM613 Multimedia storage and retrieval Lecture: Video Compression Slide 1 CM613 Multimedia storage and retrieval Video compression D.Miller.

Multimedia Information Retrieval and User Behavior