Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information...

31
Media Retrieval • Information Retrieval • Image Retrieval • Video Retrieval • Audio Retrieval Lesson 11

Transcript of Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information...

Page 1: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Media RetrievalMedia Retrieval

• Information Retrieval

• Image Retrieval

• Video Retrieval

• Audio Retrieval

• Information Retrieval

• Image Retrieval

• Video Retrieval

• Audio Retrieval

Lesson 11

Page 2: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Information RetrievalInformation Retrieval

Retrieval = Query + Search Informational Retrieval: Get required information from database/web Text data retrieval - via keyword searching in a text document or through web - via expression such as in relational database Multimedia retrieval - Get similar images from an image database - Find interesting video shots/clips from a video/database - Select news from video/radio Internet broadcasting - Listen specific sound from audio database - Search a music Challenges in multimedia retrieval

- Can’t directly text-based query and search?

- How to analysis/describe content and semantics of

image/video/audio?

- How to index image/video/audio contents?

- Fast retrieval processing and accurate retrieval results

Page 3: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Audio Visual Content/FeatureAudio Visual Content/Feature

• Color • Camera motion• Motion activity• Mosaic

• Color • Motion trajectory• Parametric motion• Spatio-temporal

shape

• Color • Shape• Position• Texture

Video segments Still regions

Moving regions Audio segments• Spoken content • Spectral

characterization• Music: timbre,

melody, pitch

Content/Features

Content/Features

Content/Features

Content/Features

Page 4: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Image Content – Image FeaturesImage Content – Image Features

• What are image features?

• Primitive features– Mean color (RGB)– Color Histogram

• Semantic features– Color distribution, texture, shape, relation,

etc…

• Domain specific features– Face recognition, fingerprint matching, etc…

Page 5: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Mean Color and Color Histogram Mean Color and Color Histogram

• Pixel Color Information: R, G, B• Mean Color (R,G or B) = Sum of that component for all

pixels Number of pixels

• Histogram: Frequency count of each individual color

Pixel

gray

Page 6: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Color Models and HSIColor Models and HSI

• Many color models: RGB, CMY, YIQ, YUV, YCrCb, HSV, HSI, …

• HSI (Hue, Saturation, Intensity): often used

Intensity

Saturation

Hue

Equatorial Section

Warm

Cold

Neutral

Neutral

H

S

I

Longitudinal Section

External views

Page 7: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

The similarity between two colors, i and j, is given by:

where

The degree of similarity between two colors, i and j, is given by:

),(),(),(),( jiIWjiSWjiHWjiC ish

ji

ji

jiji

IIjiI

SSjiS

HHHHjiH

),(

),(

12, min),(

otherwiseC

jiCHjiHif

jiCS

max

),(1

max),( 0),(

Equatorial Section

Warm

Cold

Neutral

Neutral

H

Similarity between Two ColorsSimilarity between Two Colors

Page 8: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Content Based Image Retrieval (CBIR)Content Based Image Retrieval (CBIR)

CBIR: based on similarity of image color, texture, object shape/position Images with similar color dominated by blue and green

Page 9: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Color Based Image RetrievalColor Based Image Retrieval

Images with similar colors and distribution/histogram

Page 10: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Shape Based Image RetrievalShape Based Image Retrieval

Images with similar shapes

Page 11: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Spatial Relation Based Image RetrievalSpatial Relation Based Image Retrieval

Images with similar shapes and their relation

Page 12: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Correctness and Accuracy in CBIRCorrectness and Accuracy in CBIR

CBIR accuracy is counted by a percentage of targeted/corrected image(s) in top-n candidate images, for example C1, C2, C3, …, Cn-1, Cn, Cn+1, …, CM

90% Hybrid retrieval using color and texture plus shape can improve accuracy

Page 13: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Hybrid Retrieval – Combined SimilarityHybrid Retrieval – Combined Similarity

The Similarity Measure of Color: CS

The Similarity Measure of Shape: SS

The Similarity Measure of Spatial Relation: SRS

Combined Similarity Score:

Where CS, SS, SRS are the similarity scores of Color, Shape and Spatial Relations, and WC, , WS, , WSR are the weights of Color, Shape and Spatial Relations

SRSWsrSSWsCSWcS ***

Page 14: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Query by Scratch in CBIRQuery by Scratch in CBIR

Please try such image search in the Hermitage Web site . It uses the QBIC engine for searching archives of world-famous art.

Page 15: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Query by Example in CBIRQuery by Example in CBIR

Page 16: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Query by Example in CBIR (cont.)Query by Example in CBIR (cont.)

Page 17: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Video RetrievalVideo Retrieval Video retrieval: - Find interesting video shots/segments from a movie, TV, video database - It is hard because of many images (>10fps) and temporal changes Methods of video retrieval Non-text-based: Key frames via CBIR, color, object, background sound, etc. Text-based: Extract caption, i.e., overlayed text, speech recognition, etc.

Video Database User

Video Structure

Text Information

Keyword

Image Information

Query Images

MotionInformation

Audio Information

Motion

Audio

Page 18: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Key Frame Extraction and Video RetrievalKey Frame Extraction and Video Retrieval

Shot Detection

Key Frame Extraction

1. Decompose video segment into shots2. Compute key/representative frame for each shot3. Query by QBIC4. Use frame from highest scoring shot

A set of shots

a video document

Page 19: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Various Clues/Contents in Video RetrievalVarious Clues/Contents in Video Retrieval

Page 20: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Video Caption Extraction in Video RetrievalVideo Caption Extraction in Video Retrieval

Page 21: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Transcript via Speech Recognition for Video RetrievalTranscript via Speech Recognition for Video Retrieval

• Generates transcript to enable text-based retrieval from spoken language documents

• Improves text synchronization to audio/video in presence of scripts

Raw Audio

Text Extraction

Raw Video

SILENCE MUSIC elec

tric

cars

are

they

are

the

jury

ever

yto

yow

ner

hope

sto pl

ease

Page 22: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Query

Text Image

Final Score

Text Score

ImageScore

RetrievalAgents

MovieInfo

PRFScore

Video Retrieval by Combining Different FeaturesVideo Retrieval by Combining Different Features

Audio

Audio Info

Page 23: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Feature SearchExtraction Engine

MPEG-7Description

standardization

Search Engine:Searching & filteringClassificationManipulationSummarization Indexing

MPEG-7 Scope:Description Schemes (DSs)Descriptors (Ds)Language (DDL)Ref: MPEG-7 Concepts

Feature Extraction:Content analysis (D, DS)Feature extraction (D, DS)Annotation tools (DS)Authoring (DS)

MPEG-7: Audiovisual Content DescriptionMPEG-7: Audiovisual Content Description

Page 24: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Example of MPEG-7 Annotation ToolExample of MPEG-7 Annotation Tool

Page 25: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

MPEG-7: Image Description ExampleMPEG-7: Image Description Example

Page 26: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Automatic Video Analysis and IndexAutomatic Video Analysis and Index

Scene Cuts

Camera

Objects

Action

Captions

Scenery

Yellowstone

Static

Adult Female

Head Motion

[None]

Indoor

Static

Animal

Left Motion

Yellowstone

Outdoor

Zoom

Two adults

None

[None]

Indoor

Page 27: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Segment Tree

Shot1 Shot2 Shot3

Segment 1

Sub-segment 1

Sub-segment 2

Sub-segment 3

Sub-segment 4

segment 2

Segment 3

Segment 4

Segment 5

Segment 6

Segment 7

Semantic DS (Events)

• Introduction

• Summary

• Program logo

• Studio

• Overview

• News Presenter

• News Items

• International

• Clinton Case

• Pope in Cuba

• National

• Twins

• Sports

• Closing

TimeAxis

Page 28: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Audio RetrievalAudio Retrieval

Audio retrieval: - Find required sound segment from audio database or broadcasting - Find interesting music from song/music database or web Methods of audio retrieval Physical features of audio signal: - Loudness, i.e., sound intensity (0~120dB) - Frequency range: low, middle or high (20Hz~20KHz) - Change of acoustic feature - Speech, background sound, and noise - Pitch Semantic features of audio: - word or sentence via speech recognition - Male/female, young/old - Rhythm and melody - Audio description/index - Content Based Music Retrieval (CBMR)

Page 29: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Music Retrieval by Singing/hummingMusic Retrieval by Singing/hummingHappy Birthday

Notestarts

Noteends

Notestarts

Noteends

A note has two important attributes– Pitch: It tells people which tone to play– Duration: It tells people how long a note needs to be played– Notes are represented by symbols

Do Re Mi Fa So La Si Do

Note name

Note pitch

Staff

Page 30: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Music Retrieval by Singing/humming (Cont.)Music Retrieval by Singing/humming (Cont.)

Wave to

Symbols

ApproximateString Match

MusicDatabaseIndexing

Feature Extraction

Various Music Formats toSymbols

Music Database

Humming“La, …”

Wave filesMP3 filesMIDI files

RetrievalResult

Recorder

Page 31: Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.

Demos of Content-Based Image Retrieval

Demos of Content-Based Image Retrieval