What Are the High-Level Concepts with Small Semantic Gaps? CS 4763 Multimedia System, Spring 2008.

What Are the High-Level Concepts with Small Semantic

Gaps?

CS 4763 Multimedia System, Spring 2008

Outline Introduction LCSS: a lexicon of high-level

concepts with small semantic gaps. Framework of LCSS Construction Methodologies

Experimental results Conclusion

Recent years have witnessed a fast development of Multimedia Information Retrieval (MIR) .

Semantic gap is still a fundamental barrier. --Difference between the expressing power or

descriptions of low-level features and high-level semantic concepts.

Introduction

dturnbul

(250M Americans)*(5% Own Digitial Camera)*(100 Photos per year) = 1.25 Billion Photos per yearGoogle Image serach on ".jpg" returns 11.8 Million Hits.Yahoo Image search on ".jpg" return 137 Million Hits.

To reduce the semantic gap, concept-based multimedia search has been introduced.

Select a concept lexicon relatively easy for computers to understand

Collect training data Model high-level semantic concepts

Introduction

dturnbul


Problem Concept Lexicon selection is usually simplified by

manual selection or totally ignored in previous works.

i.e. Caltech 101, Caltech 256, PASCAL --Implicitly favored those relatively “easy” concepts

MediaMill challenge concept data (101 terms) and Large-Scale Concept Ontology for Multimedia (LSCOM) (1,000 concepts)

-- Manually annotated concept lexicon on broadcast news video from the TRECVID benchmark

All the lexica ignore the differences of semantic gaps among concepts.

No automatic selection is executed.

Concepts with smaller semantic gaps are likely to be better modeled and retrieved than concepts with larger ones.

Very little research is found on quantitative analysis of semantic gap.

What are the well-defined semantic concepts for learning?

How to automatically find them?

Problem

dturnbul


Objective Automatically construct a lexicon of high-level concepts with

small semantic gap (LCSS).

Two properties for LCSS: 1) Concepts are commonly used. --The words have high occurrence frequency within the descriptions of real-world images.

2) Concepts are expected to be visually and semantically consistent. -- Images of these concepts have smaller semantic gaps (easy to be modeled for retrieval and annotation)

Web images have rich textual features

--filename, title, alt text, and surrounding text.

Input titles and comments are good semantic descriptions of the images

These textual features are much closer to the semantics of the images than visual features.

Idea

Framework of LCSS Construction

Web Image Crawling,Visual & Text Indexing

Images on World Wide Web Database

Visual IndexSystem

SurroundingText Index

System

Ij

Ik

Ix )(_ , jx IItsim

)(_ , kx IItsim

Confidence Map

Re-Rank based on

Confidence Score

Word_ 1Word_ 2Word_ 3Word_ 4Word_ 5Word_ 6Word_ 7Word_ 8Word_ 9Word_ 10Word_ 11

...

Concepts Lexicon

Words Ranka b c d e f g

h i

j k

Construct Content and Context Sparse

Similarity Matrix

Text-Based Keyword

Extraction Affinity Propagation Clustering

I1

I1 I2 I3 I4 I5 I6 I7 I8 ...

I2

I3

I4

I5

I6

I7

I8

...

2

7

3

8

5

6

3

1

9

12 2

3

34

Data Collection

2.4 million web images from photo forums --Photo.net, Photosig.com etc Photos have high quality and rich textual information 64 dimensional global visual feature --color moments, color correlogram and color-texture moments




Visual IndexSystem


System

Ij

Ik

Ix )(_ , jx IItsim

)(_ , kx IItsim

Confidence Map

Re-Rank based on

Confidence Score


...

Concepts Lexicon


h i

j k


Similarity Matrix

Text-Based Keyword


I1

I1 I2 I3 I4 I5 I6 I7 I8 ...

I2

I3

I4

I5

I6

I7

I8

...

2

7

3

8

5

6

3

1

9

12 2

3

34

Nearest Neighbor Confidence Score (NNCS)

The higher the NNCS value, the smaller the semantic gap.

Calculate NNCS Score for all 2.4 million images with K=500

Select 36231 candidate images with top NNCS

--its relatively large size and memory concern for the Affinity Propagation clustering algorithm.

K

iixx IItextsim

KINNCS

1,_

1




Visual IndexSystem


System

Ij

Ik

Ix )(_ , jx IItsim

)(_ , kx IItsim

Confidence Map

Re-Rank based on

Confidence Score


...

Concepts Lexicon


h i

j k


Similarity Matrix

Text-Based Keyword


I1

I1 I2 I3 I4 I5 I6 I7 I8 ...

I2

I3

I4

I5

I6

I7

I8

...

2

7

3

8

5

6

3

1

9

12 2

3

34

Clustering Using Affinity Propagation

Fast for large scale data set and require no prior information (e.g., number of clusters).

36231×36231 content-context similarity matrix (CCSM)

P

Ij

V1

V2

V3

V4

V5

T1

T2

T3

T4 T5Ii

Ik

-9 -8

-9

-4

-1

-5

.

.

.

. . .

-1

-6




Visual IndexSystem


System

Ij

Ik

Ix )(_ , jx IItsim

)(_ , kx IItsim

Confidence Map

Re-Rank based on

Confidence Score


...

Concepts Lexicon


h i

j k


Similarity Matrix

Text-Based Keyword


I1

I1 I2 I3 I4 I5 I6 I7 I8 ...

I2

I3

I4

I5

I6

I7

I8

...

2

7

3

8

5

6

3

1

9

12 2

3

34

Text-based Keyword Extraction (TBKE) The set of the related keywords of cluster is denoted as .

The relevance score of a keyword to cluster is denoted as:

For each keyword , its relevance score to the whole cluster pool C can be denoted as

iji

i

ij

ijWkorW

otherwiseW

Ckoccurence

CkrScore0

)1|ln(|

),(

),(_

CC

ijj

i

CkScorekScore ),()(

iC iW

jk iC

jk

LCSS

Table 1. Top 50 keywords in the LCSS lexicon

Category Concepts

Scene/Landscape sunset, sky, beach, garden, lake, sunflow, water firework, cloud, moon, sunrise, mountain, city, river, snow, rain, home, island

Object flower, rose, butterfly, tree, bee, candle, bridge, leaf, eye, tulip, orchid, house, peacock, window, glass, bird, rock

Color blue, red, yellow, green, pink, purple, orange, dark, golden

Season fall, spring, summer, autumn

Others small, wild

Experiments

00.010.020.030.040.050.060.070.080.090.1

sunset

flow

erblue red

rose

yellow

green

sky

pink

butterfly

tree

beach

garden

water

cloud

Co

nfi

den

ce S

core

For each keyword w, randomly selected 500 titled photos with this keyword.

The confidence value decreases similarly to the keyword rank’s depreciation.

Demonstrates that the image labeled with the top words have higher confidence value.

Figure 1: Distribution of average confidence value. x-axis represents top 15 keywords (from left to right) in LCSS.

Experiments Apply the lexicon on

the University of Washington (UW) dataset

(1109 images, 5 labeled ground truth annotations, 350 unique words)

Refine the annotation results obtained by the search-based image annotation algorithm (SBIA).

Figure 2: Image annotation refinement scenario .

Retrieve

Annotation

Annotation Relevance Reranking

RoseRed

Flower...

keyword search

Input Image

Visual Feature Extraction

Search Engine

Visual IndexSystem

1 2345

…

Word_a 1Word_b 2Word_c 3Word_d 4Word_e 5Word_f 6Word_g 7Word_h 8Word_i 9Word_j 10Word_k11

...

LexiconRelevanceMapping

Words Rank

Words Rank

Annotation Pruning

OR

Final Annotation

Red rose

Blooming rose

Last red rose

One more rose

Word_eWord_bWord_fWord_hWord_g

Annotation Refinement

Text IndexSystem

Surrounding

Experiments

The refinement distinctively improves original annotation’s precision and recall when s becomes larger.

The performance of refinement keeps stable when

s is equal or larger than 100.

Figure 3: Annotation precision and recall of different sizes of lexicon .

0.09

0.1

0.11

0.12

0.13

0.14

0 20 40 60 80 100 120 140 160 180 200

Phrase_rerank Phrase_pruneTerm_rerank Term_prune

Pre

cisi

on

s

0.08

0.09

0.1

0.11

0.12

0.13

0.14

0.15

0 20 40 60 80 100 120 140 160 180 200

Phrase_rerank Phrase_prune

Term_rerank Term_prune

s

Re

call

Experiments

When m is ranging from 3 to 7, the Precision and Recall of refined annotation are improved most .

The Precision and Recall of the annotation pruning

remains same especially while m is larger than 7. --most of top 7 annotation words fall into the

LCSS.

Figure 4: Annotation precision and recall of different sizes of m .

0.07

0.08

0.09

0.1

0.11

0.12

0.13

0.14

0.15

0.16

1 2 3 4 5 6 7 8 9 10

Original_phrase Phrase_rerank Phrase_pruneOriginal_term Term_rerank Term_prune

Prec

isio

n

m

0.020.040.060.080.10.120.140.160.18

1 2 3 4 5 6 7 8 9 10

Original_phrase Phrase_rerank Phrase_pruneOriginal_term Term_rerank Term_prune

m

Reca

ll

Experiments

LSCOM: Large Scale Concepts Ontology for Multimedia. WordNet: A very large lexical database of English.

(100,303)

LSCOM and WordNet do not improve the annotation precision.

--Many correct annotations are not included in these two lexica thus are pruned.

Figure 4: Annotation precision and recall of different lexica.

0.04

0.06

0.08

0.1

0.12

0.14

0.16

1 2 3 4 5 6 7 8 9 10

Original_phrase LCSSLSCOM WordNet

Prec

isio

n

m

00.020.040.060.080.10.120.140.160.18

1 2 3 4 5 6 7 8 9 10

Original_phrase LCSSLSCOM WordNet

Rec

all

m

ConclusionQuantitatively study and formulate the semantic gap problem.

Propose a novel framework to automatically select visually and semantically consistent concepts.

LCSS is the first lexicon for concepts with small semantic gap great help for data collection and concept

modeling. a candidate pool of semantic concepts --image annotation, annotation refinement and

rejection. Potential applications in query optimization and

MIR.

Future Work Investigate different features to construct

feature-based lexica Texture feature Shape feature SIFT feature

Open questions How many semantic concepts are necessary? Which features are good for image retrieval

with specific concept? ……

Reference C. G. Snoek, M. Worring, J. C. van Gemert, J. M. Geusebroek, and A. W.

Smeulders. “The challenge problem for automated detection of 101 semantic concepts in multimedia.” Proc. of ACM Multimedia, 2006.

C. M. Naphade, J. R. Smith, J. Tesic, and S. F. Chang, et al. “Large-scale concept ontology for multimedia.” IEEE MultiMedia, 13(3):86–91, 2006.

B. J. Frey, and D. Dueck. Clustering by passing messages between data points. Science, 315:972-976, 2007.

X. J. Wang, L. Zhang, F. Jing, and W. Y. Ma. “AnnoSearch: image auto-annotation by search”. Proc. of IEEE Conf. CVPR, New York, June, 2006.

C. Wang, F. Jing, L. Zhang, and H. J. Zhang. “ Scalable search-based image annotation of personal images. ” Proc. of the 8th ACM international Workshop on Multimedia information Retrieval, Santa Barbara, CA, USA, 2006. 10.

What Are the High-Level Concepts with Small Semantic Gaps? CS 4763 Multimedia System, Spring 2008.

Documents

Transcript of What Are the High-Level Concepts with Small Semantic Gaps? CS 4763 Multimedia System, Spring 2008.