Cross-media Intelligent Searching in Digital Library

Post on 23-Jan-2016

46 views 0 download

Tags:

description

Cross-media Intelligent Searching in Digital Library. Yueting Zhuang Zhejiang University, China Nov. 18, 2006, Egypt. Outline. 1. CADAL: China digital library 2. Our Vision to next generation of digital library 3. From Multimedia Retrieval to Cross-media Retrieval - PowerPoint PPT Presentation

Transcript of Cross-media Intelligent Searching in Digital Library

Cross-media Intelligent Cross-media Intelligent Searching in Digital Searching in Digital

Library Library

Yueting Zhuang Yueting Zhuang

Zhejiang University, ChinaZhejiang University, China

Nov. 18, 2006, EgyptNov. 18, 2006, Egypt

ICUDL06, YT ZhuangICUDL06, YT Zhuang

OutlineOutline

1. CADAL: China digital library1. CADAL: China digital library

2. Our Vision to next generation of digital library2. Our Vision to next generation of digital library

3. From Multimedia Retrieval to Cross-media 3. From Multimedia Retrieval to Cross-media RetrievalRetrieval

4. Retrieval of Chinese calligraphy character: a 4. Retrieval of Chinese calligraphy character: a cross-media practicecross-media practice

5. Building Personalized Portal5. Building Personalized Portal

6. Conclusion6. Conclusion

ICUDL06, YT ZhuangICUDL06, YT Zhuang

OutlineOutline

1. CADAL: China digital library1. CADAL: China digital library

2. Our Vision to next generation of digital library2. Our Vision to next generation of digital library

3. From Multimedia Retrieval to Cross-media 3. From Multimedia Retrieval to Cross-media RetrievalRetrieval

4. Retrieval of Chinese calligraphy character: a 4. Retrieval of Chinese calligraphy character: a cross-media practicecross-media practice

5. Building Personalized Portal5. Building Personalized Portal

6. Conclusion6. Conclusion

ICUDL06, YT ZhuangICUDL06, YT Zhuang3rd Workshop 2004, CMU, USA

ICUDL06, YT ZhuangICUDL06, YT Zhuang

ICUDL 2005, Zhejiang University, China

ICUDL06, YT ZhuangICUDL06, YT Zhuang

ICUDL06, YT ZhuangICUDL06, YT Zhuang

1. CADAL: China Digital 1. CADAL: China Digital LibraryLibrary

China-US One Million Book Digital Library Project

a unique library resource to scholars, students, and

citizens

contain over one million scanned books

A big step towards the goal: create a universal free to

read digital library• Get knowledge available on the web, anytime, anyone, anywhere

http://www.cadal.zju.edu.cnhttp://www.cadal.zju.edu.cn

ICUDL06, YT ZhuangICUDL06, YT Zhuang

ICUDL06, YT ZhuangICUDL06, YT Zhuang

1.0231.023 million books was digitized, including: million books was digitized, including: Degree dissertationDegree dissertation Modern Chinese books Modern Chinese books Traditional cultural resources Traditional cultural resources English booksEnglish books

Supporting multimedia resource:Supporting multimedia resource: Image Image audioaudio videovideo 3D model3D model Chinese calligraphyChinese calligraphy

about 200,000 clicks a day (http://www.cadal.zju.edu.cn)about 200,000 clicks a day (http://www.cadal.zju.edu.cn) users spread over 70 countries and regionsusers spread over 70 countries and regions 16 scanning centers in China, occupying more than 2000 square met16 scanning centers in China, occupying more than 2000 square met

ersers

As of today, CADAL has achieved:As of today, CADAL has achieved:

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Scanning books

Processing digitized books

ICUDL06, YT ZhuangICUDL06, YT Zhuang

成都

长春

西安

广州

北京

南京

上海杭州武汉

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Users spread over 70 countries and regions

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Service structure of Service structure of CADAL:CADAL:

CALIS Integration

Unified Authentication

Personal Portal

Personal Service

Unified Quick Search

Advanced Search

Knowledge Map

Sign Language

Movie Search

CalligraphySearch

Image Search

Cultural Relics

Illustration Search

Bilingual Translation

Help System

FullText Search

Metadata Havesting

Resource Location

Access Control Policy

User Management Logging

ICUDL06, YT ZhuangICUDL06, YT Zhuang

digital resources are classified into 8 classes digital resources are classified into 8 classes

according to the publication time and type.according to the publication time and type.

both unified and advanced search are provided for all both unified and advanced search are provided for all

resourcesresources

Current services provided by CADALCurrent services provided by CADAL::

(1) (1) Metadata searchingMetadata searching

ICUDL06, YT ZhuangICUDL06, YT Zhuang

(2) (2) Unified searchUnified search

ICUDL06, YT ZhuangICUDL06, YT Zhuang

China Ancient Choose the types of resources

to search

ICUDL06, YT ZhuangICUDL06, YT Zhuang

search results contain each type of resources.

ICUDL06, YT ZhuangICUDL06, YT Zhuang

(3) (3) advanced searchadvanced search

Users can choose search scope, combined results and result style

Second search, full texts and detailed information are available in result page.

ICUDL06, YT ZhuangICUDL06, YT Zhuang

(4) (4) full-text searchfull-text search

Full text search uses the texts from OCR

ICUDL06, YT ZhuangICUDL06, YT Zhuang

OutlineOutline

1. CADAL: China digital library1. CADAL: China digital library

2. Our Vision to next generation of digital library2. Our Vision to next generation of digital library

3. From Multimedia Retrieval to Cross-media 3. From Multimedia Retrieval to Cross-media RetrievalRetrieval

4. Retrieval of Chinese calligraphy character: a 4. Retrieval of Chinese calligraphy character: a cross-media practicecross-media practice

5. Building Personalized Portal5. Building Personalized Portal

6. Conclusion6. Conclusion

ICUDL06, YT ZhuangICUDL06, YT Zhuang

2. Our Vision to Next Generation of 2. Our Vision to Next Generation of Digital LibraryDigital Library

support multimodal sourcessupport multimodal sources

enable cross-media retrievalenable cross-media retrieval

What the next generation of DL looks like?

typical features of existing DLs: books are indexed by title, author, keywords…books are indexed by title, author, keywords…

users query books by keywords inputusers query books by keywords input

mostly only text information is returnedmostly only text information is returned

multimodal data is not fully-supportedmultimodal data is not fully-supported

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Extension to the concept of “Book”Extension to the concept of “Book”

The key of our vision to next generation of The key of our vision to next generation of digital library is the extension of “book” digital library is the extension of “book” conceptconcept• A book is regarded as A book is regarded as not only the written not only the written

symbols on papers, but also any type of symbols on papers, but also any type of multimedia “item”,multimedia “item”, such as such as

A video clipA video clip An audio clipAn audio clip A piece of paintingA piece of painting …………..

ICUDL06, YT ZhuangICUDL06, YT Zhuang

So in the next generation of DL, “book” can be in “multimodal”:

Scenery Image Chinese Calligraphy Video fragment Audio clips

……

a general data representation for multimodal data

feature analysis knowledge mining

We can find a general data structure to represent multimodal “books”

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Supporting multimodal data is an important trend in multimedia retrieval:

We get multimodal information from real world, then can we get multimodal data from digital world, especial like a digital library?

multimodal ?

real world digital world

texts

image

audio

video……

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Cross-media retrievalCross-media retrieval

After the extension of “After the extension of “Book”Book” concept, the retrieval shall also be concept, the retrieval shall also be extended. extended.

We call it “cross-media retrieval”. We call it “cross-media retrieval”.

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Cross-media-Cross-media-

Cross-media-

Scenario: a simple example of cross-media :

Starting Query

Starting QueryStarting

Query

User can start a query from any type of media, and relevant multimedia data would be returned.

Textual Description tothe giant Panda: the Panda is a kind of cat which ……

“Giant Panda” Image

“Giant Panda” Text “Giant Panda” Audio

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Cross-media retrieval is a useful way to access multimodal data:

available available

available available

Cross-media retrieval can be regarded as the simulation of the real world, and it helps us get multimodal data in a more flexible and more informative way!

textsimage

audiovideo

…… ……

ICUDL06, YT ZhuangICUDL06, YT Zhuang

What cross-media retrieval needs to do?

user query interfaceSubmit a query example

It can be an image, audio or keywords…

cross-media search enginecross-media search enginecross-media search engine

texts image audio video

raw data

knowledge base

multimodal representation & index

query results:

texts, images, audios…

ICUDL06, YT ZhuangICUDL06, YT Zhuang

OutlineOutline

1. CADAL: China digital library1. CADAL: China digital library

2. Our Vision to next generation of digital library2. Our Vision to next generation of digital library

3. From Multimedia Retrieval to Cross-media 3. From Multimedia Retrieval to Cross-media RetrievalRetrieval

4. Retrieval of Chinese calligraphy character: a 4. Retrieval of Chinese calligraphy character: a cross-media practicecross-media practice

5. Building Personalized Portal 5. Building Personalized Portal

6. Conclusion6. Conclusion

ICUDL06, YT ZhuangICUDL06, YT Zhuang

3. From Multimedia Retrieval to 3. From Multimedia Retrieval to Cross-media RetrievalCross-media Retrieval

1) Image Retrieval: Content-based

ICUDL06, YT ZhuangICUDL06, YT Zhuang

negative example

query example

Searching images

relevance feedback

positive example

ICUDL06, YT ZhuangICUDL06, YT Zhuang

multimedia retrieval

(2) Image retrieval: text-based

Query text

ICUDL06, YT ZhuangICUDL06, YT Zhuang

(3) Motion retrieval

Given a query example of motion data, we can find similar motion data from database.

multimedia retrieval

ICUDL06, YT ZhuangICUDL06, YT Zhuang

(4) Audio retrieval: Content-based

multimedia retrieval

content-based audio search engine

audio depository

audio query example

user

submit

adjust feature weight

adjust query center

returned audio results

return

relevance feedbackuser judge

System Framework

ICUDL06, YT ZhuangICUDL06, YT Zhuang

audio retrieval: key techniques

multimedia retrieval

extract auditory features in compression field from extract auditory features in compression field from

audio clipsaudio clips

cluster fuzzy auditory featurescluster fuzzy auditory features

represent audio clips with the cluster centerrepresent audio clips with the cluster center

retrieve similar audios by cluster center matchingretrieve similar audios by cluster center matching

introduce relevance feedback techniquesintroduce relevance feedback techniques

ICUDL06, YT ZhuangICUDL06, YT Zhuang

query examplefeature weight

relevance feedback

weight adjusting

audio retrieval: an example

multimedia retrieval

ICUDL06, YT ZhuangICUDL06, YT Zhuang

(5) video retrieval: Overview

multimedia retrieval

unlike text resources, video is unstructured.unlike text resources, video is unstructured.• rich in visual contents;rich in visual contents;• poor in semantic understanding; poor in semantic understanding;

the challenging issues:the challenging issues:• summarization & structuring;summarization & structuring;• video miningvideo mining

ICUDL06, YT ZhuangICUDL06, YT Zhuang

(5) video retrieval: key techniques

multimedia retrieval

video structuring: video structuring: construct video table-of-content (VTOC)construct video table-of-content (VTOC) make it physically structured. make it physically structured.

video summarization: video summarization: help the user quickly grasp the content of video clipshelp the user quickly grasp the content of video clips support video browsing support video browsing video encoding/compressionvideo encoding/compression

ICUDL06, YT ZhuangICUDL06, YT Zhuang

video

Scene

group

shot

key frame

concept clustering

video stream

temporal features

spatial features

table of contents

shot boundary detection

Key Frame Extraction

grouping

scene construction

video structuring

ICUDL06, YT ZhuangICUDL06, YT Zhuang

video summary: video content mining

original video(redundant)

summarized video(concise and informative )

video contentmining

Find meaningful patterns to support efficient video browsing

ICUDL06, YT ZhuangICUDL06, YT Zhuang

two news video are separated in 6 video shots (the following are the key frames) .And their total length is 3 minutes

video summary: an example

ICUDL06, YT ZhuangICUDL06, YT Zhuang

After video summarization, the video is 3 seconds.

And it consists of 3 key frames as below.

ICUDL06, YT ZhuangICUDL06, YT Zhuang

video shot clustering result

video shot

original videosimilar video shots are clustered together

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Video Retrieval

video browse

ICUDL06, YT ZhuangICUDL06, YT Zhuang

key frames

video browse

summary

ICUDL06, YT ZhuangICUDL06, YT Zhuang

(6) 3D model retrieval: overview

multimedia retrieval

measure 3D model with shape similarity

ICUDL06, YT ZhuangICUDL06, YT Zhuang

(6) 3D model retrieval: an example

multimedia retrieval

query example

ICUDL06, YT ZhuangICUDL06, YT Zhuang

As shown above, the multimedia As shown above, the multimedia retrieval is generally retrieval is generally content-based X retrieval—CBXR. —CBXR.

ICUDL06, YT ZhuangICUDL06, YT Zhuang

towards cross-media Retrieval

Motivation

image retrieval

audio retrievalvideo retrieval

motion retrieval

3D model retrieval

Cross-media retrieval……

intelligent integration

We can provide a more flexible and efficient way to access multimodal data.

We name it as cross-media retrieval.

CBXR

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Support multimodal sourcesSupport multimodal sources smooth integration of multimodal data;smooth integration of multimodal data;

query media objects by examples of different modalities; query media objects by examples of different modalities;

Challenging issues:Challenging issues: texts, images, audios, etc. are represented with different texts, images, audios, etc. are represented with different

featuresfeatures

different features are heterogeneousdifferent features are heterogeneous

cross-media similarity can’t be measured by content featurescross-media similarity can’t be measured by content features

there is a semantic gap between low-level features and there is a semantic gap between low-level features and semanticssemantics

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Our Solution to Cross-media retrieval

build cross-indexing from multimodal build cross-indexing from multimodal datadata

organize multimedia documentorganize multimedia document

explore cross-media correlationsexplore cross-media correlations

…………

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Cross-indexing-based retrieval: General idea

text

image

audio

video

graphics

text search engine

image search engine

audio search engine

video search engine

graphics search engine

preprocessingcross-index

graph

cross-index multimodal

search engine

SVM based

clustering

Retrie

val in

terfa

cequery

search results fusion

results

relevance feedback

……

ICUDL06, YT ZhuangICUDL06, YT Zhuang

an image query example

retrieved images

retrieved video

retrieved audio

(1) Cross-index retrieval: interface

The system now support images, audios and videos. Users can submit any of the media objects, and the system returns relevant images, audios and videos.

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Building multimedia document: General idea

definition of multimedia documentdefinition of multimedia document

a logical representation of multimodal data;a logical representation of multimodal data;

consists of semantically related media objects; consists of semantically related media objects;

formal structure:formal structure:

Document := <ID, Title, URI, KeywordList, ElementSet,LinkSet>Document := <ID, Title, URI, KeywordList, ElementSet,LinkSet>

ElementSet := { (Audio| Image | Text | Video) i | i N }∈ElementSet := { (Audio| Image | Text | Video) i | i N }∈

Audio := <ID, ParentID, URI, Size, KeywordList, AudioFeature>Audio := <ID, ParentID, URI, Size, KeywordList, AudioFeature>

Image := <ID, ParentID, URI, Size, KeywordList, ImageFeature>Image := <ID, ParentID, URI, Size, KeywordList, ImageFeature>

Text := <ID, ParentID, URI, KeywordList >Text := <ID, ParentID, URI, KeywordList >

Video := <ID, ParentID, URI, Frames, KeywordList, VideoFeature>Video := <ID, ParentID, URI, Frames, KeywordList, VideoFeature>

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Build multimedia document: framework

text

image

audio

video

graphics

Semantic skeleton base

Storage SubsystemMultimedia document

Preprocessing

Learning and Relevance feedback subsystem

Query Processor(multimedia document + media objects)

keyword

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Besides keyword-based search, the user can perform a content-based search with a specific media object as the query example

A multimedia document is visualized as its sketch, i.e. text, images and key-frame lists for videos.

image video text multimedia document

the left figure is the relevant media data retrieved by the query of “water”.

Building multimedia document: retrieval interface

ICUDL06, YT ZhuangICUDL06, YT ZhuangChallenges:

visual feature space auditory feature space

high-level semantics: war, dog, bird, car, tiger

Gap 2: Semantic gap

1. multimodal data reside in heterogeneous feature spaces2. the semantic gap

Gap 1: Content gap

Exploring cross-media correlations: challenges

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Images and audios represent high-level semantics from different perspectives. If we can find the correlation between different perspectives, we can enable cross-media retrieval with the bridge of correlations.

bird explosiontiger dogcar

correlationcorrelation

Exploring Cross-media Correlations: Solutions

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Canonical correlation analysis

11 12, ..., 1

21 22, ..., 2

......

1 2, ...,

,

,'

,

m

m

n n nm

x x x

x x xX

x x x

11 12, ..., 1

21 22, ..., 2

......

1 2, ...,

,

,'

,

m

m

n n nm

y y y

y y yY

y y y

Output:

11 12, ..., 1

21 22, ..., 2

......

1 2, ...,

,

,

,

p

p

n n np

x x x

x x xX

x x x

11 12, ......, 1

21 22, ......, 2

......

1 2, ......,

,

,

,

q

q

n n nq

y y y

y y yY

y y y

image feature matrix: Audio feature matrix:

Input : npX nqY

At the same time, the correlation between X and Y maximally coincides with the correlation between X’ and Y’

X and Y are of different dimension !

X and Y are of the same dimension !

Basic idea:

Exploring cross-media correlations: mathematical realization

ICUDL06, YT ZhuangICUDL06, YT Zhuang

the correlation network in the subspace

locate

1. how to measure both intra- and inter-media correlations ?1. how to measure both intra- and inter-media correlations ?

2. how to introduce new media objects into the system?2. how to introduce new media objects into the system?

locate

testing data

Intra-mediaIntra-media

cross-media

cross-media

Exploring cross-media correlations: subsequent challenges

ICUDL06, YT ZhuangICUDL06, YT Zhuang

OutlineOutline

1. CADAL: China digital library1. CADAL: China digital library

2. Our Vision to next generation of digital library2. Our Vision to next generation of digital library

3. From Multimedia Retrieval to Cross-media 3. From Multimedia Retrieval to Cross-media RetrievalRetrieval

4. Retrieval of Chinese calligraphy character: a 4. Retrieval of Chinese calligraphy character: a cross-media practicecross-media practice

5. Building Personalized Portal 5. Building Personalized Portal

6. Conclusion6. Conclusion

ICUDL06, YT ZhuangICUDL06, YT Zhuang

4. Retrieval of Chinese Calligraphy 4. Retrieval of Chinese Calligraphy CharacterCharacter

motivation: Original calligraphy works is unique. They exist in paper, bamboo slips, and are easily to be destroyed.

ICUDL06, YT ZhuangICUDL06, YT Zhuang

How to search?

In our digital library, we digitize Chinese Calligraphy works, Design retrieval systems to make them sharable by all the people on internet.

ICUDL06, YT ZhuangICUDL06, YT Zhuang

the objective:

1. to query similar characters1. to query similar characters

Similar characters could be found and returned to users.This is like traditional content based image retrieval.

ICUDL06, YT ZhuangICUDL06, YT Zhuang

2. to find out where a character comes from2. to find out where a character comes from

We aim to provide an intelligent way to find out surrounding characters, and represent them to users.

Character “ 其” comes from this work

ICUDL06, YT ZhuangICUDL06, YT Zhuang

System Overview

segmentation

individual

characters

feature extraction

Database

feature dataraw data

scanner

Ancient Books

digitize

search engine

ICUDL06, YT ZhuangICUDL06, YT Zhuang

feature extractionfeature extraction

shape matchingshape matching

speed upspeed up

(2). retrieval :

(1). segmentation :

noise eliminationnoise elimination

page-image analysispage-image analysis

smoothingsmoothing

ICUDL06, YT ZhuangICUDL06, YT Zhuang

(1) segmentation

We segment page into columns, and cut the columns into individual characters within the minimum-bounding box.

minimum-bounding box

ICUDL06, YT ZhuangICUDL06, YT Zhuang

(2) Retrieval of Chinese Calligraphy Characters

feature extraction:feature extraction:

we use contour points to represent the calligraphy character,and keep the features of each individual calligraphy character in the database

Calligraphy character is written by brush in stead of hard pen.The brush causes stroke varies in different shape and different sickness. Also the ancient calligraphy has many degradation because of nature changes.

ICUDL06, YT ZhuangICUDL06, YT Zhuang

shape matching:shape matching:

•use polar coordinates to represent the characters:

divide the direction into 8 bins equally, and divide each bin into 4 areas. Then count the points in every bins as show in the picture.

ICUDL06, YT ZhuangICUDL06, YT Zhuang

speed up strategy:speed up strategy:

coarse-to-fine Strategy

improve Shape matching algorithm• dynamic Time Warping of projecting histogram• extended DTW for 2D calligraphy contour warping

high dimensional indexing

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Visualization of Chinese

Calligraphy

Shape-based character retrieval

Retrieval result

Submit Example

ICUDL06, YT ZhuangICUDL06, YT Zhuang

OutlineOutline

1. CADAL: China digital library1. CADAL: China digital library

2. Our Vision to next generation of digital library2. Our Vision to next generation of digital library

3. From Multimedia Retrieval to Cross-media 3. From Multimedia Retrieval to Cross-media RetrievalRetrieval

4. Retrieval of Chinese calligraphy character: a 4. Retrieval of Chinese calligraphy character: a cross-media practicecross-media practice

5. Building Personalized Portal5. Building Personalized Portal

6. Conclusion6. Conclusion

ICUDL06, YT ZhuangICUDL06, YT Zhuang

5. Building Personalized Portal5. Building Personalized Portal

Personalized portal

Web personalization is the technique to help users quickly Web personalization is the technique to help users quickly locate interesting information which features locate interesting information which features multimediamultimedia and and cross-mediacross-media..

Service integration around the content

Information filtering based recommendation

Show me the information that I really need !

ICUDL06, YT ZhuangICUDL06, YT Zhuang

personalized portal

Personalization services provided by portal:Personalization services provided by portal: my bookshelfmy bookshelf my bookmarkmy bookmark my rulesmy rules personal profile personal profile

settingsetting

My bookshelf

My bookmark

Books recommended by rules

ICUDL06, YT ZhuangICUDL06, YT Zhuang

detail information about bookdetail information about book translate metadatatranslate metadata full-text searchfull-text search my bookshelf managementmy bookshelf management rankingranking CALIS union catalog and inter- CALIS union catalog and inter-

library loan library loan

““My bookshelf”My bookshelf” management management ““my bookmark”my bookmark” management management bilingual translation bilingual translation full-text searchfull-text search

service integration around the content

ICUDL06, YT ZhuangICUDL06, YT Zhuang

information filtering based recommendation

the classification of Web datathe classification of Web data content data: texts, images……content data: texts, images…… structure data: XML/HTML tagstructure data: XML/HTML tag usage data: Web access logusage data: Web access log user profile: preferences, demographic informationuser profile: preferences, demographic information

implementing information filtering techniquesimplementing information filtering techniques content –based filtering methodcontent –based filtering method collaborative filtering methodcollaborative filtering method

ICUDL06, YT ZhuangICUDL06, YT Zhuang

6. Conclusion6. Conclusion•Next generation of digital library shall focus more on multimedia, and finally cross-media retrieval.

•But more research issues to be faced with……

• Cross-Media Representation Framework• Cross-Media Knowledge-based Reasoning• Analysis and Recognition• Complex retrieval

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Thanks !Thanks !