Cross-media Intelligent Searching in Digital Library

Cross-media Intelligent Cross-media Intelligent Searching in Digital Searching in Digital

Library Library

Yueting Zhuang Yueting Zhuang

Zhejiang University, ChinaZhejiang University, China

Nov. 18, 2006, EgyptNov. 18, 2006, Egypt

ICUDL06, YT ZhuangICUDL06, YT Zhuang

OutlineOutline

1. CADAL: China digital library1. CADAL: China digital library

2. Our Vision to next generation of digital library2. Our Vision to next generation of digital library

3. From Multimedia Retrieval to Cross-media 3. From Multimedia Retrieval to Cross-media RetrievalRetrieval

4. Retrieval of Chinese calligraphy character: a 4. Retrieval of Chinese calligraphy character: a cross-media practicecross-media practice

5. Building Personalized Portal5. Building Personalized Portal

6. Conclusion6. Conclusion

OutlineOutline

ICUDL06, YT ZhuangICUDL06, YT Zhuang3rd Workshop 2004, CMU, USA

ICUDL 2005, Zhejiang University, China

1. CADAL: China Digital 1. CADAL: China Digital LibraryLibrary

China-US One Million Book Digital Library Project

a unique library resource to scholars, students, and

citizens

contain over one million scanned books

A big step towards the goal: create a universal free to

read digital library• Get knowledge available on the web, anytime, anyone, anywhere

http://www.cadal.zju.edu.cnhttp://www.cadal.zju.edu.cn

1.0231.023 million books was digitized, including: million books was digitized, including: Degree dissertationDegree dissertation Modern Chinese books Modern Chinese books Traditional cultural resources Traditional cultural resources English booksEnglish books

Supporting multimedia resource:Supporting multimedia resource: Image Image audioaudio videovideo 3D model3D model Chinese calligraphyChinese calligraphy

about 200,000 clicks a day (http://www.cadal.zju.edu.cn)about 200,000 clicks a day (http://www.cadal.zju.edu.cn) users spread over 70 countries and regionsusers spread over 70 countries and regions 16 scanning centers in China, occupying more than 2000 square met16 scanning centers in China, occupying more than 2000 square met

ersers

As of today, CADAL has achieved:As of today, CADAL has achieved:

Scanning books

Processing digitized books

成都

长春

西安

广州

北京

南京

上海杭州武汉

Users spread over 70 countries and regions

Service structure of Service structure of CADAL:CADAL:

CALIS Integration

Unified Authentication

Personal Portal

Personal Service

Unified Quick Search

Advanced Search

Knowledge Map

Sign Language

Movie Search

CalligraphySearch

Image Search

Cultural Relics

Illustration Search

Bilingual Translation

Help System

FullText Search

Metadata Havesting

Resource Location

Access Control Policy

User Management Logging

digital resources are classified into 8 classes digital resources are classified into 8 classes

according to the publication time and type.according to the publication time and type.

both unified and advanced search are provided for all both unified and advanced search are provided for all

resourcesresources

Current services provided by CADALCurrent services provided by CADAL::

(1) (1) Metadata searchingMetadata searching

(2) (2) Unified searchUnified search

China Ancient Choose the types of resources

to search

search results contain each type of resources.

(3) (3) advanced searchadvanced search

Users can choose search scope, combined results and result style

Second search, full texts and detailed information are available in result page.

(4) (4) full-text searchfull-text search

Full text search uses the texts from OCR

OutlineOutline

2. Our Vision to Next Generation of 2. Our Vision to Next Generation of Digital LibraryDigital Library

support multimodal sourcessupport multimodal sources

enable cross-media retrievalenable cross-media retrieval

What the next generation of DL looks like?

typical features of existing DLs: books are indexed by title, author, keywords…books are indexed by title, author, keywords…

users query books by keywords inputusers query books by keywords input

mostly only text information is returnedmostly only text information is returned

multimodal data is not fully-supportedmultimodal data is not fully-supported

Extension to the concept of “Book”Extension to the concept of “Book”

The key of our vision to next generation of The key of our vision to next generation of digital library is the extension of “book” digital library is the extension of “book” conceptconcept• A book is regarded as A book is regarded as not only the written not only the written

symbols on papers, but also any type of symbols on papers, but also any type of multimedia “item”,multimedia “item”, such as such as

A video clipA video clip An audio clipAn audio clip A piece of paintingA piece of painting …………..

So in the next generation of DL, “book” can be in “multimodal”:

Scenery Image Chinese Calligraphy Video fragment Audio clips

……

a general data representation for multimodal data

feature analysis knowledge mining

We can find a general data structure to represent multimodal “books”

Supporting multimodal data is an important trend in multimedia retrieval:

We get multimodal information from real world, then can we get multimodal data from digital world, especial like a digital library?

multimodal ?

real world digital world

video……

Cross-media retrievalCross-media retrieval

After the extension of “After the extension of “Book”Book” concept, the retrieval shall also be concept, the retrieval shall also be extended. extended.

We call it “cross-media retrieval”. We call it “cross-media retrieval”.

Cross-media-Cross-media-

Cross-media-

Scenario: a simple example of cross-media :

Starting Query

Starting QueryStarting

User can start a query from any type of media, and relevant multimedia data would be returned.

Textual Description tothe giant Panda: the Panda is a kind of cat which ……

“Giant Panda” Image

“Giant Panda” Text “Giant Panda” Audio

Cross-media retrieval is a useful way to access multimodal data:

available available

Cross-media retrieval can be regarded as the simulation of the real world, and it helps us get multimodal data in a more flexible and more informative way!

textsimage

audiovideo

…… ……

What cross-media retrieval needs to do?

user query interfaceSubmit a query example

It can be an image, audio or keywords…

cross-media search enginecross-media search enginecross-media search engine

texts image audio video

raw data

knowledge base

multimodal representation & index

query results:

texts, images, audios…

OutlineOutline

5. Building Personalized Portal 5. Building Personalized Portal

3. From Multimedia Retrieval to 3. From Multimedia Retrieval to Cross-media RetrievalCross-media Retrieval

1) Image Retrieval: Content-based

negative example

query example

Searching images

relevance feedback

positive example

multimedia retrieval

(2) Image retrieval: text-based

Query text

(3) Motion retrieval

Given a query example of motion data, we can find similar motion data from database.

(4) Audio retrieval: Content-based

content-based audio search engine

audio depository

audio query example

submit

adjust feature weight

adjust query center

returned audio results

return

relevance feedbackuser judge

System Framework

audio retrieval: key techniques

extract auditory features in compression field from extract auditory features in compression field from

audio clipsaudio clips

cluster fuzzy auditory featurescluster fuzzy auditory features

represent audio clips with the cluster centerrepresent audio clips with the cluster center

retrieve similar audios by cluster center matchingretrieve similar audios by cluster center matching

introduce relevance feedback techniquesintroduce relevance feedback techniques

query examplefeature weight

relevance feedback

weight adjusting

audio retrieval: an example

(5) video retrieval: Overview

unlike text resources, video is unstructured.unlike text resources, video is unstructured.• rich in visual contents;rich in visual contents;• poor in semantic understanding; poor in semantic understanding;

the challenging issues:the challenging issues:• summarization & structuring;summarization & structuring;• video miningvideo mining

(5) video retrieval: key techniques

video structuring: video structuring: construct video table-of-content (VTOC)construct video table-of-content (VTOC) make it physically structured. make it physically structured.

video summarization: video summarization: help the user quickly grasp the content of video clipshelp the user quickly grasp the content of video clips support video browsing support video browsing video encoding/compressionvideo encoding/compression

key frame

concept clustering

video stream

temporal features

spatial features

table of contents

shot boundary detection

Key Frame Extraction

grouping

scene construction

video structuring

video summary: video content mining

original video(redundant)

summarized video(concise and informative )

video contentmining

Find meaningful patterns to support efficient video browsing

two news video are separated in 6 video shots (the following are the key frames) .And their total length is 3 minutes

video summary: an example

After video summarization, the video is 3 seconds.

And it consists of 3 key frames as below.

video shot clustering result

video shot

original videosimilar video shots are clustered together

Video Retrieval

video browse

key frames

video browse

summary

(6) 3D model retrieval: overview

measure 3D model with shape similarity

(6) 3D model retrieval: an example

query example

As shown above, the multimedia As shown above, the multimedia retrieval is generally retrieval is generally content-based X retrieval—CBXR. —CBXR.

towards cross-media Retrieval

Motivation

image retrieval

audio retrievalvideo retrieval

motion retrieval

3D model retrieval

Cross-media retrieval……

intelligent integration

We can provide a more flexible and efficient way to access multimodal data.

We name it as cross-media retrieval.

Support multimodal sourcesSupport multimodal sources smooth integration of multimodal data;smooth integration of multimodal data;

query media objects by examples of different modalities; query media objects by examples of different modalities;

Challenging issues:Challenging issues: texts, images, audios, etc. are represented with different texts, images, audios, etc. are represented with different

featuresfeatures

different features are heterogeneousdifferent features are heterogeneous

cross-media similarity can’t be measured by content featurescross-media similarity can’t be measured by content features

there is a semantic gap between low-level features and there is a semantic gap between low-level features and semanticssemantics

Our Solution to Cross-media retrieval

build cross-indexing from multimodal build cross-indexing from multimodal datadata

organize multimedia documentorganize multimedia document

explore cross-media correlationsexplore cross-media correlations

…………

Cross-indexing-based retrieval: General idea

graphics

text search engine

image search engine

audio search engine

video search engine

graphics search engine

preprocessingcross-index

cross-index multimodal

search engine

SVM based

clustering

Retrie

val in

cequery

search results fusion

results

relevance feedback

……

an image query example

retrieved images

retrieved video

retrieved audio

(1) Cross-index retrieval: interface

The system now support images, audios and videos. Users can submit any of the media objects, and the system returns relevant images, audios and videos.

Building multimedia document: General idea

definition of multimedia documentdefinition of multimedia document

a logical representation of multimodal data;a logical representation of multimodal data;

consists of semantically related media objects; consists of semantically related media objects;

formal structure:formal structure:

Document := <ID, Title, URI, KeywordList, ElementSet,LinkSet>Document := <ID, Title, URI, KeywordList, ElementSet,LinkSet>

ElementSet := { (Audio| Image | Text | Video) i | i N }∈ElementSet := { (Audio| Image | Text | Video) i | i N }∈

Audio := <ID, ParentID, URI, Size, KeywordList, AudioFeature>Audio := <ID, ParentID, URI, Size, KeywordList, AudioFeature>

Image := <ID, ParentID, URI, Size, KeywordList, ImageFeature>Image := <ID, ParentID, URI, Size, KeywordList, ImageFeature>

Text := <ID, ParentID, URI, KeywordList >Text := <ID, ParentID, URI, KeywordList >

Video := <ID, ParentID, URI, Frames, KeywordList, VideoFeature>Video := <ID, ParentID, URI, Frames, KeywordList, VideoFeature>

Build multimedia document: framework

graphics

Semantic skeleton base

Storage SubsystemMultimedia document

Preprocessing

Learning and Relevance feedback subsystem

Query Processor(multimedia document + media objects)

keyword

Besides keyword-based search, the user can perform a content-based search with a specific media object as the query example

A multimedia document is visualized as its sketch, i.e. text, images and key-frame lists for videos.

image video text multimedia document

the left figure is the relevant media data retrieved by the query of “water”.

Building multimedia document: retrieval interface

ICUDL06, YT ZhuangICUDL06, YT ZhuangChallenges:

visual feature space auditory feature space

high-level semantics： war, dog, bird, car, tiger

Gap 2: Semantic gap

1. multimodal data reside in heterogeneous feature spaces2. the semantic gap

Gap 1: Content gap

Exploring cross-media correlations: challenges

Images and audios represent high-level semantics from different perspectives. If we can find the correlation between different perspectives, we can enable cross-media retrieval with the bridge of correlations.

bird explosiontiger dogcar

correlationcorrelation

Exploring Cross-media Correlations: Solutions

Canonical correlation analysis

11 12, ..., 1

21 22, ..., 2

......

1 2, ...,

n n nm

x x xX

11 12, ..., 1

21 22, ..., 2

......

1 2, ...,

n n nm

y y yY

Output：

11 12, ..., 1

21 22, ..., 2

......

1 2, ...,

n n np

x x xX

11 12, ......, 1

21 22, ......, 2

......

1 2, ......,

n n nq

y y yY

image feature matrix: Audio feature matrix:

Input ： npX nqY

At the same time, the correlation between X and Y maximally coincides with the correlation between X’ and Y’

X and Y are of different dimension !

X and Y are of the same dimension !

Basic idea:

Exploring cross-media correlations: mathematical realization

the correlation network in the subspace

locate

1. how to measure both intra- and inter-media correlations ?1. how to measure both intra- and inter-media correlations ?

2. how to introduce new media objects into the system?2. how to introduce new media objects into the system?

locate

testing data

Intra-mediaIntra-media

cross-media

Exploring cross-media correlations: subsequent challenges

OutlineOutline

5. Building Personalized Portal 5. Building Personalized Portal

4. Retrieval of Chinese Calligraphy 4. Retrieval of Chinese Calligraphy CharacterCharacter

motivation: Original calligraphy works is unique. They exist in paper, bamboo slips, and are easily to be destroyed.

How to search?

In our digital library, we digitize Chinese Calligraphy works, Design retrieval systems to make them sharable by all the people on internet.

the objective:

1. to query similar characters1. to query similar characters

Similar characters could be found and returned to users.This is like traditional content based image retrieval.

2. to find out where a character comes from2. to find out where a character comes from

We aim to provide an intelligent way to find out surrounding characters, and represent them to users.

Character “ 其” comes from this work

System Overview

segmentation

individual

characters

feature extraction

Database

feature dataraw data

scanner

Ancient Books

digitize

search engine

feature extractionfeature extraction

shape matchingshape matching

speed upspeed up

(2). retrieval :

(1). segmentation :

noise eliminationnoise elimination

page-image analysispage-image analysis

smoothingsmoothing

(1) segmentation

We segment page into columns, and cut the columns into individual characters within the minimum-bounding box.

minimum-bounding box

(2) Retrieval of Chinese Calligraphy Characters

feature extraction:feature extraction:

we use contour points to represent the calligraphy character,and keep the features of each individual calligraphy character in the database

Calligraphy character is written by brush in stead of hard pen.The brush causes stroke varies in different shape and different sickness. Also the ancient calligraphy has many degradation because of nature changes.

shape matching:shape matching:

•use polar coordinates to represent the characters:

divide the direction into 8 bins equally, and divide each bin into 4 areas. Then count the points in every bins as show in the picture.

speed up strategy:speed up strategy:

coarse-to-fine Strategy

improve Shape matching algorithm• dynamic Time Warping of projecting histogram• extended DTW for 2D calligraphy contour warping

high dimensional indexing

Visualization of Chinese

Calligraphy

Shape-based character retrieval

Retrieval result

Submit Example

OutlineOutline

Personalized portal

Web personalization is the technique to help users quickly Web personalization is the technique to help users quickly locate interesting information which features locate interesting information which features multimediamultimedia and and cross-mediacross-media..

Service integration around the content

Information filtering based recommendation

Show me the information that I really need !

personalized portal

Personalization services provided by portal:Personalization services provided by portal: my bookshelfmy bookshelf my bookmarkmy bookmark my rulesmy rules personal profile personal profile

settingsetting

My bookshelf

My bookmark

Books recommended by rules

detail information about bookdetail information about book translate metadatatranslate metadata full-text searchfull-text search my bookshelf managementmy bookshelf management rankingranking CALIS union catalog and inter- CALIS union catalog and inter-

library loan library loan

““My bookshelf”My bookshelf” management management ““my bookmark”my bookmark” management management bilingual translation bilingual translation full-text searchfull-text search

service integration around the content

information filtering based recommendation

the classification of Web datathe classification of Web data content data: texts, images……content data: texts, images…… structure data: XML/HTML tagstructure data: XML/HTML tag usage data: Web access logusage data: Web access log user profile: preferences, demographic informationuser profile: preferences, demographic information

implementing information filtering techniquesimplementing information filtering techniques content –based filtering methodcontent –based filtering method collaborative filtering methodcollaborative filtering method

6. Conclusion6. Conclusion•Next generation of digital library shall focus more on multimedia, and finally cross-media retrieval.

•But more research issues to be faced with……

• Cross-Media Representation Framework• Cross-Media Knowledge-based Reasoning• Analysis and Recognition• Complex retrieval

Thanks !Thanks !

Cross-media Intelligent Searching in Digital Library

Documents

Transcript of Cross-media Intelligent Searching in Digital Library

Cross-Modal Music Retrieval - Jazzomat · Cross-Modal Music Retrieval Meinard Müller Music Retrieval Textual metadata – Traditional retrieval – Searching for artist, title, …

EPISODE 7: Charlie and Blue Do Some Soul Searching Key ... · EPISODE 7: Charlie and Blue Do Some Soul Searching Key Stage 1 Cross-Curricular Topic: Ourselves ... Christian Jew Hindu

Searching for happiness: A cross-national analysis of ...

Doc. No. 67146.001.2 The Cross Westchester …...Doc. No. 67146.001.2 TRW The Cross Westchester Expressway Intelligent Transportation System Concept of Operations Document Prepared

Cross Platform Mobile Applications- An Intelligent Effort With Reduced Cost

Searching for Data Relationship between searching and sorting Simple linear searching Linear searching of sorted data Searching for string or numeric data.

eprints.utem.edu.myeprints.utem.edu.my/9834/1/Intelligent_Searching_System_For... · v ABSTRACT This final year project entitled with Intelligent Searching System For Software Tutorials

ISITIA2018 Technical Papers Fullrepository.ubaya.ac.id/33936/13/Searching Cheapest... · 2018. 11. 28. · PROCEEDING 2018 International Seminar on Intelligent Technology and Its

Intelligent web searching Laura Jeffrey Researcher Training Librarian.

Cross-media Intelligent Searching in Digital Library Yueting Zhuang Zhejiang University, China Nov. 18, 2006, Egypt.

VideoAnalytics · Recording search of relevant video footage within seconds Activity-controlled recording and searching reduces bandwidth and storage requirements The intelligent

Metadata and Cross-Collection Searching in Luna’s Insight

Welcome to the solution.€¦ · Automated processes For example, vacancy and offer authorisation, interview booking and onboarding online. Searching for talent Powerful ‘intelligent’

INTELLIGENT CROSS WALK - pkmcenter.itny.ac.id · proposal program kreativitas mahasiswa intelligent cross walk bidang kegiatan pkm-karsa cipta (pkm-kc) diusulkan oleh: oktaviani madu

Intelligent Interoperability Gateway Model IOP-1onthegodevices.com/NEXLINK INSTALLATION MANUAL v103.pdfThe NEXLINK interoperability gateway is an intelligent cross connection from

DESIGN OF INTELLIGENT CROSS-LAYER ROUTING PROTOCOLS … · DESIGN OF INTELLIGENT CROSS-LAYER ROUTING PROTOCOLS ... such as AN and wireless ad hoc networks ...

1999. Yu.Demchenko. TERENA Multilinguality in Indexing, Searching and Metadata Slide 2_1 Multilinguality and cross-language searching Multilingual aspects.

Intelligent Cross Dipole Sonic Tool (iXD)...16 Intelligent Cross Dipole Sonic Tool (iXD) The Intelligent Cross Dipole Sonic Tool, combining monopole and cross-dipole sonic technology,

Intelligent Searching of electronically stored information (Predictive Coding)

Intelligent Storage: Cross-Layer Optimization for Soft ... · Intelligent Storage: Cross-Layer Optimization for Soft Real-Time Workload • 259 operating system interface which can