Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D....

30
Prof. D. Petkovic, Prof. R. Jain 1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine [email protected] SPIE January 2005, San Jose

Transcript of Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D....

Page 1: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 1

Visual information Systems: Lessons for its Future

Prof. D. Petkovic, SFSU

Prof. R. Jain, UC [email protected] January 2005, San Jose

Page 2: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 2

Goals• Look at past and present of visual information systems from the

standpoint of us, computer science researchers in content based retrieval, CV, AI, and multimedia

• Analyze progress and status • Identify future opportunities and challenges in making content

based retrieval and our work become part of successful applications• Discuss role of CV, AI and content based retrieval researchers for

future development

Intended to be critical and self-critical, and to call for action and changes in the way we do the work

Assumption: ultimately, research has to influence real world applications

Page 3: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 3

What are visual information systems?

Page 4: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 4

Image Video

Information – retrieval, query, browsing, visualization

Are visual information systems only this?(content based retrieval, CV and AI - centric view)

Page 5: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

5

Image Video

Information – retrieval, query,browsing, visualization, delivery of all the information

Metadata

RelatedData

Links torelatedinfo

Location

WWWinfo

No, they are all this!

Time

Measure-ments

Audio

Integration

Page 6: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 6

What do users do with visual information systems

• Search or browse for images/video of their current interest and then review/playback/process the results?

• But most often: search or browse for information and knowledge where image/video is but one aspect of it– Entertain– Learn– Explore– Investigate/experiment/evaluate– Communicate– Teach, train– Manage personal data

Page 7: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 7

Examples of commercial or near-commercial visual info systems

• Recording and sharing of personal visual data– My Life Bits (Microsoft Bay Area Research)

• http://www.research.microsoft.com/barc/MediaPresence/MyLifeBits.aspx

– Internet Photo Albums• http://photos.yahoo.com/ph//my_photos

• Scientific research and education– Astronomy: SkyServer (Microsoft Bay Area Research)

• http://cas.sdss.org/dr3/en/

– Bioinformatics• http://hedgehog.sfsu.edu/home/index.aspx• Cell video

Page 8: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 8

Examples (2)

• News– http://news.yahoo.com/

• Entertainment– http://movies.yahoo.com/

• Visual info sharing on the WWW– http://video.search.yahoo.com/

• Art– Getty Museum

• http://www.getty.edu/art/

– Hermitage• http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicSearch.mac/qbic?selLang=English

Page 9: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 9

Examples (3)

• Remote sensing and surveillance– http://www.landsat.org/

• Training and education– http://www.employeeuniversity.com/corporatevideotraining/index.htm– http://coursestream.sfsu.edu/

• Biometrics– Face recognition and matching– Fingerprints– Iris

Page 10: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 10

Search vs. browse or manually prepared material

• It is not always search/query over indexed collections (whether they are manually or automatically indexed)

• Very often (entrainment, on-line learning) the primary function is browsing from a limited list of well organized and manually prepared material.

• “Carefully prepare once – show/sell many times” paradigm justifies investment and need for manual expert preparation– Movie trailers are works of art with the purpose of marketing and sales –

high level of expert manual prep is required and will likely stay that way

• Currently, market of “Carefully prepare once – show/sell many times” is much larger

• Search is most often based on current (changing) interests

Page 11: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 11

Some history and perspective…• Early nineties (BI- before Internet): excitement of early

discovery and (over)promises

• Mid to late Nineties: explosion of Internet, things are hot! Promise of ubiquitously available data. Furious work to achieve goals (research and startup community). WWW media emerging. MPEG7 started

• 2000 and beyond: “Crash” of Internet R&D (I.e. it became ubiquitous). Promises of content based retrieval still unfulfilled visual info systems applications doing well (media is integral part of WWW applications) but with little use of content based retrieval techniques

Page 12: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 12

Content based Retrieval – the early dream

• It was immediately identified that the process of indexing (I.e. attaching searchable metadata) of image and video is a big problem

• Idea of content based retrieval: – Process images and videos to automatically extract searchable indices.

Heavy use of AI, CV, PR was to be applied– Indexes to be used for search and retrieval by “similarity” (“show me

image like this”)

• Content based retrieval was ultimately supposed to make indexing and searching of vast image/video databases automated and economical and reduce or even eliminate need for text metadata

• Great excitement among research community and some potential customers, and many excellent pieces of work

Page 13: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

13

QBIC history• Excitement among researchers at IBM Almaden Research. First prototype very

exciting, generated a lot of related work in research community. Many good papers and patents (QBIC and others)

• Excitement among early customers in art and stock imaging/video• Marketing first skeptical but then started to oversell (e.g. you do not need text metadata

any more) we needed to get involved to tone it down • Transferred into IBM Digital Library and DB2• Business (real $) did not happen:

– It was hard to estimate QBIC added value – QBIC search was limited– Too early (this was before or early Internet times)

• But QBIC did bring good marketing and attention to multimedia features of IBM DB2 and DL. It was used successfully as a marketing tool

– http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicSearch.mac/qbic?selLang=English

• IBM QBIC group lost some credibility in IBM product divisions• QBIC grew into CueVideo (video + audio indexing and search)

Page 14: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 14

Status today: successful visual information systems application

Page 15: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 15

Characteristics of successful commercial systems today

• Content consist of images/video but also of variety of critically important related data (text, audio, prices, links, measurements etc.) arranged in easy to use GUI

• Indexing and data organization done predominantly manually with predefined and simple metadata structures and ontology

• Metadata schemas defined by domain professionals, not computer scientists. Most are very simple. MPEG 7 not widely used

• Search is very simple: title, author, and sometimes a few keywords against manually entered data

• Browsing: by alphabet, time, price, using video key, image thumbnails, often from manually prepared collections

• Content based indexing and retrieval not used

Page 16: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 16

How is indexing of images/video done today

• Manually entered metadata, usually from a fixed list/structure

• Defined metadata structure into which the content providers can publish the content (many standards exist). Most used standards are relatively simple (e.g. really Simple Syndication – RSS http://blogs.law.harvard.edu/tech/rss

• WWW: crawlers analyze image “context”: where on the WWW page the image is, ALT tags, use of the associated text linked to image etc.

• Use of manually generated close captions for video indexing• Only very rudimentary content based analysis: image type,

dimensions, whether the image is color or B&W, photographic or clip art etc.

• Even basic content based retrieval (color histograms, composition) practically not used

Page 17: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

17

Content based Retrieval – why it is not enough

• Assume content based retrieval worked perfectly. What could it ultimately do?– Image color and spatial composition– Recognition/matching of some major objects (people, buildings)– Motion, action recognition– Full speech to text

• Even this ideal situation is not enough! We also need:– Other info about the image/video (when, who, where, what, related

scientific measurements…)– Who, where, when and why – Related data and links to related data etc.– Integration and synchronization with other sources of data across semantics,

time, location, cause/effect dimensions

….. And much more, none of it recorded in pixels

Page 18: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 18

What next?

Page 19: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 19

Future opportunities and challenges – some ideas

• Improve process of media annotation and indexing (automated and semi automated)

• Define visual ontology, applications specific then more general

• Leverage and improve speech recognition, general and domains specific

• Integrate variety of data (media and related data) and provide unified multimedia modeling and handling

• Incorporate time and location search into the mainstream

Page 20: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 20

Improve process of media annotation and indexing

• Automate metadata that lend themselves to automation. Leverage semiautomated means, but pay utmost attention to HCI

• Compute indexes based on all related data and clues (WWW links, tags, audio, GPS etc.)

• Allow multimedia annotation to help: annotate text, outline image/video objects using pointing, add links…

• Use power of internet community to enable economical media annotations– e.g. ESP annotation game by CMU www.espgame.org

• Improve usability to enable annotation at most opportune time and make it very easy to use (during capture, in free/fun time etc)

• Leverage speech (audio tags and speech recognition)• Pay attention to ease of use and GUI• Use time and location• Image and video can be the data but also an index to libraries

Page 21: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 21

Define visual ontology, general or applications specific

• Define ontology of visual media: structure, terms etc. as well as related extraction procedures

• Not clear if general ontology is practical work on domains specific ones first, then try to generalize

• Make it simple and work with domain experts

• Offer procedures for automated and semiautomatic instantiation of ontologies using all available info

• Much work already done outside of CS community (e.g. domains specific standards for data submissions)

Page 22: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 22

Links to some metadata standards (most are XML based and developed outside of CS)

• Dublin Core– http://dublincore.org/documents/2003/04/02/dc-xml-guidelines/

• MPEG7 ISO standard for Video (class 8)– http://www.mpeg.org/MPEG/starting-points.html#mpeg7

• METS for Digital Libraries– http://www.loc.gov/standards/mets/

• AIIM Standards (Enterprise Content Management)– http://xml.coverpages.org/AIMM-Images200104.html– http://xml.coverpages.org/umnImages.html– http://digital.lib.umn.edu/elements.html

• Really Simple Syndication (RSS) to be used by Yahoo videos search– http://blogs.law.harvard.edu/tech/rss

Page 23: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 23

Leverage and improve speech recognition, general and domains

specific• Speech and audio has a wealth of information: semantics

and timing. It is easily available and natural and effective for input

• Speech recognition engines are today trained on general English, with no specific names and domains specific terms. Problem: terms most often searched for (names, specific domain terms,) which are not in speech engine develop domain specific speech engines

• Leverage speech and audio as annotation medium• Push speech and audio annotation into capture devices• Synchronize, cross-index speech with related textual data

for indexing and increased accuracy

Page 24: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 24

Integrate variety of data (media and related data) and provide unified multimedia

modeling and handling • Visual information is based on visual media AND related

information (links, text, documents, measurements, slides etc). enable integrated indexing, data organization, search and browse

• Leverage time and location in indexing and search• Create unified multimedia data models with unified storage,

indexing and query, across semantics, time, location• Integrate across variety of data types both semantically, at

GUI level, and at the system level (e.g. cross index video with slides and text info)

• From data to information: old “chasm” still exists – work on it, first by solving some concrete applications

Page 25: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

25

But also…..• Work very closely with users and domain experts. Develop real

and complete applications • Take a broader usage, system and application view (v.s looking

for application of AI, CV and content based retrieval)• Collaborate with DB, HCI and Internet systems researchers• Leverage all sources of information, not only image and video• Perform extensive experimental evaluations and participate in

formal benchmarks (see e.g. NIST TRAC competition rules) • Contribute and participate in standards activities• Pay much more attention to GUI and HCI and perform more

formal and complete user evaluationsNot doing this risks making us irrelevant…

Page 26: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 26

Acknowledgement

• We thank J. Gray, J. Gemmell (Bay Area Microsoft Research), R. Singh (SFSU), B. Horoowitz (Yahoo), A. Amir (IBM Almaden Research) for comments and feedback

Page 27: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 27

Page 28: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 28

Some history and perspective…• Late eighties and early nineties: excitement of early

discovery and (over)promises– CPU, networks and storage started to enable reasonably good

manipulation, rendering and processing of images– Multimedia appears as a field– DB people are being courted by CV/AI people to broaden their views

and include multimedia data– First projects on content based retrieval: e.g. QBIC (IBM), PhotoBook

(MIT)…– Startup activity: Virage and others– Interest from CV and AI researchers– First joint conferences with DB and CV communities– Many (over) promises that caught the eye of investors, marketers etc.

Page 29: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

29

Some history and perspective…• Mid to late Nineties: explosion of Internet, things are hot!

Furious work to achieve goals– Internet enabled better communication and gradually made images, then video

feasible to manipulate, send and view. Explosive growth– Advances in compression, networking, CPU, media formats, standards and

storage helped greatly. Cheap capture devices starting to happen– Multimedia moves and melds slowly into Internet– DB vendors start to embrace multimedia types (blobs, extenders, blades,

cartridges)– Content base retrieval becomes a very popular research topics for CV and AI.

Many conferences and workshop organized– MPEG7 activity started– Availability of research and venture funding continues– First trials, first products with content based retrieval (Virage, IBM,

Informix…)

Page 30: Prof. D. Petkovic, Prof. R. Jain1 Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu.

Prof. D. Petkovic, Prof. R. Jain 30

Some history and perspective…• 2000 and beyond: wake up, crash of Internet (I.e. it

became ubiquitous). Promises of content based retrieval still unfulfilled– Internet became common thing (which is good) but lost its research appeal

it became a “vehicle”. It is still growing rapidly with more and more visual data

– Explosion of image and video on internet as well as cheap capture devices (e.g. phones capturing audio+image+video+text+GPS)

– Further advances in networking, CPU, storage made image and video ubiquitously available and affordable

– Startups based on content based retrieval not doing well or folded– Strong research activity– Most applications resolved by researchers outside of CS– Minimal or no use of content based retrieval in commercial world