CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

60
CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko

Transcript of CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Page 1: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

CADIP Research atUW-Milwaukee

Ethan Munson

and

Yelena Tsymbalenko

Page 2: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Research Foci

• Languages for implementing agents– MS Thesis by Preeti Seshadri

• Multimedia information retrieval– Exploiting metadata to improve MM IR

• Yelena Tsymbalenko’s MS research

– Models of media

• Usability of information visualization– future work

Page 3: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Multimedia IR

Page 4: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Using HTML Metadata to Retrieve Relevant Images from the Web

Yelena Tsymbalenko

University of Wisconsin-Milwaukee

Page 5: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Why is image search important?

• The Web is primary source of obtaining information.

• Images are one of the most valuable sources of information available on the Web.

• Few WWW image search engines currently exist.• Using textual search engines to find images

manually is laborious.

Page 6: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

A Requirement for Web Image Search

• We need an efficient method of discovering and indexing image content.

• Two main sources of information about image content:– image processing

– associated text• text content

• markup

Page 7: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Related work

• WebSeek (J. Smith & S. Chang, Columbia University)

– performs a semi-automated classification of the images

– uses image file name for categorization

– searches by browsing or searching through the categories

– uses image features such as color content to find images of similar color

Page 8: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Related work

• WebSeer (M. Swain et al., The University of Chicago) – uses associated text and markup to supplement

information derived from analyzing image content

– uses multiple kinds of metadata

– decides which images are photographs

Page 9: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Why look for new methods for image retrieval?

• The number of WWW documents is growing rapidly and constantly changing.

• Image processing is complex and computationally expensive.

• We need fast and efficient methods for finding images.

• Extensive image processing is not necessary.

Page 10: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Our research

• Obtain information about image content from HTML Source Code:– Explicit: file and HREF names

– Implicit: markup structure

• Determine which features of Web documents are best clues to image content

Page 11: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.
Page 12: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.
Page 13: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.
Page 14: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.
Page 15: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.
Page 16: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.
Page 17: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.
Page 18: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.
Page 19: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Search Strategy Examples

• Image file name• Title of HTML document• Alternate text (ALT tag)• Text of hyperlink• Text of the same paragraph • Header text

Page 20: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.
Page 21: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.
Page 22: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.
Page 23: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.
Page 24: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.
Page 25: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.
Page 26: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.
Page 27: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.
Page 28: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.
Page 29: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.
Page 30: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.
Page 31: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.
Page 32: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Analysis Plans

• Will collect data about search results for a number of queries (several dozen)– Suggestions for queries are welcome !

• Will test which clues are most effective– Are some redundant ?

– Does a combination of clues produce better recall ?

– Are some clues more precise ?

• Is search performance dependent on query type ?– Proper names (Chaplin, Garvey)

– Phenomena (riot, explosion)

Page 33: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Using HTML Metadata to Retrieve Relevant Images from the Web

Yelena Tsymbalenko

Department of Computer Science

University of Wisconsin - Milwaukee

[email protected]

Page 34: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Agent Implementation Languages

Page 35: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Agent Implementation Languages

• Preeti Seshadri’s thesis has two parts• Survey of languages

– Pure scripting languages• Tcl, Perl

– Scripting/general-purpose languages• Java, Python, Telescript

• Resource management service for Java– Interface design

– Partial implementation

Page 36: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Language Requirements

• Good language infrastructure– OO or other good modularity features

– Automated memory management

• Decent performance– Byte-compilation is probably enough

• Portability• Security

– Mobile agents must either be trusted or controlled

– Control is always better

Page 37: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Language Survey

• Systems programming languages (C, C++)– high-performance, but non-portable and insecure

• Pure scripting languages (Tcl, Perl)– low-to-medium performance, portable

– limited security and communication services

• Scripting/general-purpose languages (Java, Python, Telescript)– medium-performance, portable

– more security and communication support

Page 38: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Systems Programming Languages

• Native-code compilation yields very high performance

• Native-code is not portable– compilation is too complex to perform at client site

• Language definitions are limited– no security or coordination infrastructure

– little is guaranteed about higher-level services

– even exception handling is limited

Page 39: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Pure Scripting Languages

• Tcl is a bad choice– poorly suited for larger applications

• low performance

• poor language infrastructure – non-OO, no threads, no exceptions

• Perl is a bit better– performance is better, but not great

– limited security

– language complexity is high

Page 40: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Telescript

• “Environment” for constructing agent societies– Proprietary (General Magic, Inc.)

– Language, engine, communication protocols

• Claimed to be fast, easy-to-use, secure• Core concepts

– “places” are execution contexts and can be nested

– No agent-to-agent communication• agents move to places and do things

– Capability-based security (“permits”)

Page 41: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Python

• An OO scripting language– Unusual dynamic type system

– Many high-level data types

– Socket-level networking support

• Typical byte-compiled characteristics– portability, dynamic linking

• Limited security support– “Restricted execution,” similar to sandboxing

• appears poorly integrated with mobility

Page 42: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Java

• General-purpose language widely used for scripting-style applications– Excellent language design

– Medium performance

• Strong security features– customizable “sandboxes”

• Heavily and effectively hyped– portability is overrated

– performance will probably never match C++

Page 43: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Security Issues in Java

• Java is very secure, but problems remain– e.g. security managers are inflexible

• Agent portability is a problem– A newly arrived agent must be trusted

• sandboxing addresses the obvious trust issues

– Denial of service attacks are still possible• deliberate and accidental

• Java lacks standard resource management services

Page 44: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Resource Management Interface

• Supports both monitoring and control– CPU time

– memory

– threads

• Granularity is per-thread and per-threadgroup• Designed to work on bytecode, not source

– can monitor “outside” agents

Page 45: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Class Structure

ChiefMonitor

UsageException

RunTimeException

ExceedUse

ThreadRegister

ThreadRegisterhas

uses interface

usesusesexceptionis

Page 46: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Interface Details

• Initialization• Resource usage queries

– consumption

– limits

• Resource usage control– set usage bounds and policy

– reset usage bounds

• Resource exceptions– interrupt-style control

Page 47: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Implementation Plans

• Prototype requires that agent be built with internal monitoring support– agent’s implementor must cooperate

• We want to impose monitoring on arbitrary mobile agents

• Solution: bytecode rewriting– All interesting operations have well-defined

representation in Java bytecode

– Will wrap relevant bytecodes in monitoring code• Similar to Purify/Quantify

Page 48: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

A Theory of Media for Multimedia Authoring and Browsing Systems

Page 49: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

The Original Problem

• Develop a multimedia document system that allows easy addition of new media modules

• Kernel/shell architecture– Shells support individual media

• text, graphics, video

– Kernel provides medium-independent services• document structure, scripting language,

style sheet system

Page 50: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Proteus Style Sheet System

• Portable style sheet system– PSL style language adapts to application (or media

module)• medium supported by application is specified with MSPEC

language

• Architecture designed for multiple, simultaneous presentations

• Used in Ensemble document environment and in MPMosaic WWW browser

Page 51: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Adapting Proteus

• Medium supported by application is described in MSPEC language

• MSPEC is based on the TDO model of media• TDO model is designed to meet needs of

authoring and browsing systems– Different from needs of multimedia networking or

intelligent systems

Page 52: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

The TDO Model

• Definition– A medium is a triple M = (T, D, O), where

• T is a set of primitive data types– text, video clip, audio clip, graphic object

• D is a set of dimensions– horizontal, vertical, depth, time

• O is a set of formatting operations with typed parameters– line-breaking, page-breaking

– fill, rotation

– dissolve, fade

Page 53: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

The Data Type Model

• A medium is a data type– e.g. video, audio, text

• Held implicitly by the multimedia research community– suits networking research well

• appears to be used in Continuous Media Toolkit

– each data type has different bandwidth requirements

Page 54: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Problems with Data Type Model

• Same document, different media– Spoken text vs. written text

– Outline text view vs. outline graphics view

– HTML document vs. HTML tree

– a medium is a way of seeing a document

• Different media, same data types– graphics vs. animation

• differing dimensions

– LaTeX vs. “drawing” programs• different formatting operations

Page 55: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

One Outline — Two Media

• Farm Animals– Cow

– Rooster

• Wild Animals– Hawk

– Mountain Goat

Page 56: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

The AHV Model

• Arens, Hovy and Vossers• Media choice by intelligent systems• Focus on perceptual qualities of media• Seven characteristics

– carrier dimension, semantic dimension, temporal endurance, granularity, medium type, default detectability, baggage

• Channels

Page 57: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

New Research: TDOA model

• Adds A, the set of attributes used in formatting• Makes explicit the input/output characteristics of

operations• Will be used in new version of MSPEC

Page 58: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Key Points

• Each model suits its domain– Data type model — networking

– TDO model — authoring and style sheets

– AHV model — media allocation

• Need for unifying concepts is evident– Is a single theory possible?

– Yes, but it will be very broad

Page 59: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Key Points (cont.)

• Distinction between data types and media is important– Media that we think of as having one type (e.g. video)

are actually composed from many

– A medium is not just the data, but also what you do with the data and the space where you do it

• How many media are there?– A dozen or so?

– Thousands?

– If thousands, is there a hierarchical taxonomy?

Page 60: CADIP Research at UW-Milwaukee Ethan Munson and Yelena Tsymbalenko.

Ethan Munson([email protected])

Yelena Tsymbalenko([email protected])

Multimedia Software LaboratoryDept. of EECS, UWM

www.cs.uwm.edu/~multimedia