Pratyusha Koduri Anish Reddy Devireddy Akaash Vankamamidi 1.

1

Video Data Retrieval

Pratyusha KoduriAnish Reddy Devireddy

Akaash Vankamamidi

2

OutlineIntroductionProblem StatementOur ContributionsRelated WorkMethodologyComparisonEvaluationFuture WorkConclusionReferences

IntroductionProblem: Growing amounts of video data. video data in form of News Video, Film archives,

Surveillance, user-generated content, distance learning, video conferencing, medical applications, sports

Video data is dynamicWith the development of multimedia data types and

available bandwidth there is huge demand of video retrieval systems

One could store the digital video information on tapes, CD-ROMs, DVDs, or any such device.

Goal: Effective video retrieval.

3

4

Problem StatementAll the papers we worked on are related to

retrieval of video data. And how to do this on a compressed video data.

In content-based video retrieval systems choosing features reflect real human interest and how do feature extraction affects the video retrieval.

5

Our Contributions

First, we identify the video retrieval approaches from spatial and temporal analysis

We focused on content-based video retrieval systems and video retrieval in compressed data

classify the methods and summarize the future trends and open problems of video retrieval

6

Related WorkSince we have large amounts of data Compress itRetrieving Data from compressed data without

processing overheadTo index and retrieve semantic datawe use

semantic indexing of video data using the generalized n-ary operators

Dominant regions are used in video indexing and retrieval which include all types of users.

The main objective is to provide concurrency control for virtual editing of video data among different users

7

Related WorkFramework for semantic retrieval of video database.

Each frame of video clips characterized by its HSV (hue-saturation-value) color feature, is first projected onto the spatial principle components

Efficient video retrieval method takes users feedback on the relevance of retrieved videos and iteratively reformulates the input query feature vectors (QFV) for improved video retrieval

8

Key ConceptsQFV reformulation Performed by optimization

method based on Simultaneous Perturbation Stochastic Approximation (SPSA)technique

Relevance feedback (RF), a popular technique in the area of

content-based image retrieval (CBIR).Tracking semantic objects in a video and then

modeling spatio-temporal events based on object trajectories and object interactions Mine spatio-temporal data

9

Video RetrievalUseful in

Historical ArchivesForensic documentsFingerprint & DNA matchingSecurity usage

Retrieval Granularity is also important. How do users want to retrieve materials? What is the purpose of retrieval? What is the user expertise?

10

Content Based Video RetrievalContent-based video retrieval systems

automatically index video material by segmenting it into clips and extracting features such as text, color, texture, motion from each clip to support search.

As digital video collections become more widely available, content-based video retrieval tools will likely grow in importance for an even wider group of users.

CBVR system aims at assisting a human operator (user) to retrieve sequence (target) within a potentially large database

11

Content Based Video RetrievalSelection of extracted features play an important

role in content based video retrievalContent based Video Indexing and Retrieval

(CBVIR), is an extension to application of image retrieval problem

“Content-based” means that the search will analyze the actual content of the video. The term ‘Content’ in this context might refer colors, shapes, textures.

These systems are aiming at accessing video by its content, namely, the spatial-temporal (video) information.

12

MethodologyThe first step for video-content analysis,

content based video browsing and retrieval is the partitioning of a video sequence into shots

Once key frames are extracted next step is to extract features

breakdown Sequence->scene->shot->frame->object

13

Features Two type

Low-levelHigh-level

Low-level features such as object motion, color, shape, texture, loudness, power spectrum, bandwidth, and pitch are extracted directly from video in the database

High-level features are also called semantic features. Features such as timbre, rhythm, instruments, and events involve different degrees of semantics contained in the media

14

IssuesOne of the key issues in CBVR is, to bridge the

”semantic gap”, which refers to the gap between low level features and high level semantic meanings of content

Low level features such as color and textures are easy to measure and compute

But it is a challenge to connect the low level features to a semantic meaning, especially involving intellectual and emotional aspects of the human operator (user).

Another issue is how to efficiently access the rich content of video information, these involves video content, spatial and temporal analysis of videos

15

Generalized n-ary relation The principle component of video data is the

spatial/temporal semantics associated with itGeneralization in both spatial and temporal

domains is to simplify describing complex spatial or temporal events.

For the spatial domain the operands represent the physical location of the objects

In temporal case they represent the duration of a certain temporal event.

N-ArySpatial event, consider a player holding the ball in a

basketball game. A frame consisting event "player holding the ball". This is characterized by six of the n-ary relations in

both x and y coordinates . M, O, C, S, CO,ESpatial events can serve as the low level (fine-grain)

indexing mechanisms for video data.Temporal event is extension of the spatial event

“holding a ball” to ‘passing of a ball between two players”.

B is the before n-ary operation, and d(Events) are the durations of the spatial events

17

ArchitectureThe system is hierarchical in nature and allows multi-level indexing and searching mechanism by modeling information at various levels of semantic granularity and hence allows processing of content-based queries without processing raw image or video data

18

Retrieval In Compressed DataTo avoid the processing overhead of

decompressing video stream into individual frames, it is better to detect these features directly from compressed video data.

Spatio-temporal data can be dominant regions, color information and motions from compressed video data.

Dominant regions are used in video indexing and retrieval, these are extracted from intensity data.

DC Image Data

Quantization

Filtering

Simplified Data

Flat Regions

Watershed Algorithm

Dominant Regions

Color information is computed from HSV quantized table.

Camera motion detection from region-based segmented data.

Based on above features we can extract semantic information of video content.

Above information can be useful in content based video indexing and retrieval.

Retrieval In Compressed Data...

20

Comparison Study SummaryKey issues we noticed in this study are1. Bridging the semantic gap:

To do annotation automatically or semi-automatically, we need to bridge the "semantic gap", i.e., to find algorithms that will infer high-level semantic concepts (sites, objects, events) from low-level image/video features that can be easily extracted from the data (color, texture, shape and structure etc)

One sub-problem is Audio Scene Analysis. Researchers have worked on Visual Scene Analysis (Computer Vision) for many years, but Audio Scene Analysis is still in its infancy, and an under-explored field.

21

Comparison Study Summary2) Human intelligence and machine intelligence

One advantage of information retrieval is that in most scenarios there is a human (or humans) in the loop. One prominent example of human-computer interaction is Relevance Feedback.

3) New Query ParadigmsFor image/video retrieval, people have tried query by

keywords, similarity, sketching an object, sketching a trajectory, painting a rough image, etc. Can we think of useful new paradigms?

4) Data MiningSearching for interesting/unusual patterns and correlations in

video has many important applications, including Web Search Engines and dealing with intelligence data. Work to date on Data Mining has been mainly in Text data.

22

Comparison Study Summary5) Unlabeled Data

Can we use the large number of unlabeled samples in the database to help?

Another problem related to image/video data annotation is Label Propagation. Can we label a small set of data and let the labels propagate to the unlabeled samples?

6) Incremental LearningIn most applications, we keep adding new data to the

database. We should be able to change the parameters of the retrieval algorithms incrementally, not needing to start from scratch every time we have new data.

23

Comparison Study Summary7) Using Virtual Reality Visualization To Help

Can we use 3D audio/visual visualization techniques to help a user to navigate through the data space to browse and to retrieve?

8) Structuring Very Large DatabasesResearchers in audio/visual scene analysis and

those in Databases and Information Retrieval should really collaborate CLOSELY to find good ways of structuring very large video databases for efficient retrieval and search.

24

Comparison Study Summary9) Applications of Video Retrieval

Few real applications of video retrieval have been accepted by the general public so far. Is web video search engine going to be the next killer application? It remains to be seen. With no clear answer to this question, it is still a challenge to do research that is appropriate for real applications.

25

Conclusion & future work Despite the considerable progress of academic research in video

retrieval, there has been relatively little impact of content based video retrieval research on commercial applications with some niche exceptions such as video segmentation.

Choosing features that reflect real human interest remains an open issue. One promising approach is to use Meta learning

Low to High Level Semantic Gap: Visual feature based techniques at the low level of abstraction, mostly from the contribution of signal processing and computer vision communities have been explored in the literature.

Current research efforts are more inclined towards high-level description and retrieval of visual content.

The techniques that bridge this semantic gap between pixels and predicates are a field of growing interest.

Intelligent systems are needed that take low-level feature representation of the visual media and provide a model for the high-level object representation of the content.

26

References http://research.microsoft.com/en-us/um/people/yongrui/ps/sigproc06.pdf Day, Y.F.; Dagtas, S.; Iino, M.; Khokhar, A.; Ghafoor, A., "Spatio-temporal modeling of video data for on-line object-oriented query

processing," Multimedia Computing and Systems, 1995., Proceedings of the International Conference on , vol., no., pp.98,105, 15-18 May 1995

Hang-Bong Kang, "Spatio-temporal feature extraction from compressed video data," TENCON 99. Proceedings of the IEEE Region 10 Conference , vol.2, no., pp.1339,1342 vol.2, Dec 1999

Sze-Man Chan, S.; Li, Qing, "VideoMAP*: a Web-based architecture for a spatio-temporal video database management system," Web Information Systems Engineering, 2000. Proceedings of the First International Conference on , vol.1, no., pp.393,400 vol.1, 2000

Xia, J.; Wang, Y., "A spatio-temporal video analysis system for object segmentation," Image and Signal Processing and Analysis, 2003. ISPA 2003. Proceedings of the 3rd International Symposium on , vol.2, no., pp.812,815 Vol.2, 18-20 Sept. 2003

Bo Geng; Hong Lu; Xiangyang Xue, "Incremetal Spatio-Temporal Feature Extraction and Retrieval for Large Video Database," Circuits and Systems, 2007. ISCAS 2007. IEEE International Symposium on, vol., no., pp.961,964, 27-30 May 2007

Velusamy, S.; Bhatnagar, S.; Basavaraja, S. V.; Sridhar, V., "SPSA based feature relevance estimation for video retrieval," Multimedia Signal Processing, 2008 IEEE 10th Workshop on , vol., no., pp.598,603, 8-10 Oct. 2008

Xin Chen; Chengcui Zhang, "An Interactive Semantic Video Mining and Retrieval Platform--Application in Transportation Surveillance Video for Incident Detection," Data Mining, 2006. ICDM '06. Sixth International Conference on , vol., no., pp.129,138, 18-22 Dec. 2006

Mehmet Emin Dönderler;Özgür Ulusoy; Ugur Güdükbay “Rule-based spatiotemporal query processing for video databases”The VLDB Journal- The International Journal on Very Large Data Bases; Volume 13 Issue 1, January 2004; Pages 86 – 103

Fudong Sun; Minyong Shi; Weiguo Lin, "Feature Label Extraction of Online Video," Computer Science and Electronics Engineering (ICCSEE), 2012 International Conference on , vol.3, no., pp.211,214, 23-25 March 2012

Divakaran, A.; Vetro, A.; Asai, K.; Nishikawa, H., "Video browsing system based on compressed domain feature extraction," Consumer Electronics, IEEE Transactions on , vol.46, no.3, pp.637,644, Aug 2000

Al-Salih, A.A.M.; Ahson, S.I., "Object detection and features extraction in video frames using direct thresholding," Multimedia, Signal Processing and Communication Technologies, 2009. IMPACT '09. International , vol., no., pp.221,224, 14-16 March 2009

Sifei Lu; Li, R.M.; Tjhi, W.-C.; Kee Khoon Lee; Long Wang; Xiaorong Li; Di Ma, "A Framework for Cloud-Based Large-Scale Data Analytics and Visualization: Case Study on Multiscale Climate Data," Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on , vol., no., pp.618,622, Nov. 29 2011-Dec. 1 2011

27

Thank you

Pratyusha Koduri Anish Reddy Devireddy Akaash Vankamamidi 1.

Documents

Transcript of Pratyusha Koduri Anish Reddy Devireddy Akaash Vankamamidi 1.