COMPUTER VISION: SOME CLASSICAL PROBLEMS
description
Transcript of COMPUTER VISION: SOME CLASSICAL PROBLEMS
![Page 1: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/1.jpg)
COMPUTER VISION: SOME CLASSICAL PROBLEMS
ADWAY MITRAMACHINE LEARNING LABORATORY
COMPUTER SCIENCE AND AUTOMATIONINDIAN INSTITUTE OF SCIENCE
June 24, 2013
![Page 2: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/2.jpg)
WHAT IS COMPUTER VISION and WHY IS ITDIFFICULT?
Computer Vision, obviously, aims to build computers that can see! In other words, it deals with analyzing/understanding images and videos
through computers Aim of analysis is to find known patterns in images -
Detection, or
match images with known patterns - Recognition For analysis of image we first need a representation for it An image is stored in a computer as a 2 or 3 dimensional matrix, each
element a pixel A single pixel carries very little, if any, semantic information!!!!
![Page 3: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/3.jpg)
Representation with Features
For most applications of machine learning, the first and foremost step is to find features Features are used for representation of the data Features should be such that we can have a metric space for them - usually they are vectors
Very elaborate features (high-dimensional) need to be avoided for computational reasons
Feature Vector- Difficult to process
Smaller FeatureVector
Representation Dimensionality Reduction
![Page 4: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/4.jpg)
Features for Computer Vision
Pixel values can serve as features, but are often not very meaningful Groups of pixels can have more meaning- but how to form such groups?? Groups-of-pixels/sub-images at large number of scales and positions Image gradients/edges Various Filter Outputs have also been explored Difficult to interpret semantically, but found to work well in certain
applications Finding concise, semantically meaningful features still a very major issue in
Computer Vision
![Page 5: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/5.jpg)
SIFT Interest Points
A filter is an operator which processes a signal and removes some undesired components
Difference-of-Gaussian Filters - a popular filter for images Positions of local maxima of this filter output are the interest points Some interest points, like those on the edges, are discarded At each interest point, a feature vector is computed using image gradients
and their orientations inside small windows around the interest point This feature is invariant to orientation and scale of the image SIFT: Scale-Invariant Feature Transform
![Page 6: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/6.jpg)
SIFT INTEREST POINTS
![Page 7: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/7.jpg)
FACE DETECTION-PROBLEM
Given an image, find the faces in it. Used in many places like digital cameras and photo sharing albums,
including Facebook Given a rectangular region in an image, say if it is a face or not! Repeat this process for every location and every size of the rectangular
region
![Page 8: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/8.jpg)
FACE DETECTION-GENERAL APPROACH
Basically a binary classification problem Requires building model for face Needs training samples- both positive and negative Positive samples are face images, negative samples are non-face images
FACE images NON-FACE images
![Page 9: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/9.jpg)
FACE DETECTION-GENERAL APPROACH
Basically a binary classification problem Requires building model for face Needs training samples- both positive and negative Positive samples are face images, negative samples are non-face images Learning algorithm finds boundary between face and non-face images
FACE images NON-FACE images
![Page 10: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/10.jpg)
FACE DETECTION-GENERAL APPROACH
Basically a binary classification problem Requires building model for face Needs training samples- both positive and negative Positive samples are face images, negative samples are non-face images Learning algorithm finds boundary between face and non-face images
FACE images NON-FACE images
Candidate
![Page 11: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/11.jpg)
FACE DETECTION- BENCHMARK and EVALUATION
Standard face-detection benchmark datasets available FDDB: Face Detection dataset for unconstrained setting Performance usually measured using Precision and Recall Precision: Of the reported face detections, how many were actually faces? Recall: Of the faces actually present, how many were detected? F-score: Harmonic mean of precision and recall
![Page 12: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/12.jpg)
FACE RECOGNITION-PROBLEM
Consists of a training phase and a testing phase In the training phase we are given many face images, each marked with
the identity of the person In the testing phase, we are given a new face image, belonging to one of
these persons The task is to find out the identity of the person This is a simple Classification problem in Machine Learning First suitable features and representations have to be found
![Page 13: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/13.jpg)
FACE RECOGNITION-PROBLEM
One approach is to build a model for each person, using the training images provided for him
Second approach is to compare the test image to each of the training images, and find the closest match
It may be observed that not every part of face image helps in recognition- certain things about faces are common to everyone
A good strategy is to find the features that are most distinctive and represent images only by them
Eigenfaces (1991) uses the last two strategies Recognition accuracy is the obvious evaluation criteria A good recognition algorithm should work well with less number of training
images
![Page 14: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/14.jpg)
FACE RECOGNITION-CURRENT STATUS
Face recognition has traditionally been done with well-cropped, focussed face images - Controlled Environment
Considered a solved problem. Nowadays face recognition is being revisited for semi-controlled or
uncontrolled environments. LFW (Labelled Faces in Wild) - a dataset of face images taken in such
settings - a new benchmark
![Page 15: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/15.jpg)
OBJECT RECOGNITION-PROBLEM
Classification task like face recognition Practically much more complex Large number of images given from many object categories Classify a test image into one of these categories Problem made very difficult by intra-class variations
![Page 16: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/16.jpg)
OBJECT RECOGNITION-GENERAL APPROACH
Once again the idea is to build models for different objects No single feature may be enough for classification Some objects may have a distinctive color, others may have a distinctive
shape Multiple Kernel Learning - a sophisticated machine learning formulation,
generally considered the best approach for this problem Caltech-101: a dataset of 101 object categories Close to 80 % accuracy obtained by Multiple Kernel Learning Caltech-256: a dataset of 256 object categories - Accuracy of 50 %
considered good! Intra-class variations continue to pose significant challenge and even
scepticism - is it at all a valid problem???
![Page 17: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/17.jpg)
OBJECT DETECTION
Given an image find all the birds, trees, and cars in it! Requires building models for each of these objects Once again search entire image at multiple positions and scales Part-based Models of objects considered efficient Instead of modelling whole object, model different parts separately Helps to handle occlusion and perhaps intra-class variations
![Page 18: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/18.jpg)
IMAGE SEGMENTATION
Given an image, divide it such that each segment contains an object Basically a clustering problem Does not require features and is done purely with pixel values Has inspired advanced clustering techniques like spectral clustering Graph-based method- models image as graph with each pixel
representing a node and adjacent pixels connected by edges Each edge is given a weight according to similarilty of the corresponding
pixel values Requires number of segments to be specified
![Page 19: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/19.jpg)
IMAGE SEGMENTATION
Segmentation evaluated with respect to a gold standard segmentation Every pair of pixels coming in the same segment in the gold standard
should also be in same segment in the segmentation (and similarly for each pair of pixels coming in different segments)
![Page 20: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/20.jpg)
Video Problems
Videos are collections of images taken over an interval of time- successive images are quite similar
Having to handle several images rather than one may make video problems tougher
But the temporal continuity of videos provides a way out Joint modelling of multiple similar images can, in fact, give better
performance than modelling single image For video tasks, additional motion-based features like optical flow can be
used Concept of Interest-points for images is extended to Space-Time Interest
Points for videos Face Recognition, Face Detection etc can also be done in videos, often
more effectively than in images
![Page 21: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/21.jpg)
OBJECT TRACKING-PROBLEM
Given a video which shows a person/object moving Need to find it in each frame Naive approach- reduce it to object detection problem If object is at position (x, y) in frame t, it will be very close in frame (t + 1) So if we know the position in time t, we need to search only around that
same position Reduces search space greatly!! Main idea is to build an appearance model for the object The appearance may change over time due to variations in size,
illumination, viewpoint etc The appearance model must be adaptive- and recomputed throughout the
video
![Page 22: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/22.jpg)
OBJECT TRACKING- BENCHMARK and EVALUATION
Performance measured with respect to gold standard, where in each frame a bounding box is provided
Proportion of overlapping areas of the gold standard and reported bounding boxes
![Page 23: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/23.jpg)
OBJECT TRACKING-CURRENT STATUS
Considered a solved problem under controlled illumination and background Current research aims to handle occlusion of the object, and sudden
changes in background and illumination Tracking multiple objects at the same time is another important problem Tracking is a real-time application. Efforts are on to process as many
frames as possible per second To adapt or not adapt- remains the fundamental problem in vision. A single miss can make the whole tracking go wrong. Detection and correction of miss is an important problem to solve
![Page 24: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/24.jpg)
ACTION RECOGNITION IN VIDEOS
Surveillance cameras are nowadays available at many sensitive public locations
The aim is to record activities of people Requires use of dynamic features, which make use of the motion in videos Some image-based features can be extended to videos, like space-time
interest points These can be used by viewing the video as a space-time volume The features can also be in the form of time-series
![Page 25: COMPUTER VISION: SOME CLASSICAL PROBLEMS](https://reader035.fdocuments.in/reader035/viewer/2022081421/568167af550346895ddcffbb/html5/thumbnails/25.jpg)
ACTION RECOGNITION IN VIDEOS
In presenece of a benign background, static camera and a single actor, the problem is considered solved
Current research aims to handle complex environments, like crowded places, where the persons frequently get occluded
Multi-person interaction recognition is another recent branchout of the problem