CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google...
Transcript of CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google...
![Page 1: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/1.jpg)
CAP6412AdvancedComputerVision
http://www.cs.ucf.edu/~bgong/CAP6412.html
Boqing GongApril 21st,2016
![Page 2: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/2.jpg)
Today
• Administrivia• Freeparametersinanapproach,model,oralgorithm?• Egocentricvideos byAisha
![Page 3: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/3.jpg)
ProjectIIdue:nextWednesday(04/27,5PM)
• FinalProjectPresentation:04/28,1—3:50PM
• Latesubmissions:https://docs.google.com/spreadsheets/d/1uNPfUsdnw5xfzIV-PrQTo9xTWKfv7s-OPyuV_zZw9Fc/edit?usp=sharing
![Page 4: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/4.jpg)
Today
• Administrivia• Freeparametersinanapproach,model,oralgorithm?• Egocentricvideos byAisha
![Page 5: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/5.jpg)
Freeparameters(hyper-parameters)
• InProject2,whenyoutraintheCNNs• Learningrate,momentum,weightdecay,dropoutrate,earlystopping,etc.• Networkarchitecture,nonlinearfunctions,strides,etc.
• InLinearregression
• InSVM
minw
MX
m=1
(ym � x
Tmw) + �kwk22
minw,⇠m,m=1,··· ,M
MX
m=1
⇠m + �kwk22
s.t. ym(xTmw) � 1� ⇠m,& ⇠m � 0 8m
![Page 6: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/6.jpg)
Freeparameters(hyper-parameters)
• InK-meansclustering:K,thenumberofclusters• InK-Nearestneighborsclassifier: K,thenumberofneighbors• InCannyedgedetection• Gaussianfilter,thresholds
• InR-CNN• Thresholdofselectivesearch• #Layers,filtersize,stride,wheremaxpooling• Paddingornot,learningrate,momentum,weightdecay,#iterations• Trade-offparameter• Featureselectionforregression• Batchsize
![Page 7: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/7.jpg)
Freeparameters(hyper-parameters)
• Freeparameters vs.Modelparameters
• Oftenseekmodelparametersbyoptimization• Gradientdescent(GD),coordinatedescent,Newton,stochasticGD,etc.
• Howtochoosethefreeparameters?
minw
MX
m=1
(ym � x
Tmw) + �kwk22
![Page 8: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/8.jpg)
Howtochoosethefreeparameters
• Smallesterrorrateon• Testset?• Validationset?
• Smallestexpectederrorrate ontheentirepopulation• Inpractice,however,wehaveaccesstoafinitesetofexamples!• Approximatetheexpectederrorrate• Choosefreeparameterswhichminimizetheapproximateerror
• Howtoapproximatetheexpectederror?
![Page 9: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/9.jpg)
Weakapproximationoftheexpectederror!
Rarelyusedinpractice.
![Page 10: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/10.jpg)
![Page 11: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/11.jpg)
Popularforsmalldata.
![Page 12: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/12.jpg)
Popularforsmalldata.
![Page 13: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/13.jpg)
![Page 14: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/14.jpg)
![Page 15: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/15.jpg)
Popularforbigdata.
1. Dividedatatotraining,validation,andtest sets.2. Selectfreeparameters
1. E.g.,networklayers,#hiddenstates,nonlinearfunctions,etc.
3. Trainthemodelusingthetraining set4. Evaluatethemodelusingthevalidation set5. Repeatsteps2—4usingdifferentfreeparametersà
differentmodels6. Selectthebestmodel(andtheirassociatedfree
parameters)7. Trainthemodel(withtheassociatedfree
parameters)usingbothtraining andvalidation sets.8. Assessthisfinalmodelusingthetest set.
Skipstep7forbigdata.
![Page 16: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/16.jpg)
Skipthisstepforbigdata.
![Page 17: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/17.jpg)
Today
• Administrivia• Freeparametersinanapproach,model,oralgorithm?• Egocentricvideos byAisha
![Page 18: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/18.jpg)
Hand detection in Egocentric videos
Aisha Urooj
Course Instructor: Dr. Boqing Gong Advanced Computer Vision
![Page 19: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/19.jpg)
Motivation
• Emergence of new wearable technologies
– Action cameras
– Smart glasses, so on…
• These devices capture videos from first person’s perspective.
• Record user’s experiences
Image Source: [1]
![Page 20: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/20.jpg)
An overview of First Person Vision
![Page 21: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/21.jpg)
Image Credits: [1]
![Page 22: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/22.jpg)
A hierarchical structure, starting from the raw video sequence (bottom) to the desired objectives (top)
Image Credits: [1]
![Page 23: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/23.jpg)
Image Credits: [1]
![Page 24: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/24.jpg)
Image
Credits:
[1]
![Page 25: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/25.jpg)
Image Credits: [1]
![Page 26: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/26.jpg)
Related Datasets [1]
![Page 27: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/27.jpg)
Motivation
• Hands are very common in egocentric videos
• Appearance of hands and pose give important cues about human’s – actions
– attention
– Activity recognition
– user–machine interaction, so on.
• Most of the egocentric computer vision problems, from object detection to activity recognition requires accurate hand detection.
![Page 28: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/28.jpg)
Challenges in hand detection
• Hands are highly deformable objects.
• Occlusion
• Cluttered background
• Dynamic background
• Inconsistent lighting
• Poor imaging conditions
• Highly dynamic camera motion
• So on..
![Page 29: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/29.jpg)
Lending a Hand: Detecting Hands and
Recognizing Activities in Complex Egocentric
Interactions
Sven Bambach, Stefan Lee, David J. Crandall, Chen Yu
Indiana University
![Page 30: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/30.jpg)
Outline
• Paper’s contribution
• Dataset details
• Approach
• Results
• Possible future directions
![Page 31: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/31.jpg)
Paper’s Contributions
• Deep model for hand detection and classification in egocentric video, including fast domain-specific region proposals.
• A new technique for pixel wise hand segmentation.
• A quantitative analysis of how hand location and pose can be useful in accurate activity recognition.
• A large dataset of egocentric interactions with fine grained ground truth.
![Page 32: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/32.jpg)
Overview
Image source: http://vision.soic.indiana.edu/projects/lending-a-hand/
![Page 33: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/33.jpg)
Ground truth hand segmentation masks on sample frames from dataset.
![Page 34: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/34.jpg)
A random
subset of
cropped
hands
according to
ground
Truth.
![Page 35: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/35.jpg)
Dataset details
• 4 participants, 4 activities, 3 different locations (office, home, courtyard)
• Total 48 unique videos.
• Used Google Glass, 720x1280 at 30 fps.
• 2 persons in one video, each wearing google glass. (Synchronized video pairs and cut them to 90 seconds)
• Pixel level ground truth for over 15000 hand instances.
• Manual annotation of 100 frames/ video i.e. 4800 frames ground truth.
• Main Split: 36 training, 4 validation, 8 test videos.
![Page 36: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/36.jpg)
Hand Detection: Approach
• Candidate windows generation
• Window classification using CNNs
![Page 37: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/37.jpg)
Window Proposals Generation
• Probability that an object O appears in a region R of an image I.
• The proposed approach for candidate windows generation combines spatial biases and appearance models together.
![Page 38: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/38.jpg)
Window Proposals Generation (Contd..)
• P (O) : Object occurrence probability
• P(R|O) : Probability that a certain region R (a bounding box) contains a specific hand (O)
• P(I | R, O): A pixel-level skin classifier – Estimates the probability that central pixel of R is
skin.
![Page 39: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/39.jpg)
Coverage Results for Different Proposal Methods
![Page 40: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/40.jpg)
Window classification
• A standard CNN classification framework used.
• CaffeNet from Caffe software package – Slight variation of AlexNet
• Each training batch contains equal number of samples from each class.
• Disabled horizontal and vertical flipping of sample images in Caffe – For differentiating between left and right hands.
![Page 41: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/41.jpg)
Window classification (Contd..)
• The CNN weights are initialized from CaffeNet
– Except final fully connected layer which is set to zero mean gaussian.
• Fine-tuning using SGD
– Learning rate = 0.001
– Momentum = 0.999
Generate Spatially sampled window proposals
Classify window crops Using fine-tuned CNN
Perform non-maximum suppression for each test frame
Input
![Page 42: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/42.jpg)
Hand Detection
• Two cases: – Detect hands of any type
– Detect hand of specific type (own left, your right etc.)
• PASCAL VOC criteria for scoring detections is used – Intersection over Union between the ground truth
and detected bounding box should be > 0.5
![Page 43: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/43.jpg)
Precision-Recall curves for Hand detection
![Page 44: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/44.jpg)
Qualitative Results for Hand Detection
![Page 45: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/45.jpg)
Quantitative Results for Hand Detection
![Page 46: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/46.jpg)
Hands Segmentation
• Pixelwise hand segmentation is useful for: – Hand pose recognition – In-hand object detection, so on..
• Goal: Label each pixel either to the background or to a specific hand class.
• Applied a semi-supervised segmentation algorithm
GrabCut. • Given an approximate foreground mask, GrabCut
iteratively refines foreground and background pixels , relabeling them using Markov Random Field.
![Page 47: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/47.jpg)
Hands Segmentation
• For each hand detected bounding box, initial foreground estimation is computed using same color skin model.
• Thresholded and marked each pixel within the box as foreground except with very low skin probability.
• Run GrabCut algorithm on bounding box including padded region.
• Final segmentation is the union of the output masks for all detected bounding boxes.
![Page 48: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/48.jpg)
Quantitative Results for Hand Segmentation
![Page 49: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/49.jpg)
Two modes of possible failures
• Failure to properly detect hand bounding boxes.
• Inaccuracy in distinguishing hand pixels from background.
• Applying segmentation algorithm on ground truth bounding boxes results in raise to average 0.73
• Taking output of hand detector but using ground truth segmentation masks again increases average to 0.76
![Page 50: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/50.jpg)
Qualitative Results for Hand Segmentation
![Page 51: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/51.jpg)
Hand-based Activity Recognition
• Masked out all other non-hand background information by using ground truth hand segmentations.
• Fine-tuned a CNN to classify whole frames as one of the four activities.
– Training: 900 frames per activity for 36 videos
– Validation: 100 frames per activity for four videos
– Classification accuracy: 66.4% per frame
![Page 52: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/52.jpg)
Hand-based Activity Recognition (contd..)
Incorporating temporal constraints:
• Simple voting based approach
• Classify each individual frame in the context of a fixed-size temporal window centered on the frame
• Scores are summed across the window
• Frame is labeled as the highest scoring class
![Page 53: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/53.jpg)
Hand-based Activity Recognition
![Page 54: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/54.jpg)
Some sample hand poses not present in their dataset
![Page 55: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/55.jpg)
Related work on Egocentric Hands Detection
Work by A. Betancourt, University of Genoa, Italy 1)Hand Segmentation and tracking in FPV 2) A Sequential Classifier for Hand Detection in the Framework of
Egocentric Vision. CVPR 2014 3) The Evolution of First Person Vision Methods: A Survey. Observations:
– Misses detection of hands in many frames for other people. – Results show false positives in many frames. – No detection on hands shown in videos running within a video. – Segmentation is not efficient. – At times both hands are detected as either left or right. – Full arm is being considered as hand.
![Page 56: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/56.jpg)
Possible Future Directions
• Improve segmentation technique
• Have an unbiased dataset
• Use an efficient tracking approach to incorporate temporal information
• Improve hand classifier
![Page 57: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/57.jpg)
References
• [1] The Evolution of First Person Vision Methods:A Survey. A. Betancourt, P. Morerio, C. S. Regazzoni, and M. Rauterberg. IEEE Transactions on Circuits and Systems for Video Technology. Vol 25. Issue 5.
![Page 58: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec28.pdf · •Used Google Glass, 720x1280 at 30 fps. •2 persons in one video, each wearing google glass.](https://reader033.fdocuments.in/reader033/viewer/2022051914/6004e700506eca0b59708a1a/html5/thumbnails/58.jpg)
THANK YOU!