Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of...
-
Upload
neil-daniel -
Category
Documents
-
view
228 -
download
0
Transcript of Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of...
![Page 1: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/1.jpg)
Database-Based Hand Pose Estimation
CSE 6367 – Computer VisionVassilis Athitsos
University of Texas at Arlington
![Page 2: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/2.jpg)
2
Static Gestures (Hand Poses)
Input image Articulated hand model
• Given a hand model, and a single image of a hand, estimate:– 3D hand shape (joint angles).– 3D hand orientation.
Joints
![Page 3: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/3.jpg)
3
Static Gestures
Input image Articulated hand model
• Given a hand model, and a single image of a hand, estimate:– 3D hand shape (joint angles).– 3D hand orientation.
![Page 4: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/4.jpg)
4
Goal: Hand Tracking Initialization
• Given the 3D hand pose in the previous frame, estimate it in the current frame.– Problem: no good way to automatically
initialize a tracker.
Rehg et al. (1995), Heap et al. (1996), Shimada et al. (2001),
Wu et al. (2001), Stenger et al. (2001), Lu et al. (2003), …
![Page 5: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/5.jpg)
5
Assumptions in Our Approach
• A few tens of distinct hand shapes.– All 3D orientations should be allowed.– Motivation: American Sign Language.
![Page 6: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/6.jpg)
6
Assumptions in Our Approach
• A few tens of distinct hand shapes.– All 3D orientations should be allowed.– Motivation: American Sign Language.
• Input: single image, bounding box of hand.
![Page 7: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/7.jpg)
7
Assumptions in Our Approach
• We do not assume precise segmentation!– No clean contour extracted.
input image
skin detection segmented
hand
![Page 8: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/8.jpg)
8
Approach: Database Search
• Over 100,000 computer-generated images.– Known hand pose.
input
![Page 9: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/9.jpg)
9
Why?
• We avoid direct estimation of 3D info.– With a database, we only match 2D to 2D.
• We can find all plausible estimates.– Hand pose is often ambiguous.
input
![Page 10: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/10.jpg)
10
Building the Database
26 hand shapes
![Page 11: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/11.jpg)
11
4128 images are generated for each hand shape.
Total: 107,328 images.
Building the Database
![Page 12: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/12.jpg)
12
Features: Edge Pixels
• We use edge images.– Easy to extract.– Stable under illumination
changes.
input edge image
![Page 13: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/13.jpg)
13
Chamfer Distance
input
model
Overlaying input and model
How far apart are they?
![Page 14: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/14.jpg)
14
• Input: two sets of points.– red, green.
• c(red, green):– Average distance from each
red point to nearest green point.
Directed Chamfer Distance
![Page 15: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/15.jpg)
15
• Input: two sets of points.– red, green.
• c(red, green):– Average distance from each
red point to nearest green point.
• c(green, red):– Average distance from each
red point to nearest green point.
Directed Chamfer Distance
![Page 16: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/16.jpg)
16
• Input: two sets of points.– red, green.
• c(red, green):– Average distance from each red
point to nearest green point.
• c(green, red):– Average distance from each red
point to nearest green point.
Chamfer Distance
Chamfer distance:C(red, green) = c(red, green) + c(green, red)
![Page 17: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/17.jpg)
17
Evaluating Retrieval Accuracy
• A database image is a correct match for the input if:– the hand shapes are the same,– 3D hand orientations differ by at most 30 degrees.
inputcorrect matches
incorrect matches
![Page 18: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/18.jpg)
18
Evaluating Retrieval Accuracy
• An input image has 25-35 correct matches among the 107,328 database images.– Ground truth for input images is estimated by
humans.
inputcorrect matches
incorrect matches
![Page 19: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/19.jpg)
19
Evaluating Retrieval Accuracy
• Retrieval accuracy measure: what is the rank of the highest ranking correct match?
inputcorrect matches
incorrect matches
![Page 20: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/20.jpg)
20
Evaluating Retrieval Accuracyinput
rank 1 rank 2 rank 3 rank 5 rank 6rank 4
…
…
highest rankingcorrect match
![Page 21: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/21.jpg)
21
Results on 703 Real Hand Images
Rank of highest ranking correct match
Percentage of test images
1 15%
1-10 40%
1-100 73%
![Page 22: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/22.jpg)
22
Results on 703 Real Hand Images
Rank of highest ranking correct match
Percentage of test images
1 15%
1-10 40%
1-100 73%
• Results are better on “nicer” images:– Dark background.– Frontal view.– For half the images, top match was correct.
![Page 23: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/23.jpg)
23
Examples
initial image
segmented hand
edge image
correct match
rank: 1
![Page 24: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/24.jpg)
24
Examples
initial image
segmented hand
edge image
correct match
rank: 644
![Page 25: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/25.jpg)
25
Examples
initial image
segmented hand
edge image
incorrect match
rank: 1
![Page 26: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/26.jpg)
26
Examples
initial image
segmented hand
edge image
correct match
rank: 1
![Page 27: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/27.jpg)
27
Examples
initial image
segmented hand
edge image
correct match
rank: 33
![Page 28: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/28.jpg)
28
Examples
initial image
segmented hand
edge image
incorrect match
rank: 1
![Page 29: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/29.jpg)
29
Examples
“hard” case
segmented hand
edge image
“easy” case
segmented hand
edge image
![Page 30: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/30.jpg)
30
Research Directions
• More accurate similarity measures.
• Better tolerance to segmentation errors.– Clutter.– Incorrect scale and
translation.
• Verifying top matches.
• Registration.
![Page 31: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/31.jpg)
31
Efficiency of the Chamfer Distance
• Computing chamfer distances is slow.– For images with d edge pixels, O(d log d) time.– Comparing input to entire database takes over 4 minutes.
• Must measure 107,328 distances.
input model
![Page 32: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/32.jpg)
32
The Nearest Neighbor Problem
database
![Page 33: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/33.jpg)
33
The Nearest Neighbor Problem
query
• Goal: – find the k nearest
neighbors of query q.database
![Page 34: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/34.jpg)
34
The Nearest Neighbor Problem
• Goal: – find the k nearest
neighbors of query q.
• Brute force time is linear to:– n (size of database).– time it takes to measure a
single distance.
query
database
![Page 35: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649cf85503460f949c8f6f/html5/thumbnails/35.jpg)
35
• Goal: – find the k nearest
neighbors of query q.
• Brute force time is linear to:– n (size of database).– time it takes to measure a
single distance.
The Nearest Neighbor Problem
query
database