Zhengyou Zhang Microsoft Research Digital Object Identifier: 10.1109/MMUL.2012.24 Publication Year:...

16
Zhengyou Zhang Microsoft Research Digital Object Identifier: 10.1109/MMUL.2012.24 Publication Year: 2012 , Page(s): 4 - 10 Professor: Yih-Ran Sheu Student : Chien-Lin Wu (MA220301)

Transcript of Zhengyou Zhang Microsoft Research Digital Object Identifier: 10.1109/MMUL.2012.24 Publication Year:...

Page 1: Zhengyou Zhang Microsoft Research Digital Object Identifier: 10.1109/MMUL.2012.24 Publication Year: 2012, Page(s): 4 - 10 Professor: Yih-Ran Sheu Student.

Zhengyou ZhangMicrosoft Research

Digital Object Identifier: 10.1109/MMUL.2012.24 Publication Year: 2012 , Page(s): 4 - 10 

Professor: Yih-Ran SheuStudent : Chien-Lin Wu (MA220301)

Page 2: Zhengyou Zhang Microsoft Research Digital Object Identifier: 10.1109/MMUL.2012.24 Publication Year: 2012, Page(s): 4 - 10 Professor: Yih-Ran Sheu Student.

AbstractIntroduction 1-Kinect Sensor 2-Kinect Skeletal Tracking 3-Head-Pose and Facial-Expression TrackingConclusion References

Page 3: Zhengyou Zhang Microsoft Research Digital Object Identifier: 10.1109/MMUL.2012.24 Publication Year: 2012, Page(s): 4 - 10 Professor: Yih-Ran Sheu Student.

Kinect’s impact has extended far beyond the gaming industry. With its wide availability and low cost, many researchers and practitioners in computer science, electronic engineering, and robotics are leveraging the sensing technology to develop creative new ways to interact with machines and to perform other tasks, from helping children with autism to assisting doctors in operating rooms.

Page 4: Zhengyou Zhang Microsoft Research Digital Object Identifier: 10.1109/MMUL.2012.24 Publication Year: 2012, Page(s): 4 - 10 Professor: Yih-Ran Sheu Student.

The Kinect sensor incorporates severaladvanced sensing hardware. Most notably, itcontains a depth sensor, a color camera, and afour-microphone array that provide full-body3D motion capture, facial recognition, andvoice recognition capabilities.

Page 5: Zhengyou Zhang Microsoft Research Digital Object Identifier: 10.1109/MMUL.2012.24 Publication Year: 2012, Page(s): 4 - 10 Professor: Yih-Ran Sheu Student.

Infraredprojector

RGBcamera

Infraredcamera

Page 6: Zhengyou Zhang Microsoft Research Digital Object Identifier: 10.1109/MMUL.2012.24 Publication Year: 2012, Page(s): 4 - 10 Professor: Yih-Ran Sheu Student.

The infrared (IR) dots seen by the IRcamera. The image on the left shows a close-upof the red boxed area.

Page 7: Zhengyou Zhang Microsoft Research Digital Object Identifier: 10.1109/MMUL.2012.24 Publication Year: 2012, Page(s): 4 - 10 Professor: Yih-Ran Sheu Student.

Kinect sensor depth image. The sensorproduced this depth image from the infrared (IR)dot image.

Page 8: Zhengyou Zhang Microsoft Research Digital Object Identifier: 10.1109/MMUL.2012.24 Publication Year: 2012, Page(s): 4 - 10 Professor: Yih-Ran Sheu Student.

Kinect calibration card. To recalibrate the Kinect sensor, the RGB camera’s coordinate system determines the 3D coordinates of the feature points on the calibration card, which are considered to be the true values.

Page 9: Zhengyou Zhang Microsoft Research Digital Object Identifier: 10.1109/MMUL.2012.24 Publication Year: 2012, Page(s): 4 - 10 Professor: Yih-Ran Sheu Student.

In skeletal tracking, a human body is represented by a number of joints representing body parts such as head, neck, shoulders, and arms). Each joint is represented by its 3D coordinates. The goal is to determine all the 3D parameters of these joints in real time to allow fluent interactivity and with limited computation resources allocated on the Xbox 360 so as not to impact gaming performance. Rather than trying to determine directly the body pose in this high-dimensional space.

Page 10: Zhengyou Zhang Microsoft Research Digital Object Identifier: 10.1109/MMUL.2012.24 Publication Year: 2012, Page(s): 4 - 10 Professor: Yih-Ran Sheu Student.

(a) Using a skeletal representation of various body parts, (b) Kinect usesper-pixel, body-part recognition as an intermediate step to avoid a combinatorial search over thedifferent body joints.

Page 11: Zhengyou Zhang Microsoft Research Digital Object Identifier: 10.1109/MMUL.2012.24 Publication Year: 2012, Page(s): 4 - 10 Professor: Yih-Ran Sheu Student.

The Kinect skeletal tracking pipeline.

Page 12: Zhengyou Zhang Microsoft Research Digital Object Identifier: 10.1109/MMUL.2012.24 Publication Year: 2012, Page(s): 4 - 10 Professor: Yih-Ran Sheu Student.

Head-pose and facial-expression tracking hasbeen an active research area in computer visionfor several decades. It has many applicationsincluding human-computer interaction,performance-driven facial animation, andface recognition. Most previous approachesfocus on 2D images, so they must exploitsome appearance and shape models becausethere are few distinct facial features. Theymight still suffer from lighting and texturevariations, occlusion of profile poses, and so forth.

Page 13: Zhengyou Zhang Microsoft Research Digital Object Identifier: 10.1109/MMUL.2012.24 Publication Year: 2012, Page(s): 4 - 10 Professor: Yih-Ran Sheu Student.

An example of a human face captured by the Kinect sensor.(a)Video frame(texture)(b) Depth image(c) close up of the facial surface.

Page 14: Zhengyou Zhang Microsoft Research Digital Object Identifier: 10.1109/MMUL.2012.24 Publication Year: 2012, Page(s): 4 - 10 Professor: Yih-Ran Sheu Student.

Facial expression tracking.These sample images show the results of Kinect tracking 2D feature points in video frames using a projected face mesh overlay.

Page 15: Zhengyou Zhang Microsoft Research Digital Object Identifier: 10.1109/MMUL.2012.24 Publication Year: 2012, Page(s): 4 - 10 Professor: Yih-Ran Sheu Student.

The Kinect sensor offers an unlimited numberof opportunities for old and new applications.This article only gives a taste of what is possible.Thus far, additional research areas includehand-gesture recognition, human-activity recognition,body biometrics estimation (such asweight, gender, or height), 3D surface reconstruction, and healthcare applications.

Page 16: Zhengyou Zhang Microsoft Research Digital Object Identifier: 10.1109/MMUL.2012.24 Publication Year: 2012, Page(s): 4 - 10 Professor: Yih-Ran Sheu Student.

1. Z. Ren, J. Yuan, and Z. Zhang, ‘‘Robust Hand Gesture Recognition Based on Finger-Earth Movers Distance with a Commodity Depth Camera,‘‘Proc. 19th ACM Int’l Conf. Multimedia (ACM MM), ACM Press, 2011, pp. 10931096.

2.W. Li, Z. Zhang, and Z. Liu, ‘‘Action Recognition Based on A Bag of 3D Points,‘‘ Proc. IEEE Int’l Workshop on CVPR for Human Communicative BehaviorAnalysis (CVPR4HB), IEEE CS Press, 2010, pp. 914.

3.C. Velardo and J.-L. Dugelay, ‘‘Real Time Extractionof Body Soft Biometric from 3D Videos,‘‘Proc. ACM Int’l Conf. Multimedia (ACM MM),ACM Press, 2011, pp. 781782.

4.S. Izadi et al., ‘‘KinectFusion: Real-Time Dynamic 3D Surface Reconstruction and Interaction,‘‘ Proc. ACM SIGGRAPH, 2011.