EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao
-
Upload
esmond-edwards -
Category
Documents
-
view
221 -
download
0
description
Transcript of EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao
![Page 1: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao](https://reader036.fdocuments.in/reader036/viewer/2022062317/5a4d1b137f8b9ab05999071b/html5/thumbnails/1.jpg)
EEC-693/793EEC-693/793Applied Computer Vision Applied Computer Vision
with Depth Cameraswith Depth Cameras
Lecture 8Lecture 8
Wenbing ZhaoWenbing [email protected]@ieee.org
![Page 2: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao](https://reader036.fdocuments.in/reader036/viewer/2022062317/5a4d1b137f8b9ab05999071b/html5/thumbnails/2.jpg)
OutlineOutline Human skeleton tracking
![Page 3: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao](https://reader036.fdocuments.in/reader036/viewer/2022062317/5a4d1b137f8b9ab05999071b/html5/thumbnails/3.jpg)
Skeleton Tracking Real-Time Human Pose Recognition in Parts from Single Depth
Images, by J. Shotton et al at Microsoft Research Cambridge & Xbox incubation http://research.microsoft.com/apps/pubs/default.aspx?id=145347
Real-time human pose recognition is difficult and challenging because of the different body poses, sizes, dresses, heights and so on
Kinect uses a rendering pipeline where it matches the incoming data (raw depth data from Kinect) with sample trained data The machine learned data is collected from the base characters with different
types of poses, hair types, and clothing, and in different rotations and views The machine learned data is labeled with individual body parts and matched with
the incoming depth data to identify which part of the body it belongs to The rendering pipeline processes the data in several steps to track human body
parts from depth data
![Page 4: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao](https://reader036.fdocuments.in/reader036/viewer/2022062317/5a4d1b137f8b9ab05999071b/html5/thumbnails/4.jpg)
The Rendering Pipeline Processes From depth image, we can easily identify the
human body object In the absence of any other logic, the sensor will not
know if this is a human body or something else To start recognizing a human body, we match
each individual pixel of incoming depth data with the data the machine has learned
The data each individual machine has learned is labeled and has some associated values to match with incoming data
matching is based on the probability that the incoming data matches with the data the machine has learned
![Page 5: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao](https://reader036.fdocuments.in/reader036/viewer/2022062317/5a4d1b137f8b9ab05999071b/html5/thumbnails/5.jpg)
The Rendering Pipeline Processes The next step is to label the body parts by creating
segments Kinect uses a trained tree structure (known as a decision
tree) to match the data for a specific type of human body Eventually, every single pixel data passes through this tree
to match with body parts Once the different body parts are identified, the
sensor positions the joint points with the highest probable matched data
With identified joint points and the movement of those joints, the sensor can track the movement of the complete body
![Page 6: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao](https://reader036.fdocuments.in/reader036/viewer/2022062317/5a4d1b137f8b9ab05999071b/html5/thumbnails/6.jpg)
The Rendering Pipeline Processes The joint positions are measured by
three coordinates (x,y,z) X and y define the position of the joint Y represents the distance from the
sensor To get the proper coordinates, the
sensor calculates the three views of the same image: front, left, and top views => define 3D body proposal
![Page 7: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao](https://reader036.fdocuments.in/reader036/viewer/2022062317/5a4d1b137f8b9ab05999071b/html5/thumbnails/7.jpg)
Skeleton Tracking The Kinect for Windows SDK provides
us with a set of APIs that allow easy access to the skeleton joints
The SDK supports the tracking of up to 20 joint points
Tracking state: Tracked, Not Tracked, or Position Only
Tracking modes: default and seated Default mode: detects the user based
on the distance of the subject from the background
Seated mode: uses movement to detect the user and distinguish him or her from the background, such as a couch or chair
![Page 8: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao](https://reader036.fdocuments.in/reader036/viewer/2022062317/5a4d1b137f8b9ab05999071b/html5/thumbnails/8.jpg)
Skeleton Tracking Kinect can fully track up to two
users It can detect up to 6 users (4 of
them with position only)
![Page 9: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao](https://reader036.fdocuments.in/reader036/viewer/2022062317/5a4d1b137f8b9ab05999071b/html5/thumbnails/9.jpg)
Skeleton Tracking Seated skeleton: up to 10 joints The seated pipeline provides a
different segmentation mask than the default pipeline: Continuity of the segmentation mask is not
guaranteed outside of the arms, head, and shoulder areas
The seated segmentation mask doesn't correspond exactly to the player outline like the standing (full-body) mask does
The seated pipeline environment has less data, with more noise and variability than the standing environment
The seated mode uses more resources than the default pipeline and yields a lower throughput (in frames per second) on the same scene
kinect.SkeletonStream.TrackingMode = SkeletonTrackingMode.Seated;
![Page 10: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao](https://reader036.fdocuments.in/reader036/viewer/2022062317/5a4d1b137f8b9ab05999071b/html5/thumbnails/10.jpg)
Capturing and Processing Sekelton Data Enable the skeleton stream channel with the type of depth
image format
Attach the event handler to the skeleton stream channel
Process the incoming skeleton frames
Render a joint on UI
this.sensor = KinectSensor.KinectSensors[0];this.sensor.SkeletonStream.Enable();
this.sensor.SkeletonFrameReady += skeletonFrameReady;
void skeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e){}
![Page 11: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao](https://reader036.fdocuments.in/reader036/viewer/2022062317/5a4d1b137f8b9ab05999071b/html5/thumbnails/11.jpg)
Processing Skeleton Datavoid skeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e) { using (SkeletonFrame skeletonFrame = e.OpenSkeletonFrame()) { if (skeletonFrame == null) {
return; } skeletonFrame.CopySkeletonDataTo(totalSkeleton); Skeleton firstSkeleton = (from trackskeleton in totalSkeleton where trackskeleton.TrackingState == SkeletonTrackingState.Tracked select trackskeleton).FirstOrDefault(); if (firstSkeleton == null) {
return; } if (firstSkeleton.Joints[JointType.HandRight].TrackingState ==
JointTrackingState.Tracked) {this.MapJointsWithUIElement(firstSkeleton);
} }}
Skeleton[] totalSkeleton = new Skeleton[6];
![Page 12: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao](https://reader036.fdocuments.in/reader036/viewer/2022062317/5a4d1b137f8b9ab05999071b/html5/thumbnails/12.jpg)
Render the Right-Hand Joint on UI
We have to map the coordinate from the skeleton space to regular image space
![Page 13: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao](https://reader036.fdocuments.in/reader036/viewer/2022062317/5a4d1b137f8b9ab05999071b/html5/thumbnails/13.jpg)
Render the Right-Hand Joint on UI
depthPoint will return the X and Y points corresponding to the skeleton joint point
private Point ScalePosition(SkeletonPoint skeletonPoint){ DepthImagePoint depthPoint = this.sensor.CoordinateMapper. MapSkeletonPointToDepthPoint(skeletonPoint, DepthImageFormat. Resolution640x480Fps30); return new Point(depthPoint.X, depthPoint.Y);}
private void MapJointsWithUIElement(Skeleton skeleton){ Point mappedPoint = ScalePosition(skeleton.Joints[JointType.HandRight].Position); Canvas.SetLeft(righthand, mappedPoint.X); Canvas.SetTop(righthand, mappedPoint.Y);}
![Page 14: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao](https://reader036.fdocuments.in/reader036/viewer/2022062317/5a4d1b137f8b9ab05999071b/html5/thumbnails/14.jpg)
Build TrackingHand App Create a new C# WPF project with name TrackingHand Add Microsoft.Kinect reference Design GUI Added WindowLoaded() method in xaml file Adding code
![Page 15: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao](https://reader036.fdocuments.in/reader036/viewer/2022062317/5a4d1b137f8b9ab05999071b/html5/thumbnails/15.jpg)
GUI Design Canvas control, then add Ellipse control in Canvas
![Page 16: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao](https://reader036.fdocuments.in/reader036/viewer/2022062317/5a4d1b137f8b9ab05999071b/html5/thumbnails/16.jpg)
Adding Code Add member variables:
WindowLoade method (WindowClosing() same as before):KinectSensor sensor;Skeleton[] totalSkeleton = new Skeleton[6];
private void WindowLoaded(object sender, RoutedEventArgs e){ this.sensor = KinectSensor.KinectSensors[0]; this.sensor.SkeletonStream.TrackingMode = SkeletonTrackingMode.Seated; this.sensor.SkeletonStream.Enable(); this.sensor.SkeletonFrameReady += skeletonFrameReady; // start the sensor. this.sensor.Start();}
![Page 17: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao](https://reader036.fdocuments.in/reader036/viewer/2022062317/5a4d1b137f8b9ab05999071b/html5/thumbnails/17.jpg)
Adding Code Event handler for skeleton frames:
void skeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e) { using (SkeletonFrame skeletonFrame = e.OpenSkeletonFrame()) { if (skeletonFrame == null) {
return; } skeletonFrame.CopySkeletonDataTo(totalSkeleton); Skeleton firstSkeleton = (from trackskeleton in totalSkeleton where trackskeleton.TrackingState == SkeletonTrackingState.Tracked select trackskeleton).FirstOrDefault(); if (firstSkeleton == null) {
return; } if (firstSkeleton.Joints[JointType.HandRight].TrackingState == JointTrackingState.Tracked) { this.MapJointsWithUIElement(firstSkeleton); } }}
![Page 18: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao](https://reader036.fdocuments.in/reader036/viewer/2022062317/5a4d1b137f8b9ab05999071b/html5/thumbnails/18.jpg)
Adding Code For UI display
private void MapJointsWithUIElement(Skeleton skeleton) { Point mappedPoint = ScalePosition(skeleton.Joints[JointType.HandRight].Position); Canvas.SetLeft(righthand, mappedPoint.X); Canvas.SetTop(righthand, mappedPoint.Y); //this.textBox1.Text = "x="+mappedPoint.X+", y="+mappedPoint.Y;}
private Point ScalePosition(SkeletonPoint skeletonPoint){ DepthImagePoint depthPoint = this.sensor.CoordinateMapper. MapSkeletonPointToDepthPoint(skeletonPoint, DepthImageFormat. Resolution640x480Fps30); return new Point(depthPoint.X, depthPoint.Y);}
![Page 19: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao](https://reader036.fdocuments.in/reader036/viewer/2022062317/5a4d1b137f8b9ab05999071b/html5/thumbnails/19.jpg)
Challenge Task
For advanced students, please modify the project to make it a drawing app Shows all traces of the hand movement Add button to clear traces to make a new drawing Add a small palette chooser for change the color
of the drawing point (an Ellipse)
05/03/23EEC492/693/793 - iPhone Application
Development 19