Kinect krishna kumar-itkan

89
Where you are the controller

description

Kinect presentation Krishna Kumar 8/11/11

Transcript of Kinect krishna kumar-itkan

Page 1: Kinect krishna kumar-itkan

Where you are the controller

Page 2: Kinect krishna kumar-itkan

Krishna Kumar, Sr. Developer Evangelist - [email protected]

Page 3: Kinect krishna kumar-itkan

Started as a $30,000 prototype

Vision: Shift the world from thinking“We need to understand technology” to "Technology needs to understand us"

Page 4: Kinect krishna kumar-itkan

Option A:

Why Kinect?

Page 5: Kinect krishna kumar-itkan

Why Kinect?

Option You:

Page 6: Kinect krishna kumar-itkan

What is Kinect?

Page 7: Kinect krishna kumar-itkan

What is Kinect?

An extraordinary new way to play, where you are the controller

Voice Recognition

Face Recognition

You Recognition

Gesture Recognition

Page 8: Kinect krishna kumar-itkan

“Xbox?!”

Kinect knows what to do!

“Let’s Play!”

Page 9: Kinect krishna kumar-itkan

“What are those things?”

③②

Page 10: Kinect krishna kumar-itkan

“What are those things?”

3D Depth Sensors① ③

Page 11: Kinect krishna kumar-itkan

Projected Invisible IR pattern

11

Page 12: Kinect krishna kumar-itkan

Depth Computation

Page 13: Kinect krishna kumar-itkan

Depth Map

Page 14: Kinect krishna kumar-itkan

“What are those things?”

RGB Camera②

Page 15: Kinect krishna kumar-itkan

“What are those things?”

Multi-array Microphone

Page 16: Kinect krishna kumar-itkan

“What are those things?”

Motorized Tilt

Page 17: Kinect krishna kumar-itkan

Combination of RGB camera, depth sensor and multi-array microphone RBG camera delivers three basic color components Depth sensors “sees” the room in 3-D Microphone locates voices by sound and extracts ambient

noise

Software makes all the magic possible Skeletal Tracking Face, Gesture Recognition Audio Echo cancellation Audio Beam Forming Speech Recognition

Page 18: Kinect krishna kumar-itkan
Page 19: Kinect krishna kumar-itkan

19© 2010 Microsoft Corporation. All rights reserved.

Scope of Microsoft Research

• Significant Investment• Investing > $9B in R&D (MSR & product dev)

• Staff of over 850 in 55 research areas

• International Research lab locations : • Redmond, Washington (Sept, 1991)• San Francisco, California (1995)• Cambridge, United Kingdom (July, 1997)• Beijing, People’s Republic of China (Nov, 1998)• Mountain View, California (July, 2001)• Bangalore, India (January, 2005)• Cambridge, Massachusetts (February, 2008)

Turning ideas into reality.

research.microsoft.com

Page 20: Kinect krishna kumar-itkan

20© 2010 Microsoft Corporation. All rights reserved.

Scope of Microsoft ResearchResearch Areas

research.microsoft.com

Page 21: Kinect krishna kumar-itkan

“Xbox?!” “Let’s

Play!”

How does Kinect know what I do?

Page 22: Kinect krishna kumar-itkan

J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation. European Conference on Computer Vision, 2006

Microsoft Research: Object Recognition

Page 23: Kinect krishna kumar-itkan

Microsoft Research: Human Body Tracking Wide range of

motion But limited agility And not real-time Infinite number of

movements

R Navaratnam, A Fitzgibbon, R Cipolla The Joint Manifold Model for Semi-supervised Multi-valued RegressionIEEE Intl Conf on Computer Vision, 2007

Page 24: Kinect krishna kumar-itkan

XBox calls MSR: September 2008“We need a body tracker with

All body motions…All agilities…10x Real-time…For multiple players…… and it has to be 3D ”

MSR’s response?

Page 25: Kinect krishna kumar-itkan

Teach the Computer/Machine LearningStep 1: Collect A LOT of Data

Teams visit households across the globe, filming real users

Hollywood motion capture studio generates billions of CG images

Page 26: Kinect krishna kumar-itkan

Training Data

Page 27: Kinect krishna kumar-itkan

Training

Millions of training images -> millions of classifier parametersVery far from “embarrassingly

parallel”New algorithm for distributed

decision-tree trainingMajor use of DryadLINQ

available for downloadDistributed Data-Parallel Computing Using a High-Level Programming LanguageM Isard, Y YuInternational Conference on Management of Data (SIGMOD), July 2009

Page 28: Kinect krishna kumar-itkan

t=1 t=2 t=3

Recognize Joint Angles Classify each pixel’s

probability of being each of 32 body parts

Determine probabilistic cluster of body configurations consistent with those parts

Present the most probable to the user

Page 29: Kinect krishna kumar-itkan

Programmers View

Page 30: Kinect krishna kumar-itkan

Programmers View

Page 31: Kinect krishna kumar-itkan

A Platform is Born

Page 32: Kinect krishna kumar-itkan

Consumer Technologies Push The Envelope

32

Price: $6000

Price: $150

Page 33: Kinect krishna kumar-itkan

Play Space

Field of View and Operational Area

• Play Space: Ideally need 12ft x 12ft of play space though you can make do with 10ft x 10ft

• Player Position: Ideally is 6-10 feet away from camera

Page 34: Kinect krishna kumar-itkan

Lighting and Environment

• Fluorescent or LED lighting are recommended• No direct light on player• No direct light into sensor lens• In a stage environment, all lights need to be

Infrared-filtered• To avoid lighting noise do not intersect sensor lens

fields of view• Avoid playing in/next to reflective surfaces

Page 35: Kinect krishna kumar-itkan

Clothing Considerations

• Avoid anything that conceals your arms or legs

• Avoid wearing flowing clothing such as scarves or long dresses and skirts– Long skirts hide the legs and scarves are often

mistaken for arms

• Avoid baggy jackets or overly baggy clothing• Generally, anything that hides the human form

should be removed for optimal game play• If players with long hair are having difficulty

playing, encourage them to pull their hair back and try playing again

Page 36: Kinect krishna kumar-itkan

Kinect with more than just games Use your voice or a wave of your

hand to:Video Kinect with others*Manage your media gallery

Music with Last.fm*HD movies with Zune

Get in the game with ESPN*

* with Xbox LIVE Gold membership

Page 37: Kinect krishna kumar-itkan

XBOX LIVEMore Ways to Connect with Family and Friends

VIDEO KINECTVIDEO KINECT FAMILY CENTERFAMILY CENTER SOCIAL NETWORKSSOCIAL NETWORKS

• Connect with family and far away friends, all from the comfort of your living room with Xbox LIVE Video Chat

• Experience the ease and convenience of chat on the big screen with Kinect-enabled auto camera zoom and pan.

• Connect with family and far away friends, all from the comfort of your living room with Xbox LIVE Video Chat

• Experience the ease and convenience of chat on the big screen with Kinect-enabled auto camera zoom and pan.

• Family Center makes it easy to manage multiple user accounts and edit privacy settings from a single location

• Ensure safe, secure fun for the whole family

• Family Center makes it easy to manage multiple user accounts and edit privacy settings from a single location

• Ensure safe, secure fun for the whole family

• Connect with friends, share photos and updates through Facebook and Twitter

• Connect with friends, share photos and updates through Facebook and Twitter

Page 38: Kinect krishna kumar-itkan
Page 39: Kinect krishna kumar-itkan
Page 40: Kinect krishna kumar-itkan
Page 41: Kinect krishna kumar-itkan
Page 42: Kinect krishna kumar-itkan
Page 43: Kinect krishna kumar-itkan

ESPN Home-field advantage in your living room Access over 3,500 live global events from

ESPN3.com, including out-of-market programming plus fresh video clips from ESPN.com

Enjoy features like HD programming and on-demand viewing, participate in polls, predictions and trivia.

See what the Xbox LIVE community is watching and declare what team you’re rooting for

With Kinect™ control the action right from your couch with just your voice or the wave of your hand

Featured Content: NCAA Football, NCAA Basketball, College Bowl Games,

NBA, MLB, Soccer, Golf and Tennis majors

Page 44: Kinect krishna kumar-itkan
Page 45: Kinect krishna kumar-itkan
Page 46: Kinect krishna kumar-itkan
Page 47: Kinect krishna kumar-itkan
Page 48: Kinect krishna kumar-itkan

Where can Kinect go?

Air Guitar Hero?Shopping in 3D?Remote Replacement?Dance Instructor?Education?Personal Trainer?Physical Therapy?

“Xbox?”

Page 49: Kinect krishna kumar-itkan
Page 50: Kinect krishna kumar-itkan
Page 51: Kinect krishna kumar-itkan

The Kinect SDK

Provides both Unmanaged and Managed APIUnmanaged API – Concepts work in C++Managed API – Concepts work in both VB/C#

Samples & documentation to get you startedAssumes some programming experiencehttp://research.microsoft.com/kinectsdk/

Page 52: Kinect krishna kumar-itkan

The Kinect Sensor

A hybrid device containing the following input devices: A color (RGB) camera A depth sensor A microphone array A tilt sensor

Play space control is done through a tilt motor Pitch +/- 27 degrees

Page 53: Kinect krishna kumar-itkan

RGB CAMERA

MULTI-ARRAY MIC MOTORIZED TILT

3D DEPTH SENSORS

Page 54: Kinect krishna kumar-itkan

Kinect USB cable

Page 55: Kinect krishna kumar-itkan

The Innards

55

Page 56: Kinect krishna kumar-itkan

The Vision System

IR laser projector

IR camera

RGB camera

Page 57: Kinect krishna kumar-itkan

Kinect video output30 HZ frame rate; 57deg field-of-view

8-bit VGA RGB640 x 480

12-bit monochrome320 x 240

57

Page 58: Kinect krishna kumar-itkan

The Audio System

Page 59: Kinect krishna kumar-itkan

Input Stream(What the mic array hears)

Post-MEC(What APIs present)

MEC

Demo: Multichannel Echo Cancellation

Page 60: Kinect krishna kumar-itkan

The Kinect SDK

Provides access to:RGB feedDepth feedSkeletal Tracking capabilitiesAudio Beam dataSpeech Recognition

Page 61: Kinect krishna kumar-itkan

Data Streams• Color stream at 640x480 resolution; 32BPP• Depth stream at 320 x 240 resolution;

16BPP• Skeletal Joint positions• Frame #s, TimeStamps, Tilt sensor data• Echo-canceled audio• Higher level systems– Speech recognition

Page 62: Kinect krishna kumar-itkan

RGB Camera Fundamentals

Page 63: Kinect krishna kumar-itkan

Camera Data

Page 64: Kinect krishna kumar-itkan

RGB stream Format• Upto 640 x 480 resolution• Upto 32 bits per pixel • Data contained in ImageFrame.Image.Bits• Array of bytes public byte[] Bits;• Array– Starts at top left of image– Moves left to right, then top to bottom

Page 65: Kinect krishna kumar-itkan

Stride

Stride - # of bytes from one row of pixels in memory to the next

Page 66: Kinect krishna kumar-itkan

Demos::RGB Camera

Page 67: Kinect krishna kumar-itkan

Depth Camera Fundamentals

Page 68: Kinect krishna kumar-itkan

Camera Data

Page 69: Kinect krishna kumar-itkan

Depth Map Format• 320 x 240 resolution• 16 bits per pixel

– Upper 13 bits: depth in mm: 800 mm to 4000 mm range– Lower 3 bits: segmentation mask

• Depth value 0 means unknown– Shadows, low reflectivity, and high reflectivity among the few reasons

• Segmentation index– 0 – no player– 1 – skeleton 0– 2 – skeleton 1– …

Page 70: Kinect krishna kumar-itkan

Depth Byte Buffer

ImageFrame.Image.BitsArray of bytes public byte[] Bits;Array

Starts at top left of imageMoves left to right, then top to bottomRepresents distance for pixel

Page 71: Kinect krishna kumar-itkan

Calculating Distance2 bytes per pixel (16 bits)Depth – Distance per pixel

Bitshift second byte by 8 Distance (0,0) = (int)(Bits[0] | Bits[1] << 8);

DepthAndPlayer Index – Includes Player indexBitshift by 3 first byte (player index), 5 second byte Distance (0,0) =(int)(Bits[0] >> 3 | Bits[1] << 5);

Page 72: Kinect krishna kumar-itkan

Demos::Depth Camera

Page 73: Kinect krishna kumar-itkan

Skeletal Tracking Fundamentals

Page 74: Kinect krishna kumar-itkan

Human Depth SensingObject pattern similarity determines disparity

Page 75: Kinect krishna kumar-itkan

Kinect Depth SensingIR pattern similarity determines disparity

IR Projector

IR Camera

Page 76: Kinect krishna kumar-itkan

Provided Data

Page 77: Kinect krishna kumar-itkan

Pipeline Architecture

Title Space

Page 78: Kinect krishna kumar-itkan

Skeleton API

Page 79: Kinect krishna kumar-itkan

Joints • Maximum two players tracked at once

– Six player proposals

• Each player with set of <x, y, z> joints in meters• Each joint has associated state

– Tracked, Not tracked, or Inferred

• Inferred - Occluded, clipped, or low confidence joints• Not Tracked - Rare, but your code must check for this state

Page 80: Kinect krishna kumar-itkan

Provided DataDepth and segmentation map

Page 81: Kinect krishna kumar-itkan

Depth Map Format• 320 x 240 resolution• 16 bits per pixel

– Upper 13 bits: depth in mm: 800 mm to 4000 mm range– Lower 3 bits: segmentation mask

• Depth value 0 means unknown– Shadows, low reflectivity, and high reflectivity among the few reasons

• Segmentation index– 0 – no player– 1 – skeleton 0– 2 – skeleton 1– …

Page 82: Kinect krishna kumar-itkan

Demos::Skeletal Tracking

Page 83: Kinect krishna kumar-itkan

Audio Fundamentals

Page 84: Kinect krishna kumar-itkan

Going Inside the Kinect• Four microphone array

with hardware-basedaudio processing– Multichannel echo cancellation (MEC)– Sound position tracking– Other digital signal processing (noise

suppression and reduction)

Page 85: Kinect krishna kumar-itkan

Audio Data

Page 86: Kinect krishna kumar-itkan

Speech Recognition

Grammar – What we are listening forCode – GrammarBuilder, ChoicesSpeech Recognition Grammar

Specification (SRGS)C:\Program Files (x86)\Microsoft Speech

Platform SDK\Samples\Sample Grammars\

Note: Set AutomaticGainControl = false

Page 87: Kinect krishna kumar-itkan

Grammar<!-- Confirmation_YesNo._value: string ["Yes", "No"] --><rule id="Confirmation_YesNo" scope="public"> <example> yes </example> <example> no </example> <one-of> <item> <ruleref uri="#Confirmation_Yes" /> </item> <item> <ruleref uri="#Confirmation_No" /> </item> </one-of> <tag> out = rules.latest() </tag></rule></rule>

<!-- Confirmation_Yes._value: string ["Yes"] --><rule id="Confirmation_Yes" scope="public"> <example> yes </example> <example> yes please </example> <one-of> <item> yes </item> <item> yeah </item> <item> yep </item> <item> ok </item> </one-of> <item repeat="0-1"> please </item> <tag> out._value = "Yes";</tag>

Page 88: Kinect krishna kumar-itkan

Demos::Audio