[1] Ashley, James and - University of Wisconsin...

Microsoft Kinect Development

Richard Isely IIISoftware Engineering

University of Wisconsin [email protected]

Abstract

This paper focuses on the creation of the Kinect and the process in which it works. The main idea for the Kinect was developed in 2005-2006 by the founders of the company PrimeSense. They found a cheap way to build a device to utilize motion tracking and depth sensing. Their reference device went on to the base design for the Kinect. Microsoft started the project in 2007 and it had many issues to overcome, including an unforeseen issue after the release. Developers around the world attempted to create their own drivers and libraries for running the Kinect on platforms other than the Xbox. The success of these developers, as well as the release of libraries by PrimeSense led to the release of Microsoft SDK. This release included all of the libraries that Microsoft used over the hardware created by PrimeSense and allowed commercial development with the Kinect to begin.

What is the Kinect?

The Kinect is a device designed by Microsoft to utilize a person’s body, movement, and voice as a controller for a video game. Microsoft’s original intent was for this device to be the future of video games. The device was designed to work with all Xbox 360 consoles, even consoles made before the release of the Kinect, to be used with video games and the “dashboard” or home screen of the Xbox. Microsoft released a commercial for the Kinect, before its release, that showed users performing the actions to control the avatars in the game, as well as being able to video chat with friends and control all media features of the Xbox by voice or with their hand. This was Microsoft’s attempt to challenge the Wii game console on the motion gaming front. The components used in the Kinect to accomplish this have made the Kinect a bigger hit than Microsoft could have imagined, but it hasn’t been in the form that was expected. [4]

History of the Kinect

The history of the Kinect starts with the release of the Nintendo Wii in 2006. With this release, Peter Moore, head of the Xbox division at the time, demanded work to be done to compete against the Wii. The plan was to have two separate teams to come up with an idea for a “Wii Killer” as Moore stated it. One of the teams met with the eventual company that worked with the depth sensor of the Kinect, Prime Sense. However, the project lost some of its momentum when Moore left Microsoft in 2007 to work for EA Sports. [1]

Project Natal

The project got back on track in 2008 when the project was given approval to work with the technology developed by PrimeSense. Alex Kipman was put in charge of project, which was given the code name project Natal. The goal of the project was to design a device to include motion tracking, voice recognition, facial recognition, and depth recognition. Microsoft has traditionally named large projects after cities, and Kipman gave this code name because it is his hometown in Brazil. Kipman and his team were tasked with building a prototype for the device, using a reference device from PrimeSense, to demo for the executives of Microsoft. After the demo on August 18, 2008, Kipman was given a launch date of Christmas 2010 for the device. [1]

Motion Tracking Issue

The project hit a few snags along the way, the biggest being the motion tracking solution. The problems they encountered included needing the user to stand in a T-shape in order for the device to discover them, the device losing the player while doing certain motions requiring them to have to be re-discovered, and the device was only working with certain body types (the body types the prototype was designed for, the executives). Using the Microsoft Research (MSR) department the team was able to find a work around for these issues. The idea was to break the player’s depth image, which was already being generated, into distinguishable body parts. The MSR team was able to develop an algorithm to break the single image into 31 distinguishable body parts. From these body parts, the final product is able to identify 48 joints to generate a skeletal image. To get around the initial T-pose for identifying the user, MSR proposed the idea to use computer learning to understand the human body. The team used video research of people performing everyday movements around the living room, as well as different active movements. The data was passed through a decision tree that would allow the device to distinguish any player depth pixel and put it into one of the 31 body parts. [1]

Microphone Array Issue

The other major problem was with the voice recognition with the microphone array. The problem was filtering out the background noise, since the device would most likely be closer to the stereo of the TV than the user, who would be making the commands. Another MSR team was used to help develop and use different methods of echoing cancelation and noise suppression tricks that allowed the audio processing to be much greater than the standard at the time. The last step was to build an acoustical model based on the variations of American accents and various acoustical properties into the microphone array. The model was completed at the end of September 2010, and the release of the Kinect was on November 4th, 2010. [1]

The Kinect

The Kinect is a motion sensing device that layers regular two dimensional video with 3D imaging and depth sensing. The Kinect isn’t the only device of this kind that can be found on the market, but the cost comparisons between the Kinect and other motion sensing devices is quite significant. One can go purchase a Kinect for around $130 where other motion sensing devices and software run upwards of $1000. Shortly after the Kinect was released by Microsoft, people were working on a way to hack the device to use it with their computers rather than an Xbox. Once this was accomplished, people were able to write motion sensing programs for a much cheaper cost than ever before. This has led to the creation of many open source SDK’s to allow users to create apps and programs utilizing the Kinect with their computer. [4]

Figure1: This figure shows all of the internal components of the Kinect

Hardware

When the project idea was first announced at Microsoft, it was agreed that the project would be working with the company PrimeSense. PrimeSense is an Israli company, founded in 2005 by Aviad Maizels, Alexander Shpunt, Ophir Sharon, Tamir Berliner and Dima Rais, that developed the depth sensing technology used in the Kinect.[6] PrimeSense developed a reference device for Microsoft in early 2008, which consisted of a RGB camera, an infrared sensor, and an infrared light source. Microsoft went on to license the devise and the ps1080 chip included with it. The depth sensor they developed for Microsoft was unlike any depth sensing devices previously created. The main difference was in the way it calculated and identified the depth of objects. [1]

Video Camera

The video camera on the Kinect is an RGB color VGA video camera. RGB stands for red, green, and blue, which is a color model. The RGB color model is used to put various amounts of red, green, and blue together to generate or represent different colors. This kind of camera detects different amounts of each color in the image it captures to allow the image to be represented digitally. VGA stands for video graphics array and captures images at a standard of 640 x 480 pixels. The Kinect is capable of taking pictures in addition to video capture, and this is the quality of that image capture. The main functionality of the camera is to help with the facial recognition. Microsoft put this ability into to device to allow the Xbox to associate a player with an account on the Xbox. When facial recognition is set up, the system will automatically sign into the account associated with that user. The video camera captures at a frame rate of 30 fps.[2]

Depth Sensor

The Kinect has an infrared projector and sensor. The projector projects a grid across the room in which the Kinect is located. The result of this projection is thousands, if not millions of little dots spread out across the surfaces of the room. The naked eye is unable to see any of this as it occurs. Using an infrared camera to capture an image of a room in which a Kinect is running will result in an image as follows. [1]

Figure2: Infrared projection on room

This image shows what the infrared camera of the Kinect is capturing. Each dot represents a beam of infrared light leaving the Kinect and reflecting off the surface. The IR sensor in this devise uses a different method than devices previously created. Before PrimeSense created the reference for Microsoft, depth sensing devices used the time of flight method to calculate depth. For each of these beams generated by the infrared light source, the device would calculate the distance each object is from the device by how long it would take for the beam to be reflected back. This is a very expensive form of depth calculation. PrimeSense created a device that uses a pattern in the projected infrared light, and measures the distance between the dots. Microsoft initializes each Kinect as they are made by placing them exactly 10 feet from a wall. The Kinect then stores this data to be used as a base for calculations, using a scale to base it off of the initial data collected. [3]

Data Collecting

Figure2 can also be used to see the range at which the Kinect is able to collect data. The areas of the image that go from dark to the white dots represent the edge of the Kinect’s capturing ability. The Kinect can capture image and data as close as three feet and as far as eleven feet. PrimeSense has said that the range is much larger, but Microsoft has declared this range for optimal gaming. [2]

Why Infrared?

Microsoft decided to use infrared light to handle motion detection and depth sensing for two major reasons. The first is that using infrared light results in no dependence on the amount of

light in the room. The infrared camera is able to pick up and identify the infrared projections in both bright and dark situations. The second is that using an RGB image can generate difficulties if objects around the room are the same or close in color to that of the user’s clothes. The infrared projector and camera method measures distance by using a preset pattern of infrared projections. [3] Because of this, objects of a room could be all the same color, but the depth sensor would still work normally. This takes care of the color issue that would occur with an RGB camera source. Figure3 below shows a depth image generated by the Kinect. The software behind the Kinect is what actually generates these images. There will be more details pertaining to the software involved with the Kinect further into the paper. [1]

Figure3: Depth Image created by Kinect

Multi-array Microphone

The Kinect has four separately placed microphones that make up the multi-array microphone. Three microphones are placed on the bottom left side of the device, and the fourth is placed on the bottom right. The array of microphones are capable of canceling out ambient noise, which aid in the ability of voice commands, and is capable of pinpointing the location of the person talking in the room. The array of microphones allows the device to replace the headset that was once used for in game communication. It also is capable of detecting multiple voices that are located close the device. The main purpose of the microphones is to allow the user/users to perform voice commands. [1]

Software

The hardware components of the Kinect have been layered with software to produce uniformed output and data to secondary devices. The microphone array, camera, and depth sensor process

and pass on raw data. The software layered with this hardware is what drives the Kinect and its ability to perform many tasks.

Depth Calculations

The depth image is generated by the IR camera and the ps1080 chip connected to it. The data is then treated as input to the next component. [3] The depth data of a given pixel is passed as a 16-bit number. The first three bits represent the corresponding player for which the pixel is a part of. The software can track up to six players in the image. So the player index can be a value between one and six, or is given a value of zero if the pixel doesn’t correspond to any player. Bits 3-15 are used to determine the depth at that pixel. Using bit manipulation and calculations, the distance of the object of that pixel from the sensor can be calculated. If the depth can’t be determined, the value is zero. The image below shows a sample of data for a given pixel. [1]

Figure4: Sample 16-bit depth data

Skeletal Viewing

Microsoft layered software over the depth sensing hardware to be able to produce a skeletal view of the users. The process to reach this skeletal result starts with the depth image produced by the IR camera. Using the depth image along with large amounts of data used during studies, the software breaks the single blob image of the depth image into 31 distinguishable body parts. (Shown Appendix A Figure1 & Figure2) From this image, the Natal team was able to use the body parts to identify a total of 48 joints. Figure5 shows the 20 joints that the Microsoft SDK exposes. During a demo of the Natal project, developer and team lead Kudo Tsunoda proclaimed that the Kinect is able to identify the joints and motions of the user’s fingers. [2]

Figure5: Skeletal view of Kinect data processing

Kinect USB Drivers & Libraries

Shortly after the release of the Kinect for Xbox, many people were working on a way to utilize the Kinect for application development. The cable for the Kinect was able to plug directly into a USB drive, but the drivers and libraries for it needed to be created. This was done by companies like Microsoft and PrimeSense as well as everyday developers. [3]

Hacking the Kinect

The initial purpose of the Kinect was solely to be used as a gaming device. Shortly after the release of the Kinect, developers saw the potential benefits to using the Kinect as a developing tool. Being a rather cheap depth sensing and motion tracking device developers would be able to create motion tracking programs at a cheap price. Johnny Chung Lee, a developer on the Natal project wanted there to be a public driver for the Kinect to work utilizing a USB of a computer. Upset with the lack of work by Microsoft, he contacted AdaFruit and the two of them set up a prize for the first person to develop a driver to read the data being produced by the Kinect. The final bounty was raised to $3,000. [1]

OpenKinect

The solution was cracked seven days after the release of Kinect. This lead to OpenKinect, the first open source library to allow the Kinect to work with Windows, Linux, and Mac. Shortly after the creation of OpenKinect, PrimeSense released their own drivers and libraries for the Kinect and processing its data. [7] This was a great step for developing with the Kinect, however the developers ran into many of the same issues that the Natal team had to deal with. It took Microsoft six months to release Microsoft SDK. [5]

Microsoft Kinect SDK

The first release of Microsoft SDK took place on June 17th, 2011. This release included the solutions to the skeletal and voice recognitions as developed by the MSR team. This release was under a non-commercial license, so developers were not able to sell any of the programs or solutions they developed using the Kinect and this SDK. The SDK allows developers to create programs across a variety of Microsoft platforms using the Kinect as input for their programs. Using Microsoft Visual Studio, one is able to set up the Kinect as an input device for the program, and can utilize the data being passed by the Kinect. The developer is able to choose one or all of the inputs passed by the Kinect. These inputs can be the video camera, the microphone, or the depth sensor. The latest release of the Microsoft Kinect, along with the Kinect for Windows device, allows developers to use it for commercial deployments. The presentation I will be giving will go more in-depth on how to use the Microsoft SDK to develop a program utilizing the Kinect. [4]

Developing with the Kinect SDK

The first thing to do to begin development with the Kinect SDK is to make sure the Kinect can be powered by the computer via the USB. Older Kinects that were released with the Xbox 360 console require a power adapter to power the Kinect. New Kinects that can be purchased directly for programming are capable of being powered via the USB drive. Next is to make sure both Microsoft Visual Studios and the Kinect SDK are downloaded and installed. The next step is deciding what language to begin development in. The SDK works with C++, C#, and VB. Once a project is created using one of these languages, the Kinect must be set up as a reference. This is done by adding a reference in the solution explorer. The component name to be referenced is Microsoft.Kinect. The next step needed is to add an existing project. The existing project will be the example programs included with the Microsoft Kinect SDK. This is done to allow you to use the program samples within the program itself. This is done by installing those programs, and then referencing them in your program to allow access to them. This then allows the use of all the Kinect libraries. Figure1 in Appendix B shows the Kinect Sensor being listed auto complete

form. This then allows the sensor to be set up as an object for the program. From here, the object can be used to enable data collection from the sensor (Shown in Figure2 of Appendix B). There are a lot of tools that can be used with the Kinect and a lot of ways to get at them. The best way to get familiar with them is to work through some examples and get used to working with the Kinect data. [8]

Conclusion

The Kinect was a project put together by Microsoft to rival the video game console of the Nintendo Wii. Microsoft wasn’t anticipating the use of the Kinect in application programming, as seen by them being the last to release a driver as well as their libraries for the Kinect. With the Kinect currently being the cheapest device that allows motion detection and depth sensing, along with the newly release commercial version of the device and SDK package, it is uncertain how far Kinect application development will go.

References

[1] Ashley, James and Jarrett Webb. Beginning Kinect Programming with the Microsoft Kinect SDK. Apress, 2012. [eBook].

[2] Hall, Jonathan, Sean Kean, and Phoenix Perry. Meet the Kinect: An Introduction to Programming Natural User Interfaces. Apress, 2011. [eBook]

[3] Borenstein, Greg. Making Things See: 3D vision with Kinect, Processing, Arduino, and MakerBot. Make, 2012. [eBook]

[4] “Kinect for Windows.” Microsoft Support. Sat. 10 Mar. 2012.<http://support.xbox.com/en-US/kinect-for-windows/kinect-for-windows-info>

[5] “OpenNI.” PrimeSense. Sun. 11 Mar. 2012.<http://75.98.78.94/default.aspx>

[6] “About PrimeSense.” PrimeSense. Sun. 11 Mar. 2012.<http://www.primesense.com/en/company-profile>

[7] “OpenKinect: About.” OpenKinect. Sat. 17 Mar. 2012<http://openkinect.org/wiki/Main_Page>

[8] “Kinect for Windows Quickstart Series.” Channel9. By: Dan Fernandez. Sat. 17 Mar. 2012<http://channel9.msdn.com/Series/KinectQuickstart>

http://channel9.msdn.com/Series/KinectQuickstart

http://openkinect.org/wiki/Main_Page

http://www.primesense.com/en/company-profile

http://75.98.78.94/default.aspx

http://support.xbox.com/en-US/kinect-for-windows/kinect-for-windows-info

Appendix A

Figure1: Depth Image of user (Blob)

Figure2: Depth Image of user with associated body parts (31 total)

Appendix B

Figure1 – Kinect Sensor shown in the Visual Studio Library

Figure2 – Kinect Sensor Data Options

Figure3 – Code Example for initializing Kinect Senor for use.

[1] Ashley, James and - University of Wisconsin...

Documents

Transcript of [1] Ashley, James and - University of Wisconsin...