MATLAB Based Interactive Music Player using XBOX Kinectrpiyush/documents/piyush_Fall2012.pdf ·...
Transcript of MATLAB Based Interactive Music Player using XBOX Kinectrpiyush/documents/piyush_Fall2012.pdf ·...
1 MATLAB Based Interactive Music Player using XBOX Kinect EN.600.461 Final Project
MATLAB Based Interactive Music Player using XBOX Kinect
Gowtham G. Piyush R. Ashish K.
(ggarime1, proutra1, akumar34)@jhu.edu
Johns Hopkins University, Baltimore, USA
1. Abstract The launch of XBOX Kinect opened exciting new avenues for 3D perception due to its easy to
use, out-of-the-box depth and color video. Applications spectra seem to be widening over a
series of software and hardware. We have all come across Music Players in our day to day life.
Various methods of accessing these music interfaces exist but again, easier methods to access the
same are always desirable. In this project, we create a gesture based 3D user interface for playing
audio from a MATLAB based graphic user interface. Multiple object based background is
assumed as the environment and hand detection over them activates processing of data. On
detecting a gesture over a particular area in the foreground, corresponding functionality in the
GUI is activated. As per the gesture of the user, for which the system was already trained, the
music player responds promptly. The setup was tested over a set of saved images as well as in
real-time from a Kinect Sensor images. The results varied over different operating systems as
discussed later, but were satisfying and as desired.
2. Aims of the Project The aims we could see before the start of the project were:
2.1 Choice of environment for camera view. We assumed the camera view to be top down so
as to emphasize on capabilities of Kinect sensor over other general cameras that provide us with
only 2D image of the objects. However, Kinect has its own limitations and doesn’t give desired
images within a very close range of its view. To be properly detected, an object has to be present
at least 0.5metres away from the camera sensors [1].
2.2 Identification of marker objects in real world. It was thought to be preferable to have
some objects which correspond to buttons in the music player. Having predefined marker objects
makes it easier for users other than the programmer to access the music player. Detection of
these objects while start of the setup is desirable.
2.3 Background subtraction and filtering of noise. One of the aims of the project is to be able
to identify dynamic objects and remove background or static objects. This would allow us to
reduce the clutter in the image and focus on the objects of interest such as hand gestures which
are dynamic.
2.4 Gesture recognition. Detecting and differentiating between gestures would reduce the
need for more marker objects. Also it would be further efficient use of the Kinect sensor.
2.5 Music Player development. We feel a music player which is not as complex as the
commercially available ones should be better for testing purpose of our project as it is more
inclined towards the computer vision part. However, we desired to develop a music player
2 MATLAB Based Interactive Music Player using XBOX Kinect EN.600.461 Final Project
graphic user interface which is comprehensive enough to have primary functionalities of ‘Play’,
‘Pause’, ‘Stop’, ‘Volume control’ and ‘Scroll track’.
3. Approaches to tackle the problems 3.1 Choice of environment – From prior research work on operation of Kinect[1][7] and
experience of working on Kinect [10] we could find that Kinect sensor does not give desired
images for objects placed within 0.5m of it. So we decided to have an operation space of at least
1.5 meters, so as to facilitate free moving of the operator’s palm. As discussed earlier, top-down
view of Kinect is assumed i.e. Kinect is placed in such a manner that it views a floor or table-top
vertically below it. Even though it should not affect the functionality or our code, it is preferred
to have a clear background free from stray items except the marker objects.
Figure 1. A screen shot of image when hand goes out of bound of Kinect Sensor
3.2 Identification of marker objects – Marker objects are portions in the image view which
demarcate the various functionalities. Specific objects are associated with separate buttons on the
GUI. These marker objects can be pre-placed in the background or may be dynamically
introduced into the image frame. First we decided placing marker objects (symbols) in a pre-
defined order and detecting the edges while pre-processing [3]. Then we could use ‘regionprops’
command in Matlab to find the centroids of the marker objects. However, this wasn’t able to
achieve scale and rotation invariance while object detection. Another possible major
disadvantage of this method would have been doing away with the dynamic detection of the
marker objects. Hence we decided to check for SIFT features [2] and match the objects to
previously stored images of the object(s). To get more key-points, we designed the markers with
roman alphabets in ‘Algerian’ font. The SIFT matching technique is rotation and scale invariant.
3 MATLAB Based Interactive Music Player using XBOX Kinect EN.600.461 Final Project
Hence there weren’t many outliers when matching. In almost all cases, we found the program
showing us correct corresponding matches. After detection of the markers, template boundaries
were calculated so as to demarcate functionalities of the objects.
4 MATLAB Based Interactive Music Player using XBOX Kinect EN.600.461 Final Project
Figure 2,3,4,5 show SIFT feature matching of ‘Marker Objects’ with corresponding images and
plotting of respective bounding boxes is shown in Fig 6
3.3 Background subtraction and filtering of noise – To detect hand in the images we initially
processed the color images. For the first trial, we assumed a background image and updated it by
taking mean of its previous five frames[3]. Then we subtracted it from the current frame which
would give us the position of the hand. While this worked properly when the hand was entering
the template frames, considerable delay was present when the hand was to pull out of the frame.
Also, changing lighting conditions would affect this method drastically. The Kinect updates its
white balance after certain time interval and this also affects the background data. However, the
depth image is generally free of the background light changes. Hence it was desirable to use
depth image for processing of data. Depth images were found to have considerably less amount
of noise. The processing of the depth image was done by differentiating between current frame
and a reference frame which was chosen when no hand was over the marker object templates.
The noise in the resultant image was reduced further by Gaussian filtering and opening function
on the image. The opening function reduces the salt and pepper noise in the image to a great
extent. An amusing error was noticed when the noise was present due to reflection of IR rays of
5 MATLAB Based Interactive Music Player using XBOX Kinect EN.600.461 Final Project
kinect sensor by the ring worn by one of the users. Also, shadows of hand in the color image
affected the results. Such errors were reduced to some extent by the ‘imopen’ function.
3.4 Gesture Recognition - To accomplish one of the objectives namely, Gesture Recognition
there are number of methods reported in the literature.
Template matching using expectation maximization [5]
Mean-shift or simple connected components
Machine learning
Machine learning allows for easier gesture recognition and provides a robust classifier at low
computational cost. This is particularly useful for real time systems. This project uses a simple
logistic regression classifier to recognize the gestures. The classifier currently recognizes three
types of gestures as shown below: NO HAND; HAND TYPE 1; HAND TYPE2. These gestures
are used in controlling the music player in various interesting ways. The classifier takes in a
filtered and cropped region of interest image (ROI) after background subtraction. The image is
resized to reduce the number of features used for training. This will avoid the possibility of over-
fitting the training data. The resized image is rolled into a 1 X 1600 feature vector. Each element
of the vector is a pixel of the image.
3.4.1 Dataset
The total dataset of training and test hand images consisted of 573 labeled images which are
divided into 473 training and 100 test images randomly. The training set for the classifier
consists of labeled images consisting of rotations of hand and scaling of each of the gesture. The
classifier after training provides parameter matrix which is a 3x1600 matrix. When applied on an
image, this provides the probabilities scaled to the range (-1, 1) of the template belonging to one
of the above Hand types described above.
The test set is used to verify the generalization of the above parameters. As the features are a lot
more than the dataset, the classifier tends to over-fit the current data. But this is tolerable since
the results from test data show an acceptable accuracy of 85 %.
Fig 7. First row shows some data for training of hand type 0. Second and third rows show some
data for training of hand type 1 and final row shows hand type 2.
6 MATLAB Based Interactive Music Player using XBOX Kinect EN.600.461 Final Project
Figure 8. Random Test data for which 96% accuracy was noticed in Matlab.
The classifier although robust to the noise in the corresponding region of interest (ROI), is
sensitive to the size of ROI. The linear classifier does not perform well if the region of interest is
picked somewhat different from the true region. Apart from region of interest, the classifier does
not have the capability to work
3.5 One of the challenges that we faced during the course of this project was to design a
simple yet comprehensive music player in MATLAB. We achieved this using GUI EDITOR of
MATLAB.
3.5.1 Music Player Layout
We designed a simple MATLAB player having the below mentioned basic features:
Listbox: containing a list of songs. The songs are uploaded from a pre-determined folder
in the system.
Text Box: This contains the name of the current song which is playing. This box will
clear out if we stop a song.
Slider: To govern the volume of the player. The maximum and minimum values of the
slider are 1 and 0 respectively.
5 pushbuttons: Each of these buttons corresponds to Play, Pause, Stop, Next and Previous
buttons on the GUI.
Play: To start playing a song.
Pause: To stop a song, however if we press play after pausing a song it resumes from the
place where it had been stopped initially.
Stop: Same as pause, however if we press play after stopping a song it will again start
from the beginning.
Next: It will highlight the next song, but the song will not start playing until we press the
Play button. If the end of the playlist has been reached, nothing will happen.
7 MATLAB Based Interactive Music Player using XBOX Kinect EN.600.461 Final Project
Previous: It will highlight the previous song, but as in Next the song will not start playing
until the Play button is pressed. If the song, which is highlighted, is the first song of the playlist
nothing will happen.
Figure 9. Matlab based music player GUI developed by us.
4. Integration of modules. After achieving desired results, modules were integrated to achieve the final aim of the project. The
interactive music player is governed by the hand movements/gestures, depending on where the
hand is in the Kinect image in the current frame. The Kinect continuously captures images of the
marker objects and where the hand is relative to each of these objects.
In the main program the function that governs this music player is ‘procctrl.m’. This function
takes in 3 arguments viz. ctrl, vol and H explained in detail below.
CTRL: This is a 1x5 vector which can have the following values:
[1 0 0 0 0] – If hand type 1 is on the Play marker object in the Kinect image then the
function ‘ctrlgen.m’ gives the value of the CTRL vector and using the Play functionality of the
Music Player is invoked.
8 MATLAB Based Interactive Music Player using XBOX Kinect EN.600.461 Final Project
[0 1 0 0 0] – If hand type 2 is on the Play marker object in the Kinect image then the
function ‘ctrlgen.m’ gives this value of the CTRL vector and using the Pause functionality of the
Music Player is invoked.
[0 0 1 0 0] – If any known hand type is on the Stop marker object in the Kinect image
then the function ‘ctrlgen.m’ gives this value of the CTRL vector and this is used to invoke the
Stop functionality of the Music Player.
[0 0 0 1 0] – If any known hand type is on the Volume marker object in the Kinect image
then the function ‘ctrlgen.m’ gives this value of the CTRL vector and this is used to invoke the
Volume functionality of the Music Player. Whenever the volume functionality is invoked then
the argument ‘vol’ is also passed which gives the current volume value (between 0 and 1)
according to which the volume of the Music Player is set.
[0 0 0 0 1] - If hand type 2 is on the Scroll marker object in the Kinect image then the
function ‘ctrlgen.m’ gives this value of the CTRL vector and this is used to invoke the ‘Previous’
functionality of the Music Player.
[0 0 0 0 2] – If hand type 1 is on the Scroll marker object in the Kinect image then the
function ‘ctrlgen.m’ gives this value of the CTRL vector and this is used to invoke the Next
functionality of the Music Player.
VOL: The current value ( between 0 and 1 ) to which the volume of the Music Player is
to be set depending on the depth value of where the hand is on Volume marker object.
H: This is the handle of the GUI and is used internally in the program.
Figure 10 showing final integration of the project
9 MATLAB Based Interactive Music Player using XBOX Kinect EN.600.461 Final Project
Conclusion Through this project, we have implemented an interactive music player which is controlled by
hand gestures in the depth images taken from XBOX Kinect. Usage of machine learning
algorithm made it possible for detecting hand and gestures even in presence of noise and is
invariant to rotation and scale.
The code was run on a set of image data taken from XBOX Kinect. The video showing execution
of the same can be found on http://youtu.be/JczfQOyJiiM. It shows the various functionalities of
the music player.
This project can be further improved upon by replacing logistic regression algorithm with better
and more efficient algorithms so that more gestures can be perfectly detected. Dynamic
background implementation can also be introduced in due course of time.
10 MATLAB Based Interactive Music Player using XBOX Kinect EN.600.461 Final Project
References
[1] M. T. Draelos, "The Kinect Up Close: Modifications for Short-Range Depth Imaging," North
Carolina State University, Raleigh, North Carolina, 2012.
[2]D. G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” International
Journal of Computer Vision, 2004
[3] Y Ivanov, A Bobick ,J Liu, “Fast Lighting Independent Background Subtraction”, MIT Media
Lab., 1999
[4] J. Canny, "A computational approach to edge detection," IEEE Transactions on Pattern
Analysis Machine Intelligence, p. 679–698, 1986.
[5] Galatsanos, N.P. , Wernick, M.N., “Impulse restoration-based template-matching using the
expectation-maximization algorithm” Image Processing, Proceedings., International Conference,
1997
[6] http://conanchen.com/Kinetris
[7] http://openkinect.org/wiki/Talk:Main_Page
[8] http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning
[9] http://matlabbyexamples.blogspot.com/2011/03/making-matlab-media-player.html
[10] P Routray, G Bhutra, S Rath, S Mohanty "Depth Image Processing and Operator Imitation
Using a Custom Made Semi Humanoid.," IOSR Journals, vol. 1, no. 1, pp. 31-35, 2012.