Final ReportRudrafarhan

8
Image Processing and Computer Vision Project Fall - 2014 Abstract The way humans interact with computers and devices is soon to change with the help of gesture recognition. It is one of the trending technologies that will revolutionize human computer interaction in the future. This project aims to use hand gestures captured from the camera for a live painting application called ‘Doodle’. Detection of the human hand from the scene and motion estimation of the hand in order to produce the painting output as the hand moves are the major parts of this project. Important techniques from computer vision such as extracting the region of interest using background subtraction, skin color segmentation, detection of convex hull, convexity defects and usage of camshift algorithm for motion tracking are used. The user is provided with a panel containing different colors and an eraser, to choose his options as he draws on the screen. Keywords Background subtraction, Skin Segmentation, Camshift, Convex hulls, Convexity defects, Motion detection I. INTRODUCTION Human Computer Interaction has emerged into an area of spectacular research activities and has introduced a new dimensionality to the science of computing. One of the most universally known example is the graphic user interface used in Windows 95 by Microsoft. As computers have become universally prevalent, attention has shifted to incorporating techniques that will enhance the end-user experience. This is one of the fundamental reasons for the increase in popularity, research activities and commercial products related to Human Computer Interaction. HCI can be defined informally as science of design and study of interaction between humans and computers with an aim to enhance the quality of interaction. Analytical, cognitive and empirical techniques are used in the process of design better systems. In the study of HCI, there are many important facets that have evolved into different areas under it. Some of these areas include constraint based interface design, exploring different methodologies in implementation, designing new interface systems, evaluation and comparison of interfaces etc. In the recent years, we have seen dramatic changes to the user interfaces for electronic devices by means of touch, voice and gestures. There are many trending technologies in human computer interaction that include virtual and augmented reality, 3d visualization, natural language processing, speech processing, hand gesture recognition, touch sensing etc. Owing to the advancements in image acquisition and processing as well as computer vision techniques, Hand Gesture Recognition has gained importance in HCI. Using a part of the human body for communicating can be termed as gesture. All human beings are naturally inclined to use gestures while communicating, for e.g. while giving a presentation or taking part in a conversation. Therefore, integrating gesture recognition technology with computer operation will definitely provide a more seemingly comfortable experience. Many different type of tools can be used for dealing with image/video based gesture recognition. One of these methods include the usage of gloves that are equipped with tracking capabilities or special features for transferring signals for the purpose of communication such as magnetic interaction. This method is called the contact based approach for modelling gestures. Restrictive experience and the requirement of additional utilities such as gloves have not made this method popular. As a contrast, vision based techniques are cost- effective, non-intrusive and non-restrictive in nature. The quality of such a rendering directly depends on the kind of algorithm used and the computational capabilities of the system. In the image/video analysis based techniques various tools are being used namely, Stereo cameras, Depth-Aware cameras and simple 2d cameras. In Stereo cameras, multiple (usually two) cameras are used are capture different view feeds of the same frame and these are merged with special processing so as to obtain the 3d representation of the scene. Depth-aware cameras are capable of sensing depth map of the video feed within a certain range. Even simple 2d cameras available in laptops, digital cameras and smartphones can be used for gesture recognition. The vision based methods can be further categorized into Appearance modelling and 3D modelling. Furthermore, the techniques used also depend on whether the gesture recognition is static or dynamic. The static methods used are related to template matching, machine learning classification etc. The temporal aspect is introduced in dynamic systems, so the gestures and postures can vary with time. Therefore, more sophisticated methods are needed for processing and they are related to Advanced Hidden Markov Models, Time Warping, and Delayed Neural Networks etc. In this project, appearance based modelling with the help of standard 2d camera feed from a laptop is used as input for gesture modelling. Doodle using Hand Gesture Recognition Vandana Ravichandran Electrical and Computer Engineering, University of Florida, Gainesville 32611

description

Final ReportRudrafarhan

Transcript of Final ReportRudrafarhan

  • Image Processing and Computer Vision Project Fall - 2014

    Abstract

    The way humans interact with computers and devices

    is soon to change with the help of gesture recognition. It is

    one of the trending technologies that will revolutionize

    human computer interaction in the future. This project

    aims to use hand gestures captured from the camera for a

    live painting application called Doodle. Detection of the

    human hand from the scene and motion estimation of the

    hand in order to produce the painting output as the hand

    moves are the major parts of this project. Important

    techniques from computer vision such as extracting the

    region of interest using background subtraction, skin color

    segmentation, detection of convex hull, convexity defects

    and usage of camshift algorithm for motion tracking are

    used. The user is provided with a panel containing

    different colors and an eraser, to choose his options as he

    draws on the screen.

    Keywords

    Background subtraction, Skin Segmentation, Camshift,

    Convex hulls, Convexity defects, Motion detection

    I. INTRODUCTION

    Human Computer Interaction has emerged into an area

    of spectacular research activities and has introduced a new

    dimensionality to the science of computing. One of the most

    universally known example is the graphic user interface used

    in Windows 95 by Microsoft. As computers have become

    universally prevalent, attention has shifted to incorporating

    techniques that will enhance the end-user experience. This is

    one of the fundamental reasons for the increase in popularity,

    research activities and commercial products related to Human

    Computer Interaction.

    HCI can be defined informally as science of design and

    study of interaction between humans and computers with an

    aim to enhance the quality of interaction. Analytical, cognitive

    and empirical techniques are used in the process of design

    better systems. In the study of HCI, there are many important

    facets that have evolved into different areas under it. Some of

    these areas include constraint based interface design,

    exploring different methodologies in implementation,

    designing new interface systems, evaluation and comparison

    of interfaces etc. In the recent years, we have seen dramatic

    changes to the user interfaces for electronic devices by means

    of touch, voice and gestures.

    There are many trending technologies in human

    computer interaction that include virtual and augmented

    reality, 3d visualization, natural language processing, speech

    processing, hand gesture recognition, touch sensing etc.

    Owing to the advancements in image acquisition and

    processing as well as computer vision techniques, Hand

    Gesture Recognition has gained importance in HCI. Using a

    part of the human body for communicating can be termed as

    gesture. All human beings are naturally inclined to use

    gestures while communicating, for e.g. while giving a

    presentation or taking part in a conversation. Therefore,

    integrating gesture recognition technology with computer

    operation will definitely provide a more seemingly

    comfortable experience. Many different type of tools can be

    used for dealing with image/video based gesture recognition.

    One of these methods include the usage of gloves that are

    equipped with tracking capabilities or special features for

    transferring signals for the purpose of communication such as

    magnetic interaction. This method is called the contact based

    approach for modelling gestures. Restrictive experience and

    the requirement of additional utilities such as gloves have not

    made this method popular.

    As a contrast, vision based techniques are cost-

    effective, non-intrusive and non-restrictive in nature. The

    quality of such a rendering directly depends on the kind of

    algorithm used and the computational capabilities of the

    system. In the image/video analysis based techniques various

    tools are being used namely, Stereo cameras, Depth-Aware

    cameras and simple 2d cameras. In Stereo cameras, multiple

    (usually two) cameras are used are capture different view

    feeds of the same frame and these are merged with special

    processing so as to obtain the 3d representation of the scene.

    Depth-aware cameras are capable of sensing depth map of the

    video feed within a certain range. Even simple 2d cameras

    available in laptops, digital cameras and smartphones can be

    used for gesture recognition.

    The vision based methods can be further categorized

    into Appearance modelling and 3D modelling. Furthermore,

    the techniques used also depend on whether the gesture

    recognition is static or dynamic. The static methods used are

    related to template matching, machine learning classification

    etc. The temporal aspect is introduced in dynamic systems, so

    the gestures and postures can vary with time. Therefore, more

    sophisticated methods are needed for processing and they are

    related to Advanced Hidden Markov Models, Time Warping,

    and Delayed Neural Networks etc. In this project, appearance

    based modelling with the help of standard 2d camera feed

    from a laptop is used as input for gesture modelling.

    Doodle using Hand Gesture Recognition Vandana Ravichandran

    Electrical and Computer Engineering, University of Florida, Gainesville 32611

  • Image Processing and Computer Vision Project Fall - 2014

    The rest of the report is organized as follows Section

    II discusses the fundamental approach in utilizing computer

    vision techniques for Hand Gesture Recognition, Section III

    gives a design description of the painting application called

    Doodle. In Section IV highlights the approach used for

    implementing the features described in the previous section.

    Here, the various image processing techniques that were used

    in the project are discussed in depth. The algorithm used for

    writing the program is described in Section V. Section VI

    provides the conclusion by summarizing the practical

    experience and challenges faced during the implementation

    Section VII discusses the scope of the project and some

    related future work.

    II. APPROACH

    Generally, all object or motion tracking systems and

    hand interactive systems have the same goal the required

    object must be tracked in successive frames of a video or a

    camera feed when it moves dynamically. In hand interactive

    systems, the object that now needs to be recognized is

    essentially the hand. These systems can be divided into layers

    that have distinct functions to perform namely Detection,

    Tracking and Recognition.

    a. Detection Layer This is the first layer and is responsible for defining, correlating and extracting the

    features that correspond to the object of interest, i.e. to

    detect the presence of hands in the field of view obtained

    from the camera feed.

    b. Tracking Layer This is the mid-layer that makes use of the features that were extracted in the detection layer and

    associates data temporally between successive images in

    order to track the motion of the hands.

    c. Recognition Layer This the last layer and deals with the classification problem. The spatiotemporal data

    extracted in the previous two layers are assigned to

    relevant result groups with labels that are associated with

    specific classes of gestures.

    Once these three steps are performed successfully, the

    hand can be detected, tracked and the gesture can be

    classified. Now, the gesture that identified can be associated

    with a specific function as output, for e.g. play the music when

    the hand moves right etc. These three functions need to be

    iteratively computed for every frame so as to process dynamic

    gesture recognition.

    III. DESCRIPTION

    As described earlier, gesture recognition is employed to

    fulfill specific functions(s). In this project, the result of gesture

    recognition is used to create a painting application called

    Doodle. Briefly, the idea is to paint the live/active webcam

    window by the movement of the hand. The image shown

    below is an image captured during a demonstration.

    The application is equipped with the following features

    a. The user can select different colors from a panel of colors present on the upper part of the window

    b. When the users hand hovers over a particular color, the color is selected and a message indicating that a particular

    color was selected is displayed at the bottom of the

    window.

    c. The user can select erase option to use the fingertip like an eraser to erase the pixels that were earlier drawn on the

    screen.

    d. The erase option can also be used to clear the screen completely by selecting the white color on top right

    corner and hovering on that area for a few seconds. The

    screen is cleared after a brief countdown.

    e. The user can exit the application by pressing the Escape key. The programs breaks from the indefinite while loop

    and de-allocates or releases any memory associated with

    image pointers.

    f. When the user exits the application, the image that is drawn by the user is saved to a file named doodle.jpg in

    the project folder.

  • Image Processing and Computer Vision Project Fall - 2014

    Other hidden objectives The application also needs to

    fulfill certain requirements in the background for it to properly

    function such as:

    g. Before the application is tested, trackbars that are provided for fine-tuning the YCrCb parameters for skin

    segmentation must be used for calibration.

    h. The upper area on the webcam field holds the panel and the lower area is used for displaying the message. These

    are like control areas and user should not be able to draw

    on these areas. The area accessible for drawing is

    restricted to the space between these two sections.

    i. Sudden illumination disturbances can cause the drawing to be triggered even when there is no noticeable

    movement in the hand. In order to handle such

    disturbances, the pixels are not drawn if the magnitude of

    the distance between the two pixel points (two

    consecutive hand positions) is greater than 100.

    j. The system should be capable of handling external noise signals to a certain extent.

    IV. IMPLEMENTATION

    The program was written in C++ with the help of the

    OpenCV image processing and computer vision library. The

    application was developed in Visual Studio 2012. Many

    important image processing techniques were used to get the

    entire project. Some of the major techniques that are

    fundamental to this application are enlisted below

    1. Skin Color Segmentation 2. Morphological Operations 3. Background Subtraction 4. Camshift Algorithm 5. Convex Hull and Convexity Defects 6. Panel design and labelling

    The implementation methodology and the usage of these

    techniques are discussed in detail below

    1. Skin Color Segmentation

    The process of distinguishing and separating the area

    of interest from the image based on color is called color

    segmentation. Human skin possesses a distinct color tone and

    we can use color based tracking for skin segmentation. Here

    we try to differentiate the skin pixels from the rest of the

    image in order to recognize the human body parts in the

    image. In order for the skin segmentation to work effectively,

    the right color space and the range values of the channel

    parameters of the color space must be chosen. The black and

    white image shown in Fig 3 is a binary image obtained by

    thresholding the camera feed to restrict only skin-colored

    pixels to be white and the rest to be black. Only the pixels that

    correspond to skin (the face and hands) are white in color

    while the rest of the image is black in color.

    As shown in Fig 4, trackbars were used in the program

    to finetune the Y, Cr, Cb parameters for our application.

    Why YCrCb color space was chosen?

    Although RGB color space is commonly used to

    represent images and widely used in computer graphics, it is

    an additive model with high correlation. So usually, HSV or

    YCbCr colorspace is preferred for skin segmentation. The

    separation of brightness information from the chrominance

    part reduces the effect of uneven illumination. In this project, I

    tried using both the color spaces and YCbCr seemed to work a

    little more effectively. So, finally YCbCr color space was

    used.

    As shown in the diagram, Y Cr Cb values can be

    varied from 0 to 255, every color in the RGB colorspace has a

    corresponding unique value in the Y Cr Cb colorspace. The

    equations that are used to convert from RGB to YCrCb is

    shown in the blue box. The values of the 3 parameters that

    were most suitable for my skin tone and the ambient

    conditions in my room are shown below. Since these are

    subject to change based on the person and location, we can

    make use of the trackbars for adjusting the values.

  • Image Processing and Computer Vision Project Fall - 2014

    2. Morphological Operations

    Usually the results of segmentation contains noise, we

    use morphological operations namely erosion and dilation for

    removing the noise. Morphological operations are shape based

    non-linear operations applied for removal of noise and

    smoothening of edges. Morphological operations are

    dependent only on the relative ordering of pixel values and are

    not affected by their numerical values, hence can be most

    suitably used to process binary images.

    In this technique, the image is probed with a small

    shape called a structuring element. At all the possible

    locations, the structuring element is positioned and the pixel

    values of the structuring element are compared with that of the

    neighborhood pixels in a particular position. If for every pixel

    in the structuring element that is set to 1, the corresponding

    image pixel is also 1, then the element is said to fit the image.

    If for at least one of the pixels in the structuring element that is

    set to 1, the corresponding image pixel is also 1, then the

    element is said to hit the image. The same operation is

    iteratively applied to every pixel of the image and based on

    whether the structuring element hits or fits the image, we have

    two fundamental operations called erosion and dilation.

    During erosion a new binary image is produced such

    that if the element fits at pixel P(x,y) then new pixel P(x,y) is

    1, otherwise it is 0. Erosion is generally used for reducing the

    noise in the image but it also has a negative effect of reducing

    the region of interest. Similarly, during dilation a new binary

    image is produced such that if the element hits at pixel P(x,y)

    then new pixel P(x,y) is 1, otherwise it is 0. Dilation is

    responsible for expanding the region of interest and it used for

    filling any tiny unrequired gaps in the image. Results of

    dilation or erosion are influenced both by the size and shape of

    a structuring element.

    In this project after smoothening the image using

    Gaussian Blur, erosion and dilation are applied consecutively

    for two iterations using 3x3 ellipse as the structuring element.

    This is termed as opening of an image in morphology which

    is nothing but applying erosion followed by dilation

    consecutively using the same structuring element. The result

    of applying the opening operation to a binary image in my

    project is shown in Fig 6.

    3. Background Subtraction

    Background subtraction is a widely used technique for

    obtaining the foreground mask of scene. The foreground that

    is extracted using this method is usually sent for further pre-

    processing. Generally an image's regions of interest are objects

    (humans, cars, in our case the hand) in its foreground.

    Background subtraction is typically used for detecting moving

    objects in videos from static cameras and this method is

    mostly used if the image in question is a part of a video

    stream. This an area with much research interest and there are

    many effective algorithms that have been proposed. The type

    of background subtraction usually depends on the

    requirements of the application, such as level of sensitivity of

    the system, accuracy with which the moving object has to be

    tracked and any constraints in terms of memory etc.

    In the OpnCV library, we have preset background

    subtraction techniques based on the Mixure of Gaussians

    method. When there is a possibility of multiple objects being

    introduced into the frames and the background permanently or

    significantly gets altered, this method is useful as it is adaptive

    in nature. In our application, since there will be not much

    change to the background and only one object (which is the

    hand) needs to be tracked, a simpler approach can be used.

    Here we make use of frame difference method, the moving

    object is extracted by subtracting the current frame and a static

    background image usually known as the background model.

    Sometimes using only a static frame as a background

    model doesnt suffice and it increases the sensitivity of the

    system leading to very small illumination disturbances to be

    wrongly captured as moving objects. In order to reduce the

    sensitivity instead of directly using the frame difference, we

    use the weighted average method as described by the equation

    shown below.

    In this equation, Fi represents the foreground of the ith

    frame, Bi represents the background of the ith frame. The new

    background at time instance i+1 is calculated as a weighted

    average of Fi and Bi. Alpha is called the learning rate and lies

    between 0 and 1. This parameter is used for tuning the

    sensitivity of the system and by varying this factor, the effect

  • Image Processing and Computer Vision Project Fall - 2014

    of the older objects present in the scene can be controlled

    while determining the new background. After determining the

    background in this manner, frame difference is conventionally

    used to extract the foreground objects of interest. The image

    shown in fig 7 is an example of background subtraction used

    for extracting the hand from the binary image.

    4. Meanshift and Camshift algorithms

    Meanshift and Camshift algorithms are popular object

    tracking algorithms. For a given pixel distribution, the

    centroid or the mean is computed and a track window is used

    to demarcate the object of interest in the frame. Subsequently,

    as the object moves, the pixel distribution changes and the

    track window needs to be shifted or moved to the area of

    higher pixel density. We are obtaining the result of skin

    segmentation and background subtraction and providing it as

    an input to the Camshift algorithm, therefore the area of higher

    pixel density should correspond to the tracked object, which in

    our case is the hand.

    Here, the method used for moving the track window is

    of interest to us. For determining the probability distribution

    of the image pixels, we use histogram back projection

    technique. Initially, the area of the hand is selected as the

    feature vector to calculate the histogram model. This

    histogram obtained is then used to determine the area of the

    image that contains a matching feature in the subsequent

    images. This is called as the backprojection technique. Since,

    in this method we make use of histograms that is composed of

    various pixel frequencies, we again rely on the color

    distribution of the frame.

    The image with the colored bars shown in Fig 8 is the

    histogram obtained for my hand. Different colored rectangular

    boxes are used to plot the histogram in order to visualize it as

    there was no readily available plotting function in OpenCV.

    We use a Gaussian Kernel function K(x) shown below to

    obtain the weighted mean of the pixels in the neighborhood.

    The neighborhood N(x) of a pixel x is defined as the set of

    pixels around x such that when the kernel function is applied,

    the value obtained should be non-zero.

    The mean m(x) is calculated using the following equation

    In this equation, the mean m(x) is calculated as a weighted

    average of every pixel xi in the neighborhood of X

    (represented by the neighborhood function N(x)) using the

    given Gaussian Kernel function K(x). Now after the mean is

    calculated the track window is now shifted to the new centroid

    or the mean m(x). This is an iterative process and repeats itself

    until convergence is achieved. This is how the object is

    tracked using Meanshift algorithm.

    How Camshift differs?

    The term Camshift stands for Continuously Adaptive

    Meanshift. Camshift exactly imitates the Meanshift model but

    only a slight difference. Once convergence is achieved using

    meanshift, not only is the position of the track window

    updated based on the new mean, but also the size of the track

    window is updated. The equation used for updating the size of

    the track window is shown below.

    In the equation above, s represents each dimension of the track

    window which is proportional to the new mean M. Also, the

    orientation of the track window is calculated if there is a tilt in

    the object tracked from its initial position. Again Camshift

    uses the new scaled track window for determining the centroid

    and this process continues iteratively. The images in Fig 9 and

    10 show how the size of the track window is updated

    proportional to the size of the hand when it is moved closer to

    the camera. This is exactly how camshift improvises over

    meanshift.

  • Image Processing and Computer Vision Project Fall - 2014

    5. Convex Hull and Convexity Defects

    Convex Hull is a prominent area of interest in

    geometry. In simple words, convex hull can be defined as the

    closest or smallest curve that can completely enclose a

    required set of points. It can be visualized as a rubber band

    stretched to exactly fit the points in the plane. The problem of

    determining the convex hull pertains to computational

    geometry. The mathematical definition is shown below.

    In an n-dimensional space, the convex hull of a given

    set of points S is the intersection of all convex sets containing

    S. The convex hull C for the given set of points P1, P2 ,

    PN is computed using the equation:

    If we are able to compute the convex hull of the white

    blob which belongs to the hand in the binary image, then we

    can visually track the hand. First for this, the contours of the

    hand in the binary image are found. The contours of the hand

    can be determined using any edge-detection technique. After

    the contour of the hand is obtained, we provide these set of

    points as an input for calculating the hull. The hull is

    calculated using the function ConvexHull2 available in

    OpenCV. The output of this function is an array of points in

    the frame that correspond to the convex hull of the hand. Line

    segments using selected points from this array are plotted on

    the frame to form a polygon around the hand.

    The points in the enclosed area that deviate from the convex

    hull are called the defect points or the convexity defects. The

    deviation from the hull are the valley points or the points

    between the fingers. So, computing the convex hull and the

    convexity defect will enable us to track the border of the hand

    along with the fingers and the area between them. Thus the

    palm can be tracked in totality. In Fig 11 shown below, the

    hull is represented by the pink color polygon which is

    bounded by the blue color rectangular bounding box. The

    finger tips and defect points are dots represented in green

    color.

    6. Panel design and labelling

    The drawing is produced by tracking the finger tips

    with the help of convex hull as described in the previous

    section. First an empty image matrix is initialized, it is white

    in color. As the user moves the hand, the fingertip taken to be

    the highest point in the hull in the vertical dimension is

    tracked and the corresponding pixels are drawn on the blank

    image. The default color selected is red and the user can

    change the drawing color by selecting one of the colors shown

    in the panel image below. The drawing is also added to the

    webcam feed so the user can visualize how the drawing is

    produced when the hand is moved.

    The panel was designed using mspaint and was saved

    as a image with .panel extension. Then, this panel image is

    overlapped on the webcam feed so as to provide the user with

    the color options. When, the white color is selected, the pixels

    drawn on the white image can be erased using the fingertip.

  • Image Processing and Computer Vision Project Fall - 2014

    Whenever the user selects any color option or clears

    the screen, corresponding messages are displayed in the lower

    area on the webcam feed. This is achieved by using a text

    buffer that will hold the text in the selected color, font and font

    size and the text is overlapped on the webcam feed using the

    cvPutText function.

    V. ALGORITHM

    The flowchart below explains in a gist the algorithm

    that was used and the important steps that were followed

    while writing the program. The following process is carried

    out iteratively until the user quits the application.

    Fig 13: Flowchart of the algorithm

    The algorithm can be briefly explained as follows.

    Initially the image is acquired from the camera and is sent for

    pre-processing. Next, skin segmentation is used to extract the

    skin colored pixels from the frame which consists of the hand

    and which may also contain other body parts such as face and

    neck. The image frame is subtracted from the background

    calculated using the running or weighted average method.

    At this stage, morphological opening is applied to

    reduce noise and expand ROI. Then camshift is used for

    tracking the hand based on histogram back projection. The

    contours, convex hull and defect points are determined. The

    fingertip is the highest point of the hull which is tracked and

    the pixels are added to the drawing buffer which is initially an

    empty image. Finally, the drawing buffer, color panel and

    webcam feed are multiplied (and operation) to get the final

    Doodle window.

    VI. CONCLUSION

    The application doodle was developed using

    fundamental computer vision and image processing

    techniques. This project was essentially implemented in two

    steps. In the first step, only hand detection and tracking was

    developed. In the next step, a program for drawing with a

    colored glove was developed. Later, both these parts were

    integrated to get Doodle functioning. In terms of challenges

    faced during the implementation, fine tuning the YCrCb

    parameters and rendering the system less sensitive to ambient

    light conditions was a difficult task.

    VII. RELATED FUTURE WORK

    This project has a good scope for improvement in the

    future wherein other sophisticated techniques like

    classification using machine learning and advanced

    background subtraction and edge detection methods can be

    used to increase the stability of the system. Many other

    interesting options for the user like drawing different shapes,

    changing the brush effect and its thickness can be included.

    The work related such future developments of this project are

    currently under progress. This project has definitely been a

    very good starting point for creating a painting application

    using hand gestures.

    VIII. REFERENCES

    [1] Amir Rosenfeld and Daphna Weinshall. Extracting

    Foreground Masks towards Object Recognition. In 13th IEEE

    International Conference on Computer Vision, Nov 2011.

    [2] Ryosuke Araki and Takeshi Ikenaga. Real-time both hands

    tracking using CAMshift with motion mask and probability

    reduction by motion prediction. APSIPA ASC, 2012.

    [3] Dr. Dapeng Wu, University of Florida, Lecture Notes

    [4] Lee, D., and Lee. Vision-Based Finger Action Recognition

    by Angle Detection and Contour Analysis. ETRI Journal,

    2011.

    [5] Afef Salhi and Ameni Yengui Jammoussi. Object tracking

    system using Camshift, Meanshift and Kalman filter. In World

    Academy of Science, Engineering and Technology, Apr 2012.

    [6] Learning OpenCV by Gary Bradski and Adrian Kaebler.

    OReilly publications, September 2008.

  • Image Processing and Computer Vision Project Fall - 2014

    [7] S. Franois, B. and R.J. Alexandre. Camshift Tracker

    Design Experiments. IMSC, no. 11, pp. 111. 2004.

    [8] Massimo Piccardi, Background subtraction techniques: a

    review. University of Technology, Sydney.

    [9] Skin segmentation using color pixel classification: analysis

    and comparison by Son Lam Phung and A. Bouzerdoum,

    University of Wollongon.

    [10] Advanced background subtraction approach using

    Laplacian distribution model by Fan-Chieh Cheng, ICME

    2010.

    [11] M Panwar, P S Mehra. Hand Gesture Recognition for

    Human Computer Interaction, Proceedings of IEEE

    International Conference on Image Information Processing

    (ICIIP 2011), Waknaghat, India,November 2011.

    [12] L Howe, F Wong, A Chekima, Comparison of Hand

    Segmentation Methodologies for Hand Gesture Recognition,

    IEEE-978-4244-2328-6, 2008.

    [13] A survey of skin-color modeling and detection methods,

    P. Kakumanu, S. Makrogiannis, N. Bourbakis, Department of

    Computer Science and Engineering, Wright State University.

    [14] Real-Time Both Hands Tracking Using CAMshift with

    Motion Mask and Probability Reduction by Motion Prediction

    by Ryosuke Araki, Seiichi Gohshi and Takeshi Ikenaga,

    Kogakuin University

    [15] Web Resources:

    http://docs.opencv.org http://opencv-srf.blogspot.com http://opencvpython.blogspot.com