A random forest approach to skin detection with r

Post on 26-Jan-2015

115 views 4 download

description

 

Transcript of A random forest approach to skin detection with r

A RANDOM FOREST* APPROACH TO SKIN

DETECTION WITH RAuro Tripathy

auro@shatterline.com

*Random Forests are registered trademarks of Leo Breiman and Adele Cutler

Outline

Attributions, code and dataset location (1 minute)

Overview of the scheme (2 minutes) Refresher on Random Forest and R

Support (2 minutes) Results and continuing work (1 minute) Q&A (1 minute and later)

Attribution - Implementing an Existing Technique with R

ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5651638

R Code and Dataset

R code available here; my contribution http://www.shatterline.com/SkinDetection.html

Data set available here http://www.feeval.org/Data-sets/Skin_Colors.html Permission to use may be required

Tips to Prepare the Dataset

All training sets organized as a two-movie sequence

1. A movies sequence of frames in color2. A corresponding sequence of frames in binary

black-and-white, the ground-truth Extract individual frames in jpeg format

using ffmpeg, a transcoding toolffmpeg -i 14.avi -f image2 -ss 1.000 -vframes 1

14_500offset10s.jpeg

ffmpeg -i 14_gt_500frames.avi -f image2 -ss 1.000 -vframes 1 14_gt_500frames_offset10s.jpeg

Training Sample - Image and Corresponding Ground-truth

Ground-truthImage

The original authors used 8991 such image-pairs, the image along with its manually annotated pixel-level ground-truth.

Outline

Attributions, code and dataset location (1 minute)

Overview of the scheme (2 minutes) Refresher on Random Forest and R

Support (2 minutes) Results and continuing work (1 minute) Q&A (1 minute and later)

Problem Statement &Summary of Solution

Skin-color classification/segmentation Uses Improved Hue, Saturation, Luminance (IHLS)

color-space RBG values transformed to HLS HLS used as feature-vectors Original authors also experimented with

Bayesian network, Multilayer Perceptron, SVM, AdaBoost (Adaptive Boosting), Naive Bayes, RBF network

“Random Forest shows the best performance in terms of accuracy, precision and recall”

Choice of IHLS Color-Space

The most important property of this [IHLS] space is a “well-behaved” saturation coordinate which, in contrast to commonly used ones, always has a small numerical value for near-achromatic colours, and is completely independent of the brightness function

A 3D-polar Coordinate Colour Representation Suitable for Image, Analysis Allan Hanbury and Jean Serra

MATLAB routines implementing the RGB-to-IHLS and IHLS-to-RGB are available at http://www.prip.tuwien.ac.at/˜hanbury.

R routines implementing the RGB-to-IHLS and IHLS-to-RGB are available at http://www.shatterline.com/SkinDetection.html

R Packages

Package ‘ReadImages’ This package provides functions for reading

JPEG and PNG files Package ‘randomForest’

Breiman and Cutler’s Classification and regression based on a forest of trees using random inputs.

Package ‘foreach’ Support for the foreach looping construct Stretch goal to use %dopar%

Pseudo Code

set.seed(371)skin.rf <- foreach(i = c(1:nrow(training.frames.list)), .combine=combine, .packages='randomForest') %do% {

#Read the Image#transform from RGB to IHLS#Read the corresponding ground-truth image#data is ready, now apply random forest #not using the formula interfacerandomForest(table.data, y=table.truth, mtry = 2, importance = FALSE, proximity = FALSE, ntree=10, do.trace = 100)

}

table.pred.truth <- predict(skin.rf, test.table.data)

Outline

Attributions, code and dataset location (1 minute)

Overview of the scheme (2 minutes) Refresher on Random Forest and R

Support (2 minutes) Results and continuing work (1 minute) Q&A (1 minute and later)

Basics - Random forest is an ensemble classifier

Have lots of decision-tree learners Each learner’s training set is sampled

independently – with replacement Add more randomness – at each node of

the tree, the splitting attribute is selected from a randomly chosen sample of attributes

Random Forest Classification Concept

Each decision tree votes for a classification

Forest chooses a classification with the

most votes

Benefits Quick training phase Trees can grow in parallel Trees have attractive computing

properties For example…

Computation cost of making a binary tree is low O(N Log N)

Cost of using a tree is even lower – O(Log N) N is the number of data points Applies to balanced binary trees; decision

trees often not balanced

Outline

Attributions, code and dataset location (1 minute)

Overview of the scheme (2 minutes) Refresher on Random Forest and R

Support (2 minutes) Results and continuing work (1 minute) Q&A (1 minute and later)

Authors’ ResultsShow Random Forest is Best-in-class!

My Results? OK, but incomplete due to very small training set.Need parallel computing cluster

ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5651638

Outline

Attributions, code and dataset location (1 minute)

Overview of the scheme (2 minutes) Refresher on Random Forest and R

Support (2 minutes) Results and continuing work (1 minute) Q&A (1 minute and later)