Efficient Histogram Based Sliding Window

download Efficient Histogram Based Sliding Window

of 8

Transcript of Efficient Histogram Based Sliding Window

  • 8/10/2019 Efficient Histogram Based Sliding Window

    1/8

    Ef cient Histogram-Based Sliding Window

    Yichen WeiMicrosoft Research [email protected]

    Litian TaoBeihang University

    [email protected]

    Abstract

    Many computer vision problems rely on computinghistogram-based objective functions with a sliding window. A main limiting factor is the high computational cost. Exist-ing computational methods have a complexity linear in thehistogram dimension. In this paper, we propose an ef cient

    method that has a constant complexity in the histogram di-mension and therefore scales well with high dimensionalhistograms. This is achieved by harnessing the spatialcoherence of natural images and computing the objective function in an incremental manner. We demonstrate the sig-ni cant performance enhancement by our method throughimportant vision tasks including object detection, object tracking and image saliency analysis. Compared with state-of-the-art techniques, our method typically achieves fromtens to hundreds of times speedup for those tasks.

    1. Introduction

    A histogram is a discretized distribution that measuresthe occurrence frequency of quantized visual features, e.g .,pixel intensity/color [6], gradient orientation [9, 17], tex-ture patterns [1] or visual words [14]. Such a representationis a reasonable trade-off between computational simplicity,discriminability and robustness to geometric deformations.

    Computing a histogram-based objective function witha sliding window is common in solving computer visionproblems such as object detection [1, 17, 5], object track-ing [16], and image ltering [22]. The objective functionis evaluated on a window sliding over the entire image andthis is computationally very intensive. Although there ex-ists ef cient local optimum search method [6] for trackingand global optimum search method [5] for detection, such adense evaluation is often inevitable. For example, in objecttracking, one often needs to conduct a full frame search forsmall, fast-moving objects [3, 16].

    Computing a dense con dence map for object detectionis of special importance. In spite of signi cant progresses inobject detection recently [2], accurate detection of generalobjects still remains a challenging task. It is generally be-

    lieved that contextual information plays an important role inimage understanding. Individual tasks such as object detec-tion, scene segmentation, and geometric inference, shouldbe integrated and enhance each other to resolve the inherentambiguity existing in each of them. Several computationalframeworks [8, 12] have been developed towards this pur-pose, in which a dense con dence map for object detection

    is an indispensable step.The main limitation of a histogram-based sliding win-

    dow is its high computational cost. For an image of sizen n , a window of size r r and a histogram of dimen-sion B , a straightforward method scans n2 windows, scansr 2 pixels per window to construct the histogram and scansB bins of the histogram to evaluate the objective function 1 .The overall complexity O(n2(r 2 + B )) is prohibitive wheneither n , r or B is large. Several techniques have been pro-posed to remove the r2 factor and reduce the complexityto O(n2B ) [11, 22, 16]. When the histogram dimensionB is large, such techniques do not scale well. Unfortu-nately, high dimensional histograms now are commonplacefor solving many vision problems, e.g ., color histograms forobject tracking [6]) ( B = 323 ), LBP [1] for face recogni-tion/detection ( B can be hundreds) and bag of visual wordsfor image retrieval [14] or object detection [5, 2] ( B is typ-ically hundreds or thousands).

    To alleviate the computational cost for high dimensionalhistograms, we propose an ef cient method with a con-stant complexity in the histogram dimension. As demon-strated later in the paper, for object detection task using sup-port vector machine with a non-linear kernel and bag of vi-sual words model, which are leading techniques [5, 2], ourmethod can achieve up to a hundred times speedup over ex-isting techniques, reducing the computation time from min-utes to seconds. In addition, it facilitates the subsequentcontextual information fusion [8, 12] and makes such tech-niques much more practical.

    Our method is also helpful for object tracking tasks,where real-time running performance is usually desired. Toachieve good running performance, previous methods either

    1We assume a bin-additive objective function. Such functions are alarge family and commonly used in vision problems.

    3003978-1-4244-6985-7/10/$26.00 2010 IEEE

  • 8/10/2019 Efficient Histogram Based Sliding Window

    2/8

    use large size features but perform local search [6, 18], orperform full frame search but use small size features [11, 3,25, 16]. Our method can achieve real-time performance us-ing both high dimensional histogram features and full framesearch. To our best knowledge, it is the rst time that such ahigh performance is achieved when both n and B are large.

    Our method is inspired by the image smoothness prior,i.e ., natural images usually possess smooth appearance.Consequently, the quantized visual features are also spa-tially coherent and a histogram of a sliding window typi-cally changes slowly. As illustrated in Figure 1, when awindow slides by one column, 2r pixels change [13] andthe number of changed histogram bins, denoted as 2sr , isusually much smaller. Therefore only a few bins of the his-togram need to be updated. Here s is a factor between 0and 1. Its value depends on the smoothness of the imageand indicates how sparsely the histogram changes. In ourexperiments, we observe that for most natural images therange of this value is s

    [0.03, 0.3].Based on the above observation, our method performs

    histogram construction and objective function evaluation inan incremental manner simultaneously, and resulting com- putational complexity is O(n2 min( B, 2sr )) . It subsumesthe complexity O(n2B ) of previous methods and will bemuch faster when B 2sr .2. Previous Fast Computational Methods

    Fast histogram computation Integral histogram [11] ap-proach computes a histogram over an arbitrary rectangle inO(B ) time. Each bin is computed in a constant time via apre-computed integral image. When B r2 , this is veryef cient. Because the pre-computation of integral imageshas O(n2B ) complexity in both time and memory, its us-age is limited for large images and high dimensional his-tograms. For n = 512, B = 1000 , 1G memory is requiredand pre-computation takes about one second, which cannotbe simply ignored sometimes.

    Distributive histogram approach [22, 16] utilizes the dis-tributivity property that a histogram over two regions is thesum of two histograms on these regions. As the windowslides by one column, the histogram is updated by adding anew column histogram and removing an old one in O(B )time. When a row scanning is nished and the windowslides downwards, all the column histograms are updated

    by removing the top pixel and adding the bottom pixel inO(1) time.

    Both above approaches have O(B ) complexity in his-togram construction and function evaluation. Although his-togram construction in [22, 16] can be accelerated severaltimes via SIMD instructions, function evaluation could bemuch more complex and dominate the computation time,e.g ., an SVM classi er with a non-linear kernel. It is crucialto reduce O(B ) for such functions.

    = + _changedpixels

    changedbins

    image

    histogramindex map

    += _

    _

    Figure 1. As the window slides by one column, 16 pixels changedbut only 3 histogram bins changed. The histogram index map isgenerated with quantized visual words as explained in section 4.1.

    Branch-and-bound Such methods [5, 19, 4] nd theglobal optimal window in sub-linear time by iterating be-tween: 1) branching the parameter space into sub-regionsand; 2) bounding the objective function over those regions.The algorithm always splits the region with best bounds un-til the region contains only one point and can no longer besplit, implying that the global optimum is found. A tightbounding function is required for fast convergence. Suchbounds have been derived for histogram based functionssuch as linear SVM classi er and 2 similarity [5, 4], andthe technique has shown excellent running performance inobject detection and image retrieval.

    Still, there are a few limitations. Because integral his-togram is used to evaluate the bounding function and eachbin needs two integral images (twice memory), the memoryproblem is more serious when B is large. It ef ciently ndsthe global optimal window but does not evaluate other win-dows. This is appropriate for an accurate objective functionwith a clear and desirable peak, but hard for a function thatis at (uncertain) or multi-mode, e.g ., an image without orwith multiple objects in object detection/tracking. In suchcases its computational complexity increases and could beas bad as an exhaustive search in the worst case. 2

    Our method is complementary to the above techniques.It scales well with the histogram dimension in both memoryand time. It performs dense function evaluation ef cientlyand can be applied to objective functions that branch-and-bound method is hard to apply due to the dif culty in ob-taining a tight bound, e.g., an SVM with a non-linear kernel.

    2It has been shown in [19] that the worst complexity can be improvedfor a speci c objective function

    3004

  • 8/10/2019 Efficient Histogram Based Sliding Window

    3/8

    F (h ) f b(hb) timedot product/linear SVM hb mb 3.2

    L2 norm (hb mb)2 3.5histogram intersection min(hb, m b) 3.9

    L1 norm |hb m b| 4.1 2 distance (h b m b )

    2

    hb

    + mb

    7.1Bhattacharyya similarity hb m b 64.8

    entropy hb loghb 93.9SVM with 2 kernel svk =1 wk 2(hb, m kb ) 460

    Table 1. Several functions in form (1) and their relative computa-tional costs ( sv = 50 for SVM). The running time numbers aremeasured by running these functions millions of times on a mod-ern CPU. {m b }Bb =1 is either a model histogram or a support vectorto which the feature histogram h is compared. {wk }svk =1 is theweight of support vectors in SVM.

    3. Ef cient Histogram-Based Sliding Window

    Let h = {h1 ,...,h B } denote the feature histogram of the window and F (h ) denote the objective function. As thewindow slides, it is unnecessary to evaluate F across theentire histogram but suf cient to only update the affectedpart. This requires that F is incrementally computable , i.e .,there exists a function F simpler than F , such that

    F (h + h ) = F (h ) + F ( h , h ).Therefore incremental computation of F ( h , h ) is moreef cient than re-evaluating F (h + h ).

    In this paper, we study bin-additive functions F , i.e ., thatcan be expressed as summation 3 of functions de ned on in-dividual bins,

    F (h ) =B

    b=1

    f b(hb). (1)

    Table 1 summarizes several functions in this family thatare commonly used in vision problems, e.g ., various bin-to-bin histogram distances or so called quasi-linear histogramkernels [2].

    It is easy to see that F is incrementally computable,

    F (h + h ) = Bb=1 f b(hb + hb)

    =B

    b=1f b(hb)

    +

    b,h b =0f b(hb + hb) f b(hb)

    F (h ) F ( h , h ).Let | h | be the number of non-zero entries in h , eval-uation of F requires 2| h | evaluations of f (). By storing3 Strictly speaking, summation should be function of summation.

    We use the former for simplicity.

    1:initialize each column histogram c x with rst r pixels2:initialize histogram h = rx =1 c x3:initialize function {db = f b(hb)}Bb=1 and F = b db4:for y = 1 to n r5: for x = 1 to n r6: foreach b h ( h = c x + r 1 c x 1) that hb = 07: hb hb + hb8: F F + f b(hb) db9: db f b(hb)10: end

    11: write F to output image at (x, y )12: end13: update all column histograms c x by adding pixel

    at (x, y + r ) and removing pixel at (x, y )14:end

    Figure 2. Ef cient histogram-based sliding window (EHSW-D).

    and maintaining those values

    {db = f b(hb)

    }Bb=1 , only

    | h

    |evaluations of f are needed.EHSW-D Our rst algorithm is denoted as EHSW-D(Ef cient Histogram-based Sliding Window-Dense). Sim-ilarly as in [16], for each column x [1..n ], a columnhistogram c x of r pixels is maintained. When the windowslides by one column, the increment h is computed as

    h = c + c ,where c + and c are the new and old column histograms,respectively. The algorithm is summarized in Figure 2.

    Let tA be the computational cost of arithmetic operations(addition/substraction/compare) and tV the cost of evaluat-

    ing f . The computational cost for each window is as fol-lows: a column histogram has one pixel added and one pixelremoved ( 2tA ) and h is updated via adding and removing acolumn histogram ( 2Bt A ). In function evaluation, all binsare traversed ( Bt A ) but f is evaluated only for non-zerobins 2sr times ( 2srt V ).

    EHSW-S When B r , most entries in c are zero andtraversing the array to nd non-zero entries becomes veryinef cient. It is better to use a sparse structure that retainsonly non-zero entries in c .

    The sparse structure should allow ef cient insertion, re-moval and query of a bin entry (sorting is not needed). Anatural choice is to use a hash table with bin as key and its

    value as content. As insertion/removal involves expensivememory operations, we implement a specialized hash ta-ble that avoids such operations. A list of (bin, value ) pairsis maintained for non-zero bins. B buckets are allocatedin advance and each bucket holds a pointer pointing to thecorresponding element in the list (the pointer correspond-ing to an empty bin is null). Bucket con iction never hap-pens and query of a bin is done in O(1) time via its bucketpointer. As the list contains at most r pixels, we pre-allocate

    3005

  • 8/10/2019 Efficient Histogram Based Sliding Window

    4/8

    Method Construct h Evaluate F EHSW-D (2 + 2B )tA Bt A + 2 srt V EHSW-S (2 + ( r + 6) c)tA 2srt A + 2 srt V Integral 3Bt A Bt V

    Distributive (2 + 2B )tA Bt V

    Table 2. Computational complexity (per window) of our methods,integral histogram [11] and distributive histogram [16]. t A and t V are the computational cost of arithmetic operation and f () eval-uation, respectively. For different f (), t V varies a lot and couldbe much more expensive than t A (see Table 1). s and c are imagesmoothness measurements, s 1, c 1.

    and retain a buffer that holds r (bin, value ) pairs. Conse-quently, insertion/removal directly fetches/returns memoryunits from/to the buffer ( rt A / 2, the average time of linearscanning r memory units) and updates the list pointers ( 2tA )and bucket pointer ( tA ). Compared to a standard STL hashmap, our implementation is much faster even with a large r .

    The new algorithm using sparse representation is de-noted as EHSW-S. It is slightly modi ed from EHSW-Din Figure 2 and its computational cost for each window isas follows: in histogram update (line 13), the pixel is rstlyqueried in the hash table ( 2tA for two pixels). Insertion orremoval is invoked when the incremented bin does not ex-ist or decremented bin becomes empty, i.e ., when the addedor removed pixel is different from all the other pixels inthe column. Let c [0, 1](coherence) denote such proba-bility, the cost of insertion and removal is then (r + 6) ctA((r/ 2 + 3) ctA for one pixel). Note that c is different froms but their values are usually similar as observed in our ex-periments. In function evaluation, the two sparse columnhistograms are traversed separately (line 6-10 is repeatedfor c x + r and c x 1 ). Update of hb (line 7) and evaluation of f are performed 2sr times ( 2srt A and 2srt V , respectively).

    Computational Complexity Table 2 summarizes thecomputational complexity of several methods. We focus onthe per window cost as it is the dominant factor. Integral his-togram approach [11] computes each bin of the histogramwith three arithmetic operations on an integral image. Dis-tributive approach [16] constructs the histogram in the sameway as EHSW-D. Both of them need to evaluate the objec-tive function over all the histogram bins.

    To compare the running time of those methods, we per-

    form synthetic experiments on a PC with a 2.83G Pentium 4CPU and 2G memory. As performance of our methods de-pends on s,c, we x s = c = 0 .25 in the experiments. Thisis a reasonable setting as will be seen in real experiments.The image size is xed as 512 512. The computationaltime of different objective functions, histogram dimensionsand window sizes, is summarized in gures 3 and 4.

    From the computational complexity in Table 2 and run-ning time in gures 3 and 4, we can draw several general

    conclusions: 1) integral histogram is not advantageous forsliding window computation 4; 2) EHSW becomes more ad-vantageous as histogram dimension increases, i.e ., when Bis very small, Dist. > EHSW-D > EHSW-S; when B is large,EHSW-S > EHSW-D > Dist.; 3) EHSW becomes more ad-vantageous as the objective function complexity increases

    (L1Norm Entropy SVM Chi-square).Note that the sparse representation used in EHSW-Scould also be used in distributive approaches [22, 16]. How-ever, it will not be as helpful as in our method because thoseapproaches do not exploit the sparseness of histogramincre-ment to update the objective function.

    Memory Complexity Integral histogram stores an inte-gral image for each bin and consumes n2B memory. Dis-tributive histogram approach stores a histogram for eachcolumn with totally nB memory. Compared with dis-tributive approach, our method stores additional B values

    {db}Bb=1 . Therefore EHSW-D consumes (n + 1) B mem-ory. For EHSW-S, each column histogram stores B bucketsand a list buffer of r entries, with each entry consisting of apair of values and a pointer. Therefore, the total memory is(n + 1) B + 3 nr .

    3.1. Extensions

    More window shapes For a non-square window of sizer w r h , the complexity factor r in Table 2 is reduced tomin( r w , r h ) by sliding the window along the longer side,i.e ., horizontally when r w > r h and vertically when r w 1024 due to the physical memory limitation.EHSW-S is independent of B while other approaches have a linear dependency.

    16 32 48 64 80 96 112 128 144 1600

    1

    2

    3

    4

    Window radius (B=512, L1Norm)

    T i m e

    ( s e c o n

    d s

    )EHSW SEHSW DDistributiveIntegralIntegral Pre

    16 32 48 64 80 96 112 128 144 1600

    1

    2

    3

    4

    Window radius (B=512, Entropy)

    T i m e

    ( s e c o n

    d s

    )

    EHSW SEHSW DDistributiveIntegralIntegral Pre

    16 32 48 64 80 96 112 128 144 1600

    10

    20

    30

    40

    50

    Window radius (B=512, SVM Chi square)

    T i m e

    ( s e c o n

    d s

    )EHSW SEHSW DDistributiveIntegralIntegral Pre

    Figure 4. Running time with varying window size r and different objective functions. Histogram dimension is xed as B = 512 , atypical size of an 83 color histogram or a visual word code book. Our methods are linear in r but still outperform other approaches.Pre-computation time of integral histogram is also shown for a reference.

    4. Applications

    4.1. Object Detection

    Computing a con dence map of object detection is im-portant in computational models that combine multipletasks and fuse contextual information [8, 12]. Hoiem et.al.s method [8] requires that each task outputs intrinsic im-ages . In object detection task, those are the con dence mapsof individual objects. Heitz et. al. s approach [12] learnscascaded classi cation models. A high level classi er com-putes features directly on the con dence map of a low levelobject detector.

    Figure 5 shows exemplar results in our experiment. It isdif cult to determine the accurate location/scale of individ-ual objects from only local appearance. Exploiting contex-tual relationship is helpful to resolve such ambiguities.

    Support vector machine classi cation and bag of vi-sual words model are leading techniques in object detec-tion [5, 2]. A large code book and a complex kernel func-tion typically give rise to good performance. Our method

    can compute a con dence map much more ef ciently thanother techniques in this setting.Experiment We use the challenging PASCAL VOC

    2006 dataset [15] and test four object classes, i.e., person,motorbike, bicycle and horse, as their contextual relation-ships can often be observed, e.g ., a person is above a motor-bike or a horse.

    In this experiment, we aim to verify the ef ciency of ourmethod, and our implementation uses standard and state-

    of-the-art techniques. The object is represented as imagepatches densely sampled at regular grid points [17, 2]. Eachpatch is described by its SIFT descriptor [9] and averageRGB colors. The patches are quantized into a code book of K visual words created by k-means clustering on 250, 000randomly selected patches. A 2 2 grid [20] is used toincorporate the spatial relation of the object parts and theobject is represented as concatenation of histograms of vi-

    sual words on the cells. An SVM classi er is trained usingmanually labeled object windows as positive examples andrandomly sampled windows from negative images as nega-tive examples.

    Performance Detection performance is measured by av-erage precision (AP) . It is the average of precisions at differ-ent levels of recall and used in PASCAL competition [15].We tested differentcode book sizes and SVMs using a linearkernel and a non-linear 2 kernel [2]. Results are reportedin Figure 6. We can draw two conclusions: 1) A 2 kernelsigni cantly outperforms a linear kernel; 2) Increasing thecode book size improves the performance as a small size

    code book has too coarse quantization and lacks discrim-inability. In our experiments, performance stops increasingafter K reaches 1600.

    Using K = 1600 and 2 kernel, our method (EHSW-S) is signi cantly faster than distributive approach [16].The speedup factor depends on the image smoothness andis from tens to hundreds in our experiments. Statistics of smoothness values and speedup factors of 2686 test images,as well as several example images, are illustrated in Figure

    3007

  • 8/10/2019 Efficient Histogram Based Sliding Window

    6/8

    Figure 5. Top: detected objects of different locations/scales. Bot-tom: con dence maps of different objects.

    400 800 1200 16000

    0.050.1

    0.150.2

    0.250.3

    0.350.4

    code book size

    A P ( m o

    t o b i k e

    ) non linear kernel

    linear kernel

    400 800 1200 16000

    0.05

    0.10.15

    0.2

    codebook size

    A P ( h o r s e

    )non linear kernel

    linear kernel

    Figure 6. Average precisions of motorbike(left) andhorse(right) classes. Class bicycle has a similar perfor-mance as motorbike. For person class, the best AP is 0.049with K = 1600 and a non-linear kernel. Our implementationuses standard ingredients and achieves comparable performancein PASCAL 2006 detection competition [15].

    7. For an image size of about 300 300, a window sizeof 140

    60 and an SVM with 1000 support vectors, ourmethod typically takes a few seconds to generate a con -dence map.

    The optimal code book size is affected by the patch de-scriptor. With high dimensional and more informative patchdescriptors, a large code book is required, e.g., the per-formance of object detection experiments in [10] keeps in-creasing even when K reaches 10, 000. Our method wouldbe more advantageous in such a setting.

    The non-linear kernel ( 2 ) evaluation has a complex-ity linear in the number of support vectors. Recently Majiet. al. [21] developed approximation technique that reducesthis linearity to a constant. Therefore non-linear kernel clas-

    si

    cation becomes much faster at the cost of inexact evalu-ation. Their technique is complementary to our method andcan be easily combined.

    4.2. Object Tracking

    Local tracking methods using color histogram [6, 18]have demonstrated good performance when the target hassmall frame-to-frame motion. Nevertheless, they are proneto failure in case of fast motion and distracting background

    0 0.1 0.2 0.3 0.40

    0.05

    0.1

    0.15

    0.2

    0.25

    sparseness(s) / coherence(c)

    p r o p o r t

    i o n s

    c

    40 60 80 100 120 1400

    0.05

    0.1

    0.15

    0.2

    speedup factor

    p r o p o r t

    i o n

    s=0.102, speedup = 96

    s=0.229, speedup = 61

    s=0.327, speedup = 47Figure 7. Top left: distribution of smoothness measurements s andc. Top right: distribution of speedup factors of our method EHSW-S over distributive approach [16]. Bottom: three test images, theirvisual word maps and statistics.

    clutter. Full frame tracking can solve these problems withexhaustive search in the entire frame. However, previoussuch approaches [11, 3, 25, 16] need to use small size fea-tures when a high running performance is required.

    Our method can achieve high running performance usingboth full frame search and more effective high dimensionalhistogram features. This is illustrated in a challenging ex-ample shown in Figure 8. The three target skiers are simi-lar, moving fast and occluding each other. The backgroundalso contains similar colors. There are a lot of local optimathat will fail local tracking methods. In full frame track-ing, we compare results using a 163 RGB histogram [6]and a 16 bins intensity histogram. 5 The likelihood mapsare displayed in the middle and bottom row in Figure 8, re-spectively, with the best 10 local optima (dashed rectangles)overlayed. To identify the effectiveness of using differentfeatures, each local optimum is labeled as correct (red rect-angles) if its overlap with ground truth is more than 50%, orwrong (green rectangles) otherwise.

    5We have also tried the 16 bins histogram in hue channel as usedin [16], but the result is worse than that of using intensity.

    3008

  • 8/10/2019 Efficient Histogram Based Sliding Window

    7/8

    #068#022 #244#198#152

    Figure 8. A tracking example of 250 frames. Top: ground truth tracking results of the right skier (yellow rectangles). Middle: likelihoodmaps using 163 bins RGB histogram. Bottom: likelihood maps using 16 bins intensity histogram. On each likelihood map, 10 best localoptima are overlayed and labeled as correct (red dashed rectangles) or wrong (green dashed rectangles) based on their overlap with groundtruth.

    As can be clearly observed, the intensity histogrampoorly discriminates the target (right skier) from the back-ground and generates noisy likelihood maps with manyfalse local optima. The color histogram generates cleanerlikelihood maps with better local optima. The average num-bers of correct local optima per frame using color and inten-sity histograms are 4.0 and 1.8, respectively.

    Performance With intensity histogram, both distributiveapproach [16] and EHSW-S run at high frame rates, 60 and50 fps (frames per second), respectively. With color his-togram, distributive approach slows down to 0.3 fps whileEHSW-S still runs in real time 25 fps, obtaining 83 timesspeedup.

    Bhattacharyya similarity [6] is used as likelihood. Wehave also tested L1(2) norms but found they are worse thanBhattacharyya similarity. The image size is 320 240, tar-get size is 53 19 and average sparseness value is s =0.27, c = 0.29 for color histogram.

    Discussion It is worth noting that the global optima are

    wrong on some frames even using color histogram. Thisindicates the insuf ciency of only searching the global op-timum on each frame. A solution is to impose the tempo-ral smoothness constraint and nd an optimal object pathacross frames via dynamic programming [3, 25], using lo-cal optima on each frame as object candidates. With suchglobal optimization technique, better local optima are morelikely to generate correct results. Using color histograms,there are 14 frames where the correct object is missed, i.e .,

    none of the 10 best local optima is correct, mostly due toocclusion ( e.g ., frame 198 in Figure 8). While intensity his-togram is used, this number is 35.

    4.3. Feature Entropy for Image Saliency Analysis

    Image saliency analysis problem is to nd visually infor-mative parts in images. It is important in object detection,recognition and image understanding. Many methods havebeen proposed in the literature (see [23] for a review) whilethe task still remains very challenging.

    Entropy is the randomness of a distribution and can serveas an information measurement of visual features. It is di-rectly used as image saliency in [7] and can be combinedwith other methods, e.g., the multi-cue learning frame-work [24].

    Our method can compute an entropy map ef ciently. Wetested a subset of images from the public data set [24] us-ing 162 bins ab color histogram and 16 bins intensity his-

    togram. We observed that the result using color histogramis clearly more visually informative than that using intensityhistogram in most images.

    A few examples are shown in Figure 4.3. It is worthnoting that using color histogram is not only better, butalso faster with our method as the image smoothness isstronger in the color space. Most of the test images havethe sparseness value s[0.03, 0.1] for color histogram andour method is much faster than distributive approach [16].

    3009

  • 8/10/2019 Efficient Histogram Based Sliding Window

    8/8

    image(size) feature(B) s EHSW-S Dist. [16]horse color( 162 ) 0.093 32 ms 158 ms

    (400 322) intensity(16) 0.184 62 ms 54 mscar color( 162 ) 0.044 15 ms 121 ms(400 266) intensity(16) 0.166 54 ms 43 ms

    dog color( 162 ) 0.022 11 ms 133 ms(400 300) intensity(16) 0.060 20 ms 23 ms

    Figure 9. Top: from left to right are images, entropy maps us-ing 162 bins ab color histogram and entropy maps using 16 binsintensity histogram. Sliding window is 16 16. Bottom: statis-tics. Note that EHSW-S with color histogram is even faster thanthat with intensity histogram, because the image appearance issmoother in ab color space than in intensity space, i.e .,sparsenesss is smaller.

    5. Conclusions

    We present an ef cient method for computing ahistogram-based objective function with a sliding window.The high ef ciency bene ts from a natural prior on spa-tial coherence of image appearance. The ef ciency of theproposed method has been demonstrated in several applica-tions, where a signi cant speedup is achieved with compar-ison to state-of-the-art methods. Future work is to extendour method for more complex objective functions, such asearth movers distance, bilateral lters, etc.

    References[1] A.Hadid, M.Pietikainen, and T.Ahonen. A discriminative

    feature space for detecting and recognizing faces. In CVPR ,2004.

    [2] A.Vedaldi, V.Gulshan, M.Varma, and A.Zisserman. Multiplekernels for object detection. In ICCV , 2009.

    [3] A. M. Buchanan and A. W. Fitzgibbon. Interactive fea-ture tracking using k-d trees and dynamic programming. InCVPR , 2006.

    [4] C.H.Lampert. Detecting objects in large image collectionsand videos by ef cient subimage retrieval. In ICCV , 2009.

    [5] C.H.Lampert, M.B.Blaschko, and T.Hofmann. Beyond slid-ing windows: Object localization by ef cient subwindowsearch. In CVPR , 2008.

    [6] D. Comaniciu, V. Ramesh, and P. Meer. Real-time trackingof non-rigid objects using mean shift. In CVPR , 2000.

    [7] C.Rother, L.Bordeaux, Y.Hamadi, and A.Blake. Autocol-lage. Proceedings of ACM SIGGRAPH , 2006.

    [8] D.Hoiem, A.Efros, and M.Hebert. Closing the loop in sceneinterpretation. In CVPR , 2008.

    [9] D.Lowe. Distinctive image features from scale-invariantkeypoints. IJCV , 60(2):91110, 2004.

    [10] F.Moosmann, B.Triggs, and F.Jurie. Fast discriminativevisual codebooks using randomized clustering forests. In NIPS , 2006.

    [11] F.Porikli. Integral histogram: A fast way to extract his-tograms in cartesian spaces. In CVPR , pages 829836, 2005.

    [12] G.Heitz, S.Gould, A.Saxena, and D.Koller. Cascaded classi- cation models: Combining models for holistic scene under-

    standing. In NIPS , 2008.[13] T. Huang, G. Yang, and G. Tang. A fast two-dimensional me-

    dian ltering algorithm. IEEE Trans. Acoust.,Speech, SignalProcessing , 27(1):1318, 1979.

    [14] J.Sivic and A.Zisserman. Video google: A text retrieval ap-proach to object matching in videos. In ICCV , 2003.

    [15] M.Everingham, A.Zisserman, C.Williams, and L.Gool. Thepascal visual object classes challenge 2006 results. Technicalreport, 2006.

    [16] M.Sizintsev, K.G.Derpanis, and A.Hogue. Histogram-basedsearch: a comparative study. In CVPR , 2008.

    [17] N.Dalal and B.Triggs. Histograms of oriented gradients forhuman detection. In CVPR , 2005.

    [18] P. P erez, C. Hue, J. Vermaak, and M. Gangnet. Color-basedprobabilistic tracking. In ECCV , 2002.[19] S.An, P.Peursum, W.Liu, and S.Venkatesh. Ef cient algo-

    rithms for subwindow search in object detection and local-ization. In CVPR , 2009.

    [20] S.Lazebnik, C.Schmid, and J.Ponce. Beyond bags of fea-tures: Spatial pyramid matching for recognizing naturalscene categories. In CVPR , 2006.

    [21] S.Maji, A.Berg, and J.Malik. Classi cation using intersec-tion kernel support vector machine is ef cient. In CVPR ,2008.

    [22] S.Perreault and P.Hebert. Median ltering in constant time.Trans. Image Processing , 16(9):23892394, 2007.

    [23] T.Huang, K.Cheng, and Y.Chuang. A collaborative bench-mark for region of interest detection algorithms. In CVPR ,2009.

    [24] T.Liu, J.Sun, N.Zheng, X.Tang, and H.Shum. Learning todetect a salient object. In CVPR , 2007.

    [25] Y.Wei, J.Sun, X.Tang, and H.Shum. Interactive of ine track-ing for color objects. In ICCV , 2007.

    3010