6 Conﬁdence measures for object detection6 Conﬁdence measures for object detection Portions of...

6 Confidence measures for object

detection

Portions of this chapter were previously presented at the 20th Australian Joint Confer-

ence on Artificial Intelligence (Horton et al., 2007).

As discussed in section 2.6 and fig. 2.4, a Haar Classifier Cascade tests whether an

image region represents an object as follows:

1. A cascade steps through several stages in turn; it stops and returns false if a stage

returns false. If all stages return true, the cascade returns true.

2. A stage returns true if the sum of its feature outputs exceeds a chosen threshold.

3. A feature returns the sum of its rectangle outputs.

4. A rectangle returns the sum of pixel values in the image region bounded by that

rectangle, multiplied by a chosen weight.

While the Haar Classification output is binary, its intermediate values are numeric

and continuous. This suggests that the stages could derive a confidence measurement

from the features, and the cascade could combine the stage confidences to return an

overall confidence measurement. This has similarities to the stacking experiments men-

tioned in section 2.3.3, which showed stacking only effectively combines classifier output

when that output contains class probability distributions. The confidence measurement

also amounts to measuring the margin of a boosted classifier, as in section 2.3.4.1.

The method proposed and implemented here returns the confidence for a region by

modifying the first two steps as described below. The modifications are illustrated in

fig. 6.1; this may be compared to the illustration of the binary process in fig. 2.4.

1. A cascade steps through several stages, adding up their outputs; it stops if any stage

returns negative. If all stages are non-negative, the confidence returned is this cumu-

lative sum; if not, it returns false.

2. A stage returns the sum of its feature outputs minus its threshold.

84

Chapter 6 Confidence measures for object detection 85

Figure 6.1: Haar Classifier Cascade confidence measurement process

6.1 Uses

The confidence measurement described above derives more information from a cascade

than the binary classification does. This information may lead to more accurate object

detection. Two potential ways to use this information are proposed here: hill-climbing

to maximise the confidence of the regions found by binary detection and measuring

confidence across an entire image to build a ‘confidence map’.

6.1.1 Hill-climbing

Here, regular binary detection finds potential object regions in the image. The hill-

climbing routine then measures the confidence of all regions with similar scales and

positions. The highest-confidence region is selected and the process repeats until no

nearby region has higher confidence than the currently chosen one. The number of

hill-climbing steps affects running time, so steps are listed in section 6.4.7.

The scale and position adjustments implemented here for a w×h pixel region cover

the 9 scale adjustments from 1.05−4 to 1.054 and the 81 position adjustments from

−w4 ,−h

4 pixels to +w4 , +h

4 pixels.

6.1.2 Confidence mapping

If the cascade confidence is measured across the image for a variety of region sizes, it

is possible to build a ‘confidence map’ of the image. This will illustrate the confidence

that there is an object centred upon each point in the image, along with information

about sizes and (if necessary) orientations.


6.1.2.1 Merging confidences

As with binary detection, confidence mapped detections form clusters around each ob-

ject in the image and should be combined in a manner similar to the near neighbour

merging described in section 2.6.6. The chosen method finds similarly-placed confi-

dence measurements and sums their confidence; the location and scale of the summed

confidence is the location and scale of the local maximum of the original confidence

measurements. The definition of ‘similarly-placed confidence’ is as follows:

If some detection E is sufficiently close to a higher-confidence detection D such

that, if D were a positive annotation instead of a detection, E would be considered a

successful detection of D, Econfidence is added to Dconfidence and E is discarded.

Under the OpenCV defaults described in section 2.7.2, this means a face or fish

detection D will gain all lower-confidence detections centred within 3×Dsize10 pixels and

sized between 23Dsize and 3

2Dsize. For seahorse segments the similarity measure in

section 4.3.1 is used instead.

The results in section 6.4.1 show why this form of merging was used.

If a binary classification is required, a confidence threshold can be chosen, based on

its trade-off between true positives and false positives. ROC curves may be created by

varying this threshold from 0 to∞. Figures in appendix C such as C.2(e), C.2(f), C.5(e)

and C.5(f) show detection rectangles with their confidences; false positives among them

may be excluded by selecting an appropriately high threshold.

6.1.2.2 Multiple cascades

Even if confidence maps improve the accuracy of single cascades, they may not effec-

tively combine the output of multiple cascades. The confidence sum described above

will combine the confidences returned by each of the cascades used, which assumes

that the stage thresholds, and therefore the final confidence measurements, are compa-

rable. The cascade training process does not guarantee this property, because binary

cascades don’t require it. If this becomes a problem confidence mapping may be run

on the training images to find the minimum, average or maximum confidences returned

by each cascade. Confidence results at classification time may then be divided by those

amounts as a crude form of normalisation.


6.2 Stage variations

Cascade confidence measurement offers some intriguing possibilities in modifying the

cascade before running it, or in running multiple variations of the cascade and com-

bining their results. Two such modifications are considered here: accepting some stage

failures, and running virtual attribute subsetting to gain multiple classifications from

a single cascade.

6.2.1 Stage failure tolerance

As implemented, the confidence measurement still halts if any stage returns false.

It is possible, however, to merely accept one or more stage ‘failures’, add the stage

result (which will be negative) to the cumulative confidence measurement, and continue

running to the next stage of the cascade. This may yield a more robust result, as it

prevents any individual badly-trained stages from preventing positive detections.

6.2.2 Virtual attribute subsetting

With confidence mapping it is also possible to create multiple copies of the existing

cascade, remove a subset of stages from each copy, run all the copies and combine the

results. This is analogous to virtual attribute subsetting, as described in chapter 3.

Note that where ‘attribute’ is normally synonymous with ‘feature’, here each ‘attribute’

considered by the subset generator is one stage of the cascade. The process is still

‘virtual’ in the sense that it takes a single trained classifier and manipulates it at

classification time to obtain multiple classifications.

Each test of virtual attribute subsetting here uses 10 subsets, with selection by

the ‘both balanced subsets’ algorithm as suggested by section 3.4. Although the best

attribute (stage) proportion is expected to be in the 0.8..0.9 range, 0.7 will also be

tested. The local maximum and sum operation will combine the confidences from each

stage subset run, just as it combines confidences from multiple independent cascades

in section 6.1.2.2.


6.2.3 Computation costs

Both of these variations will increase running time. Accepting some stage failures will

increase computation time, as tests continue on regions that would previously have

been rejected by early stages. Also, while virtual attribute subsetting has no effect on

training time, it multiplies classification time by the number of subsets.

6.3 Cascade selection

The confidence measurement may be made from any Haar Classifier Cascade; there

is no need to train the cascade with confidence hill-climbing or confidence mapping

in mind. Therefore, as explained in section 4.1.1, the existing OpenCV sample cas-

cade haarcascade frontalface alt2 was used for face detection. It has 20 stages and its

window size is 20× 20 units.

For fish and seahorse detection, the cascades trained in chapter 5 were used again.

Cascades used for confidence-based object detection might benefit from different prop-

erties from those used to make binary object classifications, so the cascades trained

with different random angle ranges were tested and compared. The images or cascades

were always rotated in 15◦ steps, in accordance with the results in section 5.4.5.

6.4 Results

ROC curves were once again constructed in pairs to show the best random angle range

in each case, as were made for binary object detection; this is explained in section 5.4.

This section contains only summary curves constructed using the best results from each

test; the individual curves are shown in appendix B. Example images with detections

are included in appendix C.

6.4.1 Confidence mapping merge comparison

The reasons for confidence mapping taking local maxima and adding nearby confidences

are shown for faces in fig. 6.2 and for fish in fig. 6.3. When all confidence regions were

returned, performance was poor. Returning only local maxima improved accuracy, and

adding neighbouring detection confidences improved it further.


255

319

383

447

511

0 100 200 300 400 500False positives

Tru

e p

os

itiv

es

0.5

0.6

0.7

0.8

0.9

1

Tru

e p

os

itiv

e r

ate

Local maxima, summing neighboursLocal maximaAll confidences

Figure 6.2: ROC curves for face detection by confidence mapping, varying local maxi-

mum usage

0

100

200

300

400

500

0 200 400 600 800

False positives

Tru

e p

os

itiv

es

0

0.2

0.4

0.6

0.8

1

Tru

e p

os

itiv

e r

ate


(a) Rotated images

0

100

200

300

400

500

0 200 400 600 800

False positives

Tru

e p

os

itiv

es

0

0.2

0.4

0.6

0.8

1

Tru

e p

os

itiv

e r

ate


(b) Rotated cascades

Figure 6.3: ROC curves for fish detection by confidence mapping, varying local maxi-

mum usage


6.4.2 Normalising confidence maps from multiple cascades

There was some small variation between the confidences returned by the different fish

cascades; these are shown in table 6.1 and plotted in fig. 6.4. Normalising by these

values had an insignificant effect upon accuracy, as shown by the overlapping lines in

fig. 6.5. Normalisation was therefore not used in the experiments that follow.

Table 6.1: Confidence returned by rotated fish cascades on training images

Fig. 6.4 is a graph of these confidences.

0

10

20

30

40

50

60

70

45° 30° 15° 0° -15° -30° -45°

Cascade angle

Co

nfi

de

nc

e

Max

90th percentile

MeanMedian

Min

10th percentile

Figure 6.4: Graph of rotated fish cascade confidences


0

100

200

300

400

500

0 200 400 600 800

False positives

Tru

e p

os

itiv

es

0

0.2

0.4

0.6

0.8

1

Tru

e p

os

itiv

e r

ate

No normalisationNormalised by meanNormalised by minimumNormalised by 10th percentileNormalized by medianNormalised by 90th percentileNormalised by maximum

Figure 6.5: ROC Curves for fish detection by rotated cascade confidence mapping,

varying normalisation


6.4.3 Stage failure tolerance

Allowing one stage failure of the 20 in each cascade consistently improved the accuracy

of confidence mapped fish detection, both for rotated images (fig. 6.7(a)) and rotated

cascades (fig. 6.7(a)). It was not as useful for face detection (fig. 6.6) or seahorse

segment detection (figs. 6.8(b), 6.8(d)). Allowing more than one stage failure did not

cause further improvements, and often reduced accuracy.

The confidence mapping used in section 6.4.5 tolerated no stage failures for face

and seahorse detection, and used failure tolerance of 1 for fish detection.

255

319

383

447

511

0 100 200 300 400 500False positives

Tru

e p

os

itiv

es

0.5

0.6

0.7

0.8

0.9

1

Tru

e p

os

itiv

e r

ate

Standard confidence mappingFailure tolerance=1Failure tolerance=2Failure tolerance=3

Figure 6.6: ROC curves for face detection by confidence mapping, varying the number

of stage failures permitted


0

100

200

300

400

500

0 200 400 600 800

False positives

Tru

e p

os

itiv

es

0

0.2

0.4

0.6

0.8

1

Tru

e p

os

itiv

e r

ate


(a) Rotated images

0

100

200

300

400

500

0 200 400 600 800

False positives

Tru

e p

os

itiv

es

0

0.2

0.4

0.6

0.8

1

Tru

e p

os

itiv

e r

ate



Figure 6.7: ROC curves for fish detection by confidence mapping, varying the number

of stage failures permitted


0

200

400

600

800

0 200 400 600 800

False positives

Tru

e p

os

itiv

es

0

0.2

0.4

0.6

0.8

1

Tru

e p

os

itiv

e r

ate

Standard confidence mapping

Failure tolerance=1

(a) Rotated images, seahorse heads

0

200

400

600

800

0 200 400 600 800

False positives

Tru

e p

os

itiv

es

0

0.2

0.4

0.6

0.8

1

Tru

e p

os

itiv

e r

ate


Failure tolerance=1

(b) Rotated cascades, seahorse heads

0

200

400

600

800

0 200 400 600 800

False positives

Tru

e p

os

itiv

es

0

0.2

0.4

0.6

0.8

1

Tru

e p

os

itiv

e r

ate


Failure tolerance=1

(c) Rotated images, seahorse bodies

0

200

400

600

800

0 200 400 600 800

False positives

Tru

e p

os

itiv

es

0

0.2

0.4

0.6

0.8

1

Tru

e p

os

itiv

e r

ate


Failure tolerance=1

(d) Rotated cascades, seahorse bodies

Figure 6.8: ROC curves for seahorse segment detection by confidence mapping, varying

the number of stage failures permitted


6.4.4 Virtual attribute subsetting

The attribute proportions tested had little effect on virtual attribute subsetting ac-

curacy, as shown for face detection in fig. 6.9(a), fish detection on rotated images in

fig. 6.10(a) and fish detection by rotated cascades in fig. 6.10(c). It improved upon sim-

ple confidence mapping for faces, where it nearly reached its maximum true positive

count while creating less than 100 false positives, as shown in fig. 6.9(b). It was slightly

inferior to simple confidence mapping for fish detection on rotated images (fig. 6.10(b)),

but improved fish detection by rotated cascades (fig. 6.10(d)). This may be seen, along

with comparisons to permitting one stage failure, in figs. 6.10(b) and 6.10(d).

Seahorse segment detection by virtual attribute subsetting was not tested due to

time constraints; seahorse detection with its two cascades and larger set of angles was

already much slower than fish detection, and virtual attribute subsetting has a large

time penalty, as predicted in section 6.2.3 and confirmed in section 6.4.8.

383

447

511

0 100 200 300 400 500False positives

Tru

e p

os

itiv

es

0.75

0.875

1

Tru

e p

os

itiv

e r

ate

p=0.9p=0.8p=0.7

(a) Varying attribute proportion

383

447

511

0 100 200 300 400 500False positives

Tru

e p

os

itiv

es

0.75

0.875

1

Tru

e p

os

itiv

e r

ate

Failure tolerance=1

Virtual Attribute Subsetting, p=0.8


(b) Method comparison

Figure 6.9: ROC curves for face detection using confidence mapping and virtual at-

tribute subsetting


0

100

200

300

400

500

0 200 400 600 800

False positives

Tru

e p

os

itiv

es

0

0.2

0.4

0.6

0.8

1

Tru

e p

os

itiv

e r

ate

p=0.9p=0.8p=0.7

(a) Rotated images, varying attribute proportion

0

100

200

300

400

500

0 200 400 600 800

False positivesT

rue

po

sit

ive

s

0

0.2

0.4

0.6

0.8

1

Tru

e p

os

itiv

e r

ate

Failure tolerance=1Virtual attribute subsetting, p=0.9Standard confidence mapping

(b) Rotated images, method comparison

0

100

200

300

400

500

0 200 400 600 800

False positives

Tru

e p

os

itiv

es

0

0.2

0.4

0.6

0.8

1

Tru

e p

os

itiv

e r

ate

p=0.9p=0.8p=0.7

(c) Rotated cascades, varying attribute propor-

tion

0

100

200

300

400

500

0 200 400 600 800

False positives

Tru

e p

os

itiv

es

0

0.2

0.4

0.6

0.8

1

Tru

e p

os

itiv

e r

ate

Failure tolerance=1Virtual Attribute Subsetting, p=0.7Standard confidence mapping

(d) Rotated cascades, method comparison

Figure 6.10: ROC curves for fish detection using confidence mapping and virtual at-

tribute subsetting


6.4.5 Method comparison

Both confidence-based detection methods were generally better than simple binary ob-

ject detection. Confidence mapping was almost always the better of the two, although

for face detection they were equal for low false positive counts, as seen in fig. 6.11;

table A.1 in appendix A contains the original true and false positive counts.

For the rotated object detection confidence mapping was also usually better than

hill-climbing, which was in turn mostly better than binary detection. This is plotted

for fish detection in fig. 6.12 and for seahorse segment detection in fig. 6.13. Despite

the concerns raised in section 6.1.2.2, confidence mapping with multiple rotated cas-

cades showed large improvements, as shown for fish detection in fig. 6.12(b), seahorse

head detection in fig. 6.13(b) and seahorse body detection in fig. 6.13(d). However,

fig. 6.12(b) also contains the only case found where confidence mapping is inferior: for

very low (< 10) false positive counts, binary detection is more accurate.

255

319

383

447

511

0 100 200 300 400 500False positives

Tru

e p

os

itiv

es

0.5

0.6

0.7

0.8

0.9

1

Tru

e p

os

itiv

e r

ate

Confidence mappingBinary with hill-climbingBinary

Figure 6.11: ROC curves for face detection using binary detection, binary detection

followed by hill-climbing, and confidence mapping


0

100

200

300

400

500

0 200 400 600 800

False positives

Tru

e p

os

itiv

es

0

0.2

0.4

0.6

0.8

1

Tru

e p

os

itiv

e r

ate


(a) Rotated images

0

100

200

300

400

500

0 200 400 600 800

False positives

Tru

e p

os

itiv

es

0

0.2

0.4

0.6

0.8

1

Tru

e p

os

itiv

e r

ate



Figure 6.12: ROC curves for fish detection using binary detection, binary detection

followed by hill-climbing, and confidence mapping


0

200

400

600

800

0 200 400 600 800

False positives

Tru

e p

os

itiv

es

0

0.2

0.4

0.6

0.8

1

Tru

e p

os

itiv

e r

ate


(a) Rotated images, seahorse heads

0

200

400

600

800

0 200 400 600 800

False positives

Tru

e p

os

itiv

es

0

0.2

0.4

0.6

0.8

1

Tru

e p

os

itiv

e r

ate


(b) Rotated cascades, seahorse heads

0

200

400

600

800

0 200 400 600 800

False positives

Tru

e p

os

itiv

es

0

0.2

0.4

0.6

0.8

1

Tru

e p

os

itiv

e r

ate


(c) Rotated images, seahorse bodies

0

200

400

600

800

0 200 400 600 800

False positives

Tru

e p

os

itiv

es

0

0.2

0.4

0.6

0.8

1

Tru

e p

os

itiv

e r

ate


(d) Rotated cascades, seahorse bodies

Figure 6.13: ROC curves for seahorse segment detection using binary detection, binary

detection followed by hill-climbing, and confidence mapping


6.4.6 Angle ranges

As mentioned above, the graphs plotting ROC curves for individual random angle

ranges are listed in appendix B.

The results for hill-climbing on each image set closely match the binary cascade

results for each image set. The connections are listed in table 6.2.

The individual confidence mapping ROC curves are much closer together. This

shows that confidence mapping is more robust than binary detection if the cascade

random angle ranges are not optimal. The best angle ranges found and their relation-

ships to binary detection angle ranges are listed in table 6.3.

Table 6.2: ROC curve figure numbers for binary detections and their corresponding

hill-climbing curves, including the selected ‘best’ angles

Image set Rotated Binary Angle Hill-climbed Angle

Fish images 5.14(a)/5.14(b) 30◦ B.1(a)/B.1(b) 30◦

Fish cascades 5.15(a)/5.15(b) 25◦ B.2(a)/B.2(b) 20◦

Seahorse heads images 5.16(a)/5.16(b) 5◦ B.5(a)/B.5(b) 5◦

Seahorse bodies images 5.17(a)/5.17(b) 15◦ B.5(c)/B.5(d) 10◦

Seahorse heads cascades 5.18(a)/5.18(b) 10◦ B.7(a)/B.7(b) 20◦

Seahorse bodies cascades 5.19(a)/5.19(b) 10◦ B.7(c)/B.7(d) 10◦

Table 6.3: ROC curve figure numbers for binary detections and their corresponding

confidence mapping curves, including the selected ‘best’ angles

Image set Rotated Binary Angle Confidence-mapped Angle

Fish images 5.14(a)/5.14(b) 30◦ B.3(a)/B.3(b) 30◦

Fish cascades 5.15(a)/5.15(b) 25◦ B.4(a)/B.4(b) 15◦

Seahorse heads images 5.16(a)/5.16(b) 5◦ B.6(a)/B.6(b) 20◦

Seahorse bodies images 5.17(a)/5.17(b) 15◦ B.6(c)/B.6(d) 15◦

Seahorse heads cascades 5.18(a)/5.18(b) 10◦ B.8(a)/B.8(b) 35◦

Seahorse bodies cascades 5.19(a)/5.19(b) 10◦ B.8(c)/B.8(d) 10◦


6.4.7 Hill-climbing steps

Few hill-climbing steps were needed in most cases. Half of the detections made only

needed two hill-climbing step to maximise confidence, and none needed more than 9.

Fig. 6.14(a) shows that the object to detect had little effect upon the number of hill-

climbing steps made. Increasing random angle ranges in the positive training sets very

slightly increased the number of steps for all of the fish (fig. 6.14(b)), seahorse heads

(fig. 6.14(c)) and seahorse bodies (fig. 6.14(d)) detected. The mean number of steps

made consistently fell between 2.4 and 2.8, as seen in table 6.4.

0

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10

Hill-climbing steps

Pro

po

rtio

na

l fr

eq

ue

nc

y

Faces

Fish

Seahorse bodies

Seahorse heads

(a) All objects, random angle range=15◦

0

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10

Hill-climbing steps

Pro

po

rtio

na

l fr

eq

ue

nc

y

Range=0°

Range=15°

Range=45°

(b) Fish detection

0

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10

Hill-climbing steps

Pro

po

rtio

na

l fr

eq

ue

nc

y

Range=0°

Range=15°

Range=45°

(c) Seahorse head detection

0

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10

Hill-climbing steps

Pro

po

rtio

na

l fr

eq

ue

nc

y

Range=0°

Range=15°

Range=45°

(d) Seahorse body detection

Figure 6.14: Frequency of hill-climbing steps made during object detection

Proportional frequency is the proportion of all hill-climbs ending after each step.


Table 6.4: Summary of hill-climbing steps carried out during object detection

6.4.8 Classification time

To compare the time needed for classification, each of the methods considered in this

chapter was run 5 times on the 130 images in the face detection dataset. These times are

shown in table 6.5 and plotted in fig. 6.15. The ROC curves associated with these times

are in fig. 6.11 (standard methods), fig. 6.6 (confidence mapping with varying stage

failure tolerance) and fig. 6.9 (confidence mapping with virtual attribute subsetting).

These results show confidence mapping to be slightly binary detection. However,

profiling tests found that this was entirely due to idiosyncratic compiler optimisations,

and would probably not reappear on other platforms. This advantage also only applied

for standard confidence mapping, with no stage failure tolerance – any stage failure tol-

erance made confidence mapping slower than binary detection, while virtual attribute

subsetting was slower still.


Table 6.5: Time in seconds taken to classify the face dataset with different methods

A graph of the mean times is plotted in fig. 6.15.

0

500

1000

1500B

ina

ry

Hill-clim

bin

g

Co

nfid

en

cem

ap

pin

g

To

lera

nce

=1

To

lera

nce

=2

To

lera

nce

=3

p=

0.9

p=

0.8

p=

0.7

Cla

ssif

icati

on

tim

e (

seco

nd

s)

Confidence mapping with

virtual attribute subsetting

Confidence mapping with

failure toleranceStandard methods

Figure 6.15: Graph of time in seconds taken to classify the face dataset with different

methods


6.5 Conclusions

This chapter introduced a confidence measurement to the formerly binary Haar Classi-

fier Cascade object detection algorithm. It was used in two different ways, and tested

on three image sets. The tests showed that the ‘confidence mapping’ form was almost

always more accurate and sometimes faster than binary object detection. On the fish

and seahorse segment detection problems it also showed itself to be capable of com-

bining the confidences from multiple cascades, and to be more robust to suboptimal

choices of cascade training angle range than binary detection using the same cascades.

If additional accuracy is required, tests also showed that virtual attribute subsetting

on the cascade stages can increase accuracy at the expense of classification time.

These confidence measurements may be taken from any Haar Classifier Cascade.

The face detection used an existing cascade, while the fish and seahorse detections used

the cascades trained and used for binary detection in chapter 5. The training process

was not changed to optimise for confidence-based object detection.

6 Conﬁdence measures for object detection6 Conﬁdence measures for object detection Portions of...

Documents

Transcript of 6 Conﬁdence measures for object detection6 Conﬁdence measures for object detection Portions of...