Full Text

20
Machine Vision and Applications DOI 10.1007/s00138-011-0359-3 ORIGINAL PAPER Silhouette-based multi-sensor smoke detection Coverage analysis of moving object silhouettes in thermal and visual registered images Steven Verstockt · Chris Poppe · Sofie Van Hoecke · Charles Hollemeersch · Bart Merci · Bart Sette · Peter Lambert · Rik Van de Walle Received: 22 December 2010 / Revised: 22 June 2011 / Accepted: 5 July 2011 © Springer-Verlag 2011 Abstract Fire is one of the leading hazards affecting everyday life around the world. The sooner the fire is detected, the better the chances are for survival. Today’s fire alarm systems, such as video-based smoke detectors, however, still pose many problems. In order to accom- plish more accurate video-based smoke detection and to reduce false alarms, this paper proposes a multi-sensor smoke detector which takes advantage of the different kinds of information represented by visual and thermal imaging sensors. The detector analyzes the silhouette coverage of moving objects in visual and long-wave infrared registered (aligned) images. The registration is performed using a contour mapping algorithm which detects the rotation, scale and translation between moving objects in the multi-spectral images. The geometric parameters found at this stage are then further used to coarsely map the silhouette images and cov- erage between them is calculated. Since smoke is invisible in long-wave infrared its silhouette will, contrarily to ordinary moving objects, only be detected in visual images. As such, S. Verstockt (B ) · C. Poppe · C. Hollemeersch · P. Lambert · R. Van de Walle Department of Electronics and Information Systems, Multimedia Lab, Ghent University, IBBT, Gaston Crommenlaan 8, bus 201, Ledeberg, 9050 Ghent, Belgium e-mail: [email protected] S. Verstockt · S. Van Hoecke ELIT Lab, University College West Flanders, Ghent University Association, Graaf Karel de Goedelaan 5, 8500 Ghent, Kortrijk, Belgium B. Merci Department of Flow, Heat and Combustion Mechanics, Ghent University, Sint-Pietersnieuwstraat 41, 9000 Ghent, Belgium B. Sette Warringtonfiregent (WFRGent NV), Ottergemsesteenweg 711, 9000 Ghent, Belgium the coverage of thermal and visual silhouettes will start to decrease in case of smoke. Due to the dynamic character of the smoke, the visual silhouette will also show a high degree of disorder. By focusing on both silhouette behaviors, the sys- tem is able to accurately detect the smoke. Experiments on smoke and non-smoke multi-sensor sequences indicate that the automated smoke detection algorithm is able to coarsely map the multi-sensor images. Furthermore, using the low- cost silhouette analysis, a fast warning, with a low number of false alarms, can be given. Keywords Smoke detection · Multi-sensor · Multi-modal · Coverage analysis · Image registration 1 Introduction Video smoke detection (VSD) has become a hot topic in computer vision over the last decade [1]. Current research, such as the work of Calderara et al. [2], shows that the video- based detection of smoke promises fast detection and can be a viable alternative or complement for the more tradi- tional techniques, such as ionization and photoelectric fire detection. However, due to the variability of shape, motion, transparency, colors and patterns of smoke, existing VSD approaches are still vulnerable to missed detections and false alarms. The main cause of both problems is the fact that visual detection is often subject to constraints regarding the scene under investigation, e.g. changing environmental conditions, and the target characteristics. To avoid the disadvantages of using visual sensors alone, we argue that the use of other types of sensors, especially infrared (IR), can be of added value. Thanks to the improvement of resolution, speed and sen- sitivity of IR imaging, this newer type of imagery is started 123

description

aaa

Transcript of Full Text

Page 1: Full Text

Machine Vision and ApplicationsDOI 10.1007/s00138-011-0359-3

ORIGINAL PAPER

Silhouette-based multi-sensor smoke detectionCoverage analysis of moving object silhouettes in thermal and visual registered images

Steven Verstockt · Chris Poppe · Sofie Van Hoecke ·Charles Hollemeersch · Bart Merci · Bart Sette ·Peter Lambert · Rik Van de Walle

Received: 22 December 2010 / Revised: 22 June 2011 / Accepted: 5 July 2011© Springer-Verlag 2011

Abstract Fire is one of the leading hazards affectingeveryday life around the world. The sooner the fire isdetected, the better the chances are for survival. Today’sfire alarm systems, such as video-based smoke detectors,however, still pose many problems. In order to accom-plish more accurate video-based smoke detection and toreduce false alarms, this paper proposes a multi-sensorsmoke detector which takes advantage of the different kindsof information represented by visual and thermal imagingsensors. The detector analyzes the silhouette coverage ofmoving objects in visual and long-wave infrared registered(∼aligned) images. The registration is performed using acontour mapping algorithm which detects the rotation, scaleand translation between moving objects in the multi-spectralimages. The geometric parameters found at this stage are thenfurther used to coarsely map the silhouette images and cov-erage between them is calculated. Since smoke is invisible inlong-wave infrared its silhouette will, contrarily to ordinarymoving objects, only be detected in visual images. As such,

S. Verstockt (B) · C. Poppe · C. Hollemeersch · P. Lambert ·R. Van de WalleDepartment of Electronics and Information Systems,Multimedia Lab, Ghent University, IBBT,Gaston Crommenlaan 8, bus 201, Ledeberg, 9050 Ghent, Belgiume-mail: [email protected]

S. Verstockt · S. Van HoeckeELIT Lab, University College West Flanders,Ghent University Association, Graaf Karel de Goedelaan 5,8500 Ghent, Kortrijk, Belgium

B. MerciDepartment of Flow, Heat and Combustion Mechanics, GhentUniversity, Sint-Pietersnieuwstraat 41, 9000 Ghent, Belgium

B. SetteWarringtonfiregent (WFRGent NV),Ottergemsesteenweg 711, 9000 Ghent, Belgium

the coverage of thermal and visual silhouettes will start todecrease in case of smoke. Due to the dynamic character ofthe smoke, the visual silhouette will also show a high degreeof disorder. By focusing on both silhouette behaviors, the sys-tem is able to accurately detect the smoke. Experiments onsmoke and non-smoke multi-sensor sequences indicate thatthe automated smoke detection algorithm is able to coarselymap the multi-sensor images. Furthermore, using the low-cost silhouette analysis, a fast warning, with a low numberof false alarms, can be given.

Keywords Smoke detection · Multi-sensor · Multi-modal ·Coverage analysis · Image registration

1 Introduction

Video smoke detection (VSD) has become a hot topic incomputer vision over the last decade [1]. Current research,such as the work of Calderara et al. [2], shows that the video-based detection of smoke promises fast detection and canbe a viable alternative or complement for the more tradi-tional techniques, such as ionization and photoelectric firedetection. However, due to the variability of shape, motion,transparency, colors and patterns of smoke, existing VSDapproaches are still vulnerable to missed detections and falsealarms. The main cause of both problems is the fact that visualdetection is often subject to constraints regarding the sceneunder investigation, e.g. changing environmental conditions,and the target characteristics. To avoid the disadvantages ofusing visual sensors alone, we argue that the use of othertypes of sensors, especially infrared (IR), can be of addedvalue.

Thanks to the improvement of resolution, speed and sen-sitivity of IR imaging, this newer type of imagery is started

123

Page 2: Full Text

S. Verstockt et al.

Fig. 1 Smoke transparency in visual, short wave infrared (SWIR) and long wave infrared (LWIR) range. (source: http://www.xenics.com)

to be explored as a way to improve the object detectionand tracking performance. IR imaging is already used suc-cessfully in many video surveillance applications, e.g. trafficsafety, airport security and material inspection. Recently, IRvideo based flame detection is also gaining importance [3–7].As manufacturers ensure a decrease in IR imaging sensorcost, it is expected that this number of applications will alsoincrease significantly in the near future [8,9].

When light conditions are bad or the target its color issimilar to the background, IR vision is a fundamental aid.Even other visual-specific object detection problems, suchas shadows, do not cause problems in IR [10]. Neverthe-less, IR has its own specific limitations which do not occurin visual light, such as thermal reflections, IR-blocking andthermal-distance problems. As visible imagery has a higherresolution and can supply more detailed information aboutthe target, the combination of both sensors is considered awin–win [9,11]. The main benefit of using IR data, as wellas visible information, is that unreliably extracted parts fromone sensor might be reliably extracted from the other sen-sor. This provides an opportunity for improving detectionperformance by fusion of visual and IR sensors. Since mis-detections in visual images can be corrected by IR detectionsand vice versa, fewer false alarms will occur when both arecombined. As a logical consequence of these benefits, multi-sensor images have started to be actively used to improve theperformance of object detection and recognition, especiallyin the field of surveillance [12], automatic target recognition[13], tracking [14] and medical image analysis [15].

Although different IR spectral ranges can be used formulti-sensor object detection, we expect that the added valueof IR cameras in the long wave IR range (LWIR, 8–12 µm)will be the highest for detecting smoke, as smoke becomesmore and more transparent further in the infrared spectrum.As is illustrated in Fig. 1, a LWIR camera can even lookthrough the smoke. By focusing on the visible-invisible char-acter of smoke in visual-LWIR images, a multi-sensor detec-tor can detect smoke very accurately. Since smoke, contrarilyto ordinary moving objects, will only be detected in the visualimages, the coverage of LWIR and visual silhouettes of mov-ing objects will start to decrease in case of smoke. Due to thedynamic character of the smoke, this visual silhouette will

also show a high degree of disorder. By focusing on bothsilhouette behaviors, a multi-modal detector will more accu-rately detect the smoke.

As it is important to develop a detector which meets theneeds expressed by partners in the field [16], the followinglist of requirements, to which the multi-sensor detector mustadhere, is defined:

– Easy (re-)calibration: automatic registration(∼alignment) of multi-sensor (visual/LWIR) images.

– Low computational cost: the detector must be able to runin real-time.

– Low number of false alarms and no missed detections.– Fast warning/alarming with different levels of detection:

higher levels of detection should only be activated if thebasic fire probability is high.

– Sequence/scene independent with low number ofthresholds.

The remainder of this paper is organized as follows. Sect. 2lists related work in the literature. Next, Sect. 3 gives a globaldescription of the multi-sensor smoke detector. Based onthe analysis of existing registration approaches, Sect. 4 pro-poses the silhouette-based registration of thermal and visualimages. As we assume parallel sensors whose lines of sightare close to each other, the proposed registration consistsof a rigid transformation, which can be decomposed into a2-D rotation, scaling and translation. Subsequently, Sect. 5discusses the silhouette coverage analysis, which focuseson the visible-invisible character of smoke in visual-LWIRimages to distinguish between smoke and non-smoke movingobjects. Next, in Sect. 6, we report performance results of ourprimary experiments. Finally, Sect. 7 lists the conclusions.

2 Related work

Although, to the best of our knowledge, no work has beendone on multi-sensor smoke detection in IR and visualimages, the combination of detection in both spectral rangesis not new. In the latest decade, the fusion of visible and infra-red images has started to be explored as a way to improve the

123

Page 3: Full Text

Silhouette-based multi-sensor smoke detection

Fig. 2 Comparison of corresponding LWIR/visual objects

detection performance [17]. For example, Chen et al. [12,18]use multi-sensor image fusion for the detection of weapons,which can be seen in infrared but not in visual. Another exam-ple is the optimized people detection and tracking proposedby Benezeth et al. [19]. The combination of both types ofimagery yields information about the scene that is rich incolor, motion and thermal detail. Once registered, such infor-mation can be used to successfully detect and analyze activityin the scene [14].

Contrary to smoke detection, multi-sensor flame detec-tion has recently been studied by some authors. For examplein Arrue et al. [9] and Martinez-de Dios et al. [20] visualinformation is used to improve infrared detection of wild-fires. Arrue et al. discriminate false alarms by analyzing theratio between the alarm areas in visual and infrared images.Martinez-de Dios et al. present how potential fire alarms fromboth thermal and visual images can be fused to obtain morereliable fire detection characteristics and also compute thegeographical position of the detected alarms. The authorsalso proposed a multi-sensor flame detector [21]; this feature-

based detector analyzes hot and moving LWIR-visual objectsusing a set of flame features which focus on the distinctivegeometric, temporal and spatial disorder characteristics offlame regions. By combining the probabilities of these fastlyretrievable features, flames are accurately detected at an earlystage.

In order to combine the information in a multi-sensorsetup, e.g. for remote sensing, medical imaging and com-puter vision [22,23], the corresponding objects in the sceneneed to be aligned, i.e. registered. The goal of registrationis to establish geometric correspondence between the multi-sensor images so that they may be transformed, compared,and analyzed in a common reference frame [24]. Since cor-responding objects in visual and thermal image may havedifferent sizes, shapes, features, positions and intensities, asis shown in Fig. 2, the fundamental question to address dur-ing registration is: what is a good image representation towork with, i.e. what representation will bring out the com-mon information between the two multi-sensor images, whilesuppressing the non common information [25]?

123

Page 4: Full Text

S. Verstockt et al.

Fig. 3 Silhouette-based image registration of thermal and visual images (STAGE 1)

When choosing an appropriate registration method, a firstdistinction can be made between automatic and manual regis-tration. In applications with manual registration, e.g. using aheated calibration checkerboard [14], a set of correspondingpoints are manually selected from the two images to com-pute the parameters of the transformation and the registrationperformance is evaluated by subjectively comparing the reg-istered images. This is repeated several times until the regis-tration performance is satisfied. If the background changes,e.g. due to camera movement, the entire procedure needs tobe repeated. Because this manual process is labor intensive,automatic registration is more desirable. Therefore, we adaptthe latter in our system.

A second distinction for an appropriate (automatic)registration method, is between region, line and point fea-ture-based methods [26]. It is necessary to use features thatare stable with respect to the sensors, i.e. the same physi-cal artifact produces features in both images. Compared tothe correspondence of individual points and lines, region-based methods, such as silhouette mapping, provide morereliable correspondence between color and thermal imagepairs [12,18]. For example, comparing the visual and LWIRimages in Fig. 2, one can see that some information varies alot, but what is most similar, i.e. the mutual information, arethe silhouettes. Therefore, the proposed image registrationmethod performs a match of the transformed color silhouetteof the calibration object, i.e. a moving person, to its thermalsilhouette. The mutual information, i.e. the silhouette cover-age, is assumed to reach its maximal value when both imagesare registered. However, knowing that the same silhouettesextracted from LWIR and visual images can still have differ-ent details (as shown in the experiments), a complete exactmatch is (quasi) impossible. It is also important to mentionthat, instead of using a person as the calibration object, alsoother objects in the scene can be used.

3 Global description of the methodology

Based on the existing multi-sensor flame detectors and onthe visible-invisible characteristic of smoke in LWIR-visualimages, we present a multi-sensor smoke detection sys-tem that can be split up into two consecutive stages: themulti-sensor image registration and the multi-sensor smokedetection. The purpose of the first stage is to coarsely reg-ister the images taken simultaneously from two differentbut parallel sensors whose lines of sight are close to eachother. A silhouette contour based image registration algo-rithm based on human body silhouettes is developed at thisstage. The geometric parameters found at this stage are usedfor the transformation of the images in the second stage, inwhich coverage between the corresponding moving objectsin the registered images is calculated. In case of smoke, thissilhouette coverage will start to decrease. Due to the dynamiccharacter of the smoke, its visual silhouette will also show ahigh disorder. When both smoke behaviors are detected, thiswill result in a fire alarm. In the following, we discuss this inmore detail.

3.1 Thermal and visual image registration

The proposed image registration, which automatically findsthe correspondence between the silhouettes extracted fromsynchronous color and thermal LWIR image sequences, isschematized in Fig. 3. The registration starts with a mov-ing object silhouette extraction [12] in both LWIR and visualimage to separate the calibration objects, i.e. the moving fore-ground, from the background, which is assumed to be static.Key components of the moving object silhouette extractionare the dynamic background subtraction, automatic thres-holding and morphological filtering with growing structuringelements, which grow iteratively until a resulting silhouette

123

Page 5: Full Text

Silhouette-based multi-sensor smoke detection

Fig. 4 Silhouette coverage analysis (STAGE 2)

is suitable for thermal-visual silhouette matching. Each ofthese components, as well as the reason for choosing them,are discussed in detail in Sect. 4.

After silhouette extraction, 1D contour vectors are gen-erated from the resulting IR and visual silhouettes usingsilhouette boundary extraction, Cartesian to polar transformand radial vector analysis. Next, in order to retrieve the rota-tion angle (∼contour alignment) and the scale factor betweenthe LWIR and visual image, these contours are mapped ontoeach other using circular cross correlation [27] and contourscaling. Finally, the translation between the two images iscalculated using maximization of binary correlation. Theretrieved geometric parameters are used in the second stageof the multi-sensor detector to align the visual and thermalimages for silhouette coverage analysis.

3.2 Multi-sensor smoke detection

The proposed multi-sensor smoke detection is based oncoverage analysis of thermal-visual registered images. Thesilhouette coverage analysis, shown in Fig. 4, starts with thesimilar moving object silhouette extraction as the one usedfor image registration. Then, it uses the registration informa-tion, i.e. rotation angle, scale factor and translation vector,to map the thermal and visual silhouette images onto eachother. Finally, the coverage of the resulting thermal-visualsilhouette map is computed and is analyzed over time. Incase of silhouette coverage reduction with a high degree ofdisorder in visual silhouette contours, fire alarm is given. Formore details, the reader is referred to Sect. 5.

4 Silhouette-based registration of thermaland visual images

In this section, a more detailed description is given of eachof the steps in the silhouette-based registration of thermaland visual images. First, the extraction of visual and thermalsilhouettes of the calibration object is discussed. Although inour work a moving person is chosen as the calibration object,also other moving objects can be used. The only constraint is

that the object must have similar thermal and visual silhou-ettes.

Next, the contour mapping algorithm is presented for theanalysis of the thermal and visual silhouettes to detect therotation and scale between the multi-spectral images. Finally,maximization of binary correlation is described to detect thetranslation between the visual and thermal silhouette. At theend of the section, examples of the registration process in dif-ferent real-case scenarios are given to illustrate the accuracyof the proposed technique.

4.1 Visual silhouette extraction

In order to extract the visual silhouette of the calibration per-son from the background, we propose the algorithm shownin Fig. 5, in which intensity, color and edge information ofthe moving part of the visual images are merged. Mergingthese three types of information is the only way we can guar-antee the entire moving object silhouette is found under allcircumstances (in our experiments). The algorithm uses thevisual frame Fn , i.e. the input RGB video frame at time n, inwhich the calibration person is in the scene (Fig. 6b), and thevisual background estimation BGn , in which we assume thatno moving objects occur (Fig. 6a). The algorithm starts withtwo image transformations to convert Fn into the intensityimage In and the color image Cn . The color image Cn equalsthe ratio of the input image Fn by the intensity image In . Inshort, the pixel values of each of the RGB color bands of Fn

are divided by the intensity values of the corresponding pix-els in the intensity/grayscale image In . This gives us the colorvalues of the color image Cn . More detailed information onthe creation of Cn can be found in [28].

Next, a dynamic background subtraction [29] extracts themoving foreground (FG) out of In and Cn using the inten-sity and color image of the visual background estimation. Bycomputing the absolute difference between In and Cn witheverything in the scene that remains constant over time, i.e.BGn , only the moving part of those images remains. Theintensity BG estimation is updated dynamically after eachsegmentation using Eq. 1, in which [x, y] are the pixel coor-dinates, Sn is the final silhouette image and α is the updateparameter. This parameter specifies how fast new information

123

Page 6: Full Text

S. Verstockt et al.

Fig. 5 Visual silhouette extraction

Fig. 6 a Background (BG) estimation and b calibration input frame; c BG subtracted intensity and d BG subtracted color image

supplants old observations. Here α (= 0.95) was chosenclose to 1, according to Toreyin et al. [30]. The color BGestimation is updated analogously.

BGn+1[x, y] =

⎧⎪⎪⎨

⎪⎪⎩

αBGn[x, y] + (1 − α)In[x, y]if Sn[x, y] → BG

BGn[x, y]if Sn[x, y] → FG

(1)

After background subtraction, the resulting intensity andcolor foreground IFG,n and CFG,n are thresholded automati-cally using automatic gamma correction, (adaptive) k-meansclustering and morphological filtering with growing struc-turing elements, which grow iteratively until the resultingsilhouette is suitable for thermal-visual silhouette matching.It was found in our experiments that the combination of thesethree steps gave the best results, compared to other frequentlyused segmentation techniques such as contrast stretching.

Gamma correction changes the brightness distribution ofimages. Using an appropriate gamma, this correction resultsin a more useful input image for k-means clustering, mak-ing details in both light and dark portions of the image more

visible. To automatically generate an appropriate value theauthors use an automatic gamma correction [31]. Based onthe mean and standard deviation of the input image, for exam-ple IFG,n , the gamma value γ for the automatic correction iscalculated using Eq. 2. Similarly, the gamma value for CFG,n

can be calculated. As the images in Fig. 7 show, the gammacorrection improves the segmentation results a lot when lightconditions are bad or the color difference between the humancalibration object and the background is minimal. Similarresults can be achieved with homomorphic filtering [32].

If IFG,n > 0.5

γ = 1 + |0.5 − IFG,n|σ

Else

γ = 1

1 + |0.5−IFG,n |σ

(2)

For the extraction of the human silhouette in the gamma-corrected color and intensity foreground images, differentthresholding techniques can be used. Among all of these tech-niques, automatic thresholding, like Otsu [33] and k-means

123

Page 7: Full Text

Silhouette-based multi-sensor smoke detection

Fig. 7 Results of color silhouette extraction with and without automatic gamma correction

clustering [34], are widely used because of their simpleimplementation and low computational cost. These methodsautomatically select an optimal gray-level threshold value forseparating objects of interest from the background, based ontheir gray level distribution. However, since these standardmethods only focus on Euclidean intensity distance, they aresometimes insufficient in forming the desired clusters in real-world image segmentation. The Otsu method, for example,fails if the histogram is unimodal or close to unimodal [35], asis the case in our experiments. Instead, a weighted distancemeasure, such as the spatial constrained k-means [36], thetwo-dimensional Otsu [37], the histogram valley emphasis[35], or the k-means adaptive clustering [38] performs muchbetter by utilizing both local pixel/histogram information andpixel intensity. In our work the latter k-means adaptive clus-tering, with two clusters, is used. As the color silhouetteextraction in Fig. 7 shows, this clustering achieves favorableresults, even in low-light images. Similar results are retrievedfor the intensity silhouette extraction.

In order to discard noisy objects and to improve the colorand intensity silhouette quality, morphological filtering [39]is performed on the binary images after k-means cluster-ing. First, small noisy FG objects are removed using a blobfilter. Next, a morphological closing connects neighboringsilhouette parts. Finally, a filling operator fills the remaining

holes in the silhouette. The results of this morphological fil-tering are shown in Fig. 7. Combining gamma correction,k-means and morphological filters clearly results in appro-priate silhouette extraction. In order to determine an optimalsize for the structuring elements of the morphological filters,the structuring elements grow iteratively until the resultingsilhouette is suitable for thermal-visual silhouette matching,i.e. until one FG silhouette object remains with adequate ther-mal-visual correspondence.

As it is not always possible to extract the full silhouetteout of the color or intensity images, as is shown in Fig. 8,we finally merge the color and intensity silhouette. In addi-tion, the resulting silhouette is also merged with an edge sil-houette, which is created using a standard canny edge detec-tion [39] on the FG intensity images. The main reason whythe canny edge detector is used on the intensity images, andnot on the color images, is that the intensity images are muchricher in edge information than the color images. This is log-ical, as in the construction of the color images a lot of edgeinformation, i.e. intensity variations, is discarded by divid-ing the input frame by the intensity image. Experiments alsoconfirm this hypothesis. Merging the three different typesof information into the final silhouette image Sn , accuratevisual silhouette extraction is achieved. The merging itselfstarts by adding the binary values of the intensity, color and

123

Page 8: Full Text

S. Verstockt et al.

Fig. 8 Silhouette merging

edge image together. Next, the resulting image is thresholded.Non-zero regions which contain one or more values that arebigger than 1, i.e. pixels that are foreground in more thanone of the silhouettes, are mapped to foreground. All otherregions are mapped to background. As such, objects whichhave only been detected in one of the three silhouettes arediscarded. So, the proposed algorithm is also able to copewith specific visual artifacts, such as (disconnected) shad-ows. Also for the visual smoke detection in our multi-sensordetector, which is discussed more in detail in Sect. 4, the com-bination of color, intensity and edges produces appropriatesilhouettes for smoke coverage analysis.

4.2 Thermal silhouette extraction

Unlike commonly used video cameras that record reflectedlight, a long-wave (8–12 µm) IR sensor records electromag-netic radiations emitted by objects in the scene. As such, theLWIR (thermal) images represent temperature: the warmeran object is, the brighter it appears on the images. Becausethe temperature of the human body is often higher than thatof its surroundings, people appear as bright objects in IRpictures and their silhouettes can generally be extracted fromthe background regardless of lighting conditions, backgroundand colors of the human clothing and skin. However, dueto the insulating properties of some clothes, the body of aperson can seldom be imaged as a whole warm object, ascan be seen in Fig. 2. As such, problems can arise duringsilhouette extraction when some part of the human body orclothing has similar (or lower) temperature as part of thebackground. The proposed silhouette extraction copes withthose problems by focusing on the absolute intensity differ-ences between the current frame and the thermal BG estima-tion, instead of focusing on the pure intensity values.

Fig. 9 LWIR silhouette extraction

The main steps of the thermal LWIR silhouette extrac-tion algorithm are shown in Fig. 9. The algorithm usesthe thermal frame FLWIR

n , in which the calibration personis in the scene (Fig. 10a), and the thermal BG estimationBGLWIR

n , in which no moving objects occur (Fig. 10b).The algorithm starts with the same dynamic backgroundsubtraction as the one used for visual extraction. The BGsubtraction extracts the thermal foreground FGLWIR

n (Fig.10c) out of FLWIR

n by calculating the absolute differenceof FLWIR

n and the thermal background estimation BGLWIRn ,

which is also updated dynamically using the final thermalsilhouette SLWIR

n . Next, automatic thresholding extracts thecandidate thermal silhouette out of FGLWIR

n using the sameautomatic gamma correction and k-means clustering as forthe visual silhouette extraction. Finally, the thermal extrac-tion also uses morphological filtering with iterative growingstructuring elements to discard remaining noisy objects andto improve the silhouette quality of the thermal silhouetteSLWIR

n . As shown in Fig. 10d, the combination of these stepsalso produces satisfactory results when applied to thermalimages.

123

Page 9: Full Text

Silhouette-based multi-sensor smoke detection

Fig. 10 Example of LWIR silhouette extraction: calibration person (a); thermal BG estimation (b); BG subtraction (c); thresholding and morpho-logical filtering (d)

4.3 Visual and LWIR image registration

After the extraction of the visual and thermal body silhouettesfrom the color image and its synchronous thermal image,registration of both images is performed using a three-stepregistration algorithm. The goal is to determine the transfor-mation parameters in order to align the LWIR image with thevisual image.

Assuming that the distance between the cameras and thecalibration person is large, the human surface from the cam-era view can be approximated as planar and the geomet-ric transformation can be strictly represented by a projectivetransformation. Furthermore, assuming that the image planesof both visual and LWIR cameras are approximately parallel,the geometric transformation can be further simplified to arigid transformation, which can be decomposed into a 2-Drotation, scaling and translation [10]. As such, a point (X, Y )

in the visual image plane is transformed into the point (X, Y )

in the thermal image plane as follows:

(X ′Y ′

)

= s

(cos θ sin θ

−sin θ cos θ

) (XY

)

+(

�X�Y

)

(3)

where θ is the rotation angle, s is the scaling factor and(�X,�Y ) is the translation vector. A similar geometrictransformation for image registration is also proposed by Liuet al. [40].

In order to estimate each of the three geometric param-eters, i.e. rotation angle θ , scaling factor s and translationvector (�X,�Y ), the contours and the correlation of thevisual and thermal silhouettes are analyzed, as is discussedin detail in the following subsections. First, the rotation iscomputed using silhouette contour extraction and circularcross correlation. Next, contour scaling is used to estimatethe thermal-visual scale factor. Finally, the translation vectoris estimated by maximization of binary correlation.

4.3.1 Rotation estimation

In order to estimate the rotation angle between the two sil-houettes (∼rotation angle between the two camera views),we propose to analyze the translation of the 1-D contour cen-troid distance (CCD) of both silhouettes. As such, the 2-Dsilhouette matching problem is converted to a one-dimen-sional signal matching problem, i.e. the matching of silhou-ette contours.

The contour signal is generated from both thermal andvisual silhouette using a boundary extraction algorithm [41],and a one-dimensional signal is generated from the centerof mass, i.e. the centroid (xc, yc), to the boundary for eachsilhouette [27]. The contour centroid distance CCD(u) repre-sents the distance between the boundary points (x(u), y(u))

and the centroid (xc, yc) of the silhouette.

123

Page 10: Full Text

S. Verstockt et al.

Fig. 11 One-dimensional visual and thermal CCD of the calibrationsilhouettes: boundary extraction (a, b); CCD (c, d); polar CCD (e, f);discretized polar CCD (g, h)

In Fig. 11, both the boundary extraction (Fig. 11a, b) andthe one-dimensional visual and thermal CCD (Fig. 11c, d) ofthe calibration silhouettes are shown. Although visual inspec-tion of the CCDs can already reveal a rough estimation of therotation, automatic analysis on this 1-D signal is not straight-forward due to the different size, i.e. number of boundarypoints, of both CCDs. For direct comparison of both CCDsand in order to estimate the rotation and scale, they musthave the same size. Therefore, we propose to convert thecontour points from Cartesian to polar coordinates and com-pute the one-dimensional CCDpolar, which is obtained bycomputing the distance from the centroid of the silhouetteto the silhouette boundary as a function of the turning angle(−π <= angle < π ).

The CCDpolar of the thermal and visual calibration silhou-ette are shown in Fig. 11e, f. Although the range of the CCDsis already equal ([−π, π ]), the number of points in both sig-nals is still different due to the fact that multiple boundary

points can be detected under the same angle. To cope with thisproblem, we propose a novel CCD mapping technique, whichdiscretizes the CCDpolar signal over 64 equally spaced inter-vals. Within each interval, the maximum max (CCDpolar) inthat interval, is chosen as the representative boundary valuefor the interval, since those points best match the outer part ofthe silhouette, and as such, only a limited amount of informa-tion is lost. The resulting CCDpolar

64 are shown in Fig. 11g, h.Alternatively, it is also possible to super-sample the smallestsignal, as in [27]. However, by converting to polar coordi-nates and quantize the signal, we reduce the 2D silhouetteboundary to a 64-element vector and keep the computationalcost low.

Using the CCDpolar64 of the thermal and visual silhouette,

the rotation of both camera views can easily be calculated byfinding the translation which maximizes the thermal-visualCCDpolar

64 correlation. This is based on the fact that translat-ing the 1-D signals in centroid contour distance space overk locations, corresponds to rotating the associated silhouetteimage in 2D pixel space over k/64×360◦. The thermal-visualCCDpolar

64 translation is found by calculating the location k atwhich the circular cross-correlation CXC(k) reaches its max-imum. The circular cross-correlation [27,42] is defined by:

CXC(k) =64∑

i=1

CCDpolar64,i (SLWIR

n ) × CCDpolar64,i⊕k(SVIS

n ) (4)

with k = 0...63 and ⊕ = addition modulo 64.For the CCDs shown in Fig. 11 it was found that CXC

reaches its maximum for k = 0. As such, the rotation θ

between the thermal and visual silhouette equals 0/64 ×360 = 0, as could be expected based on the rough visualestimation.

4.3.2 Scale factor estimation

After rotating, i.e. aligning, the thermal and the visual CCD,the scale factor between both views is [12,18]estimated byanalyzing the ratio (CCDratio) of the thermal and visualaligned CCDs. The ratios for the calibration example areshown in Fig. 12. As can be seen in the image, the ratiosare not constant and show some disorder. The reason forthis behavior is twofold. First of all, the horizontal andvertical dimensions of both sensor images do not relateequally, which implies some deformation and influencesthe vertical–horizontal scale ratios. Second, the edge tran-sitions in visual and thermal images are not always iden-tical and, as such, the thermal and visual boundaries candiffer. Furthermore, visual and thermal artifacts, such as(connected) shadows and reflections, can also increase theratio disorder. To cope with these scale-related problems,we propose to use the median ratio as the scale factor.The main reason for choosing the median ratio instead of,

123

Page 11: Full Text

Silhouette-based multi-sensor smoke detection

Fig. 12 Scale factor estimation based on CCDratio analysis

for example, the mean ratio is that the median ratio is notinfluenced by outliers, while for the mean ratio this can-not be guaranteed. The calculation of s is shown in Eq.5.

Instead of using one scale factor s for both horizontal andvertical direction, it is also possible to use different scalefactors sx and sy for each direction. For example, in case ofreasonable vertical–horizontal deformation, due to non-par-allel sensor placement or highly non-related sensor dimen-sions, different scale factors can be necessary to coarselymap the silhouettes. To estimate both the vertical and hori-zontal scale factor, we then propose to use the median ratioin [−3π/4,−π/4] and [π/4, 3π/4] for sy , and the medianratio in the other ranges for sx (Eq. 5). This is also illustratedin Fig. 12.

CCDratio = CCDpolar64 (SLWIR

n )

CCDpolar64,k (SVISUAL

n )

s = median(CCDratio)

sx = median

(

CCDratio[−π

4: π

4,

4: −3π

4

])

sy = median

(

CCDratio[−3π

4: −π

4,π

4: 3π

4

])

(5)

4.3.3 Translation estimation

The last transformation parameter estimated by the registra-tion algorithm is the translation vector (�X,�Y ). Transla-tion can occur due to the placement of the cameras, but alsodue to the different sensor resolutions, i.e. the image of onesensor can be a cropped version of the other. To correct forthis translation and to be able to perfectly align the thermaland visual image, the binary correlation technique proposedby Chen et al. [12] is used to determine the x- and y-displace-ments.

After rotating and scaling up the LWIR image usingthe estimated rotation angle θ and the scaling factor s, the

translation vector (�X,�Y ) is computed by binary corre-lation, i.e. template matching, in the frequency domain. Thecorrelation between the thermal image and the visual imageis computed by rotating the thermal image 180◦ and thenusing the Fast Fourier transform (FFT)-based convolutiontechnique. This can be done since convolution is equiva-lent to correlation when rotating the kernel by 180◦. Simi-lar to [12], we represented the two levels of the silhouetteimages by −1 (∼BG) and 1 (∼FG), so that by maximizingthe correlation function both parts are matched as much aspossible. The 2D/3D result of correlating the thermal silhou-ette with the visual silhouette is shown in Fig. 13. The point(transx , transy), at which the correlation reaches its maxi-mum, is used to calculate (�X,�Y ) as follows:(

�X�Y

)

=(

transx

transy

)

−(

sizex (SLWIRn )

sizey(SLWIRn )

)

(6)

The estimation of the translation vector finishes theproposed three-step registration algorithm, and using theretrieved transformation parameters θ, s and (�X,�Y ),registration between LWIR and visual images can be per-formed. As the registration result in Fig. 13 shows, the visualand thermal silhouette of the calibration object map coarsely.The overlapping part of the silhouettes is shown in white andthe non-overlapping part is shown in gray.

4.4 Experimental results of LWIR-visual registration

Visual evaluation of the experimental results in Fig. 14already indicates that the proposed registration algorithmis able to coarsely align the thermal and visual calibrationimages. However, in order to evaluate the registration moreobjectively, we propose a coverage metric COV (Eq. 7),which measures the percentage overlap between the thermalSLWIR

n and visual SVisualn registered silhouettes.

COV(SLWIRn , SVisual

n ) = SLWIRn ∩ SVisual

n

SLWIRn ∪ SVisual

n(7)

123

Page 12: Full Text

S. Verstockt et al.

Fig. 13 2D/3D correlation-based translation estimation and thermal–visual registration result

Since COV depends on the performance of the silhouetteextraction methods, one can also use the registration preci-sion proposed in [10]. The registration precision is defined asP(A, B) = (A ∩ B)/(A ∪ B), where A and B are manuallylabeled human silhouette pixel sets from the original visualimage and the transformed thermal image, respectively. How-ever, this method needs manual ground truth creation, and isas such not suitable for automatic (re)calibration.

Besides the visual registration result, Fig. 14 also containsparameter and coverage information. The rotation, scale,x- and y-displacement, and coverage result, found for eachthermal-visual calibration pair, are indicated underneath eachthermal image. The coverage results are around 80% andhigher. As already pointed out earlier, exact match is quasiimpossible because visual and thermal boundaries of corre-sponding objects do not always overlap.

The silhouette maps in Fig. 14 show that the proposedapproach achieves good performance for image registrationbetween color and thermal image sequences. The visual andIR silhouette of the person are coarsely mapped onto eachother. However, due to the individual sensor limitations, suchas shadows in visual images, thermal reflections, and softthermal boundaries in LWIR, small artifacts at the bound-ary of the merged silhouettes can still be noticed. Also, ifthe cameras are not aligned perfectly, i.e. the assumption ofparallel image planes is not satisfied, or if the vertical and

Fig. 14 Experimental results of LWIR–visual registration

horizontal dimensions of both sensors do not relate propor-tionally, deformation can arise between the detected objectsand coverage can be low. To cope with these problems,the proposed approach can be extended using more com-plex moving object detectors and transformation models,such as, for example, is done in the work of Benezeth et al.[19], which is based on epipolar geometry. Further improve-ment can (possibly) also be achieved by averaging the resultsover multiple frames instead of using only one frame or by

123

Page 13: Full Text

Silhouette-based multi-sensor smoke detection

refining the registration results, for example by maximiza-tion of mutual information using the techniques describedby Maes et al. [15] and Liu [40]. However, for our appli-cation, the average calibration coverage above 80% is suffi-cient. Also, compared to the results of related work, e.g. theregistration method in [10], the proposed method achievessimilar results.

5 Multi-sensor smoke detection

In this section, a more detailed description is given of themulti-sensor smoke detector. For the general scheme of thedetector, we refer to Fig. 4. The proposed algorithm startswith the similar moving object silhouette extraction as theone used for image registration. Then, it uses the registra-tion information, i.e. rotation angle θ , scale factor s andtranslation vector (�X,�Y ), to map the thermal and visualsilhouette images onto each other. As soon as this map-ping is finished, the LWIR-visual silhouette map is analyzedover time using a two-phase decision algorithm. A detailedscheme of both phases is given in Fig. 15.

The first phase focuses on the silhouette coverage ofthe thermal-visual registered images and gives a kind offirst smoke warning when a decrease in silhouette cover-age occurs. In the second phase, the smoke warning is fur-ther investigated by analyzing the disorder characteristics ofthe visual silhouette SVisual

n . If this silhouette shows a highdegree of disorder, the smoke hypotheses is confirmed and afire alarm is raised.

5.1 Phase 1: silhouette coverage analysis

The silhouette coverage analysis (SCA) starts with the calcu-lation of the LWIR-visual coverage of the registered visualSVISUAL

n and thermal silhouette SLWIRn . Contrary to the COV

coverage metric introduced for registration in Eq. 7, the SCAuses a slightly different metric, since we are only interestedin the percentage of the visual silhouette that is also detectedby the thermal silhouette the COVSCA:

COVSCA(SLWIRn , SVisual

n ) = SLWIRn ∩ SVisual

n

SVisualn

(8)

Under normal conditions, if there is no smoke, theCOVSCA does not change much over time. This is alsoshown by the silhouette coverage graph of the movingperson sequence in Fig. 16 a, where the COVSCA stayswithin the 0.8–1 coverage range. Contrarily, in the caseof smoke (Fig. 16b), the COVSCA strongly decreases wellbelow 0.8. Even when no moving objects are present in thescene (COVSCA = 1), a similar decrease is noticeable whensmoke occurs. For detection of this decrease, we propose asequence/scene independent technique based on slope analy-sis of the linear fit, i.e. trend line, over the ten most recent sil-houette coverage values. If the slope of this trend line is neg-ative and decreases continuously, smoke warning is given.

Since it is the global trend of a sequence of adjacent pointswhich is analyzed, low noisy coverage results do not causeany problems. Furthermore, the delay due to analyzing theset of adjacent points is negligible, since the algorithm runsin real-time (at 25 fps).

Fig. 15 2-phase multi-sensorsmoke detector

123

Page 14: Full Text

S. Verstockt et al.

Fig. 16 Silhouette coverage analysis (SCA). a SCA of moving personsequence, b SCA of smoke (straw fire) sequence

The trend line, i.e. linear fit, is found by linear regression[43]. Suppose there are n data points [COVSCA

i , xi ] wherei = 1, 2, . . . n and xi = i . The goal is to find the equationof the straight line COVSCA = α +βx which would providea best fit for the data points, i.e. the line which minimizesthe sum of squared residuals of the linear regression model.Using the least squares method the problem can be formu-lated as follows:

Find minα,β Q(α, β)

where Q(α, β) =n∑

i=1

(COVSCAi − α − βxi )

2 (9)

It can be shown [43] that the values of α and β that mini-mize Q are:

β = corr(x, COVSCA)σ (COVSCA)

σ (x)

α = COVSCA − βx

(10)

where corr() is the correlation coefficient, σ() is the stan-dard deviation and x and COVSCA are respectively the meansof x and COVSCA. Substituting the values of α and β inCOVSCA = α + βx provides the equation of the trend line.

A positive slope of the trend line indicates that the lineincreases, whereas a negative slope indicates a decrease. Assuch, in order to detect a continuous decrease in silhouettecoverage, it is sufficient to analyze the slope over time.If more than two consecutive slope values are negative andgrow in the negative direction, smoke warning is given.Since the silhouette coverage of ordinary objects can alsohave a small negative slope over time, due to the thermal-visual differences, small negative slopes (m > −0.1) arenot taken into account in the slope analysis. An example ofthe slope analysis for four consecutive frames from the mov-ing person sequence and the smoke sequence is shown inFig. 17. The slope for the moving person is very small anddoes not change much over time. Contrarily, in the smokesequence, the slope becomes negative and grows in the neg-ative direction as soon as smoke occurs. After more than twoconsecutive negative slope decreases, the smoke warning isgiven.

5.2 Phase 2: disorder analysis of visual silhouette

The second phase of the multi-sensor smoke detection is onlyexecuted if a smoke warning is given in the first phase. Ifa warning is given, foreground (FG) objects in the visualsilhouette are further investigated by temporal disorder anal-ysis in order to distinguish true detections from false alarms,such as shadows. Due to the dynamic character of smoke, theperimeter and the area of FG smoke objects in the visual sil-houette SVisual

n show a high degree of disorder. By temporalanalysis of the boundary-area roughness [46], which focuseson both the area A and perimeter P of the FG object, thisdisorder can be detected.

The boundary-area roughness R of a FG object in SVisualn

is given by:

R = P

2√

π A(11)

As Fig. 18 shows, the boundary-area roughness of bothsmoke objects shows a high temporal disorder, while thedisorder for the person remains quasi constant. For eachobject, its degree of disorder can automatically be detectedby low-cost extrema analysis [21] using the roughness var-iance metric Rvar (Eq. 12). RVar is related to the numberof extrema |extrema(R)|, i.e., local maxima and minima,in the set of N consecutive R data points. By smoothing

123

Page 15: Full Text

Silhouette-based multi-sensor smoke detection

Fig. 17 Slope analysis for a moving person and b smoke sequence(∼ Fig. 16). Graphs show frame number versus visual/thermal cover-age COVSCA for four consecutive frames. If more than two consecutivenegative slope decreases occur, smoke warning is given

these data points using a moving average filter, small dif-ferences between consecutive points are filtered out andare not taken into account in the extrema calculation,which increases the strength of the disorder feature. Smoke,with a high roughness disorder, will have a Rvar closeto 1, while for more static objects it will be close to0.

Rvar = |extrema(R)|N/2

(12)

If for one (or more) FG object(s) Rvar is high, i.e. closeto 1, the fire alarm is raised. If necessary, further analysisof the visual silhouette using other low cost smoke-featurescan be performed. However, by only focusing on both pro-posed silhouette behaviors, the multi-sensor smoke detector

Fig. 18 Boundary-area roughness for moving person and smokeobjects

is already able to accurately detect the smoke, as shown bythe experimental results in the next section.

6 Experimental set-up and results

In order to verify the proposed multi-sensor smoke detectorwe performed several real-life fire and non-fire experimentsin a closed car park at WarringtonFireGent [16]. An exampleof these real case scenarios is shown in Fig. 19a, b, wherethe left-most images are the visual and LWIR camera viewsof the ’moving people’ and ’car fire’ test sequence. Otherexamples are shown in Figs. 14 and 16.

The multi-modal sequences were acquired by a XenicsGobi-384 LWIR camera and a Canon MD110 camera, whichworks in the 8–14 µm spectral range and the visible spec-trum respectively. The Gobi thermal imager has a resolutionof 384×288 pixels and a frame rate of 28–30 fps. The Canon’sresolution is 576 × 720 and its frame-rate is 25 fps. In orderto cope with the different frame rates and resolutions, andalso with the differences in the field of view of the cameras,the LWIR-visual frames are spatio-temporal registered usingtemporal frame alignment and the silhouette-based registra-tion proposed in this paper.

123

Page 16: Full Text

S. Verstockt et al.

Fig. 19 Experimental results of multi-sensor smoke detector for a moving person and b smoke sequence

As the results in Fig. 19 show, the moving people sequencehas a quasi constant silhouette coverage, and as such, nosmoke warning is given and phase 2, i.e. the visual disorderanalysis, is not performed. Contrarily, the silhouette coverageof the smoke sequence shows a high decrease after 45 frames,which activates the smoke warning. As a reaction to thiswarning, phase 2 is activated and analyzes the boundary-area roughness variance Rvar of the visual silhouette objects.Since the Rvar for the largest object is high, fire alarm isgiven.

In order to objectively evaluate the proposed method weperformed five different test setups: car fire, straw fire, mov-ing people, moving car, and paper fire. For each of thesefire and non-fire test setups we generated several videosequences. In total the test set contains 18 multi-modalfire videos and 13 non-fire video sequences with varyingenvironment characteristics. For each of these sequences,we also generated a manual ground truth (GT). The per-formance results in Table 1, which are averaged over theentire test set, summarize the experimental results of differ-ent configurations of the proposed algorithm, as is furtherexplained.

6.1 Evaluation of different algorithm configurations

During the tests, four different configurations of the algo-rithm were tested to evaluate and justify the steps takenin the proposed approach: proposed setup without gammacorrection; without merge of visual color, edge and inten-sity information (only intensity was used); without adap-tive k-means clustering (as an alternative, we used the Otsumethod); and the proposed approach.

As the results indicate, the proposed configuration yieldsthe best smoke warning / fire alarm rate. For the fire tests, theoverall fire alarm rate is 98%. This means that almost eachof the tested fires, which were manually annotated duringground truth (GT) creation, is detected. Without gamma cor-rection or visual merge, these results are significantly lower.It can also be seen that the influence of the adaptive k-meansis not so big, but since this kind of automatic thresholdingresults in an extra seven percent gain, its use in the proposedapproach is justified. By comparing the smoke warning andthe fire alarm rate of the non-fire tests, the influence of thevisual disorder analysis, i.e. the second phase in the smokedetector, becomes also visible. As most of the non-fire test

123

Page 17: Full Text

Silhouette-based multi-sensor smoke detection

Table 1 Fire and non-fire test results of different configurations of proposed algorithm

Configuration Fire tests Non-fire tests

Smokewarning (%)

Firealarm (%)

Smokewarning (%)

Falsealarm (%)

Proposed approach 98 98 5 2

Without gamma correction 74 71 7 5

Without visual merge 67 63 8 4

Without adaptive k-means 93 91 7 2

Silhouette-based FG extraction alternatives

Frame differencing and dilation 69 66 11 7

Dynamic background subtraction and dilation 76 75 6 4

Silhouette disorder analysis alternatives

Randomness of area size disorder 98 95 5 3

Similarity disorder of distance transformations 98 97 5 2

sequences which falsely generate a smoke warning are cor-rected by the visual disorder analysis, the fire alarm percent-age for the proposed approach is close to zero. As such, thenumber of false alarms is very low, one of the main require-ments mentioned in the introduction.

Since other aerosols such as fog and dust can possess simi-lar visual-LWIR silhouette behavior, further visual silhouetteobject investigation, for example by energy (∼visual obscu-ration) analysis [30,44] and dynamic texture analysis [45],can be necessary to eliminate those phenomena. However,this is out of the scope of this paper and is part of future work.

6.2 Evaluation of silhouette FG extraction alternatives

In order to evaluate the effectiveness of the proposedforeground extraction, the results of the silhouette-basedapproach are compared to a simple frame differencing anddilation algorithm. Furthermore, comparison is also madewith a popular running average based dynamic BG subtrac-tion algorithm [29]. The results in Table 1 show that thedetection results of the proposed method outperform theresults of the dynamic background subtraction, which onhis turn achieves better results than the simpler frame dif-ference approach. Especially when light conditions are bad,like in the car park experiments, the frame differencing andthe dynamic BG subtraction have a lot of FG detection prob-lems, which do not occur when using the proposed method.As such, the use of the proposed silhouette-based FG extrac-tion is objectively found more efficient.

6.3 Evaluation of disorder analysis alternatives

Over the last decade, many solutions for object shapechanging detection have been proposed in literature, e.g.,randomness of area size, boundary (area) roughness and the

similarity disorder of distance transformations. In previouswork [1], the authors already discussed some of these state-of-the-art disorder detection metrics and stated that their per-formance is quasi identical. Recent experiments, of which theresults are shown in Table 1, also confirm this hypothesis. It isfound that, although each of these shape changing detectiontechniques differ in definition, the outcome of each of themis almost identical. Furthermore, the experiments revealedthat the disorder analysis of the boundary (area) roughnessand the randomness of area size are computationally moreefficient than the distance transformation technique. As such,one of these prior techniques is chosen.

6.4 Comparison with SOTA alternatives

As can be seen in Table 2, the proposed 2-phase multi-sensor detector yields good detection results, which outper-form the investigated state-of-the-art techniques [44,46,47].Especially when light conditions are bad, like in the car parkexperiments, the proposed algorithm is able to detect thesmoke more accurately.

7 Conclusions

The proposed multi-sensor smoke detector takes advantageof the different kinds of information represented by thermaland visual images in order to accurately detect smoke. By fus-ing both modalities and using the strengths of each medium,smoke detection can be done more accurately and with fewerfalse detections. Merging information from both visual andLWIR sensors has, as such, proven to be a win–win.

To detect the presence of smoke, the multi-sensor smokedetector analyzes the silhouette coverage of moving objectsin visual and LWIR registered images. In order to registerthe multi-sensor images, the proposed algorithm analyses the

123

Page 18: Full Text

S. Verstockt et al.

Table 2 Comparison of proposed algorithm to state-of-the-art smoke detectors

Configuration Fire tests Non-fire tests

Smokewarning (%)

Firealarm (%)

Smokewarning (%)

Falsealarm (%)

Proposed approach 98 98 5 2

Xiong [46]: BG subtraction, flicker/disorder analysis 71 68 9 7

Calderara [44]: DWT energy analysis, color blending 85 83 6 4

Toreyin [47]: block-based spatial wavelet analysis 88 86 7 4

contours and the correlation of visual and thermal FG silhou-ettes. First, the rotation is computed using silhouette contourextraction and circular cross correlation. Next, contour scal-ing is used to estimate the thermal–visual scale factor. Finally,the translation vector is estimated by maximization of binarycorrelation.

The geometric parameters found during this registrationphase are further used by the detector to coarsely map thesilhouette images and coverage between them is calculated.Since smoke is invisible in LWIR, its silhouette will, con-trarily to ordinary moving objects, only be detected in visualimages. As such, the coverage of thermal and visual silhou-ettes starts to decrease in case of smoke. Due to the dynamiccharacter of the smoke, the visual silhouette will also showa high degree of disorder. By focusing on both silhouettebehaviors, the system is able to accurately detect the smoke.Experiments on fire and non-fire sequences obtained a firealarm rate of 98% and a false alarm rate of 2%.

In short, the multi-sensor detector adheres to all the rele-vant requirements: object-based automatic calibration/regis-tration, low number of false alarms, no missed detections andfast warning/alarming with different levels of detection. Dueto the low-cost of the SCA and the visual silhouette disorderanalysis, which is only performed if smoke warning is given,the algorithm is also less computational expensive as manyof the existing individual detectors.

Acknowledgments The research activities as described in this paperwere funded by Ghent University, the Interdisciplinary Institute forBroadband Technology (IBBT), University College West Flanders,Warrington Fire Ghent, the Institute for the Promotion of Innovationby Science and Technology in Flanders (IWT), the Fund for Scien-tific Research-Flanders (FWO-Flanders G.0060.09), the Belgian Fed-eral Science Policy Office (BFSPO), XENICS and the EU.

References

1. Verstockt, S., Merci, B., Sette, B., Lambert, P., Van de Walle,R.: State of the art in vision-based fire and smoke detection. In:14th International Conference on Automatic Fire Detection, vol. 2,pp. 285–292 (2009)

2. Calderara, S., Piccinini, P., Cucchiara, R.: Vision based smokedetection system using image energy and color information. Mach.Vis. Appl., 22(4), 705–719 (2011)

3. Toreyin, B.U., Cinbis, R.G., Dedeoglu, Y., Cetin, A.E.: Firedetection in infrared video using wavelet analysis. SPIE Opt.Eng. 46, (067204)1–9 (2007)

4. Owrutsky, J.C., Steinhurst, D.A., Minor, C.P., Rose-Pehrsson, S.L.,Williams, F.W., Gottuk, D.T.: Long wavelength video detection offire in Ship compartments. Fire Saf. J. 41, 315–320 (2006)

5. Bosch, I., Gomez, S., Molina, R., Miralles, R.: Object discrimina-tion by infrared image processing. In: International Work-Confer-ence on The Interplay Between Natural and Artificial Computation(IWINAC), pp. 30–40 (2009)

6. Gunay, O., Tasdemir, K., Treyin, B.U., etin, A.E.: Video basedwildfire detection at night. Fire Saf. J. 44, 860–868 (2009)

7. Verstockt, S., Dekeerschieter, R., Vanoosthuyse, A., Merci, B.,Sette, B., Lambert, P., Van de Walle, R.: Video fire detection usingnon-visible light. In: 6th International Seminar on Fire and Explo-sion Hazards (2010)

8. Vandersmissen, R.: Night-vision camera combines thermal andlow-light-level images. Photonik Int. 2(2008), 2–4 (2008)

9. Arrue, B.C., Ollero, A., Martinezde Dios, J.R.: An intelligent sys-tem for false alarm reduction in infrared forest-fire detection. IEEEIntell. Syst. 15(3), 64–73 (2000)

10. Han, J., Bhanu, B.: Fusion of color and infrared video for movinghuman detection. Pattern Recognit. 40, 1771–1784 (2007)

11. Foresti, G.L., Regazzoni, C.S., Varshney, P.K.: Multisensor sur-veillance systems: the fusion perspective. Kluwer, Massachu-settes (2003)

12. Chen, H.-M., Varshney, P.K.: Automatic two-stage IR and MMWimage registration algorithm for concealed weapons detection. In:14th International Conference on Automatic Fire Detection, vol. 2,pp. 285–292 (2009)

13. Perez-Jacome, J., Madisetti, V.: Target detection from coregisteredvisual-thermal-range images. In: Proceedings of the IEEE Inter-national Conference on Acoustics, Speech, and Signal Processing,vol. 4, pp. 2741–2744 (1997)

14. Krotosky, S.J., Trivedi, M.M.: Mutual information based registra-tion of multimodal stereo videos for person tracking. Comput. Vis.Image Underst. 106, 270–287 (2007)

15. Maes, F., Collignon, A., Vandermeulen, D., Marchal, G., Sue-tens, P.: Multimodality Image Registration by Maximization ofMutual Information. IEEE Trans. Med. Imaging 16(2), 187–198(1997)

16. Merci, B.: Fire safety and explosion safety in car parks. Fire Saf.Eng. (2010). http://www.carparkfiresafety.be/

17. Pieri, G., Moroni, D.: Active video surveillance based on stereoand infrared imaging. EURASIP J. Adv. Signal Process., pp. 1–8(2008)

18. Chen, H.-M., Lee, S., Rao, R.M., Slamani, M.-A., Varshney, P.K.:Imaging for concealed weapon detection. IEEE Signal Process.Mag., pp. 52–61 (2005)

19. Benezeth, Y., Jodoin, P.M., Emile, B., Laurent, H., Rosenberger,C.: Human detection with a multi-sensors stereovision system.In: Image and signal processing (ICISP), pp. 228–235 (2010)

123

Page 19: Full Text

Silhouette-based multi-sensor smoke detection

20. Martinez-de Dios, J.R., Merino, L., Ollero, A.: Fire detection usingautonomous aerial vehicles with infrared and visual cameras. In:16th IFAC World Congress (2005)

21. Verstockt, S., Vanoosthuyse, A., Van Hoecke, S., Lambert, P.,Van de Walle, R.: Multi-sensor fire detection by fusing visual andnon-visual flame features. In: International Conference on Imageand Signal Processing (ICISP), pp. 333–341 (2010)

22. Wolberg, G., Zokai, S.: Robust image registration using log-polartransform. In: IEEE International Conference on Image Processing,pp. 493–496, IEEE Press, New York (2000)

23. Li, H., Manjunath, B.S., Mitra, S.K.: A contour-basedapproach to multisensor image registration. IEEE Trans. ImageProcess. 4(3), 320–334 (1995)

24. Shah, M., Kumar, R.: Video Registration. Kluwer AcademicPublishers, Dordrecht (2003)

25. Irani, M., Anandan, P.: Robust multi-sensor image alignment. In:IEEE International Conference on Computer Vision, pp. 959–966(1998)

26. Zitova, B., Flusser, J.: Image registration methods: a survey. ImageVis. Comput. 21, 977–1000 (2003)

27. Hamici, Z.: Real-time pattern recognition using circular cross-cor-relation: a robot vision system. Int. J. Robot. Autom. 21(3), 174–183 (2006)

28. Yamasaki, A., Takauji, H., Kaneko, S., Kanade, T., Ohki, H.:Denighting: enhancement of nighttime images for a surveillancecamera. In: 19th International Conference on Pattern Recognition(ICPR), pp. 1–4 (2008)

29. Collins, R.T., Lipton, A.J., Kanade, T.: A system for video sur-veillance and monitoring. In: Proceedings of American NuclearSociety (ANS) Eighth International Topical Meeting on Roboticsand Remote Systems (1999)

30. Toreyin, B.U., Dedeoglu, Y., Gdkbay, U., Cetin, A.E.: Computervision based method for real-time fire and flame detection. PatternRecognit. Lett. 27, 49–58 (2006)

31. Verstockt, S., Lambert, P., Van de Walle, R.: Feature extraction forlocalized CBIR: what you click is what you get. In: 4th Interna-tional Conference on Computer Vision Theory and Applications,pp. 373–376 (2009)

32. Ponomarev, V.I., Pogrebniak, A.B.: Image enhancement by homo-morphic filters. Appl. Digit. Image Process. XVIII 2564, 153–159 (1995)

33. Otsu, N.: A threshold selection method from gray-level histo-grams. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979)

34. Lucchese, L., Mitra, S.K.: Colour image segmentation: a state-of-the-art survey. Proc. Indian Natl. Sci. Acad. 67, 207–221 (2001)

35. Ng, H.F.: Automatic thresholding for defect detection. Pattern Rec-ognit. Lett. 27, 1644–1649 (2006)

36. Luo, M., Ma, Y.-F., Zhang, H.-J.: A spatial constrained K-Meansapproach to image segmentation. In: Joint Conference of Inter-national Conference on Information, Communications and SignalProcessing, and Pacific Rim Conference on Multimedia, vol. 2,pp. 738–742 (2003)

37. Zhang, J., Hu, J.: Image segmentation based on 2D Otsu methodwith histogram analysis. In: Proceedings of International confer-ence on computer science and software engineering, pp. 105–108(2008)

38. Pappas, T.N.: An adaptive clustering algorithm for image segmen-tation. IEEE Trans. Signal Process. 40(4), 901–914 (1992)

39. Gonzales, R.C., Woods, R.E.: Digital Image Processing, 3rdedn. Prentice-Hall, New Jersey (2002)

40. Liu, Z.: Investigations on Multi-Sensor Image System and ItsSurveillance Applications. University of Ottawa, Ontario (2007)

41. O’Gorman, L., Sammon, M.J., Seul, M.: Contour-Based Detec-tion, Practical Algorithms for Image Analysis, 2nd edn. CambridgeUniversity Press, New York (2008)

42. Carpin, S.: Fast and accurate map merging for multi-robotsystems. Auton. Robots 25(3), 305–316 (2008)

43. Weisberg, S.: Applied Linear Regression, 3rd edn. Wiley, NewJersey (2005)

44. Calderara, S., Piccinini, P., Cucchiara, V.: Smoke detection in videosurveillance: a MoG model in the wavelet domain. In: 6th Interna-tional Conference in Computer Vision Systems (ICVS), pp. 119–128 (2008)

45. Fazekas, S., Chetverikov, D.: Analysis and performance evalua-tion of optical flow features for dynamic texture recognition, SignalProcessing: Image Communication. Special Issue Content-BasedMultimedia Index. Retr. 22, 680–691 (2007)

46. Xiong, Z., Caballero, R., Wang, H., Finn, A.M., Lelic, M.A., Peng,P.-Y.: Video-based smoke detection: possibilities, techniques, andchallenges. In: Suppression and Detection Research and Applica-tions (2007)

47. Toreyin, B.U., Dedeoglu, Y., Cetin, A.E.: Contour based smokedetection in video using wavelets. In: European Signal ProcessingConference (2006)

Author Biographies

Steven Verstockt received hisMaster degree in Informaticsfrom Ghent University in 2003.At the end of 2007 he joinedthe ELIT Lab of the Univer-sity College West-Flanders asa researcher. Since 2008, he isPhD student at the Multime-dia Lab of the Department ofElectronics and Information Sys-tems of Ghent University - IBBT(Belgium). His research interestsinclude video surveillance, com-puter vision and multi-sensordata fusion.

Chris Poppe received his Mas-ter degree in Industrial Sci-ences from KaHo Sint-Lieven,Belgium, in 2002 and Mas-ter degree in Computer Sci-ence from Ghent University,Belgium, in 2004. He joined theMultimedia Lab, Departmentof Electronics and InformationSystems (ELIS), Interdiscipli-nary Institute for BroadbandTechnology (IBBT), where heobtained his PhD degree in 2009.His research interests includevideo coding technologies, video

analysis, and multimedia metadata extraction, processing and represen-tation, with a strong focus on standardization processes.

123

Page 20: Full Text

S. Verstockt et al.

Sofie Van Hoecke receivedher Master degree in ComputerScience from Ghent Universityin 2003. Following up on herstudies, she achieved a PhD incomputer science engineering atthe Department of InformationTechnology at the same univer-sity. Currently, she is docent ICTand research coordinator at theUniversity College West Flan-ders. Her research concentrateson the design and performancemodeling of distributed architec-tures, QoS-brokering of novel

services, innovative ICT solutions for care, and multi-sensor surveil-lance.

Charles Hollemeersch receivedhis B.Sc. and M.Sc. degreesfrom the Ghent University,Belgium, in 2004. From 2004 to2006 he joined Splash Damage,UK, in 2004 where he workedas a graphics programmer inthe computer game industry till2006. In 2007 he returned toBelgium and joined BARCO n.v.where he worked on safety criti-cal avionics software. At the endof 2007 he joined the MultimediaLab of the Department of Elec-tronics and Information Systems

of Ghent University - IBBT (Belgium) as a researcher. His researchinterests include GPU parallelization and optimization and computergraphics.

Bart Merci received his Mas-ter degree in Electro-MechanicalEngineering from Ghent Uni-versity, Belgium, in 1997. Heobtained his PhD degree for hisresearch on the topic numeri-cal simulations of turbulent com-bustion, at the Faculty of Engi-neering at Ghent University in2000. He became Professor in2004 and is now the Head of theresearch unit ‘Combustion, Fireand Fire Safety’ at this Faculty.His research interests include firesafety engineering, as well as

turbulence-chemistry interaction in turbulent flames.

Bart Sette received his Mas-ter degree in Electro-MechanicalEngineering at Ghent Univer-sity, Belgium, in 1994. Heobtained his PhD degree in2005 with the evaluation ofuncertainty and improvement ofoxygen depletion based testmethods. He joined Warrington-firegent in 2008 as technicaldirector. His research interestsinclude numerical modelling ofmaterials and their behaviour infire, fire scene investigations andevaluation of uncertainty of firetesting equipment.

Peter Lambert is a TechnologyDeveloper at the Multimedia Labof Ghent University (Belgium).He received his Master’s degreein science (mathematics) and inapplied informatics from GhentUniversity in 2001 and 2002,respectively, and he obtained thePhD degree in computer sciencein 2007 at the same university.His research interests includemultimedia applications, (scal-able) video coding technologies,multimedia content adaptation,and error robustness of digitalvideo.

Rik Van de Walle receivedhis M.Sc. and PhD degrees inEngineering from Ghent Uni-versity, Belgium in 1994 and1998, respectively. After a vis-iting scholarship at the Univer-sity of Arizona (Tucson, USA),he returned to Ghent University,where he became professor ofmultimedia systems and applica-tions, and head of the MultimediaLab. His current research inter-ests include multimedia contentdelivery, presentation and archiv-ing, coding and description of

multimedia data, content adaptation, and interactive (mobile) multi-media applications.

123