separating illumination from reflectance in colour imagery
Transcript of separating illumination from reflectance in colour imagery
SEPARATING ILLUMINATION FROM REFLECTANCE IN COLOUR IMAGERY
Weihua Xiong M.Sc. Peking University, I996
THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
In the School of
Computing Science
O Weihua Xiong 2007
SIMON FRASER UNIVERSITY
Spring 2007
All rights reserved. This work may not be reproduced in whole or in part, by photocopy
or other means, without permission of the author.
APPROVAL
Name:
Degree:
Title of Thesis:
Examining Committee:
Chair:
Date DefendedlApproved:
Weihua Xiong
Doctor of Philosophy
Separating Illumination from Reflectance in colour Imagery
Dr. Greg Mori
Assistant Professor
Dr. Brian Funt Senior Supervisor Professor
Dr. Ghassan Hamarnesh Supervisor Assistant Professor
Dr. Tim Lee SFU Examiner Adjunct Professor
Dr. Paul Hubel External Examiner Chief Image Scientist, Foveon Inc.
FP SIMON FRASER UNlVERSITY L ? - ~ library
DECLARATION OF PARTIAL COPYRIGHT LICENCE
The author, whose copyright is declared on the title page of this work, has granted to Simon Fraser University the right to lend this thesis, project or extended essay to users of the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its users.
The author has further granted permission to Simon Fraser University to keep or make a digital copy for use in its circulating collection (currently available to the public at the "Institutional Repository" link of the SFU Library website <www.lib.sfu.ca> at: ~http://ir.lib.sfu.calhandle/1892/112>) and, without changing the content, to translate the thesislproject or extended essays, if technically possible, to any medium or format for the purpose of preservation of the digital work.
The author has further agreed that permission for multiple copying of this work for scholarly purposes may be granted by either the author or the Dean of Graduate Studies.
It is understood that copying or publication of this work for financial gain shall not be allowed without the author's written permission.
Permission for public performance, or limited permission for private scholarly use, of any multimedia materials forming part of this work, may have been granted by the author. This information may be found on the separately catalogued multimedia material and in the signed Partial Copyright Licence.
The original Partial Copyright Licence attesting to these terms, and signed by this author, may be found in the original bound copy of this work, retained in the Simon Fraser University Archive.
Simon Fraser University Library Burnaby, BC, Canada
Revised Spring 2007
ABSTRACT
Since more people choose the convenience of colour imaging over
traditional greyscale imaging, colour is a very important and useful feature in the
computer vision community. However, the changing colour of the object may lead
to some problems if the illuminant colour changes, since any colour imaging
device's response to light from imaged scenes depends on three factors: the
nature of the illumination incident on the objects, the underlying physical property
of the objects, and the sensor sensitivity of the imaging system itself. Therefore,
as the urgent demands and challenges for emerging applications and higher
quality for existing applications continue to grow, accurate reproduction of the
object's colour becomes a more critical issue.
This dissertation mainly addresses the problem of separating the
illumination from the reflectance and extracting the accurate colour of the objects.
We explore three colour constancy solutions whose final goal is to estimate the
illumination colour from the image and recover the original objects' colour,
assuming the scene is lit under one uniform illuminant. Particularly, a simple non-
statistical estimation solution is proposed by identifying those grey surfaces upon
a new colour coordinate system.
For those scenes under multi-illuminations, we address the colour
constancy problem by extending the standard Retinex with spatial edges that can
be detected using a stereo vision technique. The basic idea of stereo vision is to
i i i
infer the 3D structure and arrangement of a scene from two or more images
captured at different viewpoints simultaneously, which is obviously impractical.
Then we present a novel hybrid colour constancy solution for a single image
under multi-illuminants.
An efficient way of representing accurate colour is colour spectra. To
reduce storage requirements and processing time, the finite dimensional model is
applied to find the basis vectors and the corresponding coefficients. In addition
to principal component analysis (PCA) and independent component analysis
(ICA), two other nonnegative techniques, Nonnegative Matrix Factorization and
Nonnegative ICA, are also tried. We also propose that the pseudo-inverse of the
basis derived from these two nonnegative techniques can be used as physically
realizable camera sensors.
ACKNOWLEDGEMENT
First, I would like to express my sincere thanks and appreciation to my
senior advisor, Dr. Brian Funt, for guidance, for providing me with excellent
facilities to pursue my goal, and for giving me help in my daily life throughout my
studies. I have learned a lot from him and enjoyed doing research with him.
I would like to express my gratitude to my supervisor, Prof. Ghassan
Hamarneh, for his insightful discussion and valuable knowledge.
I also express my gratitude to my colleagues, particularly to Mr. Lilong Shi
and Mr. Behnam Bastani, in my lab for the excellent ambiance that exists in our
laboratory, making it a very pleasant place to work. They provided support and
were always open to discuss technical or not-so-technical topics.
I am grateful to all my friends and their families, Yong Wang and Fang
Nan, Zengjian Hu and Rong Ge, Zhongmin Shi and Yingzi Wang, from Simon
Fraser University, for their continued moral support, care, and the happiness they
give to me.
All of my research has been funded by the School of Computing Science,
Simon Fraser University, the National Sciences and Engineering Research
Council of Canada, and Samsung Advanced Institute of Technology. Their
support is here thankfully acknowledged.
Special acknowledgement should be given to my parents and my parents-
in-law for their unselfish support that has accompanied me to come to this point. I
also thank my older sister and brother-in-law for their support of my studies over
years.
Finally, but most important, I want to thank my son, for the joy he gives
me, and my wife, for her great contribution to my family, with all my heart. Their
support, encouragement, and companionship have turned my journey during
graduate life into a pleasure. For all that, and for being everything I am not, they
have my everlasting love.
vii
TABLE OF CONTENTS
Approval .............................................................................................................. ii ...
Abstract .............................................................................................................. 111
Dedication .......................................................................................................... v
Acknowledgement ............................................................................................ v i ...
Table of Contents ............................................................................................ VIII
List of Figures ..................................................................................................... x
List of Tables ................................................................................................ xv
.............................................................................. Chapter 1 : Thesis Overview 1
Chapter 2: Basics of Colour Vision and Colour constancy ............................ 7
Chapter 3: Survey of Computational Colour Constancy Models ................. 13 3.1 Finite-Dimensional Linear Model for Colour Constancy ............................ 15 3.2 Object Image Recovery ............................................................................. 17
3.2.1 Retinex ................................................................................................ 18 3.2.2 Gamut Mapping .................................................................................. 19
3.3 Illumination Estimation for Colour Constancy .......................................... 20 3.3.1 Unsupervised lllumination Estimation ................................................. 21 3.3.2 Supervised Illumination Estimation ..................................................... 25
3.4 Multiplicative Cues to Illumination ............................................................ 28
Chapter 4: Colour Constancy under Uniform Illumination .................... ..... 32 4.1 Introduction ................................................................................................ 32 4.2 Illumination Chromaticity Estimation by Support Vector Regression ......... 33
4.2.1 Support Vector Regression Introduction ............................................. 34 4.2.2 SVR for Illumination Chromaticity Estimation ...................................... 37 4.2.3 Histogram Construction ..................................................................... 39 4.2.4 K-Fold Cross Validation for SVR Parameters ................................... .. 40
4.3 Illumination Colour Estimation Using Thin Plate Splines ........................... 42 4.3.1 Thin Plate Spline Method Introduction ................................................ 43
4.4 Illumination Colour Estimation by Gray Surface Identification ................... 45 4.4.1 LIS Colour Coordinates ....................................................................... 46 4.4.2 GSI Implementation ............................................................................ 49
4.5 Experiments .............................................................................................. 52 4.5.1 Error Measures ................................................................................... 53 4.5.2 Synthetic Data Training, Real-Data Testing ........................................ 54 4.5.3 Real Image Data Training, Real-Data Testing .................................... 57
4.6 Discussion ................................................................................................. 66
viii
Chapter 5: Stereo Retinex ......................................................................... 68 5.1 Introduction ............................................................................................. 69 5.2 Background .............................................................................................. 71 5.3 Stereo Retinex Basics .............................................................................. 73 5.4 Stereo Retinex in LIS Colour Coordinates ................................................ 75 5.5 Implementation Details ........................................................................... 76 5.6 Experiments .......................................................................................... 78
.............................................................. 5.6.1 Tests using synthetic images 80 5.6.2 Tests using Real images ................................................................... 83
5.7 Retinex's iteration parameter .................................................................... 89 5.8 Discussion ................................................................................................. 90
Chapter 6: Colour Constancy for Multiple-Illurninant Scenes using RETINEX and SVR ..................................................................................... 92
6.1 Introduction ............................................................................................ 93 6.2 Implementation Details ............................................................................. 94
............................................................. 6.2.1 Synthetic Image Experiments 96 6.2.2 Real Image Experiments ................................................................. 99
6.3 Retinex Iteration Time ............................................................................ 105 6.4 Discussion ............................................................................................... 106
Chapter 7: Independent Component Analysis and Nonnegative Linear Model Analysis of Illurninant and Reflectance Spectra ................... 108
7 . 1 Introduction ........................................................................................ 109 7.2 Method ................................................................................................... 110 7.3 Results ............................................................................................... 111 7.4 Discussion ............................................................................................... 115
Chapter 8: Conclusion ................................................................................... 117
................................................................................................... References 120
LIST OF FIGURES
Figure 1 Normalized Human Cones Response Curves (Data are from Simon Fraser University Colour Vision Lab) ......................................... 8
Figure 2 Receptor chromatic adaptation changes relative to cone sensitivity curves by shift from CIE D65 (Solid Line) to CIE A illuminant (Dashed Line) .................................................................... 10
Figure 3 The input data are shown in RED. Linear Support Vector Regression function fitting input data is shown in Black Line. All of data inside the insensitivity region are ignored. The support vectors are marked by RED '+'. ........................................................... 36
Figure 4 Geometrical interpretation of SVR (after Figures 1 and 2 of Bi and Bennett [54]). The left panel shows the input data (squares) as a function of the multi-dimensional feature vector v, the corresponding output is a single value y. The regression line is found by making 2 copies of the data and shifting them equal amounts up and down relative to the original data. The regression (dotted) line is found as the bisector of the line (arrow) between the two closest points on the convex hulls of the shifted data sets. The right panel shows the regression line from the middle panel superimposed on the original data. .......................... 37
Figure 5 (Log R, Log G, Log 5) obtained from three different surface reflectances under 102 illuminations at 15 various intensities. Each surface is plotted with a different colour. Each set lies close to a plane and the planes corresponding to the different surfaces are parallel. The three coloured lines indicate the new coordinate system in the log domain ..................................................................... 49
Figure 6 (a) Input image; (b) pixels identified as gray are shown in white corresponding to (a); (c) Corrected lmage based on GSI illumination estimation (d) Corrected lmage based on GW illumination estimation ......................................................................... 52
Figure 7 Median angular error in illumination chromaticity as a function of increasing training set size ............................................................ 56
Figure 8 (a) The original data set contains 11346 images, but the illumination chromaticities cluster around gray (0.33, 0.33). (b) The reduced data set contains 7661 images with a more uniform distribution of illumination chromaticity. ............................................... 63
Figure 9 (a) Original image containing the gray ball from which the colour of the scene illumination is determined. (b) Cropped image to be
........................... used for algorithm testing with gray ball removed
Figure 10 (a) A synthetic scene composed of two patches. The blue one is lit by tungsten light from the left; the red one is lit by D65 from the right. (b) The image (monocular version) input to stereo Retinex. The red line is the spatial edge between them, inserted manually in this case. (c) Both patches appear gray after stereo Retinex because they are isolated surfaces. .................................
Figure 11 Rewrite rules using in propagating edge information to the next lower resolution. An edge running through the middle of a 2-by-2 region is randomly assigned to one side or the other. Vertical edges are shown here. Horizontal edges are treated
................................................................................... analogously.
Figure 12 (a) From the center pixel, the three shaded pixels in the upper right can not be reached without crossing an edge. (b) The two pixels that can not be reached are shaded. ....................................
Figure 13 Results for synthetic images containing only a single edge down the middle of the image. The illumination on the left half is tungsten, and on the right D65. The black line indicates the manually defined edge (a) lnput image; (b) The benchmark image; (c) Standard McCann99 applied in log RGB space (d) Stereo Retinex applied using log RGB space (e) McCann99 result applied using the new LIS colour channels (f) Stereo Retinex applied using the new LIS colour channels with 3D edge information inhibiting propagation only within the illumination and intensity channels. ..........................................................................
Figure 14 Irregular boundary between the two regions. The edge separating the regions is defined manually. (a) lnput image; (b) the benchmark image; (c) standard McCann99 applied in log RGB space (d) Stereo Retinex applied using log RGB Space (e) McCann99 result applied using the LIS channels (f) Stereo
.......................................... Retinex applied using the LIS channels
Figure 15 Comparison of standard Retinex to stereo Retinex both in log RGB and in LIS coordinates operating on the image of a simple scene lit with bluish light from the left and reddish light from the right. (a) lnput image of a two-illuminant scene; (b) The white- point adjusted benchmark image; (c) Standard McCann99 applied in log RGB space; (d) Stereo Retinex applied using log RGB space; (e) McCann99 result applied to LIS new colour channels; (f) Stereo Retinex applied in new LIS colour channels with 3D edge information inhibiting propagation only within the illumination and intensity channels. ................................................
Figure 16 Edge map and recovered illumination: (a) Edges representing abrupt changes in surface orientation extracted from the stereo image pair are marked in white; (b) Chromaticity of illumination as estimated by stereo Retinex in LIS colour channels correctly shows a sharp change in illumination where the surface orientation changes; (c) Illumination field recovered by
............. McCann99 shows a much less distinct change in illumination 85
Figure 17 Real image performance comparison. (a) lnput image of two- illuminant scene of toys with uniform background illuminated with reddish light from the left and bluish from right; (b) White-point adjusted benchmark image; (c) Edge map in which the arrow indicates where edges completely isolate the toy's green tongue from all other regions; (d) Standard McCann99 applied in log RGB space; (e) Stereo Retinex applied using log RGB Space, the isolated small patch turns gray; (f) McCann99 result applied to channels of the new colour coordinate System; (g) Stereo Retinex applied in the new colour channels with 3D edge information inhibiting propagation only within the illumination and intensity channels, the isolated small patch is close to the green it should be as in the (b). (h)-(k) Error maps corresponding to the results from (d)-(g) in which large errors are shown as dark and zero error as white ............................................................................ 86
Figure 18 Real image performance comparison (a) lnput image of two illuminants scene of toy illuminated with colourful background lit by red light from the left-hand side and blue light from the right; (b) the white-point adjusted benchmark image; (c) standard McCann99 applied in log RGB space (d) Stereo Retinex applied using log RGB Space (e) McCann99 result applied to LIS colour channels (f) Stereo Retinex applied in LIS colour channels with 3D edge information inhibiting propagation only within the illumination and intensity channels. (g)-(j) Error maps corresponding to the results from (c)-(f) in which large errors are shown as dark and zero error as .....................................................
Figure 19 Real-image performance comparison (a) lnput image of single- illuminant scene of books illuminated soley by reddish light from the right; (b) The white-point adjusted benchmark image; (c) standard McCann99 applied in log RGB space; (d) Stereo Retinex applied using log RGB Space; (e) McCann99 applied in LIS colour channels; (f) Stereo Retinex applied in LIS space with 3D edge information inhibiting propagation only within the illumination and intensity channels. Note how the colour of the orange and yellow patches on the ball are recovered better in this case. Also the pink illumination cast is removed more completely. (g)-(j) Error maps corresponding to the results from
xii
(c)-(f) in which large errors are shown as dark, and zero error as white. ................................................................................................ 88
Figure 20 Median angular error as a function of the number of Retinex's iterations parameter. The number of iterations affects the distance with which lightness information propagates across the image. Results here are for processing Figure 10, but the trend is the same for the other scenes as well ................... .. ....................... 90
Figure 21 Synthetic image results. Top left: input image with a white line superimposed to indicate the illumination boundary. Top right: ground-truth image under equal energy white light. Middle left: Retinex result. Middle right: Retinex illumination map; Bottom left: SVR result. Bottom right: Retinex+SVR result. ............................. 98
Figure 22 Two-illuminant books scene: (a) input image with reddish light coming from the left and bluish from the right; (b) ground-truth image captured under white light matching the camera's white point; (c) Retinex result (d) SVR result (e) Retinex+SVR result ........ 101
Figure 23 Window scene: (a) input image with bluish outdoor illumination and red-orange indoor illumination. (a) input image (b) ground- truth image captured under white light that matches the camera's white point; (c) Retinex result (d) SVR result (e) Retinex+SVR result ................................................................................................ 103
Figure 24 Typical natural image with two illuminations: (a) input image; (b) ...................... Retinex result; (c) SVR result; (d) Retinex+SVR result 105
Figure 25 Median angular error as a function of the number of iterations Retinex used at each resolution. This plot is for the two- illuminant window scene; however, for other scenes the results are qualitatively similar. ................................................................... 106
Figure 26 First 3 basis vectors for surface reflectance, illumination and colour signal spectra as obtained by ICA, PCA, NNMF and NNICA. The horizontal axis is wavelength. The vertical axis is in terms of normalized power (illumination and colour signal) or fractional reflectance. ...................................................................
Figure 27 The pseudo-inverse of the surface reflectance, illumination and colour signal basis vectors. The horizontal axis is wavelength. The horizontal line at zero. Physically realizable sensors approximating these pseudo-inverses can be based on the portion of each curve on or above the zero line. Clearly, the approximation is likely to be best in the case of NNICA and NNMF. ..........................................................................................
Figure 28 Mean RMS error in spectral approximation (MRMS error) for surface reflectances, illuminations, and colour signals in the test set for each of the four methods as a function of the number of basis vectors used. .......................................................................
xiii
Figure 29 (a) A comparison of the mean RMS error in reconstructing the colour signal spectra with the actual and truncated pseudo- inverse vectors for the case of NNMF and PCA. Without truncation the NNMF and PCA results overlap (lowest curve); however, with truncation the PCA error increases substantially (with the exception of dimension 7) while the NNMF error increases marginally. (b) A comparison of the mean RMS error approximating colour signal spectra for all four methods using the truncated pseudo-inverse. ........................................................ 1 15
xiv
LIST OF TABLES
Table 1 Admissible Kernel Functions .............................................................. 40
Table 2 Results of k-fold kernel and parameter selection as a function of the histogram type and the number of training set images in SVR solutions ............................................................................................. 56
Table 3 Comparison of competing illumination estimation methods. All methods are trained on synthetic images constructed from the same reflectance and illurninant spectra and then tested on the same SONY DXC930 [55] camera images with identical pre- processing. Data marked by '* ' are extracted from [29] (Table II page 992) while the data marked by '**' are extracted from [67] (Table 2 page 79). .............................................................................. 57
Table 4 Comparison of TPS, GSI, 2D and 3D SVR performance to SoG, Max RGB, Grayworld performance. The results involve real-data training and testing on the 321 SONY images. Errors are based on leave-one-out cross validation evaluation and are reported in terms of both the RMS angular chromaticity and distance error measures ............................................................................................. 59
Table 5 Comparison of the different algorithms via the Wilcoxon signed- rank test. A '+' means the algorithm listed in the corresponding the row is better than the one in corresponding column; a '-' indicates the opposite; an '=' indicates that the performance of the respective algorithms is statistically equivalent .............................. 59
Table 6 Comparison of TPS, GSI, 2D and 3D SVR performance to SoG, Max RGB, Grayworld performance. The results involve real-data training and testing on the 900 uncalibrated images. The tests are based on leave-one-out cross-validation on a database of 900 uncalibrated images. The entries for C-by-C and the NN are from [4] (Table 7 page 2385). ........................................................... 61
Table 7 Comparison of the performance based on the Wilcoxon signed- I I ' ,-, rank test. Labeling '+ , - , - as for Table 5 ........................................... 61
Table 8 TPS, GSI and 3D SVR illumination estimation errors for different training and test sets with comparisons to the SoG with norm power 6, Max RGB, and Grayworld methods. ..................................... 64
Table 9 Comparison of the algorithms based on the Wilcoxon signed-rank test on angular error. SVR training set is Subset B. Test set for all methods is Subset A. Labeling '+', '-', '=' as for Table 5 .................. 65
Table 10 Comparison of the algorithms based on the Wilcoxon signed- rank test on angular error. SVR training set is Subset B. Test set for all methods is Subset A. Labeling '+', '-', '=' as for Table .............. 65
Table 11 Comparison of TPS,GSI and 3D SVR to SoG, Max RGB, and Grayworld. The results involve real-data training and testing on disjoint sets of 7,661 images from the Ciurea data set. ....................... 65
Table 12 Comparison of the algorithms based on the Wilcoxon signed- rank test on angular error. The results involve real-data training and testing on disjoint sets of 7,661 images from the Ciurea data set. Test set for all methods is Subset A. Labeling '+', I - ' , '=' as for Table 5. ....................................................................................... 66
Table 13 Performance comparison of the synthetic image cases from Figure 5 with straight edge boundary, and Figure 6 with an irregular edge boundary of SR LIS (stereo Retinex processed using LIS colour channels); SR (stereo Retinex processed using log RGB space), M99 LIS McCann 99 Retinex processed in LIS new colour channels); and M99 (McCann99 Retinex processed using log RGB space) .......................................................................... 82
Table 14 Two-illuminant real image performance comparison of SR LIS (stereo Retinex processed using LIS colour channels), SR (stereo Retinex processed in log RGB space), M99 LIS McCann 99 Retinex processed in LIS new colour channels), and M99 (McCann99 Retinex processed in log RGB space). ......................... 84
Table 15 Two-illuminant image toy with gray background performance comparison between SR LIS (stereo Retinex processed using LIS new colour channels); SR (stereo Retinex processed in log RGB space), M99 LIS McCann 99 Retinex processed in LIS colour channels); and M99 (McCann99 Retinex processed in log RGB space). ..................................................................................
Table 16 Two-illuminant image toy against a colourful background. Performance comparison between SR LIS (Stereo Retinex processed using LIS new colour channels); SR (Stereo Retinex processed in log RGB space), M99 LIS McCann 99 Retinex processed in LIS colour channels); and M99 (McCann99 Retinex processed in log RGB space). .......................................................
Table 17 Single-illuminant real image books scene performance comparison between SR LIS (stereo Retinex processed using LIS new colour channels); SR (stereo Retinex processed in log RGB space), M99 LIS McCann 99 Retinex processed in LIS space); and M99 (McCann99 Retinex processed in log RGB space) .................................................................................................. 89
Table 18 Comparison of MMax (see text for definition), RMS and median error on a per-pixel basis between the ground-truth image values
xvi
and the processed image values for processing by Retinex+SVR, McCann99 Retinex alone, and SVR alone ................... 98
Table 19 Comparison of the different methods via the Wilcoxon signed- rank test with 0.01 as the threshold applied to the angular errors. A "+" means the algorithm listed in the corresponding row is better than the one in the corresponding column. A "-" indicates the opposite. ................................................................................. 99
Table 20 Comparison for the two-illuminant books scene of MMax (see text for definition), RMS and median errors measured on a pixel- by-pixel basis between the ground-truth image values and the processed image values for processing by Retinex+SVR, Retinex alone, and SVR alone ........................................................... 101
Table 21 Comparison of the different methods via the Wilcoxon signed- rank test for the two-illuminant books scene. A "+" means the method listed in the corresponding row is better than the one in the corresponding column; a "-"' indicates the opposite; and a "=" indicates they are indistinguishable. ................................................ 102
Table 22 Comparison of MMax (see text for definition), RMS and median errors measured on a pixel-by-pixel basis between the ground- truth image values and the processed image values for
.......... processing by Retinex+SVR, Retinex alone, and SVR alone. 103
Table 23 Comparison of the different methods via the Wilcoxon signed- rank test for the window scene. A "+" means the method listed in the corresponding row is better than the one in the corresponding column. A "-"' indicates the opposite. ......................... 103
xvii
CHAPTER 1: THESIS OVERVIEW
In machine vision and image processing applications, colour is often used
as an efficient means of segmenting, identifying, and tracking a specific object.
Although colour in and of itself is often insufficient to perform such a task reliably,
colour can be used robustly in conjunction with other features. Therefore, colour
must turn out to be a handy and reliable tool to apprehend information from
images.
The colour responses captured by any digital imaging device result from
the interactions of the properties of the original object's surface reflectance, the
properties of the illuminant incident on the object surface, and the properties of
the camera sensors. Thus, colour information is stable when all of the images are
captured with a single camera and under uniform illumination conditions.
However, problems arise when the capturing conditions change. For example,
images appear to be reddish if a white surface is captured under tungsten
illumination or a greenish tone is captured under fluorescent lighting.
As a consequence, any imaging device system trying to use colour in a
favourable manner to extract some knowledge from images must recover the
surface colour and reduce the amount of colour variation that appears in different
views of the same scene or object. Such processes are identified with the
classical terms of colour constancy.
The thesis is devoted to the research of recovering accurate surface
colour through proposing colour constancy algorithms compensating colour
variation due to a change in the conditions of illumination, as well as analyzing
the basis modelling colour spectra.
Chapter 2 introduces the basic concepts and issues of colour vision and
colour constancy. The colour perception process starts with a source (or
sources) of light which has a specific distribution of energy over the wavelengths
of the visible spectrum. The light is reflected off the objects in the surrounding
environment, and each object reflects a fixed percentage of the energy at each
wavelength (the surface spectral reflectance or reflectance). Some of it enters
the eye of the observer where it is (selectively) absorbed by the cone pigments.
The cone output results from the response of the three human cone types to the
colour signal and is subject to further processing in both the retina and various
cortical areas. Therefore, the information about the characteristics of objects in
the scene carried by the colour signal varies with the illuminant. However, the
human colour perception has a chromatic adaptation mechanism that can identify
approximately the illumination-invariant surface colour descriptors. This is the
basis of colour constancy.
Chapter 3 describes the wide field of colour constancy trying to
encompass all the most interesting and important algorithms of which our
research has become aware. While some researchers are interested in finding a
transformation between image colours in order to make them resemble as much
as possible those under a reference light condition, such as Retinex, gamut
Mapping, etc, others restrict colour constancy to the estimation of the scene
illumination: Grayworld, Shades of Gray, Neural Network, and Colour by
Correlation [I-51 belong to this category. These two types of approaches are fully
interchangeable once a model of colour formation and variation is specified.
Chapter 4 is completely devoted to the introduction of the proposed three
colour constancy algorithms. Their main goal is to estimate the illumination colour
so that the images can be recovered as they would be seen under a canonical
illumination.
Considering that there exists a connection between image colour and
illumination colours, we use two techniques, Support vector Regression (SVR)
and Thin Plate Spline (TPS), to find the continuous function between colour
information from any image and illumination chromaticity values on it. As soon
will be seen in section 4.2, SVR has a number of similarities to the previous
colour constancy solutions. The basic idea is inherited from the Neural Network
approach, namely, extracting the relationship between the illumination
chromaticity value and the image colour binary histogram. Nevertheless, SVR is
simpler and better because it can reach the global optimization solution without
knowing the data distribution. The thin-plate spline interpolation technique is then
introduced to interpolate the colour of the incident scene illumination from an
image of the scene. TPS is a smooth function that interpolates a curve fixed at
the landmark points. It was originally developed for 2D image registration. Here
we extend it into high-dimensions to interpolate over a non-uniformly sampled
input space, which in this case is a set of training images and associated
illumination chromaticities. Compared with SVR, TPS is independently of any
predefined parameters.
The Gray-World colour constancy solution assumes that the average of
the colorimetric values from the image is the illumination colour. The Gray-World
algorithm also implies that all of the image colours have the same contribution to
the illumination estimation. The more reasonable assumption should be the
following: the closer the surface is to gray, the more contribution its colour has on
the illumination estimation. We try to identify those gray surfaces based on
deriving a new colour channel coordinate system that can separate the surface
information from illumination and intensity as independently as possible. This
method is proved to be fast and easy to implement without requiring large
training data.
The synthetic and real images' experiments show that all of these three
methods perform well and are comparable to other well-known algorithms.
Chapter 5 proposes some novel mathematical models for colour
constancy problems under multi-illuminants. Colour constancy solutions always
assume that the chromaticity of the scene illumination is constant across the
image, and the change in colour value is due to the surface reflectance, rather
than the illumination. However, the case is not true in the spatial scene where
any abrupt change from surface orientation may lead to an illumination change.
In this chapter, to improve surface colour estimation that may be lit under
different illuminations, we integrate Retinex with spatial-edge information
extracted from the stereo images. The basic idea is that the local neighbouring
pixels' comparison, introduced in Retinex, is prohibited across the spatial edge.
Meanwhile, these spatial edges can also lead to isolated patches that tend to be
gray after Retinex. Therefore, we further apply stereo Retinex in the new colour
channels described in the previous chapter, in which the ratio comparison is
allowed along the axis representing the surface change while the comparison is
still prohibited along the other two axes. The experiments show that stereo
Retinex outperforms standard Retinex in estimating accurate surface colour.
Chapter 6 continues the research on multi-illumination colour constancy
and tries to find a solution that can solve the problems from 'stereo Retinex'
introduced in the last chapter. There are two major disadvantages of stereo
Retinex. First, it requires stereo images derived from two or more images of the
same scene captured at the same time, which is impractical; the second
disadvantage is that different illuminants are not always separated by spatial
edges or a change in surface orientation, so we cannot easily identify where the
illuminations' change occur. To avoid these problems, this chapter gives a more
efficient solution on a single image by merging the benefits of two colour
constancy solutions, Retinex and SVR. For any scene under two or more
illuminations, Retinex can mitigate the illumination difference and push it to be
more uniform because it is based on the local comparison of neighbouring pixels.
Then it is followed by SVR to cancel out the illumination's effect globally. The
experiments with synthetic and real scenes indicate that this kind of hybrid
solution is very promising.
Chapter 7 presents the research on finite dimensional models for colour
spectra. An optimal imaging sensor sensitivity is also discussed. It is well-known
that colour can be always described in terms of tri-component vectors whose
values are from the projection of the colour spectrum onto the imaging device's
three spectral response curves. Multispectral imaging that provides the colour
spectra of a scene at multiple wavelengths can also generate accurate colour
information at each image pixel. However, using colour spectra requires storing
and processing lots of information. Therefore, it is necessary to represent the
spectra as a linear combination of a few principal spectra. Principal Component
Analysis (PCA) and Independent Component Analysis (ICA) have been studied
by many researchers. In this chapter, we introduce and analyze two other
nonnegative techniques, Nonnegative Matrix Factorization and Nonnegative ICA,
in finding basis vectors for finite-dimensional models of colour spectra. The other
interesting aspect of these two nonnegative techniques is that the pseudo-
inverse of the basis vectors includes trivial negative values. When we truncate all
of these negative values, the resulting vectors can serve as physically realizable
camera sensors that include maximal colour spectral information.
CHAPTER 2: BASICS OF COLOUR VISION AND COLOUR CONSTANCY
Colour perception is a sensation created in response to excitation of our
visual system by the visible region of the electromagnetic spectrum. James Clerk
Maxwell [6] showed that light is essentially a form of electromagnetic radiation
that contains radio waves, visible light, and X-rays. All of these radiations can be
represented as a spectrum of radiation; the electromagnetic radiation that
includes radio waves at one end and at the other end gamma rays. The visible
radiation wavelength range differs among distinct species. For humans, the
visible spectrum wavelength occupies a very small portion of the electromagnetic
spectrum, ranging from approximately 400 nm to 700 nm.
The human visual system consists of two important functional parts: the
eyes and part of the brain. The eyes detect light and convert it to electrical
signals by photoreceptors in the retina, while the brain does all of complex image
processing. The human retina has two types of photoreceptor cells: cones and
rods. We can distinguish colours because we have three distinct types of cones
that have the ability to separately sense three different portions of the spectrum.
We identify their peak sensitivities as red (580nm), green (540 nm), and blue
(480 nm). All rod light sensors are not sensitive to colour and are responsible for
dark-adapted vision.
Figure 1 Normalized Human Cones Response Curves (Data are from Simon Fraser University Colour Vision Lab)
The colour we assign an object depends on three components: the light
source that initiates the sensory process of vision, the object that absorbs and
reflects the light source at the selected wavelengths determined by physical and
chemical properties, and the human photoreceptors or imaging devices.
One consequence of this fact is that any change in the illuminant will result
in a corresponding change in the colour generated by each object in the scene.
Since natural and artificial lights vary substantially in both intensity and spectral
characteristics, the colour from a surface with a fixed reflectance can change
from one lighting condition to another. However, it is not true for our colour
perception. When we bring a red apple under different light conditions (sunlight,
the light of a fire, or a harsh electric light), although colour values from any
imaging device are different, our visual system can indicate that the object has
the same colour: the apple appears red. It shows that, despite the resulting
differences existing in the light reflected from a given object under distinct
illumination conditions, the colour that our visual system assigns to the object is
illuminant-independent. This kind of ability that can adjust to widely varying
colours of illumination in order to approximately preserve the appearance of
object colours is called chromatic adaptation [7].
Chromatic adaptation was defined by Wyszecki and Stiles in 1982 [8].
They proposed that the change in the visual response to a colour stimulus is
caused by (a) previous exposure to a conditioning stimulus (such as a luminous
coloured light or intensely coloured surface) or (b) simultaneous presentation of
the colour stimulus against a surround or background of a different colour. During
chromatic adaptation, a significant part appears to take place in the
photoreceptors and cortex, either as a change in the individual sensitivity curves
or in the response of the retinal secondary cells to human cones' outputs.
Figure 2 shows an example of shifts that occur as the illuminant changes
from daylight D65 to incandescent A. The increase in long wavelength light
bleaches proportionally more red information while lowering the green and blue
responses. On the other hand, the decrease in short wavelength light allows
more cones that are sensitive to short wavelength range to regenerate,
increasing the probability of blue responses.
Figure 2 Receptor chromatic adaptation changes relative to cone sensitivity curves by shift from CIE D65 (Solid Line) to CIE A illuminant (Dashed Line)
The first chromatic adaptation mathematical model was represented by
Johannes von Kries in 1902 [9], widely known as the 'von Kries transformation
model.' Many modified chromatic adaptation transformation models have been
also described in the literature [lo-111. To differentiate from colour values
notation with [R G B] from any captured image, here we use [RL GM Bs] to
represent the human cones' responses of long-wavelength light, middle-
wavelength light and short-wavelength light from some coloured surface
separately, since maximum stimulation is produced by a bright white surface.
Thus, the basic adaptation function is the ratio:
Where [RL GM Bs] is human cones' responses under any particular
illumination condition, [RL - white GM - white Bs - white] is responses of same scene under
the white illumination. Because the chromaticity values from the white patch
represent the illumination's colour information, the basic adaptation can also be
applied to predict colour matches across changes in viewing illumination
conditions. For example, if the daylight shifts from illuminant D65 to illuminant A:
Where [RL - A GM - A BS - A] and [RL - D65 GM - 065 BS - 12651 are human's responses
of surface with any colour under illumination A and D65 separately; [RL - ,,(A)
G M - ~ ~ ( A ) BS-wp(A)] and [ R L - W ~ ( D ~ ~ ) Ghn-wp(D65) B~-wp(D65) ] are tNJman's responses of
white patch under illumination A and D65 separately.
Computational colour constancy is an important research field that builds
mathematical models related to human chromatic adaptation. The final objective
of the model is to take full advantage of the full colour information available in the
typical tri-chromatic scene colours and reproduce an accurate colour-constant
estimation of the object. There are many definitions and explanations about
colour constancy. According to the definition from Foster et. al. [12]: "Colour
constancy is the constancy of the perceived colours of surfaces under changes in
the intensity and spectral composition of the illumination." Until now, many
theories of colour constancy explain how the visual system manages to extract
information about the reflectance of the objects in a scene from the colour
signals. Since this involves separating the contribution of the reflectance and the
illuminant in the colour signal, these theories are often characterized as
"discounting the illuminant." Perfect colour constancy in these terms would
involve accurate recovery of reflectance for any scene under any lighting
conditions. The measured colour of objects would be perfectly correlated with
their reflection characteristics and would not vary at all with changes in the
illuminant or the composition and arrangement of objects in view. However, this
type of perfect colour constancy is impossible, since the problem is under
constrained.
CHAPTER 3: SURVEY OF COMPUTATIONAL COLOUR CONSTANCY MODELS
The ultimate stage of the imaging system is to build a mathematical model
embodying the predominant phenomena occurring in the formation of colour
images. Therefore, all of the light source, the object, and the optical system
should be quantified. The light source is represented by its spectral power
distribution E(A), and the coloured materials are quantified through their spectral
distribution of the energy they reflect or transmit SR(A). The optical system is
specified by the spectral sensitivity function Ok(A). A general visual system can
be seen as an array of k sensors. Since the human retina has (or most imaging
devices have) three types of cones (or sensors) that respond to colour radiation
with different spectral response curves, k is always set to be 3, and colour is
specified by a tri-component colour.
Basically, two major processes are involved in colour formation: the colour
signal reaction on the object's surfaces and the camera measurement of the
colour signal coming from the reaction. For the first process, it is necessary to
describe the mutual interaction between light and object, called colour signal.
Colour signal is the product of light spectral power distribution and surface
reflectance at the corresponding individual wavelength, written by C(A) = E(A)*
SR(A). Accounting for the second issue, the way a sensor integrates the colour
signal falling onto the image plane over all the visible wavelengths must be
established, defined in equation (3).
Where [h, h2] is the interval where these sensors operate.
After sampling the wavelength at fixed interval, the whole equation can
also be rewritten in matrix format as:
p = c T * S R ~ * diag(E) * O (4)
Where P is the tri-vector formed by R, G, and B values, * is the matrix
operation, T is the vector transpose. Assume SN is the wavelength sampling
number; C, SR and E are vectorized colour signal, surface reflectance function,
and illumination spectrum individually. Each of them has size of SN. diag(E)
changes vector E into a matrix with size of SN by SN through setting all
elements from E along the diagonal and all other off diagonal elements 0. Matrix
0 is formed by three vectors, Ok with length of SN, column by column.
Colour formation determines that colour constancy is an ill-posed problem
because the surface colour and illumination colour are not uniquely separable.
So, depending on different assumptions and techniques, colour constancy
algorithms can be divided into the following three categories: (a) estimate the
illumination and surface reflectance distribution functions based on the
assumption about the dimensionality of spectral basis functions; (b) find the
image under canonical illumination, i.e., the object image, directly based on the
variation of surface and illumination colours and gamut; and (c) estimate the
illumination colour based on the assumption about scene colour distribution.
Barnard et.al. compare the performance of various colour constancy solutions
[14,15].
3.1 Finite-Dimensional Linear Model for Colour Constancy
This kind of colour constancy algorithms supposes that illumination and
surface reflectance can be accurately modelled by the dimensionality of the
spectral basis functions [15]. One of the most important works was given by
Maloney and Wandell [ I 6,171.
Following the notation introduced in equation 3, the ambient light spectral
power distribution and surface reflectance functions can be written by the linear
model with the basis functions. Assuming that the light can be represented by
D(E) basis functions E,(/Z) ( i = 1,2,. . . D(E)) and the corresponding weights are
specified by WE - ,, the light can be approximately defined by
Similarly, the surface reflectance can also be defined as the following:
Where SRi(/Z) ( i = 1,2,. . . D(SR)) are surface basis functions whose dimensionality
is D(SR), w s ~ - i are the corresponding weights.
Substituting surface linear models into equation 4 (Matrix Format), the sensor
responses become:
The component at ik-entry of matrix AE is ~ ~ i ~ * d i a ~ ( ~ ) * Ok, in which SRi is the
basis function vector format after sampling. WE is formed by weights WSR - i. and '*'
denotes matrix multiplication. So for any given illumination, E, there is a linear
relationship between the sensor responses and surface basis functions'
coefficients. Similarly, for any fixed surface reflectance, sensor responses can be
determined linearly by illumination basis functions coefficients.
p = A S I ~ * ~ S R
(Where the element at ik-entry of Asn is SR~' *diag(Ei)*Oa, and (8) WSR is formed by corresponding surface basis fiinctions' weight Wylii)
To avoid the solution being under-determined and obtain the unique
surface recovery, two limitations are imposed on the variations of lights and
surfaces: (a) the ambient light is relatively constant, while the spatial variation in
the sensor response is entirely due to the surface reflectance; and (b) the
illumination dimensionality is k and the surface dimensionality is k-I if there are k
sensors. Otherwise, assume there are q surfaces in the scene, we will have at
least kxq + k unknowns and only k x q equations, the unknowns' number will be
always larger than the equations' number, so the unique solution is impossible.
Maloney explored three major steps for colour constancy: (a) identify
subspace that contains the set of sensor vectors, since the sensor responses
can be viewed as illumination coefficient weights are projected onto it (from
equation 8), (b) recover the light vector from the vector perpendicular to the
sensor data, and (c) solve the surface reflectance coefficients through the
function WsR = k * inv(AsR) with the conventional pseudoinvese computation of
matrix As, once the light is known. Further details about the implementation of
the algorithm can be found in [ I 6,171.
Although this solution shows good performance on the Munsell chip
database, it is not practical for the real scenes for two reasons. First, the
illumination and surface should be represented by 2 and 3 basis functions,
respectively, which have been shown to be inaccurate by many researchers.
Cohen [I81 found that Munsell colours depend on 3 or more components. Malony
[ I 91 concluded that 5 to7 basis functions are appropriate. Parkkinen[20] analyzed
1257 reflectance spectra and suggested that 8 basis functions can lead to
accurate reproduction. Laamanen analyzed two illumination and surface
reflectance datasets and demonstrated that at least 10 basis functions are
needed 1211. Second, the variation in surface colour in a three-dimensional colour
space would follow a plane, but these assumptions can only be true under
specifically controlled illumination.
3.2 Object Image Recovery
The object colour image can be viewed as the object under certain
canonical illumination, normally the white illumination with equal energy at all
wavelengths. So given any colour image under unknown illumination,
compensation for the illumination effect on images and recovery of the original
object image is another type of colour constancy solution.
3.2.1 Retinex
Retinex is one of the most famous colour constancy algorithms. It
originated from Land's landmark research work on human vision [22]. Land
proposed that the absolute values of photo-pigment absorption in the eye do not
explain colour appearance. Rather, colour appearance depends on relative
absorption of light by the cones and their spatial pattern in the eye, making vision
independent of the illumination at various locations and dependent instead on the
path followed by the light reaching the eye. He named it 'Retinex' because he
believed that this mechanism comes from the integration of 'retinal' and 'cortex.'
Given a colour image, the basic idea of Retinex is to separate the
illumination from the reflectance image by processing three images lk (k = R,G,B)
independently. If the sensor sensitivity function 0 is sharp enough, i.e. close to
dirac delta function, the intensity value at location (x,y) , Ik(x,y), can be
decomposed into two different values: the illumination image Ek(x,y) and
reflectance image SRk(x,y), thus Ik(x,y) = Ek(x,y) * SRk(x,y). Retinex assumes
spatial smoothness of the illumination field, i.e. the illumination change smoothly
on the scene, while the reflectance image corresponds to the sharp change in
the image.
Retinex computation is always implemented in the log domain so that the
multiplications can be replaced by the additions. If i = log I, e = log E, and sr = log
SR, we have i = e + sr. All of the up-to-date Retinex algorithms have the same
processing framework, except the actual illumination estimation solutions.
The original Retinex algorithm was also proposed by Land. In his solution,
any pixel is selected as the starting pixel. Several paths from the pixel can be
formed by randomly selecting neighbouring pixels. Along each path, the
accumulators of difference between two neighbouring pixels are updated at
pixels, and the total number of accumulators is defined as 'path length'. The final
recovered object image can be obtained through divide the accumulator value at
each pixel by the total paths' number passing it. Therefore, parameters, such as
path length, number of paths, and how a path is calculated, are very important in
'Retinex.' A discussion about their tuning can be found in [22-241.
Since Land proposed this algorithm, many variants of Retinex have been
proposed. Stockham [26] and Faugeeras [27] present that illumination and
surface are low-pass and high-pass results, respectively, after applying a
homomorphic filter on the input image in the logarithmic domain. Horn [28]
formalized Retinex in terms of differentiation, thresholding, and re-integration in
the logarithm domain. Multi-resolution versions of Retinex were introduced for
efficiency [29]. Kimmel[30] adopted a Bayesian view point of the estimation
problem and proposed a variational model for the Retinex problem. This model
can formulate the illumination problem as a Quadratic Programming problem and
can unify previous Retinex solutions. Two versions of Retinex have been given
standardized definitions in terms of Matlab code [23].
3.2.2 Gamut Mapping
Another well-known algorithm is gamut mapping, originally introduced by
Forsyth and extended by Finlayson [31]. The gamut of any illumination is the set
of all possible observed colours under it. If all of these colours are drawn in a
chromaticity space, the gamut is closed, convex, and bounded. Based on the
linear model theory, the goal of gamut mapping to find a transformation matrix
that maps the gamut under unknown illumination to that under canonical
illumination, so that the image colour under canonical illumination can be derived.
Forsyth founded his work on the assumption that scenes consist only of flat,
matte surfaces and that the illumination is spatially constant. The rgb's values
under any illuminant will form a convex hull and change between them are
related by a diagonal matrix. He developed an algorithm, named CRULE, to find
a transformation family. Although CRULE performs very well provided that the
assumed world restrictions are satisfied, it fails when the scenes contain specular
highlights, spatially varying illumination, and surface orientation information. To
address this problem, Finlayson ignored the image intensity information by
mapping the 3D (R,G,B) spaces into 2D chromaticity space. The same CRULE
was run directly on 2D perceptive colours to produce all possible transformations.
Since CRULE can only create a set of feasible maps, the final step is to choose
one to represent the unknown illuminant. One way of doing this is to find the map
that takes all image colours into the canonical gamut such that image colours are
made as colorful as possible, which can be achieved by finding the maximum
area feasible map.
3.3 Illumination Estimation for Colour Constancy Another category of colour constancy is to estimate the illumination
values, either two chromaticity parameters (x,y or r,g) or 3 descriptors (X,Y,Z or
R,G,B). All of these algorithms can be further divided into two groups:
unsupervised estimation and supervised estimation. Unsupervised algorithms
predict the illumination information directly from a single image based on some
assumptions about the general nature of the colour components of images while
supervised ones always include two steps: the first one is to build a statistical
model between the input images and the output known illuminations by learning
training data sets, and the second one is to predict the illumination of any given
image based on the model.
3.3.1 Unsupervised Illumination Estimation
MAXRGB
The MAXRGB algorithm assumes that there is always a white surface in
the scene. The maximal RGB values corresponding to the responses from this
white surface represent the illumination estimations [32]. The MAXRGB solution
can be viewed as a special case of Retinex. Obviously, this method will be
successful providing that a scene contains either a single surface that is
maximally reflective throughout the range of the sensitivity of the imaging device
(i.e., a white surface) or a number of surfaces that are maximally reflective
throughout the range of each of the three imaging sensors individually [33].
In spite of its simplicity, MAXRGB does not give a reasonable
performance for a real world scene because the algorithm's assumption cannot
be easily met.
GRAY WORLD
The Gray-World algorithm assumes that, given an image with a sufficient
number of surface colour variations, or with a uniformly gray surface, the average
value of the surface tends to be gray. The departure from the gray values is
considered the illumination estimations. Therefore, the average RGB in the
image is estimated as the illumination colour [I].
This assumption is generally valid since in any given real world scene, we
often have lots of different colour variations. As the surface colour variations are
random and independent, it would be safe to say that given a large enough
number of samples, the average should converge to the mean value, which is
gray. For instance, if an image were shot with a camera under yellow lighting, the
camera output image would have a yellow cast over the entire image. The effect
of this yellow cast disturbs the Gray-World Assumption of the original image. By
enforcing the assumption on the camera output image, we would be able to
remove the yellow cast and re-acquire the colours of our original scene, fairly
accurately.
Shades of Gray
Max-RGB and Gray-World algorithms will work very well only if the
average scene is gray or a white patch in the scene. G.D. Finlayson et. al. [2]
proposed a more general light colour estimation method with the Minkowski
norm, assuming the image scene appears to be gray after applying an nonlinear
invertible transformation (p-norm function is selected here) at every pixel in each
channel.
Without loss of generality, let's consider the red component of colour
image. All of the red responses can be rewritten into a vector R,. = [R, , R, ,..., R,,,]
with image size of N. The corresponding values are R," = [R,", Rl, ..., R,:] in the n-
power raised image. If the scene of the raised image tends to be gray, the
2R : - 1=I illumination red component value can be estimated by R,, - ,,,iscr, - , and the
A'
illumination value for the original image is R,E =d= . This equation is the
Minkowski norm definition at channel R. We can use a similar method to find the
illumination estimation for channels G and B.
Obviously, Max-RGB and Gray-World algorithms are two instantiations of
the Minkowski norm by setting p = o~ and p = 1, respectively. In this paper [2], the
authors claimed that the algorithm can reach the best performance when the
norm value is set to 6.
Colour Constancy based on Gray-Edge Hypothesis
J. Weijer and Th. Gevers [34] proposed a colour illumination estimation
algorithm assuming that the average of the reflectance difference in a scene is
achromatic.
The authors explained this solution as skewing colour derivatives
distribution such that the average output corresponds to the white light direction
in the opponent colour space. Assuming the colour image value at location (x,y)
is [R(x,y) G(x,y) G(x,y)], the colour derivative value can be represented by :
The following step is to transform the colour derivatives into opponent
colour space by
R' - G' OC, = JZ
R' +G' - 2 ~ ' OC, =
&
I R' + G' + B' OC, =
43
OCs represents the white light direction. If the illumination is white, the
long axis from the colour derivative values' distribution will coincide with the third
component. Otherwise, the colour derivative values depart from the white-light
axis. So the average values of all the colour derivative values from the whole
image give the illumination estimations.
To improve the overall performance, the authors further point out that this
hypothesis can also be extended by incorporating it into n-th Minkowski norm,
just as the 'Shades of Gray' colour constancy does for the 'Gray-World'
algorithm.
3.3.2 Supervised Illumination Estimation
Colour by Correlation
Finlayson et.al. has proposed a method, called 'Colour by Correlation,'
that builds the correlation matrix to correlate the probability of image colours with
each possible illuminant [5]. This matrix is built from a large set of colour images
and corresponding known illuminations. To cancel out the effects of intensity,
geometry, and shading, these images' colours are changed into chromaticities
and then mapped to histograms bins. The rows in the matrix are all of predefined
chromaticities, the columns are known illuminants from the training data set, and
the entity in the matrix is the likelihood of an image chromaticity under a given
light. During the test stage, the image colour is transferred into a binary vector in
which '1' or '0' indicates the presence or absence of the corresponding
chromaticity in the image. This vector is a dot-product with each column of the
correlation matrix, and the illumination with the maximal value is predicted to be
the estimation result. The other contribution of this paper [5] is that it proves that
this framework is general and can be used to describe many existing algorithms.
Barnard et. al. [35] improved the promising 'Colour by Correlation' method by
extending it into the 3D colour space. In addition to chromaticity, the extra
information used is pixel brightness.
Neural Network
A multi-layer neural network was established to learn the relationship
between illumination chromaticity and colour distribution in the image, and then
to predict the unknown illumination from an image [3,4]. The training input is the
image chromaticity binary histogram. The (R,G) space is divided into cells 0.02
units wide so that it includes 2500 bins as input layer nodes. ' 7 ' or '0' in each bin
represents the presence or 'absence' of certain chromaticity. The neural net has
two hidden layers: one has 400 nodes and the other has 30 nodes. Two output
nodes with real value are the corresponding illumination chromaticities. It is
trained with the backpropagation algorithm with a sigmoid activation function.
Colour Constancy by KL-Divergence
Many colour constancy algorithms attempt to use a statistical model to
estimate the maximum likelihood values of illumination, C. Tosenberg et, al. use
maximum likelihood and KL-divergence as the solution [36].
Assume [R(x,y)lG(xly)lB(xl~)l and [Rc(x,y),Gc(x,y),Bc(x,~)l are the
responses under an unknown illumination or some canonical illumination for the
same pixel respectively. The Von Kries diagonal transformation can tell us the
relationship between them:
If we ignore only the absolute intensity values of illumination, the colour
constancy problem can be solved by only estimating a and P while restricting y to
be 1.
Considering the image in log-chromaticity space, the illumination change
means colour values shift:
likelihood technique:
The Pr(a,P) can be assumed to be uniform, so
If the logarithm operation is applied, then the expression can be simplified
into
\I
Log{Pr(ff7P1 I;,,,z -- , - (x ,y) ,F I , , , ( x , ~ ) ) } = ~ ~ P I . ( F ~ ~ ~ r ( ~ 1 3 ~ , ) 9 F 1 0 p _ R ( ~ i 7 ~ , ) I u 9 P ) (14) ,-I
Another solution is from the KL divergence, the measure of the closeness
of two distributions. The closer the two distributions are to each other, the smaller
the value the KL-divergence output.
A two-dimensional histogram with ml*m2 bins can be built up to find the
Pr(F,og-,.(~, y), cog - (x, y) I a,P) by using a large training data set. The probability
value can be viewed as the percentage of observed colour value given a set of
illumination-related parameters (a,P).
These two equations look very similar. But the authors point out two major
differences between them: the first one is from the scoring possible matching
functions' definitions between the canonical colour distribution and given colour
image distribution, and the second one is from the conditions when the best
match is reached.
3.4 Multiplicative Cues to Illumination
Each computational colour constancy algorithm can be considered as
applying a potential cue to the illumination present in a scene. However, various
cues are always simultaneously available in the scene that provides valuable
information about colour perception. The human vision system attempts to
combine them, may ignore some in favour of the others, or may attempt to
represent the two as dominant perceptions. These cues not only come from the
colour information in the image but also include the scene background, object
spatial arrangement, surface orientation, binocular disparity, and other factors.
The original research in this field is indebted to the Gilchrist's work in 1977
[37]. He performed a series of experiments to investigate the effect of spatial
arrangement on human lightness constancy and proposed that the retinal ratio
alone cannot tell us the whole story. The simulated scene includes trapezoids
whose perceived orientation and shape are under different viewing conditions. All
of these trapezoids are arranged to be coplanar with one or the other of two
background planes, which were perpendicular to each other. The psychophysical
experiments proved the 'coplanar ratio hypothesis': the perceived lightness of the
object is only controlled by the luminance relationship between coplanar regions
that have same depth; those non-coplanar regions are substantially irrelevant
although they may be retinally adjacent. The luminance relationship between the
target and non-coplanar regions (despite retinal adjacency) is trivial.
Yamauchi and Uchikawa [38] investigated the effects of depth information
on perception by measuring the stimuli's upper-limit luminance in a three-
dimensional environment. In their experiments, the stimulus was presented in
one room, and the observers sat in another room. They were required to adjust
the luminance of a test colour and set the level perceived to be the limit of
surface-colour mode. The test stimulus and those surrounding stimuli composed
of 10 colours were at different spatial locations. The results strongly support the
coplanar importance on the mode of colour perception.
People observe naturally any scene binocularly. So binocular disparity can
also provide a lot of information, especially spatial depth information, which
should be very useful in colour perception. Yang and Shevell [39] found that
binocular disparity improved colour constancy. For their research, they set up a
kind of special equipment that can generate two images displayed on two
monitors controlled by two CPUs. The subject' left and right eyes focused on
separate video displays reflected by two mirrors, positioned so that the viewer
could see a fused image. A keyboard was also prepared to set the matching
chromaticity value of the test patch under different conditions. The experiments
show that the binocular disparity is indeed an important factor in colour
perception.
Another important cue is the orientation of any object's surface. Its
influence on the lightness was examined by Boyaci, Maloney, and Hersh [40]. In
this project, a test patch with seven orientations was used. The scene was lit
under a mixture of diffuse and point light sources. Six observers participated in
the experiment. In each trial, the observer used the mouse to control a
monocular stick-and-circle gradient probe superimposed on the middle of the test
patch, estimate the orientation patch, then match the lightness of the test patch
by choosing one of the reference chips. The experiments showed that human
perception of orientation was nearly veridical.
In addition to depth and binocular disparity, there are other valuable
environmental factors affecting colour perception of any surface. Yang and
Maloney [41] evaluated and determined if the human vision system takes
advantage of three illumination cues: specular light, full surface specularity, and
uniform background. Some specular, coloured spheres were placed on a uniform
plane perpendicular to the experiment participant's sight line. The viewer sat at
the open side, positioned in a chin rest, and gazed at a large, high-resolution
stereoscopic display. Two standard illuminations, D65 and A, lit the scene. The
viewers were required to adjust a small coloured patch until it appeared to be
gray. The achromatic settings from different candidate cue configurations were
evaluated in CIE u'v' space. The experiments showed that colour perception is
affected by several factors to different degrees. The surface specular cue is
significant for illumination, and the other two have trivial influence.
Maloney also proposed a plausible framework for analyzing human
surface colour perception based on the weighted average of illumination cues
1421. The weights corresponding to different cues varied from location to location
within a scene, reflecting the importance of illumination information available from
each type of cue. For example, in the scene with uniform background, little
weight should be given to illumination from background cue. His experiments
show that the cue promotion and dynamic reviewing intervene to assign the
weights.
CHAPTER 4: COLOUR CONSTANCY UNDER UNIFORM ILLUMINATION
lllumination estimation is fundamental to white balancing digital colour
images and to understanding human colour constancy. In this chapter, we will
present three advanced illumination colour estimation solutions: Support Vector
egression', Thin Plate Spline and Gray Surface ldentification2. All of these three
proposed solutions are compared with other published methods, including neural
network colour constancy, colour by correlation, and shades of gray. The
synthetic and real images experiments show that their performances are
comparable to the other colour constancy solutions.
4.1 Introduction
Accurate estimation of the spectral properties of the light illuminating an
imaged scene by automatic means is an important problem. It could help explain
human colour constancy and it would be useful for automatic white balancing in
digital cameras. Here we will focus on machine-based colour constancy. A colour
imaging system will be considered to be colour constant to the degree to which it
is able to account for changes in the colour of the scene illumination and thereby
maintain a stable representation of object colours.
1 The work on this method also appears as a published paper: Weihua Xiong and Brian Funt, "Estimating lllumination Chromaticity via Support Vector Regression", Journal of Imaging Science and Technology, Vol. 50, No. 4, pp. 341-348, JulylAugust 2006
2 These two methods have been submitted for USA and Korean Patents by Samsung Corporation
More precisely we can formulate colour constancy as: Given a digital
image acquired under unknown illumination conditions, predict what the image
would have been if the same scene had been illuminated instead by some
chosen known 'canonical' illuminant. For example, the canonical illuminant might
be specified as equal-energy white. Colour constancy can be divided into two
sub-problems: (a) estimate the colour of the illumination, (b) adjust the image
colours based on the difference between the estimated and canonical
illuminants. The second problem is often addressed by the von Kries coefficient
rule or an equivalent diagonal transformation model [43]. Because it is very under
constrained, the first problem, illumination estimation, is the more difficult of the
two. Here we will introduce three new solutions: Support Vector R.egression,
Thin Plate Splim, and Gray Surface Identification. The first two cstiinate the
illuini~lation clironlaticity values while tlic third one gives out, the illumination
coloriirletric values.
4.2 lllumination Chromaticity Estimation by Support Vector Regression
lllumination chromaticity estimation from support vector regression is
similar to previous work by Funt et. al. 14,441 and Finlayson et. al. [5] in that it
aims to recover the chromaticity of the scene illumination based on the statistical
properties of binarized colour or chromaticity histograms; however, the proposed
method replaces the neural networks and Bayesian statistics of these previous
methods with support vector machine regression.
Vapnik's [45,46] Support Vector Machine theory has been applied
successfully to a wide variety of classification problems [47-501. Support vector
machines have been extended as well to regression problems including financial
market forecasts, travel time prediction, power consumption estimation, and
highway traffic flow prediction [51-531.
Depending on the problem domain, support vector machine based
regression (SVR) can be superior to traditional statistical methods in many ways.
SVR enables inclusion of a minimization criterion into the regression, training can
be easier, and it achieves a global rather than local optimum. It also facilitates
explicit control of the tradeoffs between regression complexity and error.
4.2.1 Support Vector Regression Introduction
SVR estimates a continuous-valued function that encodes the
fundamental interrelation between a given input and its corresponding output in
the training data. This function then can be used to predict outputs for given
inputs that were not included in the training set. This is similar to a neural
network. However, a neural network's solution is based on empirical risk
minimization. In contrast, SVR introduces structural risk minimization into the
regression and thereby achieves a global optimization, while a neural network
achieves only a local minimum [54].
Most classical regression algorithms require knowledge of the expected
probability distribution of the data. Unfortunately, in many cases, this distribution
is not known accurately. Furthermore, many problems involve uncertainties such
that it is insufficient to base a decision on the event probability alone.
Consequently, it is important to take into account the potential cost of errors in
the approximation. SVR minimizes the risk without prior knowledge of the
probabilities.
Smola and Scholkopf [45] provide an introduction to SVR. Some simple
intuition about it can be gained by comparison to least-squares regression in
fitting a line in 2-dimensions. Least squares regression minimizes the sum of
squares distance between the data points and the line. SVR maximizes the
space containing the data points subject to the minimization of the distance of the
points to the resulting line. The width of the space is called the 'margin'. Points
within an 'insensitivity' region are ignored. The technique represents the region
defined by the margin by a subset of the initial data points. These data points are
called the support vectors. A linear SVR example for a set of data points is
shown in figure 3.
Figure 3 The input data are shown in RED. Linear Support Vector Regression function fitting input data is shown in Black Line. All of data inside the insensitivity region are ignored. The support vectors are marked by RED '+'.
SVR is extended to the fitting of a non-linear function by employing the
kernel trick [45], which allows the original non-linear problem to be reformulated
in terms of a kernel function. The reformulated problem is linear and can be
solved using linear SVR. The advantage of such way is that we can disregard the
actual mapping function and concentrate only on the kernel function. However,
there are a lot of constraints for the kernel functions so that not every arbitrary
function is suitable for application in SVR. For each kernel function, it should
satisfy the Mercel Conditions[45]. We used the Chang and Lin [55] SVR
implementation.
An intuitive geometric interpretation of SVR in terms of distances between
the convex hulls of the training sets is provided by Bi and Bennett [56]. Figure 4
shows the basic idea for the simplest case of a linear fit with hard margins.
Copies of the original data are made and shifted vertically, one up, one down,
along the y direction by equal amounts. The two sets of data are then considered
to be two groups to be classified. The regression line is determined as the line
that best separates the two groups into two classes. The best separation is found
by considering the convex hulls of the two sets and the locations where the hulls
come closest to one another. The perpendicular bisector of the line between the
two closest points provides the optimum separation between the classes, and
also is the regression line to the original data.
Figure 4 Geometrical interpretation of SVR (after Figures 1 and 2 of Bi and Bennett [56]). The left panel shows the input data (squares) as a function of the multi-dimensional feature vector v, the corresponding output is a single value y. The regression line is found by making 2 copies of the data and shifting them equal amounts up and down relative to the original data. The regression (dotted) line is found as the bisector of the line (arrow) between the two closest points on the convex hulls of the shifted data sets. The right panel shows the regression line from the middle panel superimposed on the original data.
4.2.2 SVR for Illumination Chromaticity Estimation
In this section, we discuss how the SVR technique can be applied to
analyze the relationship between the image of a scene and the chromaticity of
the illumination chromaticity incident upon it. As introduced in the neural network
method [4], we will first use binarized 2D chromaticity space histograms to
represent the input image data. Later, we extend these histograms to 3D to
include intensity as well as chromaticity. Chromaticity histograms have the
potential advantage that they discard intensity shading, which varies with the
surface geometry and viewing direction, but is most likely unrelated to the
illumination's spectral properties.
The training set consists of histograms of many images along with the
measured rg chromaticities (r=R/(R+G+B) and g=G/(R+G+B)) of the
corresponding scene illuminants. Each image's binarized chromaticity histogram
forms an SVR binary input vector in which each component corresponds to a
histogram bin. A '1' or '0' indicates that the presence or absence of the
corresponding chromaticity in the input image. Partitioning the chromaticity space
equally along each component into N equal parts yields N x N bins. The resulting
SVR binary input vector is of size N ~ . We experimented with various alternative
choices for N and eventually settled on N=50. Generally speaking, for N<50, the
bins are too large so the colour space is quantized too coarsely, with the result
that the illumination estimation error increases. For N>50, the training time
increases, but without a corresponding improvement in overall performance. All
the results reported below are based on N=50, so the chromaticity step size is
0.02. With o 2 1;s s rand r+g = I -b < I, only half these bins can ever be filled, so a
sparse matrix representation was used. Support vector regression then finds the
function mapping from image histograms to illuminant chromaticities.
Since some other illumination estimation methods [14,15,57] (gamut
mapping and colour by correlation) benefit from the inclusion of intensity data, it
is natural to consider it in the SVR case as well. The neural network method has
thus far not been applied to 3D data (chromaticity plus intensity) because the
number of input nodes becomes too large and the space too sparse for
successful training, given the relatively small size of the available training sets.
Support vector regression handles sparse data reasonably well, so we
experimented with 3D binarized histograms in the training set. Intensity, defined
as L = R + G + B, becomes the third histogram dimension along with the r and g
chromaticity. We quantized L into 25 equal steps, so the 3D histograms consist
of 62,500 (25x50~50) bins.
4.2.3 Histogram Construction
To increase the reliability of the histograms, the images are pre-processed
to reduce the effects of noise and pixels straddling colour boundaries. We have
chosen to follow the region-growing segmentation approach described by
Barnard et, al. [14,15], in which each pixel that does not belong to any created
region will be used as seed. Each segmented region will correspond to one
histogram bin, and those regions with very few pixels will be ignored, since they
will be considered noisy. This also facilitates comparison of the SVR method to
the other colour constancy methods Barnard et. al. tested. The region-growing
method is good because the borders it finds are perfectly thin and connected.
Membership in a region is based on chromaticity and intensity. A region is only
considered to be meaningful if it has a significant area. For the sake of easy
comparison, we used the same thresholds as 1141; namely, to be in the same
region, the r and g chromaticities at a pixel must not differ from their respective
averages for the region containing the pixel by more than 0.5%, or its intensity by
10%. Also, regions that result in an area of fewer than 5 pixels are discarded.
The RGB's of all pixels within each separate region are then averaged, converted
to L, r, g and histogrammed.
4.2.4 K-Fold Cross Validation for SVR Parameters
The performance of SVR is known to depend on its insensitivity parameter
E, regularization parameter C, the choice of kernel function and associated
parameters. Different kernel functions work better on some problem domains
than others. Four of the commonly used kernel functions are listed in Table 1.
From a practical and empirical standpoint, the bigger the insensitivity parameter
E , the fewer the support vectors, and the higher the error in estimating the
illumination. After much experimentation with different E values, we fixed it to be
0.0001.
Table 1 Admissible Kernel Functions
Name Linear Polynomial
Radial Basis Function (RBF)
Sigmoid*
In the case of SVR for illumination estimation, the best choice of kernel
function and its parameters may depend on the training set. We eliminated the
Sigmoid kernel function from further consideration since it is known to be invalid
for some values of the parameter rr [46] and focus instead on the RBF and
polynomial kernel functions. This leaves the choice of either the RBF or
(*: For some rr values, the kernel function is invalid)
Definition K(x,,xJ = (x;) 'xi K(x,,xJ = [(x3 'x i+ lP
K(x,, xj) = e-711~'~-xl 11' K(x,, xJ =tanh[(xJ ' xi+rr]
Parameters. ---
d
j/
rr
polynomial kernel functions and for each of these kernels their parameters:
penalty C and width y for the RBF kernel function; or penalty C and exponential
degree d for polynomial kernel function. The parameters y and d control the
corresponding kernel function's shape, while C determines the penalty cost of
estimation errors. The kernel choice and parameter settings are made during the
training phase by k-fold cross validation, which involves running the training
using several different parameter choices and then selecting the choice that
works best for that particular training set. This is described in more detail below.
For the RBF kernel function, we allow the penalty parameter to be chosen
from 4 different values C ~ ( 0 . 0 1 , 0.1, 1, 10) and the width value from y ~(0 .025 ,
0.05, 0.1, 0.2). For the polynomial kernel function, we used the same 4 penalty
candidates and selected the best degree d from the set (2 3 4 5). Thus for each
training data set, 32 test cases (2 kernel choices with 16 pairs of parameter
settings each) are tested to find the best choice.
Among the algorithms generally used to find the best parameters for
support vector regression, we chose k-fold cross validation because it does not
depend on a priori knowledge or user expertise and it handles the possibility of
outliers in the training data. The disadvantage of the k-fold method is that it is
computationally intensive.
In k-fold cross validation, the whole training set is divided evenly into k
distinct subsets. Every kernel function and each of its related parameters forms a
candidate parameter setting. For any candidate parameter setting, we conduct
the same process k times during which (k-I) of the subsets are used to form a
training set and the remaining subset is taken as the test set. The RMS
chromaticity distance errors from k trials are averaged to represent the error for
that candidate parameter setting. The parameter setting leading to the minimum
error is then chosen and the final SVR training is done using the entire training
set based on the chosen parameter setting.
4.3 Illumination Colour Estimation Using Thin Plate Splines
A new approach basing on the technique of thin-plate interpolation to
illumination estimation for colour constancy and automatic white balancing is
developed in this section. In this method, we will treat illumination estimation as a
problem of interpolation over a set of training images
Interpolation is a common problem and there are many well-established
interpolation methods[58]. The majority of these methods, such as bilinear or bi-
cubic interpolation, are based on interpolation over training data sampled on a
uniform grid. However, we map image information to illumination r-chromaticity
and g-chromaticity values and use the image's r and g chromaticity values as
inputs. We can not uniformly sample the space of images because the difference
between any two images' chromaticity values is not fixed, so interpolation over a
non-uniformly sampled space is required. Thin-plate spline interpolation is an
effective interpolation method under these conditions, and has been widely used
in the context of deforming one image into registration with another.
4.3.1 Thin Plate Spline Method Introduction
As is typical of interpolation methods in general, thin-plate spline (TPS)
interpolation constructs a function that matches a given set of data values yi,
corresponding to a given set of data vectors T=[X. ,.I ,?,... x i,,, 1 , in the sense that yi
= f(x,).
TPS interpolation was originally designed for 2-dimensional image
registration[59-621. Previously, we extended TPS to 3 dimensions, and
successfully applied it to the problem of camera and colour display calibration
[63]. Compared with other methods, TPS has been found to be quite stable and
accurate in terms of finding a unique solution without having to tune a lot of
parameters. Here, we extend TPS to n-dimensions and apply it to the problem of
estimating the chromaticity of a scene's overall incident illumination from an
image of that scene. Different from SVR introduced in the last section and other
previous methods [4,5] that have used a colour histogram as the input data, TPS
uses image thumbnails as input. The thumbnails are 8x8 pixel images created by
averaging the underlying pixels in the original input image. These thumbnails in
chromaticity coordinates become input vectors of size 8 x 8 x 2 = 128.
TPS for illumination estimation also requires a "training" set of N images' r
and g chromaticity values along with their corresponding illumination chromaticity
values {(li,l,li,2,. . .livl28)1 (fi,g;)}. TPS determines parameters wi and a, controlling
the two mapping functions f , f,, such that
The mapping function f,, is defined as
where Ci(,r) = x' logr
The function f, is defined similarly. The weights wi control a non-linear
term, and the a,, control an additional linear term.
Each training set pair (an image plus its illumination chromaticity) provides
2 equations. For the th training image we have
In addition, a smoothness constraint is imposed by minimizing the bending
energy. In the original TPS formulation [59], the bending energy function was
defined in 2D. Here we generalize it to higher dimensions defined as a, (i=
1.. 128)
whereJ(j;.) is the total bending energy described in terms of the curvature of f . .
Following [60-621, the energy will be minimized when
For each of f, and f,, we have (N+129) equations in (N+129) linear
unknowns. Hence, the TPS parameters can be uniquely determined using matrix
operations. Define L, W, K, Q and Gas follows:
The (N+129) equations can then be written K=LW, and the solution can
be obtained as W = L-'K.
4.4 Illumination Colour Estimation by Gray Surface Identification
Although not the most accurate, one of the simplest and quite widely used
methods is the gray world algorithm [I]. Borrowing on some of the strengths and
simplicity of the gray world algorithm, we introduce a modification of it that
significantly improves on its performance while adding little to its complexity.
The standard gray world algorithm is based on the assumption that the
average colour in a scene is gray so that when an image's colours are averaged,
any departure from gray reflects the colour of the scene illumination. The
proposed extension first identifies colours that are likely to be from truly gray
surfaces and then averages only those colours. The trick is in the identification of
gray surfaces. Note that we must make a distinction between the colour of the
surface as it would appear under a white light and the image colour of that same
surface under the unknown scene illumination. We can not simply average gray
image colours since that would tell us nothing other than gray colours are gray.
To find the surfaces that are gray, but do not necessarily appear gray in the
image because of the effect of the illumination, we use a new colour coordinate
system which encodes illumination and surface reflectance along different axes.
4.4.1 LIS Colour Coordinates
The goal of the new colour coordinate system is to represent the 3
components of a colour in terms of the underlying physical components that
generated the colour, in particular, luminance/intensity, incident illumination
colour, and the surface reflectance colour. [64] Of course, this goal cannot
actually be met without additional information, but it can be approximated to a
useful extent. Since the coordinates represent intensity, illumination and surface
reflectance as separate dimensions, we refer to them as LIS coordinates.
Experiments with this new LIS channels showed that points in the new
coordinate system with the reflectance coordinate of zero were gray. They are
not just gray in RGB image space, but they represent gray surface colours
because they are in the reflectance space. To the extent that the new reflectance
coordinate truly is independent of the illumination, this means that we can identify
gray surfaces in an image independent of whether or not they have R=G=B. The
strategy for the new method, therefore, is to use the new LIS coordinates to
identify gray surfaces in the image, and then use these grays to estimate the
illuminant colour. For this final step, we convert back to the original colour space
of the image and average the chromaticities of the grays. We call this method
GSI (gray surface identification)
The derivation of the LIS coordinates exploits the assumption that
illuminants are a I-parameter family of functions. We model the RGB sensor
response in the standard way as equation (3) introduced in chapter 2
If the sensor sensitivities O(h) are narrow band, they can be modelled as
Dirac delta functions and (3) reduces to,
p, = E(/Z, )SR(/Z, ) (22)
Following [64], let us further suppose that the illumination can be
approximated as a blackbody radiator described by Planck's law,
I is the power of illumination, T is the blackbody radiator temperature, and
the constants cl and cp are 3.741 83*1 o ' ' ~ w ~ ~ and 1.4388*1 o - ~ ~ K , respectively.
Equation (23) becomes
Taking logarithms, we have
C' log(pk ) = log I + log(SR(A, )) -I-+ log(cl/2,')
7-4 (25)
Equation (24) imposes a constraint such that the log of camera responses
[log(R), log(G), log(B)] for a given surface reflectance are confined to a plane. To
see this, let n, = c', 11, and tn, = IO~(C,A;~)+ log(SR(il,)) and substitute into the
equations for log(k). After combining the equations and eliminating the terms log
I and IIT, we obtain
I 1 (I+---- ' I B ) I O ~ R - - 1711 - n t l log G - log B = 1 1 1 , + I I , , I ? I ~ , - I I , ~ I N , + nUmM
n(, - 1 1 ~ n ( , - ) I! , I I ~ , - l l t l I I ( , - r 7 , (26)
Since the nk are fixed by the choice of camera sensitivity, and mk are fixed
by the choice of camera and surface reflectance, for any given surface
reflectance, varying the illumination's colour temperature, T, or its intensity, I,
causes [log(R), log(G), log(B)] to move within a plane. For a different surface
reflectance, a parallel plane is generated. The axis perpendicular to these
parallel planes becomes the reflectance axis of the LIS coordinates.
Figure 5 shows the planes formed by 3 sample surfaces under the 102
illuminant spectra from the Simon Fraser University [65] database at 15 different
intensities each. Although these 102 illuminants are not specifically blackbody
radiators, but common light sources found around a university campus, the
planar model works well. PCA (principal component analysis) determines the
plane and establishes that the first 2 dimensions explain 99.1% percent of the
variance. The PCA axes define the new colour coordinate system, the third axis
corresponds to surface reflectance.
Figure 5 (Log R, Log G, Log B) obtained from three different surface reflectances under 102 illuminations at 15 various intensities. Each surface is plotted with a different colour. Each set lies close to a plane and the planes corresponding to the different surfaces are parallel. The three coloured lines indicate the new coordinate system in the log domain
4.4.2 GSI Implementation
The first issue in terms of implementing the GSI colour constancy
algorithm is that the LIS system is camera dependant and must be determined
for the camera being used. There are two methods to do this depending on
whether or not the camera's spectral sensitivity response functions are known. If
they are known, then they can be used to calculate camera responses for
spectra synthesized as the product of illuminant and reflectance spectra chosen
from a database of spectra. If the camera's spectral sensitivity curves are not
known, then real values can be obtained by using the camera to take images of a
gray card under several different illuminants. PCA is then applied to the
logarithms of RGBs from the gray card. The vector corresponding to the maximal
eigenvalue forms the intensity axis, the next vector forms the illumination axes,
and the vector corresponding to the least eigenvalue is the surface reflectance
axis. We have conducted an experiment with Barnard's [65] 321 images
captured using a calibrated SONY DXC-930 camera. These images are from 33
different scenes under 11 different lights that represent a cross-section of
common lights. Since the spectral sensitivity functions of the camera are known
and the calibration images are available on the Internet [65], this data set
provides a means of comparing LIS coordinates extracted based on synthetic
versus on real data. For the synthetic case, we synthesize RGB values for the
measured percent spectral reflectance of 24 Macbeth Colour Checker patches
and the spectral power distributions of 102 illuminants at 15 different intensities
values. Applying PCA to this data, we find the LIS axes as row vectors:
[0.5994 0.5871 0.54411, [0.6421 0.0482 -0.76511, [0.4729 -0.81 32 0.33581.
To compute the LIS coordinates from real data, we have the RGB values
from the gray card under the 11 different illuminants The LIS axes are:
Clearly, the two methods produce very similar results. The advantage of
the real data method is that it is much easier to collect images of a gray card
under a dozen or so different illuminants than it is to determine a camera's
spectral sensitivity functions.
To estimate the illumination for an image of N pixels [Ri,GilBi], each pixel
is first classified as to whether or not it belongs to the class of gray pixels. To
classify a pixel, the logarithm of each channel is taken producing
[logR,,logG,,logBj] which is then projected onto the S axis of the LIS coordinate
system via vector inner product. If the resulting value is less than a specified
threshold value then the pixel is classified as gray.
The GSI method estimates the colour [Re,Ge,B,] of an image's illumination
according to
$ isgru.y([Ri, Gi , B, I) then w, = 1 else W; = 0
where 'isgray' is the test that classifies pixels as gray or not.
An example of the GSI method is shown in Figure 6. The isgray test
identifies as gray those pixels from Figure 6(a) that are shown in white in Figure
6(b)). The true scene illumination as measured from a gray card is [0.2476,
0.2910, 0.46141. The standard gray world method averages the RGBs of all
pixels so that the estimated illumination is found to be [0.4748, 0.2348, 0.29031.
Figure 6(c) shows the result of colour balancing the colours using this gray world
estimate. The GSI method, however, averages only the RGB of pixels that pass
the isgray test with the result that the illumination is estimated to be l0.2810,
0.3290, 0.38991. Clearly, this latter estimate is much closer to the true value.
Figure 6(d) shows the result of colour balancing based on the GSI illumination
estimate. Although this example shows the potential of the GSI method, rigorous
tests are presented in the next section. Those tests also show that, although the
LIS coordinate system assumes the use of sharp of sensors, for images obtained
from a camera with somewhat broadband sensors. GSI also works well.
Figure 6 (a) Input image; (b) pixels identified as gray are shown in white corresponding to (a); (c) Corrected lmage based on GSI illumination estimation (d) Corrected lmage based on GW illumination estimation
4.5 Experiments
We tested the proposed three illumination estimation methods on both
synthetic and real images. The implementation of SVR is based on the
implementation by Chang and Lin [55]. To this we added a Matlab interface that
reads data files representing the image histograms and associated illumination
chromaticities. Each row in the training data file represents one training image
and consists of two parts: the true illumination chromaticity followed by the bin
number for each non-zero histogram bin. TPS and GSI method were
implemented in MATLAB 7.0.1 [66]
Barnard et. al. [ l 4 , l5 ] reported tests of several illumination estimation
methods, including neural-network and colour by correlation based on binary
histogram. For SVR, We have tried to follow their experimental procedure as
closely as possible and used the same image data so that it can be compared
fairly to these other methods. For TPS, the original image will be averaged into
8*8 thumbnail images and changed into chromaticity space as input. In addition,
we compare all of these methods to the "shades of gray" (SoG) technique [2]
based on the Minkowski family of norms, Max RGB and Grayworld.
4.5.1 Error Measures
Several different error measures are used to evaluate performance. The
first is the distance between the actual (ra,ga) and estimated chromaticity of the
illuminant (re,ge) as [14,15]:
For the distance error, we also compute the root mean square (RMS),
mean, and median errors over a set of N test images. It has been argued that the
median is the most appropriate metric for evaluating colour constancy[67]. The
standard RMS is defined as:
The second error measure is the angular error between the chromaticity 3-
vectors when the b-chromaticity component is included. Given rand g, b = 1 - r -
g. Thus, we can view the real illumination and estimated illumination as two
<r,g,b> vectors in 3D chromaticity space and calculate the angle between them.
The angular error represented in degrees is:
We also compute the RMS, mean, and median angular error over a set of
images.
Even if the median angular error for one method is less than for another,
the difference may not be statistically significant. To evaluate whether a
difference is significant, we use the Wilcoxon signed-rank test based on the
angular errors[67]. In the following experiments, the error rate for accepting or
rejecting null hypothesis is always set to 0.01.
4.5.2 Synthetic Data Training, Real-Data Testing
The first tests are based on training with synthesized image data
constructed using the 102 illuminant spectra and 1995 reflectances described by
Barnard [I51 along with the sensor sensitivity functions of the calibrated SONY
DXC-930 CCD[68]. Testing is based on Barnard1s[14] 321 real images taken with
the SONY DXC-930 of 30 scenes under 11 different light sources. These images
are linear (a gamma of 1.0) with respect to scene intensity. This data is available
on-line from the Simon Fraser University Computational Vision Laboratory colour
image database[69].
The number of distinct synthesized training 'scenes' was varied from 8 to
1024 in order to study the effect of training size on performance. Each synthetic
scene was 'lit' by each of the 102 illuminants in turn to create 102 images of each
scene.
For SVR, all of these synthesized camera RGB values, their
corresponding chromaticities, and the illuminant chromaticity are mapped to 2D
and 3D binary vectors for input to SVR. Table 2 shows that the parameters vary
with the training set as expected. Although the basis function type was allowed to
vary during the cross-validation, the RBF was eventually selected in all cases.
Training Set Size I1 02
8
16
32
Table 2 Results of k-fold kernel and parameter selection as a function of the histogram type and the number of training set images in SVR solutions
Histogram Dimension
2D 3D 2D 3D 2D 3D
64
128
256
51 2
1024
Figure 7 shows how these three methods performance initially improves
2D 3D 2 D 3D
RBF RBF RBF RBF
as the size of the synthetic training set increases.
Kernel Selected
RBF RBF RBF RBF RBF RBF
2 D 3D 2D 3 D 2D 3D
Figure 7 Median angular error in illumination chromaticity as a function of increasing training set size
1 0.1 0.01 1
Illumination estimations by SVR, TPS and GSI compare favourably to the
C
0.01 0.01 1 1 0.1 0.1
0.05 0.1 0.025 0.2
RBF RBF RBF RBF RBF RBF
methods Barnard tested [57] and Finlayson reported [67] as shown below in
Y
0.2 0.2 0.1 0.05 0.05 0.025
0.01 0.1 0.01 10 0.01 1
0.1 0.05 0.1 0.025 0.05 0.2
Table 3. The RMS and median errors for Colour By Correlation with Binary
Histogram (CCOI), Colour By Correlation with Maximum Likelihood (CCMAP),
Colour By Correlation with Mean Likelihood (CCMMSE), Colour By Correlation
(CCMLM), the Neural Network(NN), Database Grayworld(DB), Gamut
Mapping(GM) are from Table ll, page 992 of [ I 51 and [Table 2 of 561 respectively.
Method I RMS Dist I RMS Angle 1 Median Angle
GS I
I CCMMSE 1 0.072 1 9.9' 1 I
3.01 TPS
3D SVR CCOl CCMAP
0.071
Table 3 Comparison of competing illumination estimation methods. All methods are trained on synthetic images constructed from the same reflectance and illuminant spectra and then tested on the same SONY DXC930 [I51 camera images with identical pre- processing. Data marked by ' * ' are extracted from 1311 (Table II page 992) while the data marked by I * * ' are extracted from [67] (Table 2 page 79).
0.082
0.067 0.081 0.071
CCMLM Neural Network
4.5.3 Real Image Data Training, Real-Data Testing
9.907 10.09
2D SVR.
Training on synthetic image data is convenient because large training sets
3.74 0.080 10.1
8.1 10.9' 9.9' - -
can be calculated from existing databases of illuminant and reflectance spectra.
4.86 3.17
2.93"
0.072
The disadvantage of synthetic data is that it requires an accurate model of the
9.9'
camera and imaging process. On the other hand, creating a training set of real
images is difficult because for each image the scene illumination must be
0.070
measured.
9 5'
Our testing with real data is based on three image data sets. To begin, we
train and test on Barnard's [I41 set of 321 SONY images and find that training
with real data is in fact better than training with synthetic data. We continue with
tests on Cardei1s[4] set of 900 images from assorted cameras. Finally, we train
using the 11,346 image set that Ciurea et. al. [70] built using a digital video
camera. This very large, real-data training set improves overall performance.
Experiments with 321 SONY Real images
When SVR is applied, the kernel and parameters were selected based on
the '1024' row of Table 2; namely, for 3-D, the radial basis function kernel with
shape parameter i / r 0.2 and penalty value C = 1, while in 2-0, these two
parameters are set to 0.05 and 0.01 respectively.
Since it would be biased to train and test on the same set of images, we
evaluate the illumination error using leave-one-out cross validation
procedure[71]. In the leave-one-out procedure, one image is selected for testing
and the remaining 320 images are used for training to find the support vectors
and weight parameters in SVR and TPS respectively. In the case of GSI, the
training consists of choosing the optimal threshold minimizing the median angular
error over the training set. This is repeated 321 times, leaving a different image
out of the training set each time, and the RMS and median of the 321 resulting
illumination estimation errors are calculated. The errors are significantly lower
than those obtained with synthetic training data. The results and their comparison
to the Shades of Gray(SoG)[2], Max RGB(MAX)[32], and Grayworld(GW)[I] are
listed in Table 4. Table 5 tells us that 3D SVR has the best performance.
Table 4 Comparison of TPS, GSI, 2D and 3D SVR performance to SoG, Max RGB, Grayworld performance. The results involve real-data training and testing on the 321 SONY images. Errors are based on leave-one-out cross validation evaluation and are reported in terms of both the RMS angular chromaticity and distance error measures.
~ethod-SVR
TPS GSI
SVR
SoG Max RGB GW
Table 5 Comparison of the different algorithms via the Wilcoxon signed-rank test based on angular errors. A '+I means the algorithm listed in the corresponding the row is better than the one in corresponding column; a '-' indicates the opposite; an '=' indicates that the performance of the respective algorithms is not statistically different.
Median Angle
0.64 3.91 4.65 2.17 3.97 6.44
Dimension1 Norm Power
2D 3D 6
Experiments with Uncalibrated 900 Real Images
We next consider Cardei1s[4] set of 900 uncalibrated images taken using a
7.04
variety of different digital cameras from Kodak, Olympus, HP, Fuji Polaroid, PDC,
RMS Angle
2.10 10.11 10.06 8.069 9.027 12.28
Canon, Ricoh and Toshiba. A gray card was placed in each scene and its RGB
13.58
value is used as the measure of the scene illumination. The SVR was based on a
Max Angle
14.43 33.79 22.99 24.66 28.70 36.24
polynomial kernel function of degree 3 and 0.1 penalty. Its performance is also
37.31
compared to the performance reported by Cardei[4] for Colour by Correlation, the
Median Dist ( x loZ) 0.53 2.71 3.41 3.07 2.83 4.46
Neural Network, Shades of Grey, Max RGB, and Grayworld.
5.68
RMS Dist ( x lo2) 1.55 7.1 5 7.5 6.3 6.21 8.25
Max Dist ( x lo2)
1 0.42 22.65 16.41 16.03 19.77 25.01
11.12 35.38
Because the prerequisite of GSI is that all images should be from the
same camera, we conduct leave-one-out tests only for SVR and TPS as before.
And since SVR uses the binary histogram as the input, to increase the training
data set size, we can use a kind of histogram resampling strategy proposed by
Cardei[4] in the context of neural network training. Cardei et. al. observed that
each a histogram in the original training set could be used to generate many new
training histograms by random sampling of its non-zero bins. Each sampling
yields a new histogram of an 'image' with the same illuminant chromaticity as the
original. The number of possible sub-samplings is large, thereby making it
possible to build a large training set based on real data extracted from a smaller
number of images. Hence, in this additional leave-one-out test for SVR, when
we select an image for testing, we create a training set of 10,788 histograms
from the remaining 899 real images and then measure the error in the SVR
illuminant estimate for that one image. This process is repeated 900 times.
The RMS and median of the 900 errors are tabulated in Table 6. Table 7
summarizes the Wilcoxon test among several of the algorithms. It also shows
that on this training and test set, resampling the training set does not significantly
change 3D SVR's performance.
Method / DimensionlN I Median I RMS I Max I Median I Mean I RMS I Max
TPS(rg)
SVR (no resampling)
orm Power
2
2D
3D 1 2.02 1 3.94 1 17.46 1 1.40 1 2.09 1 2.94 1 15.42
SVR(with resampling)
C-by-C
Table 6 Comparison of TPS, GSI, 2D and 3D SVR performance to SoG, Max RGB, Grayworld performance. The results involve real-data training and testing on the 900 uncalibrated images. The tests are based on leave-one-out cross-validation on a database of 900 uncalibrated images. The entries for C-by-C and the NN are from [4] (Table 7 page 2385).
NN
SoG
Max RGB
Angle
2.26
2.40
3D
2D
Table 7 Comparison of the performance via Wilcoxon signed-rank test based on angular errors. Labelling I+','-','=' as for Table 5
2D
6
Experiments with Large Real Image Set
Our final test with real data is based on the 11,346 real images extracted
from over 2 hours of digital video acquired with a SONY VX-2000. Ciurea et.
a1.[70] built the database by partially automating the measurement of the
illumination's RGB. Their setup consisted of a matte gray ball connected by a rod
Angle
3.86
4.47
2.07
TPS 2D SVR 3D SVR 3D SVR (with resampling) SoG (norm power = 6) MAXRGB GW
3.02
2.96
TPS
+ +
-
-
Angle
22.23
20.43
3.91
2D SVR +
+ +
-
4.99
6.39
MAX RGB + + + +
- -
Dist o(io2)
1.72
1.74
10.57
3D SVR
- -
-
GW
+ + + +
+
+
-
19.71
27.16
Dist
pi01 2.22
2.40
1.55
3DSVR (with resampling)
- -
-
-
2.19
2.17
SoG (norm power = 6) + + + +
- -
Dist W o 2 )
2.92
3.27
2.03
2.92
Dist W o 2 )
18.29
18.40
2.26
2.96
3.36
2.72
3.89
6.42
-
2.76
3.80
4.75
-
15.96
22.79
attached to the camera. In this way, the gray ball was made to appear at a fixed
location at the edge of each video frame. The ball's pixels were thus easy to
locate in each frame. Because the automatic white balancing function of the
digital camreconder was turned off, the chromaticity of the dominant illumination
hitting the ball was easily measured as the average chromaticity of the pixels
located in the ball's brightest region. The images include a wide variety of indoor
and outdoor scenes, including many with people in them.
In terms of SVR, based on some initial experimentation, the RBF kernel
function was chosen with 0.1 as the penalty parameter and 0.025 as the width
parameter. All subsequent tests of SVR on the Ciurea database are based on
these settings. Regarding GSI, since the camera was uncalibrated, so we used
the real data method to calculate the LIS coordinates for it based on RGBs
values from the gray ball.
The original image database includes 11,346 images. However, many of
these images have very good colour balance (i.e., RGB values of the gray ball is
gray) which could bias the testing of the illumination estimation methods.
Therefore, we eliminated from the data set the majority of the correctly balanced
images so that the overall distribution of the illumination colour is more uniform,
as can be seen in Figure 8. The resulting data set contains 7,661 images.
Figure 8 (a) The original data set contains 11346 images, but the illumination chromaticities cluster around gray (0.33, 0.33). (b) The reduced data set contains 7661 images with a more uniform distribution of illumination chromaticity.
As shown in Figure 9, the images are cropped to remove the gray ball,
which is located at a fixed location in the lower right quadrant. The resulting
image size is 240 * 240.
Figure 9 (a) Original image containing the gray ball from which the colour of the scene illumination is determined. (b) Cropped image to be used for algorithm testing with gray ball removed.
The size of the database means that leave-one-out validation is not
feasible, although leave-N-out for a reasonable choice of N would be possible. In
any case, it would not necessarily be a fair test because of the inherent
regularities in the database. Since the database was constructed from a 3-frame-
per-second sampling of video clips, neighbouring images in the database tend to
be related. Hence, to ensure that the training and testing sets would be truly
distinct, we partitioned the database into two sets.
The partitioning is based on geographical location. Subset A includes
3581 images, and subset B includes 4080. Subset A contains images from
Apache Trail, Burnaby Mountain, Camelback Mountain, CIC 2002 and Deer
Lake. Subset B contains images from different locations: False Creek, Granville
Island Market, Marine, Metrotown shopping center, Scottsdale, Simon Fraser
University and Whiteclyff Park. We then used A for training and B testing and
vice versa. The results are again listed in Table 8. Tables 9 and 10 give the
Wilcoxon sign test results for this case. The combined errors and corresponding
Wilcoxon sign test result from both tests are shown in Table 11 and 12.
Method
i orm = 6) - -
TPS GSI 3D SVR SoG (norm = 6) MAX RGB GW TPS(rg)
Table 8 TPS, GSI and 3D SVR illumination estimation errors for different training and test sets with comparisons to the SoG with norm power 6, Max RGB, and Grayworld methods.
Training and Test
Angular Degrees I Distancefx I@) Median 1 RMS 1 Max I Median 1 RMS 1 Max
Sets
Train: SubsetA
Test: Subset B
4.52 4.55 4,53 6.71 10.33 6.83
Train: GSI - Subset B
4.58 6.69 -
9.23 7.83
3D SVf -
SoG (n - MAX RGB GW
7.02 7.46 6.76 8.93 12.81 9.66
Test: Subset A
6.83 8.47 7.32 8.92 11.32 10.66
34.81 38.71 24.55 37.01 27.42 43.84 27.62 30.38 24.80 37.01 26.76 43.84
3.37 3.43 4.1 1 4.83 6.99 5.25 3.31 5.12 3.91 4.83 6.76 6.25
5.19 5.71 5.03 6.59 9.14 7.82
25.78 30.26 18.62 27.99 21.72 45.09
4.99 6.78 5.29 6.59 8.39 8.81
29.37 31 -93 16.68 27.99 21.55 45.09
Table 10 Comparison of the algorithms via Wilcoxon signed-rank test on angular error. SVR training set is Subset B. Test set for all methods is Subset A. Labelling I+', I-', '=' as for Table 5.
Table 9 Comparison of the algorithms via Wilcoxon signed-rank test on angular error. SVR training set is Subset B. Test set for all methods is Subset A. Labeling '+', '-', '=' as for Table 5.
MAX
+ + +
+
Method
TPS GS I 3D SVR SOG (norm power = 6) MAX GW
Method
TPS GS I 3D SVR SoG (norm power = 6) MAX GW
GW
+ + + -
GSI
- -
- - -
-
TPS
- - - - =
Method
TPS
TPS
- - -
-
-
Table 11 Comparison of TPS,GSI and 3D SVR to SoG, Max RGB, and Grayworld. The results involve real-data training and testing on disjoint sets of 7,661 images from the Ciurea data set.
3D SVR
- - - -
-
GSI
3D SVR
SoG MAX RGB GW
SoG (norm power = 6)
+ + +
-
GSI
+
+ - -
-
Distance(x ld) Median 1 RMS 1 Max 3.35 1 5.09 1 25.78
Angular Degrees
7.95
7.03
5.46
4.91
Median 4.56
6.71 9.65 6.82
3DSVR
- -
-
RMS 1 Max 6.93 1 34.18
38.71
24.80 8.93 12.13 9.66
SoG (norm power = 6) + - - +
-
4.15
3.62 37.01 27.42 43.84
MAX
+ + + +
- -
6.23
5.16
GW
+ + + +
- -
31.93
18.62 4.83 6.86 5.25
6.59 8.80 7.82
27.99 21.72 45.09
Table 12 Comparison of the algorithms via Wilcoxon signed-rank test on angular error. The results involve real-data training and testing on disjoint sets of 7,661 images from the Ciurea data set. Test set for all methods is Subset A. Labelling I+', I-', '=' as for Table 5.
4.6 Discussion
Three colour constancy methods are proposed in this chapter. Many
MAX
+ + +
Method
TPS GSI 3D SVR SoG (norm power = 6) MAX GW
previous methods of estimating the chromaticity of the scene illumination have
GW
+ + + - -
+
3D SVR
- -
been based in one way or another on statistics of the RGB colours arising in an
SoG (norm power = 6) + + +
+ - -
TPS
- - -
image, independent of their spatial location or frequency of occurrence in the
GSI
+
+
-
image. Support vector regression is a relatively new tool developed primarily for
machine learning that can be applied in a similar way. We have tried it here, with
good results, to the problem of learning the association between colour
histograms and illumination chromaticity.
The problem of estimating the chromaticity of the overall scene
illumination is also formulated in terms of interpolation over a non-uniformly
sampled data set. The chromaticity is viewed as a function of the image and the
set of training images is non-uniformly spaced. Thin-plate spline interpolation is
an excellent interpolation technique for these conditions and has been shown to
work well for illumination estimation in particular. TPS calculates its result based
on a weighted combination of the entire set of training data. Hence, for efficiency
it is important to keep that set as small as possible. Another non-statistical
solution, GSI, is finally proposed that is based on detecting pixels corresponding
to gray surface reflectance--- which is not necessarily the same as gray image
colour--- and using their average image colour as an indicator of the colour of the
overall scene illumination. The gray surfaces are found by first transforming the
image RGB values to a new LIS coordinate system with axes that roughly
correspond to luminance, illumination and reflectance. In LIS coordinates, values
of S near zero tend to be gray. The major advantage of GIs is much faster, does
not require training, and is substantially simpler to implement.
Under almost the same experimentation conditions as those used by
Barnard [ l4 , l5 ] , tests of the shades-of-grey, neural-network, colour-by-
correlation, Max RGB, and Grayworld methods, the experiments show that these
three methods' performance generally is comparable to or better than these other
methods.
All of colour constancy solutions, including the three new ones introduced
in the previous chapter, always assume that there is only one illumination
incident upon the scene. However, this is not true for most typical scenes. The
goal of this chapter is to conduct research on colour constancy under two or
more illuminations. To achieve the goal, the Retinex algorithm for lightness and
colour constancy is extended to include 3-dimensional spatial information
reconstructed from a stereo image. A key aspect of traditional Retinex is that,
within each colour channel, it makes local spatial comparisons of intensity. In
particular, intensity ratios are computed between neighbouring spatial locations.
Retinex assumes that a large ratio indicates a change in surface reflectance, not
a change in incident illumination; however, this assumption is often violated in 3-
dimensional scenes, where an abrupt change in surface orientation can lead to a
significant change in illumination. In this chapter, Retinex is modified to use the
3-dimensional edge information derived from stereo images. The edge map is
used so that spatial comparisons are only made between locations lying on
approximately the same plane in 3-dimensions. Experiments on real images
show this method works well, however, they also reveal that it can lead to
isolated regions, which, as a result of being isolated, are incorrectly determined
1 The chapter also appears as two published papers: Weihua Xiong and Brian Funt, "Stereo Retinex", Third Canadian Conference on Computer and Robot Vision, (Best Vision Paper Award) Quebec, June 2006; Brian Funt and Weihua Xiong, "Colour Space for Stereo Retinex", Third International Conference on Colour in Graphics, Imaging, and Vision, Leeds, June 2006
to be grey. To overcome this problem, stereo Retinex is extended to allow
information that is orthogonal to the space of possible illuminants to propagate
across changes in surface orientation. This is accomplished by transforming the
original RGB image data into the LIS colour space introduced in the last chapter.
This coordinate system allows stereo Retinex to propagate reflectance
information across changes in surface orientation, while at the same time
inhibiting the propagation of potentially invalid illumination information. The
stereo Retinex algorithm builds upon the multi-resolution implementation of
Retinex known as McCann99. Experiments on synthetic and real images show
that stereo Retinex performs significantly better than unmodified McCann99
Retinex when evaluated in terms of the accuracy with which correct surface
object colours are estimated.
5.1 Introduction
Although it is well established that for human subjects, a surface's
perceived spatial location affects the perception of its lightness and colour
[37,41], many machine colour constancy models [l-5,28-361 make no use of 3-
dimensional spatial information. In fact, many of the methods are based on
binarized colour histograms, which discard all the images' spatial structure, and
rely instead on statistical properties of the colour distribution in order to
determine the colour of the scene illuminant. Although these methods work quite
well [14,15], they all assume implicitly that there is a single scene illuminant or a
single adapted illuminant in the visual system. However, multiple illuminants are
common in typical scenes. Outdoors, for example, shadowed areas are not only
darker, but much bluer, than those in the sun, because the sky's light is bluer
than the sun's.
In this chapter, we extend Retinex to take advantage of 3-dimensional
distance information extracted from stereo imagery. In particular, since an abrupt
change in surface orientation may lead to an abrupt change in the incident
illumination as, for example, occurs due to self-shadowing, Retinex is modified so
that its computation does not cross edges in the depth map. In this way, it can
provide lightness/colour estimates for different parts of the scene that may be
illuminated differently.
Although this modification of Retinex does ameliorate many of problems
that arise in multi-illuminant scenes, the processing has a tendency to result in
isolated grey areas. This problem arises especially for surfaces of uniform colour
that are completely isolated from other surfaces by a change in surface
orientation. Retinex normalizes to white, so any completely isolated single colour
will always be made white (or grey after subsequent intensity adjustment). To
overcome this problem, a new colour coordinate system is derived with axes
representing variation in illumination colour, intensity, and object reflectance.
Retinex is applied separately to each of these new colour channels and the result
is then transformed back to the original colour coordinates. The new coordinate
system allows stereo Retinex to propagate reflectance information across
changes in surface orientation, while at the same time inhibiting the propagation
of potentially invalid illumination information.
Tests on synthetic and real images show that the modified, depth-aware
stereo Retinex method outperforms the original Retinex method in terms of the
accuracy with which the true scene surface colours are estimated. Accurate
estimation of scene colours under uncontrolled illumination conditions is
important in many computer vision applications.
5.2 Background
Retinex has a long history beginning with an early paper by Land [22] and
there are many variations on the original Retinex algorithm. We have introduced
them in chapter 3. The basic principles of Retinex are: (i) colour is obtained from
3 'lightnesses' computed separately for each of the colour channels; (ii) the ratios
of intensities from neighbouring locations are assumed to be illumination
invariant; (iii) lightness in a given channel is computed over large regions based
on combining evidence from local ratios; (iv) the location with the highest
lightness in each channel is assumed to have 100% reflectance within that
channel's band. Lightness refers to the perceived (in the case of human
perception), or estimated (in the case of computational methods) surface albedo
(reflectance averaged over the channel's band).
All of the Retinex variants treat the input image as a spatial arrangement
of colours and make no use of the 3-dimensional structure of the underlying
scene. However, there are a number of psychophysical experiments indicating
that the human lightness and colour perception are influenced by information
from several sources, including 3-dimensional scene geometry. In particular,
Gilchrist's early experiments [37] showed that, in the black and white scenes,
changing a surface's apparent 3-dimensional context affected the perception of
its lightness. Gilchrist writes, "The central conclusion of this research is that
perceived surface lightness depends on ratios between regions perceived to lie
next to one another in the same plane."[37]. The extension to Retinex proposed
here uses ratios between regions lying next to one another and, furthermore,
specifically excludes ratios from neighbouring regions lying in different planes. In
experiments using computer graphics rendered 3-dimensional scenes, Boyaci
et.al. [40] provided further evidence for the relationship between perceived
orientation and the perceived lightness of matte surfaces. Yamauchi and
Uchikawa 1361 used stereoscopic stimuli to support the notion that surface colour
perception is strongly influenced by depth information. Bloj et.al. 1721 illustrated
the effect of spatial shape on chromatic recognition. Yang and Shevell 1391 show
that binocular disparity can improve colour constancy. Adelson 1731 argues that
statistical properties and spatial arrangement in three dimensions are combined
for lightness perception.
Since there is plenty of psychophysical evidence indicating a connection
between a surface's spatial properties in 3-dimensions and its perceived
lightness and colour properties, the question is how to include the spatial
information into a colour constancy model? We investigate how it can be
incorporated into the Retinex model in particular, and show that spatial
information does improve its colour constancy performance significantly.
5.3 Stereo Retinex Basics
Since we begin with the multi-resolution version of the Retinex algorithm,
known as McCann99 [23], and extend it to include 3D spatial information, we
briefly describe the original algorithm. McCann99 is a multi-resolution technique
which involves the standard pyramid of decreasing resolution. The computation
starts at the top of the pyramid with a ratio-product-reset-average process that
involves local comparisons between each pixel and its immediate neighbors. The
procedure is iterative so that a pixel's lightness estimate is updated based on its
current lightness estimate in conjunction with its intensity ratios with respect to its
neighbors. After a fixed, but user-selectable, number of iterations, the lightness
estimates are propagated down a layer where the computation is continued, then
propagated further.
We use a stereo image to calculate a depth map registered with the image
data. Details of the camera setup, calibration and stereo-correspondence
algorithm will be described in the Experiments section; however, any standard
stereo-reconstruction algorithm could be used. Edges in the depth map are then
detected using a modified version of the method proposed by Gelautz et.al [74].
These edges represent sharp changes in surface orientation, or depth
discontinuities such as those created by occlusion.
The depth edges are the key factor in controlling the spatial comparisons
made during the Retinex computation. Traditional Retinex compares a pixel to all
its neighbors. In this case, the implicit assumption is that a large change in
intensity between pixels arises from a change in surface reflectance, while a
small change arises from a gradual change in illumination. However, in 3-
dimensions an abrupt change in surface orientation can also mean that there is
an abrupt change in the incident illumination, since the differently oriented parts
of the surface may be pointed towards different light sources. Similarly, depth
discontinuities imply that there are two separate surfaces, which may, of course,
be illuminated differently. As Gilchrist [37] pointed out, the only spatial
comparisons between neighboring locations on the same locally planar surface
should be used. With the additional information about the location of depth edges
derived from stereo, the proposed stereo Retinex method only makes
comparison between pixels that do not cross a depth edge. Although this is
conceptually simple, the computation requires some organization, especially to
accommodate the multi-resolution aspect of McCann99 Retinex.
Since McCann99 Retinex compares values at neighboring pixels and
averages lightness estimates from them as well, what is required is an efficient
way to stop it making comparisons across depth edges. This is accomplished by
first constructing separate maps for vertical and horizontal edges elements. This
division makes it easier to propagate the edges up to lower resolution levels of
the multi-resolution pyramid. Once the edge information is propagated through
the pyramid, a bit-mask is used to encode the subset of the immediate a pixel's 8
neighbors that are all on the same side of any edges. As McCann99 iterates, it
simply uses the bit-mask encoding to determine which neighbors to visit. Details
are given below in the "Implementation Details" section.
5.4 Stereo Retinex in LIS Colour Coordinates
Figure 10 demonstrates a problem that can arise with stereo Retinex
when spatial edges isolate regions from one another. If all spatial comparison
across the edge is inhibited then the colour information will not propagate at all.
In this case, some areas will tend to become grey. This problem becomes
especially acute for surfaces of uniform colour that are completely isolated by
spatial edges. Because Retinex normalizes to white, any completely isolated
single colour will always become grey. The final result is grey, not white, because
in all figures below, a pixel's output intensity is made to match its input intensity.
The synthetic scene in Figure 10 is composed of two patches meeting at a sharp
angle. There is tungsten illumination illuminating the blue patch from left, while
D65 is illuminating the red patch from the right. For stereo Retinex, the spatial
edge between them isolates them from one another, so both turn grey.
/
Blue \
Red I Patch , Patch
(a> 41
(b)
Figure 10 (a) A synthetic scene composed of two patches. The blue one is lit by tungsten light from the left; the red one is lit by D65 from the right. (b) The image (monocular version) input to stereo Retinex. The red line is the spatial edge between them, inserted manually in this case. (c) Both patches appear gray after stereo Retinex because they are isolated surfaces.
To mitigate against this graying problem, we took advantage of the new
colour coordinate system, LIS introduced in the last chapter, that will allow
Retinex to pass information about surface reflectance across 3D orientation
changes while still inhibiting the exchange of possibly incorrect illumination
information. Since LIS can represent illumination change, reflectance and
luminance as an independent component as possible, the basic stereo Retinex
method described above is modified so that, at a 3D surface edge, information is
allowed to propagate within the channel representing surface reflectance, while it
continues to be inhibited within the illumination and intensity channels.
5.5 Implementation Details
The main difficulty in implementing stereo Retinex as a modification of the
McCann99 algorithm is in transmitting the spatial edge information from one level
of the multi-resolution pyramid to the next. For convenience, the edges found
from the stereo depth map are assumed to lie in between image pixels. To
propagate the edge information to the next lower resolution level in the pyramid,
the rewrite rules shown in Figure 11 are used. For a 2-by-2 group of pixels, if
they are all to one side of an edge then the edge is easily propagated to the next
level. For the case where a vertical edge runs through the group, it is randomly
assigned to pass on one side of the group or the other; or above or below it in the
case of a horizontal edge.
Case 1 No Edges Case 2 One Edge Element
a I3 Random
Case 3 Two Edge Elements
a bN Random
Figure 11 Rewrite rules using in propagating edge information to the next lower resolution. An edge running through the middle of a 2-by-2 region is randomly assigned to one side or the other. Vertical edges are shown here. Horizontal edges are treated analogously.
If there are any edges between a pixel and its neighbors, then it should
only make comparisons with a subset of those neighbors. This subset is
compactly represented by the 'on' bits in an 8-bit mask using 1 bit for each of a
pixel's 8 immediate neighbors. This strategy is useful for reducing the memory
requirements. Deciding whether or not an edge must be crossed to reach a
neighbor to the east, south, west, or north is straightforward because the edges
are either above or to the side of a pixel. For a diagonal neighbor, the one to the
northeast for example, an edge must be crossed if there are edges both to the
north and to the east. Together they surround the pixel's northeast corner
forming an edge as shown in Figure 12a. Similarly, an edge must be crossed to
reach either of the 2 shaded pixels in Figure 12b.
Figure 12 (a) From the center pixel, the three shaded pixels in the upper right can not be reached without crossing an edge. (b) The two pixels that can not be reached are shaded.
At each iteration, McCann99 compares each pixel to its neighbors and
averages the local lightness estimates. The algorithm is modified to use the 8-bit
neighbor mask to indicate what subset of the neighbors to use. The number of
'on' bits also indicates the number to divide by in the averaging step.
For stereo matching, we use a fast cross-correlation, rectangular sub-
regioning, and 3D maximum-surface techniques in a coarse-to-fine scheme [75].
However, noise in the image, as well as errors in calibration and rectification, can
lead to false matches being made that lead to errors in the depth map. To
improve the accuracy of detected spatial edges, we use the 'edge combination'
technique developed by Gelautz et-al [74]. We used their original method with the
exception of using Laplacian of Gaussian edge detection in place of Canny
detection, since for our purposes it seemed to give slightly better results.
5.6 Experiments
We implemented stereo Retinex in Matlab 7.0 by downloading and
modifying the McCann99 Matlab code available from the Simon Fraser University
Computational Vision Laboratory [65]. We then tested it on both synthetic and
real images. Retinex's performance is evaluated in terms of the accuracy with
which it estimates the chromaticity of surface colours as they would occur under
a canonical 'white' illumination.
Images were captured using a Kodak DCS460 single-lens reflex digital
camera. A "LORE0 3D lens in a cap" is attached in place of the standard lens so
that the camera records a stereo pair within a single image frame[76]. Camera
geometry calibration, image rectification and stereo matching were conducted
using standard procedures [69,71]. We use the stereo image to calculate a 3D
depth map and then detect edges in the depth map using a modified version of
the method proposed by Gelautz et.al [74].
We evaluate performance in terms of the distance between colours in rg-
chromaticity (r=R/(R+G+B), g=G/(R+G+B)) space, and in terms of the angle
between colours viewed as vectors in RGB space. These are given by the
following formulas, where subscript 'el indicates the result of Retinex processing,
and 'w' indicates the 'benchmark' colour under white light:
We report four basic statistical measures of the error distributions: mean,
median, RMS (root mean square) and Mmax. Mmax is the average value of the
largest p percent of the errors. Mmax is more stable with respect to presence of
an isolated extreme value than the simple maximum. In this paper, p is set to be
0.5 practically. Hordley et. al. [67] indicate that the median angular error is often
the most appropriate one to use when evaluating colour constancy. RMS of the
errors from N pixels is given by the standard formula:
5.6.1 Tests using synthetic images
Since stereo reconstruction and edge detection will be imperfect, one goal
of the synthetic-image tests is to determine how much undetected edges will
affect accuracy. It is also useful to compare the performance of stereo Retinex to
McCann99 Retinex in a controlled, noise-free environment, with ground-truth
data for the colours of the objects in the scene.
The synthetic images are constructed with a variable number of patches of
different reflectances selected from the 1995 available in the database described
by Barnard 1651. The illumination spectrum and sensor sensitivity functions 1681 of
a SONY DXC-930 3-CCD camera are used to derive the RGB for each patch.
First, a benchmark image is generated using equal-energy white illumination.
Second, using the same patch reflectances, the same synthetic scene is divided
into two parts. RGB1s for one part are synthesized using the spectrum of
tungsten light, and for the-other using D65 daylight. All the reflectance and
illuminant data were downloaded from the Simon Fraser University colour
database 1651. For the synthetic case, we do not synthesize stereo images, but
instead create the depth-edge map manually so that the number and extent of
leaks between the two differently illuminated parts of the image can be
controlled.
For the first experiment, we divided the image down the middle. We apply
stereo Retinex to the image once providing it a perfect edge map. The results are
shown visually in Figure 13 and tabulated numerically in Table 13.
Figure 13 Results for synthetic images containing only a single edge down the middle of the image. The illumination on the left half is tungsten, and on the right D65. The black line indicates the manually defined edge (a) Input image; (b) The benchmark image; (c) Standard McCann99 applied in log RGB space (d) Stereo Retinex applied using log RGB space (e) McCann99 result applied using the new LIS colour channels (f) Stereo Retinex applied using the new LIS colour channels with 3D edge information inhibiting propagation only within the illumination and intensity channels.
For the second experiment, the image is separated into 2 parts via an
irregular border. The irregular border tests the effectiveness of the propagation of
the edge information through the multi-resolution pyramid. The results are shown
in Figure 14 and Table 13.
Figure 14 Irregular boundary between the two regions. The edge separating the regions is defined manually. (a) Input image; (b) the benchmark image; (c) standard McCann99 applied in log RGB space (d) Stereo Retinex applied using log RGB Space (e) McCann99 result applied using the LIS channels (f) Stereo Retinex applied using the LIS channels.
Table 13 Performance comparison of the synthetic image cases from Figure 5 with straight edge boundary, and Figure 6 with an irregular edge boundary of SR LIS (stereo Retinex processed using LIS colour channels); SR (stereo Retinex processed using log RGB space) , M99 LIS McCann 99 Retinex processed in LIS new colour channels); and M99 (McCann99 Retinex processed using log RGB space).
5.6.2 Tests using Real images
We conducted two sets of experiments with real images. In the first, the
only objects in the scene were Macbeth Colour Checkers [78]. In the second,
other more typical objects were included. Although scenes such as a room with
tungsten light from a lamp along with daylight from a window are common, we
arranged a controlled 2-illuminant environment. Two tungsten lamps were used
with filters attached. One, with a blue filter, lit the scene from the left; the other,
with a red filter, lit the scene from the right.
The first scene consisted of two Macbeth Colour Checkers meeting at an
angle as shown Figure 15. The scene was then photographed in stereo. To
obtain a benchmark image, a white reflectance standard was introduced at the
side of the scene and then an additional image was taken using white light. The
RGB channels were then scaled in order to make the reflectance standard
perfectly white (i.e., R=G=B=255). Results are shown in Figure 15 and Table 14.
Figure 15 Comparison of standard Retinex to stereo Retinex both in log RGB and in LIS coordinates operating on the image of a simple scene lit with bluish light from the left and reddish light from the right. (a) Input image of a two-illurninant scene; (b) The white-point adjusted benchmark image; (c) Standard McCann99 applied in log RGB space; (d) Stereo Retinex applied using log RGB space; (e) McCann99 result applied to LIS new colour channels; (f) Stereo Retinex applied in new LIS colour channels with 3D edge information inhibiting propagation only within the illumination and intensity channels.
Table 14 Two-illuminant real image performance comparison of SR LIS (stereo Retinex processed using LIS colour channels), SR (stereo Retinex processed in log RGB space), M99 LIS McCann 99 Retinex processed in LIS new colour channels), and M99 (McCann99 Retinex processed in log RGB space).
Angular MMar I RMS 1 Mean ] Med 8.88 1 3.80 1 3.63 1 2.99 SR LIS
Distance (* lo2) lMMax 1 RMS 1 Mean 1 ~Med 8.5 1 1 2.98 1 2.77 1 2.68
Figure 16 Edge map and recovered illumination: (a) Edges representing abrupt changes in surface orientation extracted from the stereo image pair are marked in white; (b) Chromaticity of illumination as estimated by stereo Retinex in LIS colour channels correctly shows a sharp change in illumination where the surface orientation changes; (c) Illumination field recovered by McCann99 shows a much less distinct change in illumination.
The surface orientation edge in the previous scene is very distinct and
easily identified. To test how well stereo Retinex works in a less controlled
environment, we use the more complex scenes shown in Figures 17 and 18.
Again, Figure 17 has blue light from the right and red light from the left. As can
be seen from the white bust in the upper right, as well as the white button in the
lower left, stereo Retinex in log RGB (Fig. 17 (e)) is more successful at
eliminating the illumination variation than McCann99 (Fig. 17 (d)). Both methods
push the colours towards grey because Retinex normalizes colours relative to the
whitest surface within a local region. This leads to desaturation of the colours
when there is no nearby white surface. In the case of stereo Retinex, this
problem is exacerbated by the fact that depth edges (correctly) limit the distance
within which a white surface needs to be found. Using the new LIS colour space,
more surface colour information propagates across the edges and this leads to
the more colorful result Fig. 17 (g).
Figure 17 Real image performance comparison. (a) Input image of two-illuminant scene of toys with uniform background illuminated with reddish light from the left and bluish from right; (b) White-point adjusted benchmark image; (c) Edge map in which the arrow indicates where edges completely isolate the toy's green tongue from all other regions; (d) Standard McCann99 applied in log RGB space; (e) Stereo Retinex applied using log RGB Space, the isolated small patch turns gray; (f) McCann99 result applied to channels of the new colour coordinate System; (g) Stereo Retinex applied in the new colour channels with 3D edge information inhibiting propagation only within the illumination and intensity channels, the isolated small patch is close to the green it should be as in the (b). (h)-(k) Error maps corresponding to the results from (d)-(g) in which large errors are shown as dark and zero error as white
Table 15 Two-illuminant image toy with gray background performance comparison between SR LIS (stereo Retinex processed using LIS new colour channels); SR (stereo Retinex processed in log RGB space), M99 LIS McCann 99 Retinex processed in LIS colour channels); and M99 (McCann99 Retinex processed in log RGB space).
Now we replace the uniform gray background from the previous
experiment with a colorful one that is also lit by two light sources. The
performance is compared in the Figure 18 and Table 16.
Figure 18 Real image performance comparison (a) Input image of two illuminants scene of toy illuminated with colorful background lit by red light from the left-hand side and blue light from the right; (b) the white-point adjusted benchmark image; (c) standard McCann99 applied in log RGB space (d) Stereo Retinex applied using log RGB Space (e) McCann99 result applied to LIS colour channels (f) Stereo Retinex applied in LIS colour channels with 3D edge information inhibiting propagation only within the illumination and intensity channels. (9)-(j) Error maps corresponding to the results from (c)-(f) in which large errors are shown as dark and zero error as white
M99 LIS 1 21.95 1 5.58 1 4.33 1 3.59 1 20.88 1 5.97 1 4.89 1 4.29 M99 1 23.33 1 7.50 1 5.88 1 4.54 1 23.53 1 8.11 1 6.66 1 5.48
SR LIS SR
Table 16 Two-illuminant image toy against a colourful background. Performance comparison between SR LIS (Stereo Retinex processed using LIS new colour channels); SR (Stereo Retinex processed in log RGB space), M99 LIS (McCann 99 Retinex processed in LIS colour channels); and M99 (McCann99 Retinex processed in log RGB space).
Distance (* 10') MMax
17.52 14.14
Angular MMax 14.87 14.42
RMS 3.80 4.17
RMS 3.79 4.56
Mean 2.88 3.25
Med 2.06 2.51
Mean 3.12 3.73
Med 2.38 3.03
Both of these two toys scenes have two distinct illuminants, but even in a
single-illuminant scene the illumination can vary locally due to light interreflecting
off coloured surfaces. Figure 19 shows an example of a single-illuminant scene.
One example of the advantage of stereo Retinex over McCann99 can be seen by
comparing the left facing part of the horizontal book, which is in shadow so that it
is only being illuminated indirectly. In the McCann99 result, on the book cover
there is a region with a pink cast as well as one with a pale green cast; whereas,
stereo Retinex in LIS space correctly removes the original red cast. Overall
performance results are tabulated in Table 17.
Figure 19 Real-image performance comparison (a) Input image of single-illuminant scene of books illuminated soley by reddish light from the right; (b) The white-point adjusted benchmark image; (c) standard McCann99 applied in log RGB space; (d) Stereo Retinex applied using log RGB Space; (e) McCann99 applied in LIS colour channels; (f) Stereo Retinex applied in LIS space with 3D edge information inhibiting propagation only within the illumination and intensity channels. Note how the colour of the orange and yellow patches on the ball are recovered better in this case. Also the pink illumination cast is removed more completely. (9)-(j) Error maps corresponding to the results from (c)-(f) in which large errors are shown as dark, and zero error as white.
Table 17 Single-illuminant real image books scene performance comparison between SR LIS (stereo Retinex processed using LIS new colour channels); SR (stereo Retinex processed in log RGB space), M99 LIS McCann 99 Retinex processed in LIS space); and M99 (McCann99 Retinex processed in log RGB space).
5.7 Retinex's iteration parameter
One of the key parameter choices to make when running McCann99
Retinex is the number iterations to be conducted at each pyramid level. The
larger the number of iterations the greater distance at which pixels affect one
another. Figure 20 plots the median chromaticity angular error as a function of
the number of iterations for the scene from Figure 18. The plots for all the other
scenes showed a similar trend. From this plot, it appears that 1 iteration is the
best choice, so it is what has been used to obtain all the results reported above.
- McCann 99
-.- McCann 99 in LIS
m m m a - Stereo Retinex
- -- Stereo Retinex In LIS
i I I I I I I I I I 2 3 4 5 6 7 8 9 10
Iteration Time
Figure 20 Median angular error as a function of the number of Retinex's iterations parameter. The number of iterations affects the distance with which lightness information propagates across the image. Results here are for processing Figure 10, but the trend is the same for the other scenes as well.
5.8 Discussion
The McCann99 Retinex method was modified to include information about
the 3-dimensional structure of the imaged scene. The additional 3-dimensional
information is obtained from stereo imagery. Fundamental to Retinex is that it
ratios intensities from neighboring image locations. Stereo Retinex specifically
stops Retinex from using ratios that occur across abrupt changes in 3-
dimensional surface orientation, or across abrupt changes in depth. It thereby
avoids abrupt changes in the incident illumination from having a deleterious
effect upon its calculations. This strategy is in line with Gilchrist's experiments
[37] that showed how spatial context affects human lightness perception and his
conclusion that the important ratios are the ones relating to locations lying on the
same 3-space plane. Although stereo imagery was used here to determine the 3-
dimensional structure, any other method (e.g. from shading in a monocular
image) of identifying when neighbouring image pixels correspond to scene points
lying on a locally planar surface would work just as well.
Although a significant improvement over traditional Retinex, stereo
Retinex also highlights the problem that limiting the propagation of lightness
information across the image increases the likelihood that it will normalize
colours relative to a colour which is not a true white, with the result that some
colours are estimated as being more desaturated than they should be. To solve
this problem, the new colour coordinate system, LIS, was again introduced for
use in Retinex processing. The new coordinate system defines channels that
relate to changes in illumination, intensity and reflectance. Both Retinex and
stereo Retinex applied to these channels performs modestly better than when
either is applied to the standard log RGB channels. By at least partially
separating changes in surface reflectance from changes in illumination and
intensity, the LIS colour space makes it possible to express the fact that across
an abrupt change in 3D surface orientation the comparison of surface reflectance
information across the edge remains valid even though the illumination may have
changed in unpredictable ways
Stereo Retinex outperforms McCann99 Retinex in its ability to estimate
the chromaticity of surface colours as they would appear under ideal white light.
For the case of Retinex at least, this demonstrates that knowledge of scene's 3-
dimensional spatial structure can be useful for colour constancy.
CHAPTER 6: COLOUR CONSTANCY FOR MULTIPLE- ILLUMINANT SCENES USING RETINEX AND SVR'
Scenes lit by multiple colours of illumination provide a problem for colour
constancy and automatic white balancing algorithms. Many of these algorithms
estimate a single illuminant colour, but when there are multiple illuminants, there
is in fact not a single correct answer when we recover the surface reflectance.
For automatic white balancing and colour-cast removal in digital images, multiple
illuminants mean that a single, image-wide adjustment of colours may not yield a
good result, since the adjustment that makes one image area look better, may
simultaneously make another look worse. Retinex is one method that adjusts
colours on a pixel-by-pixel basis, and so inherently addresses the multiple-
illumination problem, but it does not always produce a perfect overall colour
balance. In the last chapter, we tried to solve the problem by extending Retinex
with 3D spatial information. However, Stereo Retinex requires two or more
images captured simultaneously from different view points, which is not practical.
On the other hand, illumination estimation by Support Vector Regression (SVR),
produces quite good overall colour balance for single-illuminant scenes, but does
not adjust the colours locally. Therefore, in this chapter, we combine Retinex and
SVR into a hybrid Retinex+SVR method to overcome some of these problems.
Experiments with both synthetic and real images show promising results.
1 This chapter also appears as a published paper: Weihua Xiong and Brian Funt, "Colour Constancy for Multiple-llluminant Scenes using Retinex and SVR,", in Proc, of Imaging Science and Technology Fourteenth Colour Imaging Conference pp. 304-308, Nov. 2006
6.1 Introduction
Many scenes involve multiple sources of illumination. One very common
example occurs when one is indoors and looks across the room and through a
window to the outdoors. The indoor illumination will generally be reddish in
comparison to the bluish illumination provided by the sky. These situations can
lead to very strange output. If the camera is correctly balanced for the indoor
illumination, the window will often look far too blue. The problem is that a single
colour balance setting is insufficient. The colours must, to some extent at least,
be adjusted locally to account for the local variation in scene illumination.
The majority of the illumination-estimation methods [I-5,28-361 that have
been developed for automatically colour balancing images makes a single
estimate of the scene illumination. They, therefore, are susceptible to the
situations such as the too blue window. Retinex is an exception in that it makes
a separate illumination estimate for each pixel. Although this is strength of
Retinex, it can also be a weakness in that the illumination estimate is strongly
influenced by the colours in each pixel's neighbourhoods. For pixels that have
same initial colour values but are at very different locations, Retinex's output may
be different.
Our goal is to gain the benefits of both the local and global approaches by
merging them into a single process. In particular, we use Retinex to make local
colour adjustments and then apply the Support Vector Regression (SVR) method
to the Retinex-processed image to adjust the overall colour balance. In scenes
with strong differences of illumination, our hypothesis is that because it makes
local adjustments, Retinex will attenuate the differences in illumination, and as
result SVR will be able to make a better global adjustment for the illumination.
The experiments describe below show that this hybrid method works better than
either SVR or Retinex alone.
6.2 Implementation Details
For the Retinex implementation, we use the Matlab version of McCann99
Retinex [66] . For Support Vector Regression we use the "3D" method described
in chapter 4 which is based on binarized histograms of the image pixels' (L, r, g)
where L = R + G + B and r=R/L and g = G/L. We quantize L into 25 equal steps,
and r and g into 50 steps so the 3D histograms consist of 62,500 (25x50~50)
bins. After training, SVR provides an estimate of the rg-chromaticity of the overall
scene illumination based on the binarized image histogram submitted to it.
SVR requires a training set. We created a training set of 56,730
histograms by random subsampling of colours from images contained in the
11,346 "grayball" image database [70].
Each image is processed first with McCann99 Retinex. The binarized Lrg
colour histogram of the resulting image is then passed to SVR which returns the
estimate of the illumination chromaticity. The SVR estimate is not actually an
estimate of the true illumination, but rather an estimate of the illumination relative
to the post-Retinex-processed image. The SVR illumination estimate is used in a
diagonal von Kries transformation to correct the post-Retinex image in order to
adjust it to have the colours it would have had if the original scene had been
imaged under the canonical illumination. This Retinex-SVR image is then
compared with the ground-truth image of the same scene imaged under the
canonical white illumination
We evaluate Retinex-SVR performance at each pixel in terms of the
distance between measured in rg-chromaticity (r=R/(R+G+B), g=G/(R+G+B))
space and in terms of the angle in degrees between colours in RGB space.
These errors are defined by the following formulas, where subscript 'p' indicates
the result after Retinex-SVR and 'g' indicates the ground-truth image.
We also compute three statistics on the distribution of errors across all the
pixels in an image: the median, the RMS (root mean square) and the mean of
the top 112 percentile of the largest errors, denoted MMax. In contrast to a single
maximum error, MMax is a more representative measure of the methods failures.
RMS of the errors from N pixels is given by the standard formula:
The Wilcoxon signed-rank test based on angular error with a 0.01
threshold for accepting or rejecting the null hypothesis is also used to evaluate
difference between error distributions [67].
6.2.1 Synthetic Image Experiments
Our first experiments are based on synthetic images that model a scene
with two quite distinct illuminants lighting different parts of the scene. We
generate synthetic scenes composed of patches of different reflectance by
randomly selecting reflectances from the 1995 available in the database
described by Barnard [65]. The patches are divided into two sections by an
irregular boundary representing where the illumination changes. RGB values for
the patches are calculate by using two illumination spectra, CIE A on the left, CIE
D65 daylight on the right, and sensor sensitivity functions of the SONY DXC-930
camera colour balanced equal-energy white. The ground-truth image is
generated using equal-energy white illumination over the whole scene. The
sensitivity functions were normalized for this white light. All of the spectra and
sensitivity functions were downloaded from the Simon Fraser University colour
database [69].
Figure 21 shows the results of SVR, Retinex and Retinex+SVR
processing. The top left Mondrian is the input image with a white line
superimposed demarcating the boundary between the two illuminations to make
it easier to see. The line is not part of the actual input image. SVR applied to the
input Mondrian estimates the illumination's rgb-chromaticity as [0.375, 0.298,
0.3081, in other words, as quite reddish in comparison to white [0.333, 0.333,
0.3331. This successfully removes some of the reddish cast from the left side of
the image, but introduces more blue to the right side (Figure 21, bottom row on
the left). On the other hand, when SVR is applied to the Retinex-processed
image (Figure 21, middle row on the left), it estimates the "illumination" as a
bluish [0.296, 0.31 5, 0.3891. In this second case, there was no actual illumination;
rather it is SVR's estimate of what the illumination would be if the Retinex output
were actually an unprocessed input image. Since the Retinex result is too blue in
comparison to the ground-truth Mondrian (top right), correcting the colours based
on SVR's estimate improves the image so that now the bottom right
(Retinex+SVR) and top right (ground truth) images are very similar. Numerical
results are tabulated in Tables 18 and 19. The Wilcoxon signed-rank test applied
to the angular error indicates that for this image the performance difference is
significant and that Retinex+SVR outperforms Retinex, and Retinex outperforms
SVR.
Figure 21 Synthetic image results. Top left: input image with a white line superimposed to indicate the illumination boundary. Top right: ground-truth image under equal energy white light. Middle left: Retinex result. Middle right: Retinex illumination map; Bottom left: SVR result. Bottom right: Retinex+SVR result.
1 Distance (* l o L ) I Anaular
Retinex 1 19.59 / 7.58 1 3.28 1 18.41 1 7.86 1 4.43 SVR 1 30.44 1 10.97 / 4.66 1 33.15 / 13.08 1 5.96
Table 18 Comparison of MMax (see text for definition), RMS and median error on a per- pixel basis between the ground-truth image values and the processed image values for processing by Retinex+SVR, McCann99 Retinex alone, and SVR alone.
I Retinex+SVR I Retinex I SVR
Table 19 Comparison of the different methods via the Wilcoxon signed-rank test based on angular error with 0.01 as the threshold applied to the angular errors. A "+" means the algorithm listed in the corresponding row is better than the one in the corresponding column. A "-" indicates the opposite.
6.2.2 Real Image Experiments
The first set of real-image experiments is based on some real scenes we
constructed in the lab containing two distinct illuminants similar to those found
indoors and outdoors. A bluish illuminant was created by placing a light blue filter
in front of a tungsten lamp. The reddish illuminant was a Solux 4100K tungsten
bulb connected to a dimmer. By adjusting the dimmer, the colour temperature of
the light drops significantly. These scenes were photographed using a Sony
DSC V1 camera. To obtain the ground-truth image, a white reflectance standard
was introduced at the side of the scene, and an additional image was taken
under unfiltered tungsten light. The RGB channels were then scaled in order to
make the reflectance standard perfectly white (i.e., R=G=B=255).
Retinex
The first test scene is shown in Figure 22a. It contains some books,
boxes, and a Mini Macbeth Colour Checker and is lit with reddish light from the
left and bluish light from the right. Figure 22b shows the same scene imaged
under white light. In addition to using white light, the resulting image was further
white balanced by scaling the RGB channels so that the image of a calibrated
white reflectance results in R=G=B.
+ SVR
Figure 22c shows the Retinex result with the intensity at each pixel
adjusted to match the input image in Figure 22a. Although Retinex processing
affects the luminance as well as the chromaticity of each pixel, here we are
interested only in its effect on chromaticity and are restoring the luminance
(R+G+B) to match that of the input image. The SVR result, which is also adjusted
to preserve pixel luminance, is shown in Figure 22d. Since SVR makes the same
colour adjustment across the whole image, anything it does must inevitably be a
compromise. In this case, SVR has removed some of the blue cast from the input
image, but this introduces some orange cast in other parts of the image. On the
other hand, the Retinex+SVR result shown in Figure 22e contains neither a blue
nor an orange cast. SVR determined the single value for the illumination in rgb-
chromaticity as a slightly bluish [0.306, 0.308, 0.3851 in comparison to white
[0.333, 0.333, 0.3331. When applied to the Retinex-processed image, SVR's
estimate is [0.324, 0.341, 0.3271.
The numerical results presented in Tables 20 and 21 show that Retinex
and SVR perform with similar accuracy for this image, while the Retinex+SVR
hybrid outperforms each of the others taken individually.
Figure 22 Two-illuminant books scene: (a) input image with reddish light coming from the left and bluish from the right; (b) ground-truth image captured under white light matching the camera's white point; (c) Retinex result (d) SVR result (e) Retinex+SVR result.
Table 20 Comparison for the two-illuminant books scene of MMax (see text for definition), RMS and median errors measured on a pixel-by-pixel basis between the ground-truth image values and the processed image values for processing by Retinex+SVR, Retinex alone, and SVR alone
I Retinex+SVR I Retinex I SVR
Table 21 Comparison of the different methods via the Wilcoxon signed-rank test based on angular error for the two-illuminant books scene. A "+" means the method listed in the corresponding row is better than the one in the corresponding column; a "-"' indicates the opposite; and a "=" indicates they are indistinguishable.
Retinex+SVR 1
We designed a second scene in the lab intended in this case to model the
situation of being indoors in a room with a window to the outdoors. The scene
shown in Figure 23a consists of a toy human figure 'outdoors' seen through a
+
window. The mountain scene on the left is a picture on the wall 'indoors'. The
coloured ball is also indoors. The outdoor objects are gray surface lit with sky
blue light, while the indoor ones are lit by reddish-orange light. Figure 23b is the
ground truth image with pixel intensities adjusted to match those of the input
image. The Retinex result in Figure 23c shows that Retinex reduces the
+ Retinex
magnitude of the difference between the two illuminants, but the overall colour
balance is too yellow. SVR determines the single value for the illumination in rgb-
chromaticity as a slightly reddish 10.343, 0.335, 0.3221. On the other hand,
when SVR is applied to the Retinex-processed image, SVR1s estimate is [0.346,
0.358, 0.2971. SVR provides better overall colour balance in Figure 23d, but the
outdoor part becomes even bluer. The Retinex+SVR result, Figure 23e, has the
indoor section reasonably well balanced and has reduced, but not eliminated the
outdoor blue. Numerical results are presented in Tables 22 and 23.
- -
Figure 23 Window scene: (a) input image with bluish outdoor illumination and red-orange indoor illumination. (a) input image (b) ground-truth image captured under white light that matches the camera's white point; (c) Retinex result (d) SVR result (e) Retinex+SVR result
I Distance (* 10') I Angular I MMax 1 RMS I Med ( MMax I RMS I Med
Table 22 Comparison of MMax (see text for definition), RMS and median errors measured on a pixel-by-pixel basis between the ground-truth image values and the processed image values for processing by Retinex+SVR, Retinex alone, and SVR alone.
- - . . Retinex
SVR
Table 23 Comparison of the different methods via the Wilcoxon signed-rank test based on angular error for the window scene. A "+" means the method listed in the corresponding row is better than the one in the corresponding column. A "-"' indicates the opposite.
50.05 40.17
In addition to laboratory scenes, we processed images of other typical
Retinex+SVR Retinex SVR
scenes. The advantage of the laboratory scenes is that it is possible to obtain a
11.98 9.76
Retinex +
+
Retinex+SVR
ground truth image with which to evaluate the error in illumination estimation.
SVR + -
Outside the laboratory, it is difficult to make enough measurements of the
4.85 6.26
56.33 43.26
13.38 10.94
8.39 7.83
illumination distribution to obtain the ground truth image. We also applied
Retinex, SVR, and Retinex +SVR on lots of image and let some reviewers select
the best one in front of a LCD monitor. During the subjective evaluation of
several hundred images, we found that in many cases there is little difference in
the overall image quality between Retinex, SVR and Retinex+SVR. This is in part
because the majority of scenes do not contain dramatic differences in incident
illumination. However, in the cases where the scene clearly contains quite
different illuminants, Retinex+SVR gives out the best solution. An example of one
such scene and the results of the three methods is shown in Figure 24. In this
example, Retinex has again reduced the difference in illumination, but has left
the image with a slight blue cast that Retinex+SVR removes.
Figure 24 Typical natural image with two illuminations: (a) input image; (b) Retinex result; (c) SVR result; (d) Retinex+SVR result
6.3 Retinex Iteration Time
McCann99 Retinex is a multi-resolution algorithm and one of its key
parameters 123,241 is the number of iterations it performs at each resolution. We
determined the optimal setting for Retinex+SVR by plotting the error as a function
of the number of iterations. Figure 25 shows the plot for the case of the two-
illuminant window scene. The plots for other scenes were similar with the
minimum error found at 4 iterations. All our experiments were thus based on 4
iterations.
c .- ; lo. P
8 -
6 1 2 3 4 5 6 7 8 9 1 0
Number of Iterations
Figure 25 Median angular error as a function of the number of iterations Retinex used at each resolution. This plot is for the two-illuminant window scene; however, for other scenes the results are qualitatively similar.
6.4 Discussion
Many images are of scenes with at least two distinct illuminants. These
images present a challenge for automatic white balancing algorithms because
there is no single right answer. Retinex is one method that makes local
adjustments for the illumination, but it does not always get the overall colour
balance correct. We proposed a hybrid Retinex+SVR method and shown, at least
for the limited set of images it is possible to create in the laboratory, that it works
better than either SVR or Retinex working separately.
SVR is not the only illumination-estimation method that could be
hybridized with Retinex. Since there are several learning-based illumination
methods [I-5,32-361 of similar accuracy, it is reasonable to suppose that any one
of them could be substituted for SVR in this context with similar results.
Our goal was to remove the colour effects of illumination; however, as
Hubel [79] has argued perhaps in terms of creating an interesting image it is best
to preserve the illumination effects.
CHAPTER 7: INDEPENDENT COMPONENT ANALYSIS AND NONNEGATIVE LINEAR MODEL ANALYSIS OF
ILLUMINANT AND REFLECTANCE SPECTRA'
The colour outputs can be viewed as the projection of colour spectra on
the imaging device sensitivity functions. Another accurate way to represent any
location's colour is to provide the spectrum value at each wavelength. To reduce
storage and processing requirements, Principal Component Analysis (PCA),
lndependent Component Analysis (ICA), Non-Negative Matrix Factorization
(NNMF) and Non-Negative lndependent Component Analysis (NNICA) are all
techniques that can be used to compute basis vectors for finite-dimensional
models of spectra. The two non-negative techniques turn out to be especially
interesting because the pseudo-inverse of their basis vectors is also close to
being non-negative. This means that after truncating any negative components of
the pseudo-inverse vectors to zero, the resulting vectors become physically
realizable sensors functions whose outputs map directly to the appropriate finite-
dimensional weighting coefficients in terms of the associated (NNMF or NNICA)
basis. Experiments show that truncating the negative values incurs only a very
slight performance penalty in terms of the accuracy with which the input
spectrum can be approximated using a finite-dimensional model.
1 This chapter also appears as a published paper: Weihua Xiong and Brian Funt, "lndependent Component Analysis and Nonnegative Linear Model Analysis of llluminant and Reflectance Spectra", Proc. 10th Congress of the International Colour Association, Granada, May 2005
l o 8
7. 1 Introduction
Finite-dimensional models of spectra based on PCA have been widely
used since Judd's model of daylight and Cohen's analysis of Munsell chips.
Previous studies have applied ICA to surface reflectance [80] and daylight
spectra 1811. In this chapter, we extend this analysis to a larger set of illuminants
and to colour signal spectra. The colour signal is defined as the product of
surface reflectance and spectral power distribution of the illuminant incident on it.
We compare the PCA and ICA bases to the entirely non-negative bases obtained
via NNlCA and NNMF in terms of the accuracy with which full spectra can be
modelled using the various bases.
For any finite-dimensional model, a spectrum is modelled by projecting it
onto the pseudo-inverse of a set of basis vectors. This projection yields the
weighting coefficients of the model as described in more detail below. The output
of an optical sensor can also be described as the result of a projection of the
incoming spectrum on the sensor's spectral sensitivity functions. This leads to
the question: Is there a good basis for modelling spectra that also has the
property that the pseudo-inverse of the basis might be used as physically
realizable sensors?
PCA is a standard technique for calculating a good orthogonal basis from
a training set of spectra. However, being orthogonal, the PCA basis vectors
contain significant negative components. The pseudo-inverse of this basis is also
orthogonal and similarly contains significant negative components. As a result,
the PCA basis is unlikely to yield physically realizable sensors since such sensor
clearly can not have negative sensitivity; however, we hypothesize that perhaps
NNlCA and NNMF which yield non-negative basis vectors might. Such a sensor
would directly output the weighting coefficients of a finite-dimensional model of
the incident light's spectrum.
7.2 Method
It is convenient to express a finite-dimensional linear model of spectra as:
X = AB where X is an m-by-d matrix of m spectra each uniformly sampled at d
wavelengths; B is an n-by-d matrix of n basis vectors; and A is the m-by-n
mixing matrix of weighting coefficients. Since the intent of the dimensionality
reduction techniques is to identify a basis of reduced dimension that
approximates the original data well, n is generally less than m. Each of the four
dimensionality reduction techniques finds a basis B minimizing (possibly subject
to additional constraints):
PCA finds basis vectors that are uncorrelated and orthogonal. ICA finds
basis vectors that are uncorrelated and in addition are independent but not
orthogonal. There are many different ICA algorithms [82]. Here we used the
JADE [83] (Joint Approximate Diagonalization of Eigenvalues) implementation.
NNlCA [84] carries out ICA subject to the additional constraint of non-negativity
in the resulting basis vectors. Non-negative Matrix Factorization solves (37)
subject to all entries in both A and B being nonnegative. An iterative algorithm
1851 to do this is based on the following pair of equations:
7.3 Results
We used the 1781 surface reflectances and 102 illuminant sources
described in [65]. The wavelength range is from 380nm to 780nm with a
sampling 4nm interval. The reflectance and illumination data sets are each
broken into two random subsets for training and testing. Colour signal training
and test sets are constructed from the respective training and test reflectance
and illumination datasets. The first 3 basis vectors obtained by each of PCA, ICA,
NNMF and NNICA for training sets of surface reflectances, illuminations, and
colour signals are shown in the Figure 26.
Figure 26 First 3 basis spectra as obtained by The vertical axis is in fractional reflectance.
t vectors for surface reflectance, illumination and colour signal ICA, PCA, NNMF and NNICA. The horizontal axis is wavelength. terms of normalized power (illumination and colour signal) or
Given a set of basis vectors, a spectrum written as a column vector, x, can
be represented by the weighting coefficients, formed by a row
vector, w = sT x B-' . The PCA basis vectors are orthogonal so B-' = BT.
However, for the other methods the basis vectors are not orthogonal so the
pseudo-inverse of matrix B, B', is used to obtain the coefficients instead. Figure
27 shows the corresponding pseudo-inverse for each of basis vectors sets from
Figure 26.
Surface Reflectance ICA PC4
Illumination
Colour ICA
I
Signal PC A
Figure 27 The pseudo-inverse of the surface reflectance, illumination and colour signal basis vectors. The horizontal axis is wavelength. The horizontal line at zero. Physically realizable sensors approximating these pseudo-inverses can be based on the portion of each curve on or above the zero line. Clearly, the approximation is likely to be best in the case of NNlCA and NNMF.
Setting all the negative values in the pseudo-inverse vectors to zero
results in a new set of vectors we will call the truncated pseudo-inverse, B,' . The
weighting coefficients are then obtained as w = x x B,' . An approximation, xl, , to
the original spectrum is reconstructed from the weights and basis as x,, = w x B .
The root mean square distance is one the measure of the accuracy of the
approximation of xu to .v :
We found that the L1 norm yielded qualitatively similar results to the RMS
error and therefore report only the RMS error. For N spectra the mean RMS error
is then simply the mean of the individual RMS errors:
When the true pseudo-inverse of basis vectors is used, ICA always results
in the least error. Figure 28 shows the mean approximation error as a function of
the number of basis vectors used. Plots of the median RMS error are qualitatively
similar.
Surface Reflectance Illumination Colour Sianal
I . 2 4 e Basls Dlmiiension Bask DI~nensIon Basis Dlnienslo~i
Figure 28 Mean RMS error in spectral approximation (MRMS error) for surface reflectances, illuminations, and colour signals in the test set for each of the four methods as a function of the number of basis vectors used.
When the actual pseudo-inverse vectors are replaced with the truncated
pseudo-inverse vectors, the approximation error necessarily increases. Figure 29
compares the mean RMS errors for colour signal reconstruction based on
truncated versus un-truncated pseudo-inverse vectors. Since the negative
components were small, truncation has little effect on the NNMF and NNlCA
results.
Figure 29 (a) A comparison of the mean RMS error in reconstructing the colour signal spectra with the actual and truncated pseudo-inverse vectors for the case of NNMF and PCA. Without truncation the NNMF and PCA results overlap (lowest curve); however, with truncation the PCA error increases substantially (with the exception of dimension 7) while the NNMF error increases marginally. (b) A comparison of the mean RMS error approximating colour signal spectra for all four methods using the truncated pseudo- inverse.
7.4 Discussion
Whether for reflectances, illurninants or colour signals, ICA consistently
yielded the lowest mean RMS error in spectral approximation followed by PCA,
NNMF and NNICA. The errors, however, for ICA, PCA and NNMF were all
qualitatively very similar. NNMF and NNICA have the advantage that the basis
vector components are all nonnegative. Although for NNMF and NNICA, the fact
that the basis vectors are all nonnegative does not mean that and their pseudo-
inverses necessarily will also be nonnegative, we found that in practice the
negative components are relatively small. Setting the negative components to
zero results in functions that could be realized by actual optical sensors. Such
sensors would have the advantage that their output would correspond directly to
the weighting coefficients of a finite-dimensional model of the incident spectra. As
such, they could be considered optimal (ignoring the influence of noise) in terms
of the information they capture about the incident spectra.
CHAPTER 8: CONCLUSION
With the development of multimedia and virtual reality technology, colour
information is being utilized widely and comprehensively in computer vision and
pattern recognition applications, such as image segmentation and object
recognition. But colour is one of the most complex phenomena of visual
perception. Colour perception is derived from the interactions between the visual
system, physical surfaces, illumination, and the visual environment. It is strongly
dependent on lighting geometry (direction and intensity of light sources) and
illuminant colour (spectral power distribution), which changes with the position
and atmospheric conditions. A fundamental and challenging issue is to separate
the illumination information from the image and recover the original surface
colour. This desired invariance of colour representation to general changes in
illumination is called colour constancy.
My dissertation has described my research work including new models of
colour constancy. The first contribution is to propose two new statistical
illumination colour estimation methods, one is based on the regression technique
and the other one is based on the interpolation technique. It is well known that
there is some relationship between the colour distribution of an image and its
illumination, and some research has been conducted to investigate this
phenomenon, for example, Colour by Correlation by Finlayson et. al., and Neural
Network by Funt et. al.. However, Neural Network may suffer from local
optimization. To overcome this disadvantage, I implemented Support Vector
Regression, a simple and better global solution, for predicting illumination
chromaticity values. The other solution is to extend a typical non-uniform
interpolation technique, Thin Plate Spline, into higher dimensions and extract a
continuous function representing the relationship between image colours and the
corresponding scene illumination. Compared with SVR, TPS has a major
advantage in that its outputs are always parameter-independent and unique.
The second contribution in my thesis is to design a fast and efficient colour
constancy method by designing a new colour coordinate system, named LIS.
This system can separate the illumination, intensity and surface information from
colour images as independently as possible. The surface axis is used to identify
those gray surfaces in the scene. Obviously the average of all RGB values of
these gray surfaces can be used to estimate the illumination chromaticities.
The third important contribution is to recover the original surface image by
integrating a spatially dependent method with a colour constancy solution for
those scenes under two or more sources of illumination. Until now, almost all
colour constancy algorithms assume either that there is only one uniform
illumination or that the illumination change is very smooth. These cases are not
true for most scenes. My work supposes that the surface orientation may cause
an illumination discontinuity. So the basic idea is to find those surfaces on the
same plane by stereo image technique, and apply a spatially dependent method
on the different planes separately. To avoid the problem that the Retinex method
tends to push isolated patches to gray, LIS is applied again here. The results
indicate the spatial arrangement can improve the colour constancy calculation.
However, this method requires one or more images captured of the same scene
simultaneously to detect the objects' depth information and surface orientation.
Such a limitation makes the method less practical, so the question of how to
solve this problem is a potential research direction in the next stage.
The fourth contribution is to design a hybrid colour constancy solution for
any single image under multiple sources of illumination. Retinex can be used to
mitigate the illuminations' difference and to create an intermediate image
assumed to be lit by an unknown consistent light, then the SVR global
illumination method and Von Kris diagonal transformation is applied to remove
the illumination effect. Although this method has been tested on several sets of
images, including synthetic images and real images, the performance of natural
images is not effective enough for practical uses. Thus improvement on it is
another possible research work in the future.
The fifth contribution is presented by research on the finite dimensional
model of colour spectra. PCA and ICA are commonly used to reduce the storage
and processing requirements for illumination and surface reflectance spectra. I
introduced two non-negative component analysis tools: NNMF and NNica. They
are shown not only to be accurate but also to be an efficient way to find the
optimal sensor sensitivity for the imaging device in terms of spectral estimation or
recovery.
REFERENCES
[ I ] B-Buchsbaum. "A spatial processor model for object color perception", Journal
of the Franklin Institute, 1980, Vol. 31, ppl -26
[2] G.D. Finlayson, and E. Trezzi, "Shades of Gray and Colour Constancy",
Proceedings of lz th Color Imaging Conference, 2004, pp. 37-41
[3] B. Funt and V.C. Cardei, "Bootstrapping colour constancy", Proc. Of SPIE,
Vol. 3644 1999, pp. 421 -428
[4] V. Cardei, B. Funt, and K. Barnard, "Estimating the Scene Illumination
Chromaticity Using a Neural Network", Journal of the Optical Society of America
A, Vol. 19, No. 12, Dec 2002, pp2374-2386.
[5] G. D. Finlayson, S. Hordley, and P. M. Hubel, "Color by Correction: A Simple,
Unifying Framework for Color Constancy", IEEE Transactions On Pattern
Analysis And Machine Intelligence Vo1.23, No. 1 1 Nov. 2001, pp.1209-1221
[6] T.K. Sarkar, "History of Wireless", Wiley-Interscience, 2006, pp20-66
[7] Mark D. Fairchild "Color Appearance Models", Wiley & Sons. Ltd 2005 pp.
146-1 50
[8] G. Wyszecki & W.S. Stiles, "Color Science: Concepts and Methods,
Quantitative Data and Formulas", 2nd edition, John Wiley & Sons, New York,
1 982, ~ ~ 7 4 - 1 0 3
[9] V. Kries. "Influence of adaptation on the effects produced by luminous stimuli"
Sources of Color Science, The MIT Press, Cambridge MA, 1970, pp. 109-1 19
[ l o ] M.R. Luo and R. W. G. Hunt. "A chromatic adaptation transform and a colour
inconstancy index". Color Res. Appl., Vol. 23, 1 998, pp. 1 54-1 58.
[ I I ] K. M. Lam. "Metamersim and Colour Constancy". PhD thesis, University of
Bradford, 1985.
[I21 D.H. Foster, S. M. C. Nascimento, "Four issues concerning colour constancy
and relational colour constancy1'. Vision Research, Vol. 37, lssue 10,
1997,pp.341-350.
1131 J.L. Simonds, "Application of characteristic vector analysis to photographic
and optical response data", Journal of the Optical Society of America A, Vol. 53,
Issue. 8, 196, pp. 968-974
[I41 K. Barnard, L. Martin, A. Coath, and B. Funt, "A Comparison of
Computational Colour Constancy Algorithms. Part Two: Experiments on lmage
Data", IEEE Transactions on lmage Processing, No. 11, 2002, pp.985-996 .
[I51 K. Barnard, V. Cardei, and B. Funt, "A Comparison of Computational Colour
Constancy Algorithms. Part One: Methodology and Experiments with
Synthesized Data", IEEE Transactions on lmage Processing, No. 1 1,2002, pp.
972-984 .
[I61 L.T. Maloney and B.A. Wandell, "Color Constancy: A Method for Recovering
Surface Spectral Reflectance", Journal of the Optical Society of America A, Vol.
3, lssue 1, 1986, pp.29-33
[I71 B.A. Wandell, "The synthetic and Analysis of Color Images", IEEE
Transactions on PAMI, No. 1, lssue 9, January, I987 ppl-13
1181 J.Cohen, "Dependency of the spectral reflectance curves of the Munsell
colour chips", Psychonomic Science, Vol. 1, 1964, pp369-370
[ I 91 L.T. Malony, "Evaluation of linear models of surface spectral reflectance with
small numbers of parameters," Journal of the Optical society of America A, Vol.
3, 1986, ~~1673 -1683
[20] J.P.S. Parkkinen, J.Hallikainen, and T. Jaaskelainen, "Characteristic spectra
of Munsell colors", Journal of the Optical society of America A, Vol. 6, 1989,
~ ~ 7 2 5 - 7 3 0
[21] H.Laamanen, T. Jaaskelainen, J.P.S. Parkkinen, and J.Hallikainen,
"Comparison of PCA and ICA in color recognition", Proceedings of SPIE, Vol.
41 97,2000
[22] E. Land, J. McCann, "Lightness and Retinex Theory", Journal of the Optical
Society of America A, Vol. 61, January 1971, pp. 1-1 1
[23] B. Funt, F. Ciurea, J. McCann, "Retinex in Matlab", Journal of the Electronic
Imaging, Jan. 2004, pp 48-57
[24] B. Funt, and F. Ciurea, "Parameters for Retinex", Proc. 9th Congress of the
International Color Association, Rochester, June 200 1 .
[25] B. Funt, F. Ciurea, and J. McCann, "Tuning Retinex Parameters", Journal of
the Electronic Imaging, Jan. 2004, pp 48-57.
[26] Jr. T.G. Stockham, "Image processing in the context of a visual model1',
Proc. of IEEE. Vol. 60, Issue 1072,pp. 828-842
1271 O.D. Faugeras, "Digital image color processing within the framework of a
human visual system1', IEEE transactions on ASSP, Vol. 27, 1979, pp380-393
[28] B.K.P. Horn, "Determining lightness from an image", Computer Graphics and
lmage Processing, Vol. 3, 1974, pp277-299
[29] J. Frankle, Jonathan and J. McCann, "Method and Apparatus for Lightness
Imaging", US Patent #4,384,336, May 17, I983
[30] R. Kimmel, M. Elad, D. Shaked, R. Keshet, and I. Sobel, "A Variational
Framework for Retinex", International Journal of Computer Vision, Vol. 52, Issue
1, 2003, pp7-23
[31] G. D. Finlayson and S. Hordley, "Selection for Gamut Mapping Color
Constancy", British Machine Vision Conference, 630-639, Sept. 1997.
[32] G.D. Finlayson. "Retinex viewed as a gamut mapping theory of color
constancy", Proc. AIC lnternational Association 97, Vol. 2, 1997, pp. 527-530
[33] S.D. Hordley, "Scene Illumination Estimation: Past, Present, and Future",
Color Research and Application, Vol. 31, Number 4, 2006, pp. 303-31 4
[34] J. Weijer, and Th. Gevers, "Color Constancy based on the Grey-Edge
Hypothesis", Proceedings of I International Conference on lmage
Processing, 2005, pp. 722-725
[35] K. Barnard, L. Martin, and B. Funt, "Colour by correlation in a three
dimensional colour space'', 6th European Conference on Computer Vision, 2000,
~ ~ 3 7 5 - 3 8 9
[36] C. Rosenberg, M. Hebert and S. Thrun, "Color constancy using KL-
divergence", Proc. 8'h ICCV, Vol. 1, 2001, pp. 239-246
[37] A.L. Gilchrist, "Perceived lightness depends on perceived spatial
arrangement," Science, Vol. 195, 1977, pp. 185-1 87
[38] Y. Yamauchi, K. Uchikawa, "Depth Information Affects Judgment of the
Surface-Color Mode Appearance", Journal of Vision, Vol. 5, 2005, pp.515-523
[39] J. N. Yang, S.K. Shevell, "Stereo Disparity Improves Color Constancy,"
Vision Research, Vol. 42, 2002, pp. 1979-1989
[40] H. Boyaci, L.T. Maloney, S. Hersh, "The Effect of Perceived Surface
Orientation on Perceived Surface Albedo in Binocularly Viewed Scenes", Journal
of Vision, Vol. 3, 2003, pp. 541-553
[41] J.N. Yang, L.T. Maloney, "llluminant cues in surface color perception: Tests
of three candidate cues", Vision Research, Vol. 41, 2001, pp. 2581-2600
[42] L.T. Maloney, M. S. "illumination Estimation as Cue Combination", Journal of
Vision, Vo1.2, 2002, pp493-504.
[43] G. Finlayson, M. Drew, and B. Funt, "Color constancy: generalized diagonal
transforms suffice", Journal of the Optical Society of America, A No. 11, 1994,
pp.3011-3020.
[44] B. Funt, V. Cardei and K. Barnard, "Learning Color Constancy", Proc.
IS&T/SID Fourth Color Imaging Conference: Color Science, Systems and
Applications, 1996, pp. 58-60
[45] A. Smola and B. Scholkopf, "A tutorial on support vector regression",
Statistics and Computing, 2003
[46] V. Kecman, Learning and Soft Computing, MIT, Cambridge, 2001, pp. 121-
193
[47]A. Chodorowski, T. Gustavsson and U. Mattson, "Support Vector Machine for
Oral Lesion Classification", Proceedings of 2002 IEEE International Symposium
on Biomedical Imaging, July, 2002, pp.173 - 176
[48] C.W. Hsu and C.J. Lin, "A Comparison of Methods for Multiclass Support
Vector Machine", IEEE Transaction on Neural Network, 2002, Vol. 13 No 2,
pp.415-425
[49] Y. Lee and C. Lee, "Classification of Multiple Cancer Types by Multicategory
Support Vector Machines using Gene Expression Data Bioinformatics",
Bioinformatics Vol. 19 No. 9, 2003 pp. 11 32-1 139
[50] Y.Lee, Y.Lin and G. Wahba, "Multicategory Support Vector Machine",
Proceedings of the 33rd Symposium on the Interface, 2001
[51] H. Yang, L. Chan, and I. King, "Support Vector Machine Regression for
Volatile Stock Market Prediction", lntelligent Data Engineering and Automated
Learning 2002, LNCS 24412, 2002, pp. 391-396
[52] C.H. Wu, C. C. Wei, M.H. Chang, D.C. Su and J.M. Ho, "Travel Time
Prediction with Support Vector Regression", Proc. Of IEEE lntelligent
Transportation Conference, October, 2003, pp. 1438-1442.
[53] D. X. Zhao, and L. Jiao, "Traffic Flow Time Series Prediction Based On
Statistics Learning Theory", Proceedings of IEEE 5th International Conference on
lntelligent Transportation Systems, 2002, pp. 727-730
1541 H. V. Khuu, H.K. Lee and J.L. Tsai, "Machine Learning with Neural Networks
and Support Vector Machines", Online Technical Report, available at:
http://www.cs.wisc.edu/-hiep/Sources/Articles/, accessed on April, 2004
[55] C.C. Chang, and C.J. Lin(2001), LIBSVM: a library for support vector
machines. Software available at http://www.csie, ntu.edu. tw/-cilin/libsvm,
accessed on April, 2004
1561 NJ. Bi and K. P. Bennett, "A Geometric Approach to Support Vector
Regression, Neurocomputing7', Vol. 55, Issues 1-2, 2003, pp. 79-108
[57] K. Barnard, L. Martin, and B. Funt, "Colour by correlation in a three
dimensional colour space", 6th European Conference on Computer Vision,
Springer, 2000, pp. 375-389.
[58] G.D. Knott, Interpolating Cubic Splines, Birkhauser Inc, 2000
1591 F. L. Bookstein. "Principal warps: thin-plate splines and decomposition of
deformations", IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 11,
lssue 6, 1989, pp. 567-585.
[60] M. H. Davis, A. Khotanzad, D. Flamig, and S. Harms. "A physics-based
coordinate transformation for 3-d image matching", IEEE Trans. Medical Imaging,
Vol. 16, lssue 3, 1997, pp.317-328.
[61] N. Arad and D. Reisfeld, "Image warping using few anchor points and radial
functions", Computer Graphics forum, pp.35-46, vol. 14, 1995, pp. 35-46.
[62] Asker M. Bazen and Sabih H. Gerez, "Elastic minutiae matching by means
of thin-plate spline models", International Conference on Pattern Recognition,
Aug 2002
[63] W. Xiong and B. Funt, "Nonlinear RGB-to-XYZ Mapping for Device
Calibration", Proceedings of the 13'~ CIC, 2005, pp. 200-204
[64] G.D. Finlayson and S.D. Hordley, "Color Constancy at a pixel", Journal of the
Optical Society of America A, Vol. 18, issue 2, 2001, pp. 253-64
[65] K. Barnard, L. Martin, B. Funt, A. Coath, "A Data Set for Color Research",
Colour Research and Application, Vol. 27 No. 3, 2002, pp. 140-147. (Data from:
www.cs.sfu .cat-colour )
[66] http://www,mathworks.com/, accessed on January 2006
[67]S.D. Hordley, G.D. Finlayson, "Reevaluation of color constancy algorithm
performance", Journal of the Optical Society of America A, vol. 23, Issue 5, 2006,
pp. 1008-1 020
[68] K. Barnard, and B. Funt, "Camera Characterization for Color Research",
Colour Research andApplication, Vol. 27, No. 3, 2002, pp. 153-164.
[69] www.cs.sfu.ca/-colour, accessed on January 2006
[70] F. Ciurea and B. Funt, "A Large Image Database for Color Constancy
Research", Proc. IS&T.SID Eleventh Color lmaging Conference, Society for
lmaging Science and Technology, 2003, pp. 160-1 63.
[71] R.L. Eubank, Spline Smoothing and Nonparametric Regression, Marcel
Dekker, New York, 1988
[72] M.G. Bloj., D. Kersten, A.C. Hurlbert, "Perception of Three-Dimensional
Shape Influences Colour Perception through mutual Illumination", Nature, Vol.
42, 1999, pp. 23-30
[73] E.H. Adelson. "Lightness Perception and Lightness Illusions," New Cognitive
Neuroscience, 2"d ed., MIT Press, 2000, pp. 339-351
[74] M. Gelautz, D. Markovic, "Recognition of Object Contours from Stereo
Images: an Edge Combination Approach", Proc. Of Yd International Symposium
on 3 0 Data Processing, Visualization and Transmission, 2004, pp. 774-780
[75] Ch. Sun, "Fast Stereo Matching Using Rectangular Subregioning and 3D
Maximum-Surface Techniques", International Journal of Computer Vision. Vol.
47,2002, pp. 99-1 17
[76] http://ww.loreo.com , accessed on September 2005
[77]http://www.vision.caltech .edu/bouqueti/calib doc/, accessed on September
2005
[78] www.qretaqmacbeth.com, accessed on September 2005
[79] P. M. Hubel, "The Perception of Colour at Dawn and Dusk", Proc. fh Color
lmaging Conference, 1999, pp.48-51
[80] H. Laamanen, T. Jaaskelainen, and J.P.S. Parkkinen, "Comparison of PCA
and ICA in color recognition", Proceedings of Intelligent Robots and Computer
Vision, SPlE vol. 4197, 2000, pp. 367-377
[81] Eva M. Valero, Juan L. Nieves, Javier Hernandez-Andres, and Javier
Romero, "lndependent component analysis with Different Daylight Illuminants",
Proc. Second European Conference on Colour in Graphics, Imaging and Vision,
2004,pp.I 93-1 96
[82] A. Hyvarinen, J. Karhunen and E. Oja, lndependent Component Analysis,
(John Wiley & Sons Inc, Part 11 2001) pp. 147-1 93.
[83] J. F. Cardoso and A.Souloumiac, "Blind Beamforming for Non Gaussian
Signals", IEEE Transactions on Signal Processing, Vol. 46, Issue 7, 1998, pp.
1878-1 885.
1841 M.D. Plumbley, "A Nonnegative PCA Algorithm for lndependent Component
Analysis", IEEE Transactions On Neural Network, Vol. 15, No 1, 2004, pp.66-76