Post on 19-Apr-2020
Specular Highlight Removal in Facial Images
Chen Li1 Stephen Lin2 Kun Zhou1 Katsushi Ikeuchi2
1State Key Lab of CAD&CG, Zhejiang University 2Microsoft Research
Abstract
We present a method for removing specular highlight
reflections in facial images that may contain varying illu-
mination colors. This is accurately achieved through the
use of physical and statistical properties of human skin and
faces. We employ a melanin and hemoglobin based model
to represent the diffuse color variations in facial skin, and
utilize this model to constrain the highlight removal solu-
tion in a manner that is effective even for partially satu-
rated pixels. The removal of highlights is further facili-
tated through estimation of directionally variant illumina-
tion colors over the face, which is done while taking ad-
vantage of a statistically-based approximation of facial ge-
ometry. An important practical feature of the proposed
method is that the skin color model is utilized in a way that
does not require color calibration of the camera. More-
over, this approach does not require assumptions commonly
needed in previous highlight removal techniques, such as
uniform illumination color or piecewise-constant surface
colors. We validate this technique through comparisons to
existing methods for removing specular highlights.
1. Introduction
A human face usually exhibits specular highlights
caused by sharp reflections of light off its oily skin surface.
Removing or reducing these highlights in photographs is
often desirable for the purpose of aesthetic enhancement or
to facilitate computer vision tasks, such as face recognition
which may be hindered by these illumination-dependent ap-
pearance variations. Extraction of a specular highlight layer
can moreover provide useful information for inferring scene
properties such as surface normals and lighting directions.
Specular highlight removal is a challenging task because
for each pixel there are twice as many quantities to be es-
timated (specular color and diffuse color) as there are to
observe (image color). To manage this problem, previous
methods typically require simplifying assumptions on the
imaging conditions, such as white illumination [33, 34, 41,
40], piecewise-uniform surface colors [13, 1], repeated sur-
face textures [31], or a dark channel prior [11]. These con-
ditions, however, generally do not exist in facial images,
which are normally captured in natural lighting environ-
ments and do not exhibit the assumed surface properties.
To address this problem, we present a highlight removal
method that takes advantage of physical and statistical prop-
erties of human skin and faces, and also jointly estimates an
approximate model of the lighting environment. Accurate
estimation of illumination and its colors is essential for re-
moving highlights, since highlight reflections have the color
of the lighting. Most previous highlight separation tech-
niques simply assume the illumination color to be uniform
and/or known, but this is typically not the case for real-
world photographs. We instead solve for it together with
highlight removal while utilizing priors on human faces. In
this way, our method not only is able to account for high-
light information and facial priors in estimating the illumi-
nation, but it can also estimate an environment map with di-
rectionally variant colors often present in everyday scenes,
like an office with both fluorescent ceiling lights and sun-
light from windows.
Besides the environment map, better prior knowledge
about the diffuse colors of an object is also important for
effectively separating specular highlights. In this work, we
utilize a physically-based model of human skin color to bet-
ter constrain the highlight removal solution. Human skin is
a turbid medium containing melanin and hemoglobin as its
two dominant pigments. Spatial variations in the amount
of melanin, which is contained in the epidermal layer of
the skin, result in skin features such as freckles or moles.
Hemoglobin, which is a protein in blood, flows within the
dermal layer and forms the appearance of blood circulation.
The variation of skin color is mainly caused by different
densities of these two pigments. Darker skin is a result of
denser concentrations of melanin, while pinkish cheeks in-
dicate a high density of hemoglobin. We use a skin color
model based on these two pigments as a constraint on esti-
mated diffuse colors as well as to effectively deal with high-
lights that cause partial saturation of measured color values,
a problem neglected in previous techniques.
A well-known ambiguity that exists in separating specu-
lar highlights from diffuse reflection occurs when the illu-
mination chromaticity is similar to the diffuse chromaticity
43213107
of the object. In such cases, it is difficult to distinguish the
two reflection components. To deal with this issue for faces,
we additionally make use of a statistically-based approxi-
mation of the facial geometry to help infer the magnitude
of diffuse reflections and thus reduce the aforementioned
ambiguity.
With this approach, our method obtains results that com-
pare favorably to state-of-the-art highlight removal tech-
niques, especially for scenes that contain different types of
light sources. A noteworthy feature of this work is that the
model of skin color is employed in a way that does not
require color calibration of the camera. We evaluate our
method on laboratory captured images that allow for quan-
titative comparisons, and on real images taken under natural
imaging conditions.
2. Related Work
In this section, we briefly review previous work on
single-image highlight removal and illumination estima-
tion.
Highlight removal from a single input image is a prob-
lem that has been studied for decades. Early approaches
aim to recover diffuse and specular colors through an analy-
sis of color histogram distributions, under the assumption of
piecewise-constant surface colors [13, 1]. This color-space
approach was later extended to also account for image-
space configurations, which has enabled handling of surface
textures that can be inpainted [32] or that have a repetitive
structure [31]. A recent approach is to first derive a pseudo
diffuse image that exhibits the same geometric profile as the
diffuse component of the input image [33, 34]. Highlights
are then removed by iteratively propagating the maximum
chromaticity of the diffuse component to neighboring pix-
els. Variants of this approach have employed a dark chan-
nel prior in generating the pseudo diffuse image [11]. A
real-time implementation has been presented based on bilat-
eral filtering [41]. In contrast to these previous techniques,
our work derives and utilizes additional constraints based
on prior knowledge for a particular object of great interest,
namely human faces. These physical and statistical con-
straints allow our method to avoid previous restrictions on
surface textures, and enable handling of partially saturated
highlight pixels, varying illumination color, and ambiguities
caused by similar illumination chromaticity and diffuse re-
flection chromaticity, which these previous methods do not
address.
Illumination estimation also has a long history in com-
puter vision. Research on this problem has primarily fo-
cused on estimating either the directional distribution of
light or the illumination color, but not both. By contrast,
both color and direction are needed in our work to take
advantage of the physical and statistical priors on human
faces.
Methods for recovering the directional distribution of
light have analyzed shading [42, 44, 38], cast shadows [25,
26, 27, 28, 22, 12], and specular reflections [21, 17] on sur-
faces with known geometry. Our method also utilizes sur-
face shape in recovering the lighting distribution, but esti-
mates unknown geometry as well as lighting color with the
help of statistical data on human faces. Reflections from
human eyes have also been used to estimate the lighting en-
vironment [20, 37], but require close-up views of an eye and
the estimates can be significantly degraded by iris textures.
For estimation of illumination color, there have been two
main approaches. One is to employ color constancy based
on prior models for surface colors [6, 8, 19, 10, 4, 2]. Most
closely related to our work are methods that utilize a com-
prehensive set of measured skin tones [4, 2]. Color con-
stancy methods, however, are unsuitable for our work be-
cause they neither recover a directional distribution nor dis-
tinguish between the color of light for diffuse reflection (i.e.
from the upper hemisphere of a surface point) and specular
highlights (i.e. from a mirror reflection angle), which is es-
sential for highlight removal.
The other approach is to estimate illumination color from
specular reflections [35, 5, 16, 30, 9] based on the dichro-
matic reflection model [29]. Under this model, the pixels
of a monochromatic surface are restricted to a dichromatic
plane in the RGB color space. To determine the point on
this plane that corresponds to the illuminant color, some of
these methods find the intersection of the dichromatic plane
with the Planckian locus [5, 16], which models the emit-
ted light color of an incandescent blackbody radiator as a
function of its temperature. While the Planckian locus is
a powerful physical constraint for light color estimation, it
requires color calibration of the camera, which is avoided in
our work to make the method more widely applicable. Our
method also employs the dichromatic model, but uses it in
conjunction with face and skin attributes that constrain the
color estimates. Unlike these techniques, our method also
recovers the directional distribution of light.
3. Reflection Model
As mentioned in the previous section, the dichromatic
reflection model [29] has been commonly used for specular
highlight separation. According to this model, the image of
an inhomogeneous dielectric object consists of two reflec-
tion components, namely diffuse and specular:
I(p) = Id(p) + Is(p), (1)
where p denotes the pixel index, and Id and Is represent
diffuse and specular reflection, respectively.
Given a distant lighting environment L that is incident on
the face, part of it is directly reflected by the skin surface,
43223108
producing specular reflection Is:
Is(p) =
∫
L
fs(p,np, ωo, ωi)L(ωi) dωi, (2)
where fs is the bidirectional reflectance distribution func-
tion (BRDF) for specular reflection, np is the surface nor-
mal of pixel p, and ωi is the incident direction of the light.
We orient the coordinate system such that the viewing di-
rection, ωo, is in direction (0, 0, 1)T and will not need to
be referenced in the remainder of the paper. The rest of
the light enters the human skin volume and exits as diffuse
reflection Id:
Id(p) = D(p)A(p), (3)
where A(p) is the diffuse albedo of skin at pixel p and D(p)is the geometry-dependent diffuse shading:
D(p) =
∫
L
fd(p,np, ωi)L(ωi) dωi (4)
which represents the interaction between lighting L and the
skin volume according to the diffuse BRDF fd.
Substituting Eq. (2), Eq. (3) and Eq. (4) into Eq. (1)
yields the following reflection model:
I(p) = A(p)
∫
L
fd(·)L(ωi) dωi +
∫
L
fs(·)L(ωi) dωi. (5)
3.1. Illumination Modeling
A typical assumption in previous specular separation
methods is that the illumination color is uniform. However,
this assumption often does not hold for real-world scenes,
since many lighting environments contain different types of
illuminants.
To handle varying illumination colors, we model the
lighting environment using spherical harmonics, which are
the analogue on the sphere to the Fourier basis on the line
or circle:
L(ωi) =∑
l,m
LlmYlm(ωi)
= LTlmYlm(ωi), (6)
where Ylm denote spherical harmonics (SH) and Llm are
the SH coefficients, with l ≥ 0 and −l ≤ m ≤ l. The SH
coefficients Llm are estimated for the R,G,B color chan-
nels separately as Llm = {Llm,R, Llm,G, Llm,B} in order
to model varying illumination colors. Uniform illumination
chromaticity is thus a special case where the SH coefficients
differ by only a scalar factor among the three color chan-
nels.
Diffuse and specular reflections are represented as inte-
grations over the environment map L in Eq. (5). To facilitate
optimization, we avoid evaluating the integrals by solving
Epidermis
Dermis
Air
Hemoglobin
Melanin
(a) (b) (c) (d)
Figure 1. An illustration of the melanin-hemoglobin based skin
model. (a) Two layered skin model. (b) A diffuse skin patch. (c)
Color and density ρm of the melanin component. (d) Color and
density ρh of the hemoglobin component.
directly for diffuse shading D(p) at each pixel p. The spec-
ular component Is(p) can be approximated by only consid-
ering its mirror reflection as:
Is(p) =
∫
L
fs(p,np, ωo, ωi)L(ωi) dωi
≈ L(ωp)ms(p), (7)
and
ωp =2np − ωo
‖2np − ωo‖, (8)
where ms is the specular coefficient, and the surface nor-
mal np of pixel p is the half-angle direction of ωp and ωo.
A single-image 3D face reconstruction algorithm [39] based
on morphable models is used to recover an approximate nor-
mal direction map N̂, from which we obtain the values of
np.
3.2. Skin color model
Some previous works [34, 41, 11] utilize a pseudo
specular-free image to estimate the diffuse chromaticity of
A(p), but not estimate A(p) itself. As we focus on specu-
lar highlight removal specifically for facial images, we can
instead deal with A(p) directly instead of its chromaticity,
with the help of prior knowledge on human skin. This dif-
ference will allow our method to accurately handle partially
saturated pixels.
Illustrated in Fig. 1, skin reflectance can be physically
represented as a two layered model consisting of an epider-
mis layer containing melanin and a dermis layer containing
hemoglobin. This model is used in [36] for skin texture syn-
thesis to achieve certain visual effects such as the appear-
ance of alcohol consumption or tanning. According to the
modified Lambert-Beer law [7] which models subsurface
scattering in layered surfaces in terms of one-dimensional
linear transport theory, the diffuse reflection of skin is:
R(p, λ) = exp{ρm(p)σ′
m(λ)lm(λ)+ρh(p)σ′
h(λ)lh(λ)}R(p, λ),(9)
where λ denotes wavelength, and R and R are the incidentspectral irradiance and reflected spectral radiance. ρm(p),
43233109
ρh(p), σ′m, σ′
h are the pigment densities and spectral cross-sections of melanin and hemoglobin, respectively. lm andlh are the mean path lengths of photons in the epidermis anddermis layers. Following the simplification used in [36], weprocess the wavelength-dependent melanin and hemoglobinscattering terms σ′
m, σ′h, lm, lh at the resolution of RGB
channels:
σ′
m(λ)lm(λ) = {σ̄′m,R l̄m,R, σ̄′
m,G l̄m,G, σ̄′m,B l̄m,B},(10)
σ′
h(λ)lh(λ) = {σ̄′h,R l̄h,R, σ̄′
h,G l̄h,G, σ̄′h,B l̄h,B}. (11)
We define the relative absorbance vectors σm, σh ofmelanin and hemoglobin as:
σm = exp{σ̄′m,R l̄m,R, σ̄′
m,G l̄m,G, σ̄′m,B l̄m,B}, (12)
σh = exp{σ̄′h,R l̄h,R, σ̄′
h,G l̄h,G, σ̄′h,B l̄h,B}. (13)
By combining Eq. (9), Eq. (12), Eq. (13) and R = AR,
skin albedo A(p) can be expressed as
A(p) = σρm(p)m σ
ρh(p)h . (14)
As reported in [36] and empirically verified in the sup-
plemental material, the relative absorbance vectors σm,
σh vary within a restricted range for typical human faces.
Based on this observation, we make the assumption that
σm, σh are the same among people, and variations in skin
color are attributed to differences in pigment densities ρmand ρh. The computation of σm and σh is performed by
independent components analysis (ICA) on a set of facial
images captured under neutral illumination. Further details
on this ICA are given in the supplemental material.
The effects of camera color filters can in general be
mixed with estimates of surface albedos and illumination
colors. In our case, the surface albedos are circumscribed
by σm and σh, which are both known and illumination-
independent properties. The effects of camera color filters
will thus be intertwined with the estimated lighting colors,
for which we make no assumptions. As a result, our method
does not require color calibration of the camera, unlike tech-
niques that use the Planckian locus as a constraint on illu-
mination [5, 16, 18].
4. Facial Specular Highlight Removal
Our objective function for removing specular highlights
under varying illumination colors is
argminLlm,ρ,D,ms
EO + λSES + λHEH + λGEG (15)
subject to ρm(p) ≥ 0, ρh(p) ≥ 0,
where we define the SH coefficients Llm of the lighting en-
vironment map L in Eq. (6), the pigment densities ρ ={ρm, ρh} in Eq. (14), the diffuse shading D in Eq. (3), and
the specular coefficient ms in Eq. (7). λS , λH , λG are regu-
larization weights for balancing the data term EO, isotropic
smoothness term ES , anisotropic smoothness term EH , and
global shading term EG. Each of these terms is presented
in the following subsections.
4.1. Data Term
The data term EO measures the difference between the
skin reflectance model of Eq. (5) and the observed input
image I:
EO(Llm,ρ,D,ms) =∑
p∈I
ems(p)‖A(p)D(p)+L(ωp)ms(p)−I(p)‖2,
(16)
where the albedo A(p) is computed according to Eq. (14)
and ωp is defined in Eq. (8). Since our method is focused on
removing specular highlights, we place greater emphasis on
pixels that contain greater specular reflection, through the
adaptive weight ems(p).
4.2. Isotropic Smoothness Term
The isotropic smoothness term ES constrains the gra-
dient of the diffuse shading D and specular coefficient
ms to be locally smooth. Similar constraints have been
used in prior work to increase the stability of highlight re-
moval [11, 32]. We define this prior as the isotropic TV-l2regularizer:
ES(D,ms) =∑
p∈I
(‖ ▽D(p)‖2 + ‖ ▽ms(p)‖2), (17)
where ▽ is the gradient operator.
4.3. Anisotropic Smoothness Term
We also regularize the pigment densities ρ to be locally
smooth while accounting for skin textures. In previous
work [41, 11], guided bilateral filtering or a TV-l1 term was
used to obtain a smooth but edge preserving estimate of dif-
fuse chromaticity. Since in our case we are solving for pig-
ment densities ρ, which are physical quantities that give rise
to diffuse chromaticity, we enforce anisotropic smoothness
on them instead:
EH(ρ) =∑
p∈I
(e−▽ρm(p)‖▽ρm(p)‖2+e−▽ρh(p)‖▽ρh(p)‖2).
(18)
4.4. Global Shading Term
An ambiguity in specular highlight separation occurs
when the illumination chromaticity is similar to the diffuse
reflection chromaticity. In such cases, which can happen
for faces, it is difficult to separate the contributions of spec-
ular and diffuse reflections. To resolve this ambiguity, we
take advantage of additional prior information about human
faces, in the form of a statistically-based approximation of
43243110
the facial geometry. This prior is used to constrain the esti-
mates of diffuse shading, which in turn determines the mag-
nitude of specular highlights.
Although the illumination distribution can be arbitrary,
the appearance of diffuse shading can be described by a
low-dimensional model. The Lambertian reflectance func-
tion acts as a low-pass filter on the lighting environment and
can be modeled as a quadratic polynomial of the surface
normal direction [23]:
D(p) = nTp Mnp, (19)
where np = (x, y, z, 1)T is the surface normal of pixel p
and M is a symmetric 4 × 4 matrix encoding the illumina-
tion distribution. According to [23], M is determined by
the first nine coefficients of Llm as
M =
⎛
⎜
⎝
c1L22 c1L2−2 c1L21 c2L11
c1L2−2 −c1L22 c1L2−1 c2L1−1
c1L21 c1L2−1 c3L20 c2L10
c2L11 c2L1−1 c2L10 c4L00 − c5L20
⎞
⎟
⎠,
(20)
c1 = 0.429043, c2 = 0.511664, c3 = 0.743125,
c4 = 0.886227, c5 = 0.247708.
Based on this, we define our global shading term as
EG(D) =∑
p∈I
‖D(p)− nTp Mnp‖2, (21)
where np is obtained using the statistically-based single-
image 3D face reconstruction algorithm of [39]. Though the
diffuse reflectance of human skin does not exactly adhere to
the Lambertian model, we nevertheless found this global
shading term to be helpful in providing good approximate
solutions, especially in cases of the aforementioned ambi-
guity.
We note that related shading constraints have been em-
ployed in various low-level vision problems, including 3D
reconstruction [43, 15] and intrinsic image decomposi-
tion [16, 14]. In [14], the space of surface normals1 is di-
vided into small bins, and pixels are assigned to them ac-
cording to their known surface orientations reconstructed
from a Kinect camera. A non-local smoothness constraint
is applied to the diffuse coefficients of pixels within the
same bin based on the assumption that similar normal di-
rections indicate similar diffuse coefficients. Our work like-
wise places non-local constraints on diffuse reflection, but
leverages statistical models of face geometry instead of re-
lying on depth sensors.
4.5. Optimization
We minimize the objective function of Eq. (15) using
an alternating optimization scheme, where the parameters
1Polar and azimuth angles (θ, φ), θ ∈ [0, 2π], φ ∈ [0, π/2].
{Llm,ρ,D,ms} are estimated sequentially while the val-
ues of the other unknowns are fixed. The optimization is
iterated until the change in the objective energy falls below
a threshold. We set the value of the regularization weights
{λS , λH , λG} to {0.1, 0.1, 0.001} in our experiments.
We initialize Llm as uniform white illumination2 and ms
using the results of [41]. D and ρ are initialized by project-
ing the input image I to the σm-σh plane in log-RGB space
(described further in the supplemental material):
⎛
⎝
logD(p)ρm(p)ρh(p)
⎞
⎠ =(
1, logσm, logσh
)−1log I(p), (22)
where 1 = (1, 1, 1)T is the shading direction in log-RGB
space.
Update ρ and D: We first estimate ρ and D while fixing
Llm and ms. To satisfy the bounds for ρ:
ρm(p) ≥ 0, ρh(p) ≥ 0, (23)
we simply express the estimation of ρ as the estimation of
ρ′m = log ρm, ρ′h = log ρh and optimize the objective func-
tion in Eq. (15) by the Gauss-Newton method.
To correctly handle pixels with saturated image values,
we omit the saturated color channels when computing the
data term EO. For the case of partially saturated pixels
with two non-saturated channels, although the observation
I is incomplete, the correct skin albedo A can still be ac-
curately estimated from the non-saturated channels based
on their intersection with the skin albedo model in Eq. (14)
(which forms a plane in log-RGB space as described in the
supplemental material) in addition to the smoothness term
EH , which propagates estimated pigment densities ρ from
non-saturated regions.
Update Llm: We then fix ρ, D, ms to update the SH ap-
proximation Llm by optimizing EO + EG. EO can be rep-
resented as a quadratic energy
EO =∑
p∈I
ems(p)‖ms(p)LTlmYlm(ωp)+A(p)D(p)−I(p)‖2,
(24)
where A(p), D(p), ms(p) and Ylm(ωp) have been fixed.
Similar to EO, the global shading term EG in Eq. (21) can
also be represented as a quadratic energy with fixed D. As
a result, this optimization problem can be solved in closed
form as
‖ALlm − I‖2 + ‖BLlm −D‖2 = 0, (25)
2Initialized spherical harmonics coefficients are zero for all values ex-
cept for L00 = 3.544881.
43253111
where A and B are two P × K matrices with P denoting
the number of face pixels in the image and K representing
the number of SH coefficients Llm.
Update ms: Thirdly, we update ms with fixed Llm, ρ and
D by solving
argminms
EO + λSES . (26)
This energy function is also optimized by the Gauss-
Newton method.
5. Results
In this section, we evaluate the proposed method. The
face region is automatically segmented by applying Grab-
cut [24] on the facial landmarks detected using an Active
Appearance Model [3]. Facial features which do not fol-
low the proposed skin model, such as teeth and eyes, are
excluded from the segmented region using the landmarks.
The results of specular highlight removal for facial im-
ages are evaluated using both laboratory captured images
for quantitative comparison and real natural images for
qualitative comparison. Results for additional images are
provided in the supplemental material, which also includes
experimentation on illumination chromaticity estimation, a
byproduct of our method.
We compare the specular highlight removal results of
our method to four existing techniques [34, 41, 11, 16].
ILD [34] uses the pseudo diffuse image to generate a
specular-free image, and then removes specular reflections
through comparisons between the specular-free image and
the input image. MDCBF [41] takes the pseudo diffuse im-
age as a guide for applying edge-preserving filters to es-
timate the maximum diffuse chromaticity. We use the re-
sults of MDCBF [41] to initialize the value of ms in our
work. DarkP [11] utilizes a dark channel prior to initial-
ize the pseudo diffuse image. FacePL [16] considers the
Planckian locus as a soft constraint together with a statisti-
cal property of skin albedo to remove specular highlights.
ILD [34], MDCBF [41], and DarkP [11] assume a cali-
brated uniform illumination color, and FacePL [16] works
under an unknown uniform illumination color.
5.1. Laboratory Images
Figure 2 presents a quantitative evaluation of specular
highlight removal where the ground truth results in Fig. 2(h)
are obtained through cross-polarization. Three different il-
lumination configurations are considered among the input
images in Fig. 2(a), namely calibrated uniform white illu-
mination, three 3000K color temperature lamps, and one
3000K lamp + two 6000K lamps.
Under the uniform white illumination, ILD [34] and
DarkP [11] do not perform well because their pseudo-
diffuse assumptions, i.e. that the diffuse chromaticity is lo-
cally uniform [34] and the dark channel of the diffuse com-
ponent is zero [11], do not hold for facial images. The re-
sults of MDCBF [41] and FacePL [16] are better but spec-
ular reflections still exist, especially in the region around
the nose; the results in (f) show significant improvement
over previous methods by considering the diffuse skin color
model, and our results in (g) exhibit further improvement
by incorporating the global shading term EG.
For the case of uncalibrated but uniform illumination in
the second input image, the results for most of the other
methods degrade in quality due to unknown illumination
chromaticity. Our method does not perform as well with-
out the global shading term because the chromaticity of a
3000K lamp is close to that of the skin, which increases the
ambiguity between the specular and diffuse components.
After considering the global shading term, our method bet-
ter estimates the illumination color, which leads to an im-
proved specular highlight separation.
For the third lighting configuration, which includes
different-color light sources, the 6000K illumination com-
ing from the front and right leads to a grayish color for
skin pixels with specular reflection. As a result, the meth-
ods that assume calibrated white illumination overestimate
the specular component because they cannot distinguish
between specular and diffuse reflections when they have
similar chromaticity. Because of our models for melanin-
hemoglobin skin color and directional illumination, the re-
sults of (f) exhibit improvement except for the highlight
from the 3000K lamp, which still exists. This highlight is
removed in (g) by utilizing the global shading term which
takes advantage of diffuse shading estimates of other face
pixels. The estimated illumination environments for the last
two lighting configurations and the corresponding ground
truth environment maps captured using a mirrored sphere
(and expressed as 4th-order SH) are presented for a qualita-
tive comparison in the upper-right corners of corresponding
specular highlight components.
5.2. Natural Images
Figure 3 displays several qualitative comparisons to
[34, 41, 11, 16] on facial images captured under different
lighting environments. We note that it is generally imprac-
tical to capture accurate ground truth by cross-polarization
in natural scenes due to the number and spatial extent of
light sources (e.g., outdoor light from windows). Results
for additional images are provided in the supplemental ma-
terial. The input images in the first and second rows are
captured under a uniform calibrated illumination color; the
images in the third and fourth rows are captured with a uni-
form but uncalibrated illumination color; the images in the
fifth and sixth rows are examples with directionally varying
illumination color.
43263112
(a) Input (b) ILD [34] (c) MDCBF [41] (d) DarkP [11] (e) FacePL [16] (f) Ours w/o EG (g) Ours (h) Ground truth
5.59 3.82 6.81 3.45 2.54 2.00
7.96 4.85 2.82 4.07 3.93 1.98
9.38 3.87 10.4 6.02 2.96 2.31Figure 2. Quantitative evaluation of specular highlight removal. (a) Input images. (b-h) Separated diffuse and specular components by (b)
ILD [34], (c) MDCBF [41], (d) DarkP [11], (e) FacePL [16], (f) our method without global shading term EG, (g) our method, and (h)
ground truth from cross-polarization. The illumination for the three input images are respectively: calibrated uniform white; three lamps
with color temperature of 3000K; one 3000K lamp and two 6000K lamps. In the upper-right corners of the separated specular components,
we show for qualitative comparison the corresponding illumination estimates (in terms of 4th order SH). The RMSE of the separated
specular components (shown rescaled by 100 for better viewing) is given in the lower-right corners.
One of the advantages of our approach is that it can
more accurately handle partially saturated pixels, such as
those around the nose and forehead in the first image, com-
pared to the other methods. Aside from the partially satu-
rated region, our approach also outperforms the others even
under calibrated illumination as shown for the second im-
age. Although facial hair is not modeled by our skin model,
our method still performs adequately in those regions be-
cause the facial hair contains little specular reflection and
has low pixel intensity. Due to the lack of illumination cal-
ibration, the diffuse results of the previous methods on the
third and fourth images contain much specular reflection.
Note that because of the utilization of global shading term
EG, our method obtains a good result even though the il-
lumination chromaticity of the third image is similar to the
skin color. Since the dark channel prior in DarkP [11] does
not hold in most facial images, the method generally over-
estimates specular reflections except for the fourth image,
whose deeply reddish illumination leads to the blue chan-
nel becoming nearly zero, thus satisfying the dark channel
prior. The fifth and sixth images are captured under varying
illumination colors, which is problematic for previous tech-
niques as shown in the results. Note that our method suc-
cessfully removes the specular highlights on the right side
of the forehead in the sixth image, which is caused by illu-
mination with chromaticity similar to the skin, thanks to the
global shading constraints.
43273113
(a) Input (b) ILD [34] (c) MDCBF [41] (d) DarkP [11] (e) FacePL [16] (f) Ours
Figure 3. Specular highlight removal results for natural lighting environments. (a) Input images. (b/c/d/e/f) Diffuse images estimated by
(b) ILD [34], (c) MDCBF [41], (d) DarkP [11], (e) FacePL [16], and (f) our method.
6. Conclusion
We presented a method for removing specular highlight
reflections in facial images that may contain varying illumi-
nation colors. By jointly separating the specular highlights
and estimating the illumination environment while utiliz-
ing physical and statistical priors on human faces, our ap-
proach demonstrates appreciable improvements over previ-
ous state-of-art methods, especially for handling partially
saturated pixels, varying illumination colors, and ambigu-
ities caused by the similarity between illumination chro-
maticity and diffuse chromaticity.
Acknowledgements
This work was done while Chen Li was an intern at Mi-
crosoft Research. The authors thank all of the models for
appearing in our test images. Kun Zhou is partially sup-
ported by the National Key Research & Development Plan
of China (No. 2016YFB1001403) and NSFC (U1609215).
43283114
References
[1] R. Bajcsy, S. Lee, and A. Leonardis. Detection of dif-
fuse and specular interface reflections and inter-reflections
by color image segmentation. Int. Journal of Computer Vi-
sion, 17(3):241–272, 1996.
[2] S. Bianco and R. Schettini. Adaptive color constancy using
faces. IEEE Trans. Patt. Anal. and Mach. Intel., 36(8):1505–
1518, 2014.
[3] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active ap-
pearance models. IEEE Trans. Pattern Anal. Mach. Intell.,
23(6):681–685, June 2001.
[4] S. Crichton, J. Pichat, M. Mackiewicz, G. Y. Tian, and A. C.
Hurlbert. Skin chromaticity gamuts for illumination recov-
ery. In 6th European Conf. on Colour in Graph., Imaging,
and Vision, CGIV 2012, Amsterdam, the Netherlands, May
6-9, 2012, pages 266–271, 2012.
[5] G. D. Finlayson and G. Schaefer. Solving for colour con-
stancy using a constrained dichromatic reflection model. Int.
Journal of Computer Vision, 42:127–144, 2001.
[6] D. Forsyth. A novel algorithm for colour constancy. Int.
Journal of Computer Vision, 45(1):5–36, 1990.
[7] P. Hanrahan and W. Krueger. Reflection from layered sur-
faces due to subsurface scattering. ACM Trans. Graph.,
1993.
[8] T. Hansen, M. Olkkonen, S. Walter, and K. R. Gegenfurtner.
Memory modulates color appearance. Nature Neuroscience,
9(11):1367–1368, 2006.
[9] Y. Imai, Y. Kato, H. Kadoi, T. Horiuchi, and S. Tomi-
naga. Estimation of multiple illuminants based on specu-
lar highlight detection. In R. Schettini, S. Tominaga, and
A. Trameau, editors, Computational Color Imaging, volume
6626, pages 85–98. 2011.
[10] H. R. V. Joze and M. S. Drew. Exemplar-based color con-
stancy and multiple illumination. IEEE Trans. Patt. Anal.
and Mach. Intel., 36(5):860–873, May 2014.
[11] H. Kim, H. Jin, S. Hadap, and I. Kweon. Specular reflection
separation using dark channel prior. In CVPR, pages 1460–
1467, 2013.
[12] T. Kim and K. Hong. A practical single image based ap-
proach for estimating illumination distribution from shad-
ows. In Proc. Intl. Conf. on Computer Vision, pages I:266–
271, 2005.
[13] G. Klinker, S. Shafer, and T. Kanade. The measurement of
highlights in color images. Int. Journal of Computer Vision,
2(1):7–32, 1988.
[14] K. J. Lee, Q. Zhao, X. Tong, M. Gong, S. Izadi, S. U.
Lee, P. Tan, and S. Lin. Estimation of intrinsic image se-
quences from image+depth video. In Proceedings of
the 12th European Conference on Computer Vision - Vol-
ume Part VI, ECCV’12, pages 327–340, Berlin, Heidelberg,
2012. Springer-Verlag.
[15] C. Li, S. Su, Y. Matsushita, K. Zhou, and S. Lin. Bayesian
depth-from-defocus with shading constraints. Image Pro-
cessing, IEEE Transactions on, 25(2):589–600, Feb 2016.
[16] C. Li, K. Zhou, and S. Lin. Intrinsic face image decompo-
sition with human face priors. In ECCV (5)’14, pages 218–
233, 2014.
[17] Y. Li, S. Lin, H. Lu, and H.-Y. Shum. Multiple-cue illumi-
nation estimation in textured scenes. In Proc. Intl. Conf. on
Computer Vision, pages 1366–1373, 2003.
[18] B. Mazin, J. Delon, and Y. Gousseau. Estimation of illumi-
nants from projections on the planckian locus. IEEE Trans.
Image Process., 24(6):1944–1955, June 2015.
[19] A. Moreno, B. Fernando, B. Kani, S. Saha, and S. Karaoglu.
Color correction: A novel weighted von kries model based
on memory colors. In Computational Color Imaging, vol-
ume 6626 of Lecture Notes in Computer Science, pages 165–
175. 2011.
[20] K. Nishino, P. Belhumeur, and S. Nayar. Using eye re-
flections for face recognition under varying illumination.
In Proc. Intl. Conf. on Computer Vision, pages I:519–526,
2005.
[21] K. Nishino, Z. Zhang, and K. Ikeuchi. Determining
reflectance parameters and illumination distribution from
sparse set of images for viewdependent image synthesis. In
Proc. Intl. Conf. on Computer Vision, pages 599–606, 2001.
[22] T. Okabe, I. Sato, and Y. Sato. Spherical harmonics vs. haar
wavelets: Basis for recovering illumination from cast shad-
ows. In Proc. IEEE Conf. on Computer Vision and Pattern
Recognition, pages I:50–57, 2004.
[23] R. Ramamoorthi and P. Hanrahan. An efficient represen-
tation for irradiance environment maps. In Proceedings of
the 28th Annual Conference on Computer Graphics and In-
teractive Techniques, SIGGRAPH ’01, pages 497–500, New
York, NY, USA, 2001. ACM.
[24] C. Rother, V. Kolmogorov, and A. Blake. ”grabcut”: Inter-
active foreground extraction using iterated graph cuts. ACM
Trans. Graph., 23(3):309–314, Aug. 2004.
[25] I. Sato, Y. Sato, and K. Ikeuchi. Illumination distribution
from brightness in shadows: Adaptive estimation of illumi-
nation distribution with unknown reflectance properties in
shadow regions. In Proc. Intl. Conf. on Computer Vision,
pages 875–883, 1999.
[26] I. Sato, Y. Sato, and K. Ikeuchi. Illumination distribution
from shadows. In Proc. IEEE Conf. on Computer Vision and
Pattern Recognition, pages 306–312, 1999.
[27] I. Sato, Y. Sato, and K. Ikeuchi. Stability issues in recover-
ing illumination distribution from brightness in shadows. In
Proc. IEEE Conf. on Computer Vision and Pattern Recogni-
tion, pages II:400–407, 2001.
[28] I. Sato, Y. Sato, and K. Ikeuchi. Illumination from shadows.
IEEE Trans. on Pattern Analysis and Machine Intelligence,
25:290–300, 2003.
[29] S. A. Shafer. Using color to separate reflection components.
Color Res. Appl, 10:210–218, 1985.
[30] M. Storring, E. Granum, and H. J. Andersen. Estimation of
the illuminant colour using highlights from human skin. In In
Int. Conference on Color in Graphics and Image Processing,
pages 45–50, 2000.
[31] P. Tan, S. Lin, and L. Quan. Separation of highlight reflec-
tions on textured surfaces. In CVPR, volume 2, pages 1855–
1860, 2006.
[32] P. Tan, S. Lin, L. Quan, and H.-Y. Shum. Highlight removal
by illumination-constrained inpainting. In ICCV, 2003.
43293115
[33] R. Tan and K. Ikeuchi. Reflection components decompo-
sition of textured surfaces using linear basis functions. In
CVPR, volume 1, pages 125–131 vol. 1, 2005.
[34] R. Tan and K. Ikeuchi. Separating reflection components of
textured surfaces using a single image. IEEE Trans. Patt.
Anal. and Mach. Intel., 27(2):178–193, 2005.
[35] R. T. Tan, K. Nishino, and K. Ikeuchi. Color constancy
through inverse-intensity chromaticity space. J. Opt. Soc.
Am. A, 21(3):321–334, Mar 2004.
[36] N. Tsumura, N. Ojima, K. Sato, M. Shiraishi, H. Shimizu,
H. Nabeshima, S. Akazaki, K. Hori, and Y. Miyake. Image-
based skin color and texture analysis/synthesis by extract-
ing hemoglobin and melanin information in the skin. ACM
Trans. Graph., 22(3):770–779, July 2003.
[37] H. Wang, S. Lin, X. Liu, and S. B. Kang. Separating re-
flections in human iris images for illumination estimation.
In Proc. Intl. Conf. on Computer Vision, pages 1691–1698,
2005.
[38] Y. Wang and D. Samaras. Estimation of multiple illuminants
from a single image of arbitrary known geometry. In Proc.
European Conf. on Computer Vision, LNCS 2352, pages
272–288, 2002.
[39] F. Yang, J. Wang, E. Shechtman, L. Bourdev, and
D. Metaxas. Expression flow for 3d-aware face component
transfer. ACM Trans. Graph., 30(4):60:1–60:10, July 2011.
[40] J. Yang, L. Liu, and S. Z. Li. Separating specular and diffuse
reflection components in the hsi color space. In Computer
Vision Workshops (ICCVW), 2013 IEEE International Con-
ference on, pages 891–898, Dec 2013.
[41] Q. Yang, S. Wang, and N. Ahuja. Real-time specular high-
light removal using bilateral filtering. In ECCV, volume
6314, pages 87–100. 2010.
[42] Y. Yang and A. L. Yuille. Sources from shading. In Proc.
IEEE Conf. on Computer Vision and Pattern Recognition,
pages 534–539, 1991.
[43] L.-F. Yu, S.-K. Yeung, Y.-W. Tai, and S. Lin. Shading-based
shape refinement of rgb-d images. In IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2013.
[44] Y. Zhang and Y.-H. Yang. Multiple illuminant direction de-
tection with application to image synthesis. IEEE Trans.
on Pattern Analysis and Machine Intelligence, 23:915–920,
2001.
43303116