High Dynamic Range Imaging by Perceptual Logarithmic ... · by Perceptual Logarithmic Exposure...

arX

iv:1

411.

0326

v2 [

cs.C

V]

3 Ju

n 20

15

High Dynamic Range Imagingby Perceptual Logarithmic Exposure Merging

Corneliu Florea [email protected]

Image Processing and Analysis LaboratoryUniversity ”Politehnica” of Bucharest, Romania, Address Splaiul Independentei 313

Constantin Vertan [email protected]


Laura Florea LAURA [email protected]


Abstract

In this paper we emphasize a similarity betweenthe Logarithmic-Type Image Processing (LTIP)model and the Naka-Rushton model of the Hu-man Visual System (HVS). LTIP is a deriva-tion of the Logarithmic Image Processing (LIP),which further replaces the logarithmic functionwith a ratio of polynomial functions. Basedon this similarity, we show that it is possi-ble to present an unifying framework for theHigh Dynamic Range (HDR) imaging problem,namely that performing exposure merging underthe LTIP model is equivalent to standard irradi-ance map fusion. The resulting HDR algorithmis shown to provide high quality in both subjec-tive and objective evaluations.

1. Introduction

Motivated by the limitation of digital cameras in captur-ing real scenes with large lightness dynamic range, a cate-gory of image acquisition and processing techniques, col-lectively named High Dynamic Range (HDR) Imaging,gained popularity. To acquire HDR scenes, consecutiveframes with different exposures are typically acquired andcombined into a HDR image that is viewable on regulardisplays and printers.

In parallel, Logarithmic Image Processing (LIP) models

were introduced as an alternative to image processingwith real based operations. While initially modelled fromthe cascade of two transmitting filters (Jourlin and Pinoli,1987), later it was shown that the LIP models can be gen-erated by the homomorphic theory and they have a conespace structure (Deng et al., 1995). The initial model wasshown to be compatible with the Weber-Fechner perceptionlaw (Pinoli and Debayle, 2007), which is not unanimouslyaccepted (Stevens, 1961). Currently, most global HumanVisual System (HVS) models are extracted from the Naka-Rushton equation of photoreceptor absorption of incidentenergy and are followed by further modelling of the lo-cal adaptation. We will show in this paper that the newLIP extension model introduced in (Vertan et al., 2008) isconsistent with the global human perception as describedby the Naka-Rushton model. The model no longer uses alogarithmic generative function but only a logarithmic-likefunction, hence it will be named Logarithmic Type ImageProcessing (LTIP) model. In such a case, the generativefunction of the LTIP model transfers the radiometric energydomain into human eye compatible image domain; thus itmimics, by itself and by its inverse, both the camera re-sponse function and the human eye lightness perception.

The current paper claims three contributions. Firstly, weshow that the previously introduced LTIP model is com-patible with Naka-Rushton/Michaelis-Menten model of theeye global perception. Secondly, based on the previousfinding, we show that it is possible to treat two contrast-ing HDR approaches unitary if the LTIP model frameworkis assumed. Thirdly, the reinterpretation of the exposuremerging algorithm (Mertens et al., 2007) under the LTIPmodel produces a new algorithm that leads to qualitativeresults.

http://arxiv.org/abs/1411.0326v2

HDR by Perceptual Merging

The paper is constructed as follows: in Section2we presenta short overview of the existing HDR trends and we em-phasize their correspondence with human perception. InSection3, state of the art results in the LIP framework andthe usage of the newly introduced LTIP framework for thegeneration of models compatible with human perceptionare discussed. In Section4 we derive and motivate the sub-mitted HDR imaging, such that in Section5 we discuss im-plementation details and achieved results, ending the paperwith discussion and conclusions.

2. Related work

The typical acquisition of aHigh Dynamic Range imagerelies on the “Wyckoff Principle”, that is differently ex-posed images of the same scene capture different informa-tion due to the differences in exposure (Mann and Picard,1995). Bracketing techniques are used in practice to ac-quire pictures of the same subject but with consecutive ex-posure values. These pictures are, then, fused to create theHDR image.

For the fusion step two directions are envisaged. Thefirst direction, namedirradiance fusion, acknowledges thatthe camera recorded frames are non-linearly related to thescene reflectance and, thus, it relies on the irradiance mapsretrieval from the acquired frames, by inverting the cameraresponse function (CRF), followed by fusion in the irradi-ance domain. The fused irradiance map is compressed viaa tone mapping operator (TMO) into a displayable low dy-namic range (LDR) image. The second direction, calledexposure fusion, aims at simplicity and directly combinesthe acquired frames into the final image. A short compari-son between these two is presented in Table1 and detailedin further paragraphs.

2.1. Irradiance fusion

Originating in the work of Debevec and Malik (1997), theschematic of the irradiance fusion may be followed in Fig.1 (a). Many approaches were schemed for determining theCRF (Grossberg and Nayar, 2004). We note that the dom-inant shape is that of a gamma function (Mann and Mann,2001), trait required by the compatibility with the HVS.

After reverting the CRF, the irradiance maps are combined,typically by a convex combination (Debevec and Malik,1997), (Robertson et al., 1999). For proper displaying, atone mapping operator (TMO) is then applied on the HDRirradiance map to ensure that in the compression processall image details are preserved. For this last step, follow-ing Ward’s proposal (Ward et al., 1997), typical approachesadopt a HVS-inspired function for domain compression,followed by local contrast enhancement. For a survey ofthe TMOs we refer to the paper of Ferradans et al. (2012)

and to the book by Banterle et al.(2011).

Among other TMO attempts, a notable one was proposedby Reinhard et al. (2002) which, inspired by Ansel Adams’Zone System, firstly applied a logarithmic scaling to mimicthe exposure setting of the camera, followed by dodging-and-burning (selectively and artificially increase and de-crease image values for better contrast) for the actual com-pression. Durand et al. (2002) separated, by meansof a bilateral filter, the HDR irradiance map into a baselayer that encoded large scale variations (thus, needingrange compression) and into a detail preserving layer toform an approximation of the image pyramid. Fattal etal. (2002) attenuated the magnitude of large gradientsbased on a Poisson equation. Drago et al. (2003) imple-mented a logarithmic compression of luminance values thatmatches the HVS. Krawczyk et al. (2005) implementedthe Gestalt based anchoring theory of Gilchrist Gilchristet al. (1999) to divide the image in frameworks and per-formed range compression by ensuring that frameworks arewell-preserved. Banterle et al. (2012) segmented the im-age into luminance components and applied independentlythe TMOs introduced in Drago et al. (Drago et al., 2003)and in Reinhard et al. (Reinhard et al., 2005) for furtheradaptive fusion based on previously found areas. Ferradanset al. (2012) proposed an elaborated model of the globalHVS response and pursued local adaptation with an itera-tive variational algorithm.

Yet, as irradiance maps are altered with respect to the re-ality by the camera optical systems, additional constraintsare required for a perfect match with the HVS. Hence, thiscategory of methods, while being theoretically closer tothe pure perceptual approach, requires supplementary andcostly constraints and significant computational resourcesfor the CRF estimation and for the TMO implementation.

2.2. Exposure merging

Noting the high computational cost of the irradiance mapsfusion, Mertens et al. (2007) proposed to implement thefusion directly in the image domain; this approach is de-scribed in Fig.1 (b). The method was further improved forrobustness to ghosting artifacts and details preservationinHDR composition by Pece and Kautz (2010).

Other developments addressed the method of computinglocal contrast to preserve edges and local high dynamicrange. Another expansion has been introduced by Zhanget al. (2012), who used the direction of gradient in a par-tial derivatives type of framework and two local qualitymeasures to achieve local optimality in the fusion process.Bruce (2014) replaced the contrast computed by Mertens etal. (2007) onto a Laplacian pyramid with the entropy calcu-lated in a flat circular neighborhood for deducing weightsthat maximize the local contrast.


Table 1.Comparison of the two main approaches to the HDR problem.Method CRF recovery Fused Components Fusion Method Perceptual

Irradiance fusion(Debevec and Malik, 1997)

Yes Irradiance MapsWeighted

convex combinationYes (via TMO)

Exposure fusion(Mertens et al., 2007)

No Acquired FramesWeighted

convex combinationNo

(a) (b)

Figure 1. HDR imaging techniques: (a) irradiance maps fusion as described in Debevec and Malik (1997) and (b) exposure fusion asdescribed in Mertens et al.(2007). Irradiance map fusion relies on inverting the Camera Response Function (CRF) in order to return tothe irradiance domain, while the exposure fusion works directly in the image domain, and thus avoiding the CRF reversal.

The exposure fusion method is the inspiration source formany commercial applications. Yet, in such cases, expo-sure fusion is followed by further processing that increasesthe visual impact of the final image. The post–processingincludes contrasting, dodging-and-burning, edge sharp-ening, all merged and tuned in order to produce asurreal/fantasy-like aspect of the final image.

While being sensibly faster, the exposure fusion is notphysically motivated, nor perceptually inspired. However,while the academic world tends to favor perceptual ap-proaches as they lead to images that are correct from a per-ceptual point of view, the consumer world naturally tendsto favor images that are photographically more spectacularand the exposure merging solution pursuits this direction.

3. Logarithmic Type Image Processing

Typically, image processing operations are performed us-ing real-based algebra, which proves its limitations underspecific circumstances, like upper range overflow. To dealwith such situations, non-linear techniques have been de-veloped (Markovic and Jukic, 2013). Such examples arethe LIP models. The first LIP model was constructed byJourlin and Pinoli (Jourlin and Pinoli, 1987) starting fromthe equation of light passing through transmitting filters.

The LIP model was further developed into a robust math-ematical structure, namely a cone/vector space. Sub-sequently many practical applications have been pre-sented and an extensive review of advances and ap-

plications for the classical LIP model is presented in(Pinoli and Debayle, 2007). In parallel, other logarith-mic models and logarithmic-like models were reported.In this particular work we are mainly interested in thelogarithmic-like model introduced by Vertan et al. (2008),which has a cone space structure and is named LTIPmodel. A summary of existing models may be followedin (Navarro et al., 2013). Recently parametric extensionsof the LTIP models were also introduced (Panetta et al.,2011), (Florea and Florea, 2013). The LTIP models aresummarized in Table2.

3.1. Relation between LIP models and HVS

From its introduction in the 80s, the original LIP model hada strong argument being similar with the Weber-Fechnerlaw of contrast perception. This similarity was thor-oughly discussed in (Pinoli and Debayle, 2007), where itwas shown that logarithmic subtraction models the incre-ment of sensation caused by the increment of light withthe quantity existing in the subtraction. Yet the logarithmicmodel of the global perceived luminance contrast systemassumed by the Weber-Fechner model was vigorously chal-lenged (Stevens, 1961) and arguments hinted to the power-law rules (Stevens and Stevens, 1963). Thus, we note thatthe Stevens model is more inline with the LTIP model. Onthe other hand, Stevens experiments were also questioned(Macmillan and Creelman, 2005), so it does not seem to bea definite answer in this regard.

Still, lately, the evidence seems to favor the Naka-


Table 2.The classical LIP model introduced by Jourlin and Pinolli, the logarithmic type (LTIP) model with the basic operations and theparametric extension of the LTIP model.D is the upper bound of the image definition set (typicallyD = 255 for unsigned intrepresentation orD = 1 for float image representation).

Model Domain IsomorphismAdditionu⊕ v

Scalarmultiplication

α⊗ u

LIP Dφ = (−∞;D] Φ(x) = −D log DD−x

u+ v + uvD

D −D(

1− uD

)α

LTIP Dφ = [0; 1) Φ(x) = x1−x

1− (1−u)(1−v)1−uv

αu1+(α−1)u

ParametricLTIP

Dφ = [0; 1) Φm(x) = xm

1−xm

m

√

1− (1−um)(1−vm)1−umvm

u m

√

α1+(α−1)um

Rushton/Michaelis-Menten model of retinal adaptation(Ferradans et al., 2012), thus an important class of TMOtechniques following this model for the global adapta-tion step. The Naka-Rushton equation is a particular caseof the Michaelis-Menten model that expresses the hyper-bolic relationship between the initial velocity and the sub-strate concentration in a number of enzyme-catalyzed re-actions. Such a process is the change of the electricpotential of a photoreceptor (e.g. the eye cones) mem-brane, r(I) due to the absorption of light of intensityI. The generic form, called Michaelis-Menten equation(Valeton and van Norren, 1983) is:

r(I) =∆V (I)

∆Vmax=

In

In + InS

(1)

where∆Vmax is the maximum difference of potential thatcan be generated,In

S is the light level at which the photore-ceptor response is half maximal (semisaturation level) andn is a constant. Valeton and van Norren (1983) determinedthat n = 0.74 for rhesus monkey. Ifn = 1 the Naka-Rushton equation (Naka and Rushton, 1966) is retrieved asa particular case of the Michaelis-Menten model:

r(I) =∆V (I)

∆Vmax=

I

I + IS(2)

For the TMO application, it is considered that the electricvoltage in the right is a good approximation of the per-ceived brightness (Ferradans et al., 2012). Also, it is notuncommon (Meylan et al., 2007) to depart from the initialmeaning of semisaturation forIS (the average light reach-ing the light field) and to replace it with a convenient cho-sen constant. TMOs that aimed to mimic the Naka-Rushtonmodel (Reinhard et al., 2002), (Tamburino et al., 2008) as-sumed that the HDR map input wasI and obtained theoutput asr(I).

On the other hand, the generative function of the LIP modelmaps the image domain onto the real number set. The in-verse function acts as a homomorphism between the realnumber set and the closed space that defines the domainof LIP. For the LTIP model, the generative function is

ΦV (x) =x

1−xwhile the inverse is:

Φ−1V (y) =

y

y + 1(3)

The inverse function (Eq. (3) ) mimics the Naka-Rushtonmodel –Eq. (2), with the difference that instead of the semi-saturation,IS , as in the original model, it uses full satura-tion. Given this observation, we interpreted logarithmic-like model as the mapping of the irradiance intensity(which are defined over the real number set) onto photore-ceptor acquired intensities, i.e human observable chromaticintensity. While the logarithmic-like model is only similarand not identical with the Naka-Rushton model of the hu-man eye, it has the strong advantage of creating a rigorousmathematical framework of a cone space.

3.2. Relation between the LIP models and CRF

The dominant non-linear transformation in the camerapipe-line is the gamma adjustment necessary to adapt theimage to the non-linearity of the display and respectivelyof the human eye. The entire pipeline is described by theCamera Response Function (CRF) which, typically, has agamma shape (Grossberg and Nayar, 2004).

It was previously pointed to the similarity between theLTIP generative function and the CRF (Florea and Florea,2013). To show the actual relation between the LTIP gener-ative function and the CRF, we considered the Database ofResponse Functions (DoRF) (Grossberg and Nayar, 2004)which consists of 200 recorded response functions of dig-ital still cameras and analogue photographic films. Thesefunctions are shown in Fig.2 (a); to emphasize the rela-tion, in subplot (b) of the same figure we represented onlythe LTIP generative function and the average CRF. As onemay see, while the LTIP generative function is not identi-cal to the average CRF, there do exist camera and films thathave a response function identical to the LTIP generativefunction.

To improve the contrast and the overall appearance of theimage, some camera models add an S-shaped tone mapping


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Irradiance

Inte

nsi

ty

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Irradiance

Inte

nsi

ty

average camera CRFLTIP generator

(a) (b)

Figure 2.The relation between the LTIP generative functionΦV

and camera response functions as recorded in DoRF database:(a)full database and (b) average CRF (black) with respect to theLTIPfunction (green).

that no longer follows the Naka-Rushton model. In such acase, a symmetrical LTIP model, such as the one describedin (Navarro et al. 2013) has greater potential to lead tobetter results.

4. HDR by Perceptual Exposure Merging

Once that camera response function,g, has been found, theacquired imagesfi are turned into irradiance maps,Ei by(Debevec and Malik, 1997):

Ei(k, l) =g−1 (fi(k, l))

∆t(4)

where∆t is the exposure time and(k, l) is the pixel lo-cation. Further, the HDR irradiance map is calculatedas the weighted sum of the acquired irradiance maps(Debevec and Malik, 1997), (Robertson et al., 1999):

EHDR(k, l) =

∑Ni=1 w (fi(k, l)) ·Ei(k, l)∑N

i=1 w (fi(k, l))(5)

wherew (fi(k, l)) are weights depending on the chosen al-gorithm andN is the number of frames.

However, we stress that the weights are scalars with respectto image values. This means that their sum is also a scalarand we denote it by:

η =

N∑

i=1

w (fi(k, l)) (6)

Taking into account that the CRF may be approximatedby the LTIP generative functiong and, also, that the finalimage was achieved by a tone mapping operator from theHDR irradiance map, we may write that:

fHDR(k, l) = g (EHDR(k, l)) (7)

If one expands the HDR irradiance map using Eq. (5), he

will obtain:

fHDR(k, l) = g(

1η

∑N

i=1 w (fi(k, l)) ·Ei(k, l))

= 1η⊗ g

(

∑Ni=1 w (fi(k, l)) ·Ei(k, l)

)

= 1η⊗

(

⊕∑N

i=1 g (w (fi(k, l)) · Ei(k, l)))

= 1η⊗

(

⊕∑N

i=1 (w (f(k, l)))⊗ g (Ei(k, l)))

= 1η⊗

(

⊕∑N

i=1 (w (fi(k, l)))⊗ fi(k, l))

(8)

where⊗ and⊕ are the LTIP operations shown in Table2,

while(

⊕∑N

i=1 ui

)

stands for:

(

⊕

N∑

i=1

ui

)

= u1 ⊕ u2 ⊕ · · · ⊕ uN

Eq. (8) shows that one may avoid the conversion of theinput images to irradiance maps, as the HDR image may besimply computed using additions and scalar multiplicationsin the logarithmic domain. Furthermore, we accentuate thatEq. (8), if written with real-based operations, matches theexposure fusion introduced in (Mertens et al., 2007); yetwe have started our calculus based on the irradiance mapsfusion. Thus, the use of LTIP operations creates a unifyingframework for both approaches. In parallel, it adds partialmotivation, by compatibility with HVS, for the exposurefusion variant. The motivation is only partial as the LTIPmodel follows only the global HVS transfer function andnot the local adaptation.

The weights,w (f(k, l)), should complement the globaltone mapping by performing local adaptation. In(Mann and Mann, 2001) these weights are determined byderivation of the CRF, while in (Mertens et al., 2007) theyare extracted as to properly encode contrast, saturation andwell-exposedness. More precisely:

• contrast wC is determined by considering the re-sponse of Laplacian operators; this is a measure ofthe local contrast which exists in human perception ascenter-surround ganglion field organization.

• saturationwS is computed as the standard deviationof the R, G and B values, at each pixel location. Thiscomponent favors photographic effects since the nor-mal consumers are more attracted to vivid images andhas no straight-forward correspondence to human per-ception.

• well-exposednesswe is computed by giving smallweights to values in the mid-range and large weightsto outliers favoring the glistening aspect of consumerapproaches. More precisely, one assumes that a per-fect image is modelled by a Gaussian histogram with


µ mean andσ2 variance, and the weight of each pixelis the back–projected probability of its intensity giventhe named Gaussian.

We will assume the same procedure of computingthe weights with some small adjustments: while in(Mertens et al., 2007) for well-exposedness both outlierswere weighted symmetrically, we favor darker tones tocompensate the tendency of LIP models to favor brighttones, caused by their closing property. Details about theprecise implementation parameters values will be providedin Section6, subsect. 1.

5. Implementation and evaluation procedure

Implementation We have implemented the HDR algo-rithm described mainly by the equation (8) within the LTIPmodel and weights similar to the procedure described in(Mertens et al., 2007) in Matlab. The actual values ofthe weights are standard for contrastwC = 1, saturationwS = 1, but differs for well-exposedness, where the midrange (the parameter of the Gaussian distribution modellingit) areµ = 0.37 andσ2 = 0.2. The choices are based onmaximizing the objective metrics and will be further ex-plained in Sections6.1 and6.3.

An example of the achieved extended dynamic range imagemay be seen in Fig.3.

The common practice is to evaluate HDR methods usingfew publicly available images. We adopted the same princi-ple, using more extensive public imagery data, such as theones from (Cadık et al., 2008), OpenCV examples libraryand from (Drago et al., 2003). We have evaluated the pro-posed algorithm on a database containing 22 sets of HDRframes acquired from various Internet sources, being con-strained by the fact that the proposed method requires alsothe original frames and not only the HDR image. We madepublic1 the full results and the code to obtain them so toencourage other people to further test it.

Evaluation The problem of evaluating HDR images isstill open as HDR techniques includes two categories: ir-radiance map fusion which aims at correctness and expo-sure fusion which aims at pleasantness. As mentioned inSection2.2, the irradiance map is physically supported andtypical evaluation is performed with objective metrics thatare inspired from human perception. Thus the evaluationwith such objective metrics will show how realistic is onemethod (i.e. how closely is the produced image to the hu-man perception of the scene).

On the other hand, the exposure fusion methods inspired

1The code and supplementary results are available athttp://imag.pub.ro/common/staff/cflorea/LIP

from (Mertens et al., 2007) are much simpler, and produceresults without physically motivation, but which are visu-ally pleasant for the average user; consumer applicationsfurther process these image to enhance the surreal effectswhich is deemed, but fake. Thus, the subjective evaluationand no-reference objective metrics that evaluate the over-all appearance will positively appreciate such images, al-though they are not a realistic reproduction of the scene.

Thus, to have a complete understanding of a method per-formance, we will evaluate the achieved results with twocategories of methods: subjective evaluation and evalua-tion based on objective metrics.

5.1. Objective Evaluation

While not unanimously accepted, several metrics were cre-ated for the evaluation of TMOs in particular and HDRimages in general. Here, we will refer to the metrics in-troduced in (Aydin et al., 2008) and respectively the morerecent one from (Yeganeh and Wang, 2013).

The metric from (Aydin et al., 2008), entitled “DynamicRange (In)dependent Image Metrics” (DRIM) uses a spe-cific model of the HVS to construct a virtual low dynamicrange (LDR) image from the HDR reference and comparesthe contrast of the subject LDR image to the virtual one.In fact, the HVS model and the comparison can be mergedtogether, so that the matching is between the subject LDRimage and the HDR reference, skipping the virtual LDRimage. The comparison takes into consideration three cat-egories: artificial amplification of contrast, artificial lossof contrast and reversal of contrast. The metric points topixels that are different from their standard perception ac-cording to the authors aforethought HVS modelling and atypical monitor setting (γ = 2.2, 30 pixels per degree andviewing distance of 0.5m). For each test image we nor-malized the error image by the original image size (as de-scribed in (Ferradans et al., 2012)). The metric only as-signs one type of error (the predominant one) and has twoshortcomings: it heavily penalizes global amplification er-ror (which is not so disturbing from a subjective point ofview) and it merely penalizes artifacts (such as areas withcompletely wrong luminance), which, for a normal viewer,are extremely annoying. Thus the metric, in fact, assigns adegree ofperceptualness(in the sense of how close is thatmethod to the human contrast transfer function) to a certainHDR method.

A more robust procedure for evaluation was proposed in(Yeganeh and Wang, 2013) where, in fact, three scores areintroduced:

• Structural fidelity, S, which uses the structural sim-ilarity image metric (SSIM) (Wang et al., 2004) toestablish differences between the LDR image and

http://imag.pub.ro/common/staff/cflorea/LIP


(a) (b) (c) (d) (e) (f) HDR

Figure 3.Example of HDR imaging: initial, differently exposed frames (a-e)) and the HDR image obtained by using the proposedalgorithm (f).

the original HDR one and a bank of non-linear fil-ters based on the human contrast sensitivity function(CSF) (Barten, 1999). The metric points to structuralartifacts of the LDR image with respect to the HDRimage and has a 0.7912 correlation with subjectiveevaluation according to Yeganeh et al. (2013).

• Statistical naturalness, N gives a score of the close-ness of the image histogram to a normal distributionwhich was found to match average human opinion.

• Overall quality, Q which integrates the structural fi-delity,S, and statistical naturalness,N , by :

Q = aSα + (1 − a)Nβ (9)

wherea = 0.8012, α = 0.3046, β = 0.7088 as foundin (Yeganeh and Wang, 2013). The metric has 0.818correlation with human opinion.

The structural fidelity appreciates how close a TMO to theCSF function, thus is theoretical oriented measure, whilethe structural fidelity is subjective measure and shows howclose is a TMO to the consumer preferences.

5.2. Subjective evaluation

On the subjective evaluation, for HDR images,Cadık etal. (Cadık et al., 2008) indicated the following criteria asbeing relevant: luminosity, contrast, color and detail repro-duction, and lack of image artifacts. The evaluation wasperformed in two steps. First we analyzed comparatively,by means of example, the overall appearance and existenceof artifacts with respect to the five named criteria in thetested methods; next, we followed with subjective evalua-tion where an external group of persons graded the images.

To perform the external evaluation, we instructed 18 stu-dents in the 20-24 years range to examine and rank theimages on their personal displays, taking into account thefive named criteria. We note that the students follow com-puter science or engineering programme, but they are notclosely related to image processing. Thus, the subjectiveevaluation could be biased towards groups with technicalexpertise.

The testing was partially blind: the subjects were awareof the theme (i.e. HDR), but they were not aware about

the source of each image. While we introduced them to amethod for monitor calibration and discussed aspects aboutview angle and distance to monitor, we did not impose themas strict requirements since the average consumer does notfollow rigorous criteria for visualization. Thus, the subjec-tive evaluation was more related to how appealing an imageis.

6. Results

6.1. Algorithm parameters

To determine the best parameters of the proposed methodwe resorted to empirical validation.

Logarithmic Model. The first choice of the proposedmethod is related to the specific Logarithm Image Process-ing model used. While we discuss this aspect by meansof an example shown in Fig.4, top row (b-d), we haveto stress that all the results fall in the same line. TheLTIP model provides the best contrast, while the classi-cal LIP model (Jourlin and Pinoli, 1987) leads to very sim-ilar results, with marginal differences like slightly flat-ter sky and less contrast on the forest. The symmetri-cal model (Patrascu and Buzuloiu, 2001) produces over–saturated images. Given the choice between our solutionand the one based on (Jourlin and Pinoli, 1987), as differ-ences are rather small, the choice relies solely on the per-ceptual motivation detailed in Section4.

Next, given the parametric extension of the LTIP modelfrom (Florea and Florea, 2013), we asked which is the bestvalue for the parameterm. As shown in Fig.4, bottom row(e-h), the best visual results are obtained form = 1, whichcorresponds to the original LTIP model. Choices differentfrom m = 1 use direct or inverse transformations that aretoo concave and, respectively, too convex, thus distortingthe final results. Also, the formulas become increasinglycomplex and precise computation more expensive. Con-cluding, the best results are achieved with models that arecloser to the human perception.

Algorithm weights. In Section4 we nominated three cate-gories of weights (contrast –wC , saturationwS and well-exposedness) that interfere with the algorithm. For thefirst two categories, values different from standard ones


(a) Normally exposed frame (b) LTIP,m = 1 (c) LIP (d) symmetrical LIP

(e) LTIP,m = 0.5 (f) LTIP, m = 0.75 (g) LTIP,m = 1.33 (h) LTIP,m = 2

Figure 4.Normal exposed image frame (a) and the resulting image obtained by the standard LTIP model (b), Classical LIP model (c),symmetrical model introduced by Patrascu (d). Images obtained with the parametric extension of the LTIP model (e-h).

(wC = 1 andwS = 1) have little impact.

The well-exposedness, which is described by mainly thecentral valueµ of the ”mid range” has significant impact.As one can see in Fig.5, the best result is achieved forµ = 0.37 while for larger values (µ > 0.37) the image istoo bright and, respectively, for smaller ones (µ < 0.37)is too dark. While aµ = 0.4 produced similar results, theobjective metrics reach the optimum inµ = 037. Chang-ing the variance also has little impact. These findings wereconfirmed by objective testing, as further showed in Sec-tion 6.3.

6.2. Comparison with state of the art

To test against various state of the art methods, we used theHDR irradiance map (stored as.hdr file) which was eitherdelivered with the images (and typically produced using themethod from (Robertson et al., 1999)), or produced withsome online available code2.

For comparative results we considered the exposurefusion in the variant modified according to (Mertens et al.,2007) and (Zhang and Cham, 2012) using the authorreleased code and the TMOs applied on the.hdrimages described in (Ward et al., 1997), (Fattal et al.,2002), (Durand and Dorsey, 2002), (Drago et al., 2003),(Reinhard et al., 2005), (Krawczyk et al., 2005) and(Banterle et al., 2012) as they are the foremost suchmethods. The code for the TMOs is taken from the MatlabHDR Toolbox (Banterle et al., 2011) and is availableonline3. The implemented algorithms were optimized by

2The HDR creator package is available athttp://cybertron.cg.tu-berlin.de/pdci09/

hdr_tonemapping/download.html3The HDR toolbox may be retrieved from

http://www.banterle.com/hdrbook/downloads/

Table 3.HDR image evaluation by the average values for struc-tural fidelity (S), statistical naturalness (N ) and the overall qual-ity (Q) and detailed in Section5.1. With bold letters we markedthe best result according to each category, while with italic thesecond one.

Method S[%] N [%] Q[%]

Ward et al. (1997) 66.9 14.38 72.7Fattal et al. (2002) 59.9 6.4 61.0

Durand et al. (2002) 81.7 41.0 85.4Drago et al. (2003) 82.3 50.2 87.0

Reinhard et al. (2005) 83.1 50.5 87.5Krawczyk et al. (2005) 71.7 36.8 76.6Banterle et al. (2012) 83.7 52.1 87.8

Mertens et al. (2007) 81.7 64.2 89.4Zhang et al. (2012) 77.6 59.7 83.4

Proposed, µ = 0.5 81.0 39.2 84.5Proposed, µ = 0.4 81.6 52.1 87.3Proposed, µ = 0.37 81.5 57.4 88.0Proposed, µ = 0.32 81.4 53.7 87.4

the Toolbox creators to match with initial article reportedresults and for better performance; hence, we used theimplicit values for the algorithms parameters. We note thatenvisaged TMO solutions include both global operatorsand local adaptation. A set of examples with the resultsproduced with all the methods is presented in Fig.6.

6.3. Objective metrics

Structure and Naturalness.We started the evaluation usingthe set of three objective metrics from (Yeganeh and Wang,2013). The results obtained are presented in Table3. The

HDR_Toolbox_current.zip

http://cybertron.cg.tu-berlin.de/pdci09/

hdr_tonemapping/download.html

http://www.banterle.com/hdrbook/downloads/

HDR_Toolbox_current.zip


(a)µ = 0.32 (b) µ = 0.37 (c) µ = 0.4 (d) µ = 0.5

Figure 5.Output image when various values for mid-range (µ) in well-exposedness weight were used. The preferred choice isµ = 0.37.

(a)Proposed (b) Pece and Kautz (2010) (c) Banterle et al. (2012) (d)Krawczyk et al. (2005)

(e) Reinhard et al. (2005) (f) Drago et al. (2003) (g) Durand et al. (2002) (h) Fattal et al. (2002)

Figure 6.The resulting images obtained with HDR state of the art imaging techniques (irradiance maps fusion followed by TMO andexposure fusion ).

best performing version of the proposed method was forµ = 0.37.

The proposed method, when compared with various TMOs,ranked first, according to the overall quality and statis-tical naturalness, and it rank fifth according to struc-tural fidelity (after (Banterle et al., 2012), (Reinhard et al.,2005), (Drago et al., 2003), (Durand and Dorsey, 2002)).Our method was penalized when compared to other TMOdue to their general adaptation being closer to the stan-dard contrast sensitivity function (CSF) (Barten, 1999).Yet we stress that some TMOs (Krawczyk et al., 2005) or(Banterle et al., 2012) work only for calibrated images inspecific scene luminance domain.

When compared with other exposure fusion methods(Mertens et al., 2007) and (Zhang and Cham, 2012), itranked second for the overall quality after (Mertens et al.,2007). This is an expected results as standard exposurefusion was build to match subjective opinion score as theenvisaged metrics did too. Yet the proposed method out-performed the overall performance of the exposure fusionintroduce in (Zhang and Cham, 2012). Furthermore a morerecent algorithm, namely the ExpoBlend (Bruce, 2014) re-ports the overall quality on two image that we used too(”Memorial” and ”Lamp”). On these images, the pro-

posed method outperformed ExpoBlend: on ”Memorial”we reach 95.5% compared to 93.2%, while on ”Lamp” wereach 90.1% compared to 89.4% reported in (Bruce, 2014).

Furthermore, the proposed method is the closest to the stan-dard exposure fusion result (Mertens et al., 2007), which iscurrently the state of the art method for consumer applica-tions while building HDR images. This aspect is shown inTable4, where we computed the natural logarithm of theRoot-Mean-Square to the image resulting from the stan-dard exposure fusion and, respectively, the structural simi-larity when compared to the same image.

Perceptualness.One claim of the current paper is that theproposed method adds perceptualness to the exposure fu-sion. To test this, we compared our method against thestandard exposure fusion, (Mertens et al., 2007) using theperceptual DRIM metric from (Aydin et al., 2008). Overthe considered database, the proposed method producedan average total error (the sum of three categories) with2% smaller than the standard exposure fusion (64.5% com-pared to 66.8%). On individual categories, the proposedmethod produced a smaller amount of amplification of con-trast, with comparable results on loss and reversal of con-trast. Thus, overall, the results confirm the claim.


Table 4.HDR image evaluation by taking the log of root meansquare to standard exposure fusion. Best values are marked withbold letters.

Method logRMSE[dB] SSIM

Ward et al. (1997) 149.9 83.5Fattal et al. (2002) 281.1 38.1

Durand et al. (2002) 160.5 74.4Drago et al. (2003) 148.4 74.7

Reinhard et al. (2005) 148.4 74.8Krawczyk et al. (2005) 136.9 66.4Banterle et al. (2012) 147.9 75.2

Proposed 72.1 93.8

(a) Durand et al. (2002) (b) Proposed

Figure 7.Examples of artifacts produced by state of the art meth-ods compared to the robustness of the proposed method. Close-ups point to artifact areas.

6.4. Artifacts

The HDR-specific objective metrics have the disadvantageof not properly weighting the artifacts that appear in im-ages, while human observers are very disturbed by them.This fact was also pointed by (Cadık et al., 2008) and tocompensate we performed visual inspection to identify dis-turbing artifacts. The proposed method never produced anyartifact in the tested image sets. Examples of state of the artmethods and artifacts produced may be seen in Fig.7.

In direct visual inspection, when compared against thestandard exposure fusion method (Mertens et al., 2007),our algorithm shows details in bright areas, while normal,real–based operations do not. This improvement is dueto the closing property of the logarithmic addition and re-spectively scalar amplification. This aspect is also visiblewhen comparing with the most robust TMO based method,namely (Banterle et al., 2012). Examples that illustratethese facts are presented in Fig.8.

6.5. Subjective ranking

The non-experts ranked the images produced with theproposed method, standard exposure fusion, the meth-ods from (Banterle et al., 2012), (Pece and Kautz, 2010),(Drago et al., 2003) and (Reinhard et al., 2005). Regardingthe results, the proposed method was selected as the best

one by 10 users, the exposure fusion (Mertens et al., 2007)by 7, while the rest won 1 case. Also the second placewas monopolized by the ”glossier” exposure fusion basedmethod.

When compared to direct exposure fusion proposed byMertens et al. (2007), due to the perceptual nature of theproposed method, a higher percentage of the scene dy-namic range is in the visible domain; direct fusion lossesinformation in the dark-tones domain and respectively invery bright part; this is in fact the explanation for the nar-row margin of our method advance.

7. Discussion and Conclusions

In this paper we showed that the LTIP model is compat-ible with the Naka-Rushton equation modelling the lightabsorption in the human eye and similar with the CRF ofdigital cameras. Upon these findings, we asserted that it ispossible to treat different approaches to HDR imaging uni-tary. Implementation of the weighted sum of input framesis both characteristic to irradiance map fusion and to expo-sure fusion. If implemented using LTIP operations, the per-ceptualness is added to the more popular exposure fusion.Finally, we introduced a new HDR imaging technique thatadapts the standard exposure fusion to the logarithmic typeoperations, leading to an algorithm which is consistent inboth theoretical and practical aspects. The closing propertyof the LTIP operations ensures that details are visible evenin areas with high luminosity, as previously shown.

The method maintains the simplicity of implementationtypical to the exposure fusion, since the principal differ-ence is the redefinition of the standard operations and dif-ferent parameter values. The supplemental calculus associ-ated with the non-linearity of LTIP operation could easilybe trimmed out by the use of look-up tables, as shown in(Florea and Florea, 2013).

The evaluation results re-affirmed that in an objective as-sessment aiming at naturalness and pleasantness of the im-age, the proposed method outperforms irradiance map fu-sion followed by TMOs as they try to mimic more a theo-retical model which is not perfect and is not how the normaluser expects HDR images to look. The same conclusionwas emphasized by the subjective evaluation, where meth-ods developed in the image domain are preferred as theresulting images are more ”appealing”. The method out-performed, even by a small margin, the standard exposurefusion when evaluated with DRIM metric showing that ismore HVS oriented. The proposed method, having a HVSinspired global adaptation and ”glossy” tuned local adapta-tion, by a narrow margin, ranks best in the subjective eval-uation.


(a) Banterle et al. (2012) (b) Proposed (c) Mertens et al. (2007) (d) Proposed

Figure 8.Examples of loss of details produced by state of the art methods compared to the robustness of the proposed method.

Acknowledgments

The authors wish to thank Martin Cadik for providing theframes for testing. We also would like to thank Tudor Iorgafor running tests and providing valuable ideas. The workhas been partially funded by the Sectoral Operational Pro-gramme Human Resources Development 2007-2013 of theMinistry of European Funds through the Financial Agree-ment POSDRU/159/1.5/S/134398.

References

Aydin, T., Mantiuk, R., Myszkowski, K. and Seidel, H.(2008). Dynamic range independent image quality as-sessment,ACM Transactions on Graphics (TOG), Vol.27(3), pp. 1–10.

Banterle, F., Artusi, A., Debattista, K. and Chalmers, A.(2011). Advanced High Dynamic Range Imaging: The-ory and Practice, AK Peters (CRC Press), Natick, MA,USA.

Banterle, F., Artusi, A., Sikudova, E., Edward, T.,Bashford-Rogers, W., Ledda, P., Bloj, M. and Chalmers,A. (2012). Dynamic range compression by differen-tial zone mapping based on psychophysical experiments,ACM Symposium on Applied Perception (SAP), pp. 39–46.

Barten, P. G. J. (1999).Contrast Sensitivity of the HumanEye and Its Effects on Image Quality, SPIE, Washington,DC.

Bruce, N. D. (2014). Expoblend: Information preservingexposure blending based on normalized log-domain en-tropy,Computers & Graphics39: 12–23.

Cadık, M., Wimmer, M., Neumann, L. and Artusi, A.(2008). Evaluation of HDR tone mapping methods us-ing essential perceptual attributes,Computers & Graph-ics 32(3): 330–349.

Debevec, P. and Malik, J. (1997). Recovering high dy-namic range radiance maps from photographs,ACMSIGGRAPH, pp. 369–378.

Deng, G., Cahill, L. W. and Tobin, G. R. (1995). A studyof logarithmic image processing model and its applica-tion to image enhancement,IEEE Transactions on ImageProcessing4(4): 506 – 512.

Drago, F., Myszkowski, K., Annen, T. and Chiba, N.(2003). Adaptive logarithmic mapping for display-ing high contrast scenes,Computer Graphics Forum22(3): 419 – 426.

Durand, F. and Dorsey, J. (2002). Fast bilateral filtering forthe display of high-dynamic-range images,ACM Trans-actions on Graphics21(3): 257 – 266.

Fattal, R., Lischinski, D. and Werman, M. (2002). Gradientdomain high dynamic range compression,ACM Trans-actions on Graphics (TOG)21(3): 249–256.

Ferradans, S., Bertalmio, M., Provenzi, E. and Caselles, V.(2012). An analysis of visual adaptation and contrastperception for tone mapping,IEEE Transactions on Pat-tern Analysis and Machine Intelligence33(10): 2002–2012.

Florea, C. and Florea, L. (2013). Parametric logarithmictype image processing for contrast based auto-focus inextreme lighting conditions,International Journal of Ap-plied Mathematics and Computer Science23(3): 637 –648.

Gilchrist, A., Kossyfidis, C., Bonato, F., Agostini, T., Catal-iotti, J., Li, X., Spehar, B., Annan, V. and Economou,E. (1999). An anchoring theory of lightness perception,Psychological Review106(4): 795–834.

Grossberg, M. D. and Nayar, S. K. (2004). Modelingthe space of camera response functions,IEEE Transac-tions on Pattern Analysis and Machine Intelligence26(10): 1272 – 1282.

Jourlin, M. and Pinoli, J. C. (1987). Logarithmic imageprocessing,Acta Stereologica6: 651 – 656.

Krawczyk, G., Myszkowski, K. and Seidel, H.-P. (2005).Lightness perception in tone reproduction for highdynamic range images,Computer Graphics Forum,Vol. 24, pp. 635–645.


Macmillan, N. and Creelman, C. (eds) (2005).Detectiontheory and: A user’s guide, Lawrence Erlbaum.

Mann, S. and Mann, R. (2001). Quantigraphic imaging:Estimating the camera response and exposures from dif-ferently exposed images,Porc. of IEEE Computer Visionand Pattern Recognition, Vol. 1, pp. 842–849.

Mann, S. and Picard, R. (1995). Being ’undigital’ withdigital cameras: Extending dynamic range by combiningdifferently exposed pictures,Proceedings of IS&Ts 48thAnnual Conference, Vol. 1, pp. 422–428.

Markovic, D. and Jukic, D. (2013). On parameter esti-mation in the bass model by nonlinear least squares fit-ting the adoption curve,International Journal of AppliedMathematics and Computer Science23(1): 145 – 155.

Mertens, T., Kautz, J. and Reeth, F. V. (2007). Exposurefusion,Proceedings of Pacific Graphics, pp. 382 – 390.

Meylan, L., Alleysson, D. and Susstrunk, S. (2007). Modelof retinal local adaptation for the tone mapping of colorfilter array images,Journal of Optical society of Amer-ica, A24: 2807 – 2816.

Naka, K.-I. and Rushton, W. A. H. (1966). S-potentialsfrom luminosity units in the retina of fish (cyprinidae),The Journal of Physiology185: 587 – 599.

Navarro, L., Courbebaisse, G. and Deng, G. (2013). Thesymmetric logarithmic image processing model,DigitalSignal Processing23(5): 1337 – 1343.

Panetta, K., Zhou, Y., Agaian, S. and Wharton, E. (2011).Parameterized logarithmic framework for image en-hancement,IEEE Transactions on Systems, Man, andCybernetics - part B: Cybernetics41(2): 460 – 472.

Patrascu, V. and Buzuloiu, V. (2001). Color image en-hancement in the framework of logarithmic models,Proceedings of the 8th IEEE International Conferenceon Telecommunications, Vol. 1, Bucharest, Romania,pp. 199 – 204.

Pece, F. and Kautz, J. (2010). Bitmap movement detection:Hdr for dynamic scenes,Proceedings of Conference onVisual Media Production, pp. 1–8.

Pinoli, J. C. and Debayle, J. (2007). Logarithmic adaptiveneighborhood image processing (LANIP): Introduction,connections to human brightness perception, and appli-cation issues,EURASIP Journal on Advances in SignalProcessing1: 114–114.

Reinhard, E., Stark, M., Shirley, P. and Ferwerda, J. (2002).Photographic tone reproduction for digital images,ACMTransactions on Graphics21: 267–276.

Reinhard, E., Ward, G., Pattanaik, S. and Debevec, P.(2005). High Dynamic Range Imaging: Acquisition,Display and Image-Based Lighting, Morgan KaufmannPublishers, San Francisco, California, S.U.A.

Robertson, M., Borman, S. and Stevenson, R. (1999). Dy-namic range improvement through multiple exposures,Proceedings of International Conference on Image Pro-cessing, pp. 159 – 163.

Stevens, J. and Stevens, S. (1963). Brightness func-tions: Effects of adaptation,Jorunal of Optical Societyof America A53: 375–385.

Stevens, S. (1961). To honor Fechner and repeal his law,Science133: 80–133.

Tamburino, D., Alleysson, D., Meylan, L. and Strusstruk,S. (2008). Digital camera workflow for high dynamicrange images using a model of retinal process,Proceed-ings of SPIE, Vol. 6817.

Valeton, J. and van Norren, D. (1983). Light adaptation ofprimate cones: an analysis based on extracellular data,Vision Research23(12): 1539 – 1547.

Vertan, C., Oprea, A., Florea, C. and Florea, L. (2008). Apseudo-logarithmic framework for edge detection,Ad-vanced Concepts for Intelligent Vision Systems, Vol.5259, pp. 637 – 644.

Wang, Z., Bovik, A. C., Sheikh, H. R. and Simoncelli, E. P.(2004). Image quality assessment: From error visibil-ity to structural similarity,IEEE Transactios on ImageProcessing13(4): 600–612.

Ward, G., Rushmeier, H. and Piatko, C. (1997). A visibilitymatching tone reproduction operator for high dynamicrange scenes,IEEE Transactions on Visualization andComputer Graphics3: 291–306.

Yeganeh, H. and Wang, Z. (2013). Objective quality as-sessment of tone mapped images,IEEE Transactios onImage Processing22(2): 657–667.

Zhang, W. and Cham, W.-K. (2012). Gradient-directedmulti-exposure composition,IEEE Transactios on Im-age Processing21(4): 2318–2323.

8. Biographies

Corneliu Florea born in 1980, got his master degree fromUniversity “Politehnica” of Bucharest in 2004 and the PhDfrom the same university in 2009. There, he lectures onstatistical signal and image processing, and has introduc-tory courses in computational photography and computer


vision. His research interests include non-linear image pro-cessing algorithms for digital still cameras and computervision methods for portrait understanding. He is author ofmore than 30 peer-reviewed papers and of 20 US patentsand patent applications.

Constantin Vertan Professor Constantin Vertan holds animage processing and analysis tenure at the Image Process-ing and Analysis Laboratory (LAPI) at the Politehnica Uni-versity of Bucharest. For his contributions in image pro-cessing he was awarded with UPB ”In tempore opportuno”award (2002), Romanian Research Council ”In hoc signovinces” award (2004) and was promoted as IEEE SeniorMember (2008). His research interests are: general im-age processing and analysis, content-based image retrieval,fuzzy and medical image processing applications. He au-thored more than 50 peer-reviewed papers. He is the secre-tary of the Romanian IEEE Signal Processing Chapter andassociate editor at EURASIP Journal on Image and VideoProcessing.

Laura Florea received her PhD in 2009 and M.Sc in 2004from University “Politehnica” of Bucharest. From 2004she teaches classes in the same university, where she is cur-rently a Lecturer. Her interests include image processingalgorithms for digital still cameras, automatic understand-ing of human behavior by analysis of portrait images, med-ical image processing and statistic signal processing the-ory. Previously she worked on computer aid diagnosis forhip joint replacement. She is author of more than 30 peer-reviewed journal and conference papers.

High Dynamic Range Imaging by Perceptual Logarithmic ... · by Perceptual Logarithmic Exposure...

Documents

Transcript of High Dynamic Range Imaging by Perceptual Logarithmic ... · by Perceptual Logarithmic Exposure...