Laurenz Peleman - Universiteit Gent

Laurenz Peleman

and Thin Film Solar Cells Performancein Sustainable Energy Production: Biodiesel SynthesisDevelopment of an Adequate Modelling Methodology

Academic year 2014-2015Faculty of Engineering and Architecture

Chairman: Prof. Marc VanhaelstDepartment of Industrial Technology and Construction

Chairman: Prof. dr. ir. Guy MarinDepartment of Chemical Engineering and Technical Chemistry

Master of Science in Chemical EngineeringMaster's dissertation submitted in order to obtain the academic degree of

Counsellor: Kenneth TochSupervisors: Prof. dr. ir. Joris Thybaut, Prof. dr. Johan Lauwaert

Preface

As May quietly passes by and this master thesis enters the stage of completion, it is getting

hard to deny that my five-year journey as a student at Ghent University is coming to an end.

During this period, I have been able to further explore the interesting fields of science and

technology while experiencing joyful moments and encountering opportunities which I

could not possibly have imagined at the moment it all started, on that sunny late afternoon

in August 2010 when I had finally bitten the bullet and decided to enroll at the engineering

faculty. Therefore, I would like to reserve this part to acknowledge all those who have,

whether or not directly, assisted and supported me throughout this thesis year and without

whom the past five years would not have been the same.

First of all, I would like to thank prof. dr. ir. Guy Marin and all staff members of the

Laboratory for Chemical Technology for the highly qualitative education and the

professional environment in which I have been trained as a chemical engineer. Thanks to my

promotors, prof. dr. ir. Joris Thybaut and prof. dr. Johan Lauwaert, for this challenging and

many-sided master thesis subject, for the interesting discussion sessions and the highly

appreciated willingness to clarify whatever was unclear. Special thanks go to my coach,

Kenneth, for his thorough support throughout the entire year, his valuable advice where

needed and his enthusiasm to seeing ‘challenge, not problems’ when results turned out once

more to be not that satisfactory. More personally, I wish him and his young family the very

best with the nearing welcoming of their third child.

I am very grateful to dr. Samira Khelifi of the Electronics and Information Systems

department for her help and time during the solar cell experiments, and to dr. ing. Evelien

Van de Steene of the Polymer Chemistry and Biomaterials Group, for her ever willing aid

and suggestions to tackle the variety of issues that were encountered during the

measurement on the chemical case study.

Writing this master thesis would not have been as straightforward without a pleasant

ambience, for which I want to thank all fellow-students. A special mention goes to

Alexandra, Brigitte and Tine, with whom I had the honor to share the control room for one

year. The small talks were a pleasant alternation on the efforts for the thesis, as well as on the

charming sounds from tube replacement, alarm signals and pressure washers arising from

the room next-door, while the dinner event earlier this year clearly demonstrated that you all

have outstanding culinary talents as well. Furthermore, I cannot get around the

unforgettable moments I have shared with Jenoff, Jens, Jeroen, Joeri, Lysander, Thomas and

Yoshi. Although, at the moment, it is hard to estimate how strongly the outcome of our no-

one-ever-died-because-of-a-healthy-dose-of-surrealism talks during regular, and additional, coffee

breaks will truly contribute to a better world, well, at least we made an attempt.

At last, I want to thank my friends and family, for their heartwarming and continual support

and belief in whatever I try to do. More specifically, I owe nothing but gratitude to my

parents, for having given me the chance to start this study and to create the optimal

conditions for me to accomplish it, time after time.

Laurenz Peleman

21 may 2015

FACULTY OF ENGINEERING AND ARCHITECTURE

Department of Chemical Engineering and Technical Chemistry

Laboratory for Chemical Technology

Director: Prof. Dr. Ir. Guy B. Marin

Laboratory for Chemical Technology • Technologiepark 914, B-9052 Gent • www.lct.ugent.be

Secretariat : T +32 (0)9 33 11 756 • F +32 (0)9 33 11 759 • [email protected]

Laboratory for Chemical Technology

Declaration concerning the accessibility of the master thesis Undersigned, Laurenz Peleman .......................................................................................................... Graduated from Ghent University, academic year 2014-2015 and is author of the master thesis with title: Development of an Adequate Modelling Methodology in Sustainable Energy Production: Biodiesel Synthesis and Thin Film Solar Cells Performance .........................................

......................................................................................................................................

......................................................................................................................................

......................................................................................................................................

The author gives permission to make this master dissertation available for consultation and to copy parts of this master dissertation for personal use.

22 May 2015

Development of an Adequate Modelling Methodology

in Sustainable Energy Production:

Biodiesel Synthesis and Thin Film Solar Cells Performance

Laurenz Peleman

Promotors: prof. dr. ir. Joris Thybaut

prof. dr. Johan Lauwaert

Counsellor: dr. ir. Kenneth Toch

Master’s dissertation submitted in order to obtain the academic degree of Master of Science in

Chemical Engineering

Department of Chemical Engineering and Technical Chemistry

Chairman: prof. dr. ir. Guy Marin

Faculty of Engineering and Architecture

Academic year 2014-2015

Summary

A crucial step in the construction of theoretical models to describe physicochemical

phenomena is in the accurate estimation of unknown model parameters. Typically, a statistical

analysis based on ordinary least squares regression is typically applied for this purpose as it

has already yielded promising results in the kinetic modeling of chemical reactions.

Nevertheless, the theoretical framework of this technique requires a specific regularity of the

experimental errors which is not necessarily fulfilled for practical measurements. As a

consequence, the accuracy of the estimates is not assured. The robustness and overall

applicability of this methodology are therefore evaluated on two case studies that are rooted in

sustainable energy production: the estimation of kinetic parameters for a transesterification

reaction, relevant for biodiesel production and the modelling of the electrical characteristics of

thin-layer solar cells. Additionally, alternative parameter estimation procedures were

searched, for which Matlab routines were written. Their potential was benchmarked by

application to a simple model.

Keywords

Least squares regression; weighted regression, correlated errors; Bayesian estimation; MCMC

sampling; transesterification; kinetic modeling, biodiesel, CIGS solar cells; sustainable energy

Development of an Adequate Modeling

Methodology in Sustainable Energy Production:

Biodiesel Synthesis and

Thin Film Solar Cells Performance

Laurenz Peleman

Supervisors: dr. ir. K. Toch, prof. dr. ir. J.W. Thybaut, prof. dr. J. Lauwaert

Abstract: A crucial step in the construction of theoretical

models to describe physicochemical phenomena is in the accurate

estimation of unknown model parameters. Typically, a statistical

analysis based on ordinary least squares regression is typically

applied for this purpose as it has already yielded promising

results in the kinetic modeling of chemical reactions.

Nevertheless, the theoretical framework of this technique

requires a specific regularity of the experimental errors which is

not necessarily fulfilled for practical measurements. As a

consequence, the accuracy of the estimates is not assured. The

robustness and overall applicability of this methodology are

therefore evaluated on two case studies that are rooted in

sustainable energy production: the estimation of kinetic

parameters for a transesterification reaction, relevant for

biodiesel production and the modelling of the electrical

characteristics of thin-layer solar cells. Additionally, alternative

parameter estimation procedures were searched, for which

Matlab routines were written. Their potential was benchmarked

by application to a simple model.

Keywords: Least squares regression; weighted regression,

correlated errors; Bayesian estimation; MCMC sampling;

transesterification; kinetic modeling, biodiesel, CIGS solar cells;

sustainable energy

I. INTRODUCTION

Classical regression schemes are a powerful tool to obtain

optimal point values for unknown model parameters by fitting

a limited set of experimental data and allow for a complete

statistical assessment of the validity of these results [1, 2]. The

underlying theory relies on regularity conditions on the

experimental error of data. The noise is assumed to obey to a

normal distribution, centered at 0, having a constant variance

and no mutual correlation. Unfortunately, the fulfillment of

these theoretical requirements is not guaranteed in practice,

and as a consequence, the accuracy and performance of the

regression routine is in general not ensured as well.

In this work, this methodology was critically reviewed. The

evaluation is two-fold. At first, the performance of the current

methodology was assessed, by testing 1) its robustness

towards the experimentally used reactor setup for a

transesterification reaction, relevant for biodiesel production

processes, and 2) its applicability in scientific fields beyond

chemical kinetics, by the estimation of electrical properties of

thin-layer solar cells (TLSC’s) from current-voltage (I-V)

L. Peleman is with the Chemical Engineering Department, Ghent

University (UGent), Gent, Belgium. E-mail: [email protected].

measurements. Secondly, three alternative parameter

estimation procedures are benchmarked on a simple linear

model. The first two are extensions of classical regression,

correcting it for violation of the theoretical restrictions. The

third method, the Bayesian approach, starts from a

fundamentally different view on parameter estimation and

allows theoretically for attractive features like optimized

weighing of the data and the inclusion of prior knowledge on

the parameter values.

II. CASE STUDY 1: KINETIC MODELING OF CONTINUOUS-FLOW

DATA ON TRANSESTERIFICATION

A. Assessing the robustness of the methodology

A kinetic model for the transesterification of methanol and

ethyl acetate on the ion-exchanging catalyst Lewatit K2629

has been developed recently [3]. The estimation of the kinetic

parameters was performed on data from a batch reactor setup.

Repeating the estimation based on continuous-flow data was

considered as a valuable way to evaluate its robustness.

B. Procedures

The reactor setup consisted of a double-layer glass tube,

filled up with Raschig rings, with a plug of 2.0 g of freeze-

dried catalyst placed halfway. Thermal oil flows through the

outer shell of the tube, which allows for stable temperature

control of the reaction environment. The feed mixture,

consisting of methanol (MeOH), ethyl acetate (EtOAc) and n-

decane as internal standard, was sent through the setup at 5, 8

and 15 ml/min. The temperature was varied from 30 over 45

to 60 °C, while feeds were made up for molar MeOH to

EtOAc ratios of 1:1, 5:1 and 10:1. Measurements at each of

these conditions hence yielded a set of 27 samples. These

were then put in a GC-analyzer to determine their exact

composition.

C. Results

The calculated concentrations of the different reactants and

qualitatively showed the expected trends as function of the

varied process variables. Conversion of the reactants

increased for increasing temperature and for a higher excess

of MeOH, while lower space time had a negative impact on

the conversion, see Figure 1. Unfortunately, the quantitative

results were more doubtful, as the calculated concentrations

were not consistent with the stoichiometry of the reaction, for

an unidentified reason.

Nevertheless, simulations were run on the collected data in

the simulation software Athena Visual Studio (AVS), using an

adapted version of the original kinetic model that suited the

continuous reactor setup.

In a first step, the kinetic parameters were set at the values as

reported in the original research. Comparison of the calculated

model predictions and the observations revealed a

considerable misfit. Subsequently, an attempt was made to

estimate new values for the kinetic parameters, i.e., those

which fit the continuous-flow data optimally. Unfortunately,

the routine did not yield significant estimates and hence, a

solid comparison with the reported values is not possible.

Figure 1Conversion of EtOAc (%) for varying temperature,

molar feed composition and flow rates (♦ 5, ■ 8 and ▲ 15 ml/min)

III. CASE STUDY 2: MODELING OF CURRENT-VOLTAGE

CHARACTERISTICS OF THIN-LAYER SOLAR CELLS

A. Assessing the overall applicability

The proper description of the dark I-V characteristics of

TLSC’s was chosen to evaluate the ability of the current

methodology to model physical phenomena as well.

The application of photovoltaic cells as thin films on both

rigid and flexible carriers is considered as a promising route to

extend the production of solar energy. CIGS-based cells,

made of a copper-indium-gallium-selenium semiconductor,

yield attractive efficiencies, but limited knowledge about the

power loss mechanisms hinders effective scale-up of this

technology. These parasitic pathways causing current leakage

inside the cell are attributed to imperfections in the cell

structure and therefore often very localized.

Equivalent electronic networks are available to model this

behavior. The most extended model counts 8 parameters [4].

B. Procedures

Two pieces of 0.5 cm² surface area were cut out of the same

CIGS mother panel and fixed in the measuring device. The

voltage over and the current through the cell were measured

by two separate wires to minimize the bias of the results due

to the potential fall over the wires. The device is cooled with

liquid nitrogen to ‘freeze in’ the leakage mechanisms and

allowed for measurements at temperatures ranging from 300

down to 110 K in steps of 10 K. The voltages over the cells

were varied between -1 and 1V by 0.01V increments.

C. Results

Parameter estimates were calculated for both cells at all

temperatures while both their significance and physical

meaning were ensured. This allowed for a current pathway

analysis like shown in Figure 2, giving the contributions of

each model term at each temperature. It was observed that not

all leakage effects were contributing equally in both cells,

which demonstrated the small-scale differences in parasitic

effects.

Figure 2 Contribution of current pathways for both cells (%)

Main junction Shunt resistance SCLC

Shunt tunneling

IV. BENCHMARK ANALYSIS OF ALTERNATIVE PARAMETER

ESTIMATION PROCEDURES

A. Adapted regression schemes and Bayesian estimation

Three alternative parameter estimation techniques were

retained from a literature survey. The first performs a data-

based weighing of the residuals to correct for experimental

error with non-constant variance [5]. The weighing

factors 𝑤𝑖 , 𝑖 = 1. . 𝑛, 𝑛 the number of experiments, are

calculated according to (1) after the introduction of the

transformation parameter 𝜙, which has to be estimated from

the data.

𝑤𝑖 ∝ |𝑦�̂�|2𝜙−2 (1)

Secondly, the violation of the assumption on zero-

correlation was taken into account, which is particularly

relevant for time series experiments [6]. The elected method

captures the error in a first-order autoregressive model, i.e.:

𝜀(𝑡) = 𝜌𝜀(𝑡 − 1) + 𝑢(𝑡) (2)

which requires the calculation of the autocorrelation factor 𝜌.

Lastly, Bayesian estimation was considered. For this

approach, all statistical inference on the parameters 𝜷 of the

model 𝑦𝑖𝑗 = 𝑓𝑗(𝒙𝒊, 𝜷) is extracted from the posterior density

function 𝑝(𝜷|𝒚):

𝑝(𝜷|𝒚) ∝ ∏ | 𝒗(𝑖)(𝜷)|−(𝑚+2) 2⁄

𝑛

𝑖=1

(3)

where 𝒗(𝑖)(𝜷) = {[𝑦𝑖𝑗 − 𝑓𝑗(𝒙𝒊, 𝜷)][𝑦𝑖𝑙 − 𝑓𝑙(𝒙𝒊, 𝜷)]}𝑗=1..𝑚

𝑙=1..𝑚

and 𝑚 the number of responses for each experiment. Affine

invariant MCMC sampling was used to evaluate (3) efficiently throughout parameter space [7, 8].

B. Procedures

Routines were encoded in Matlab for all techniques. To

compare their performance compared to classical regression,

parameter estimation was performed on the single-response

linear model 𝑦 = 5𝑥 + 1. An experimental data set was

simulated by adding Gaussian noise 𝜀~𝒩(0, 𝑉) to the exact

response value, with the covariance matrix 𝑉 to be specified.

This way, it is allowed to impose a specific correlation

structure on the ‘experimental’ data and test, or break through,

the theoretical limits of ordinary regression. For each of the

techniques, 10 subsequent runs are performed to evaluate the

consistency of their outcome.

0

20

40

30 60

1:1

0

20

40

30 60

5:1

0

20

40

30 60

10:1

0

100

-1 0 1

Voltage [V]

0

100

-1 0 1

Voltage [V]

C. Results

1) Data-based weighing

Each experimental data set was simulated by explicitly

assuming that the error variance scales with the square of the

corresponding response. In principle, optimal weighing

factors correspond to 𝜙 = 0 in equation (1) for this situation.

A comparison of the resulting point estimates resulting for

both the weighted and classical regression are shown in Figure

3. It follows that, for all runs, the optimal parameter values are

more accurately estimated by the weighing technique.

Nevertheless, the calculated confidence intervals on the

estimates, which are not depicted for clarity, turned out to be

almost equally broad for both methods. Hence, the weighted

estimation is actually equally informative as the classical

regression. Moreover, the weighing routine was found to

perform considerably worse for a lower quality of the data, as

the optimal fit to highly scattered observations was poor.

Figure 3 Estimates for the model parameters A (left) and B (right)

for 10 separate runs from the weighted regression (filled markers)

and ordinary least squares (open markers). The dotted line

corresponds to the true parameter values.

2) Correction for serial correlation

To evaluate the benefits of accounting for correlation

between observations, the experimental data set was designed

to obey a specified mutual dependence. A comparison

between the results from adapted and ordinary regression is

depicted in Figure 4. Although clearly demonstrating the

considerable impact of correlation on the performance of

ordinary regression, it shows the rather limited gain in

accuracy of the adapted procedure as well. Moreover, when

simulating experimental data with slightly or highly positive

mutual correlation, the outcome of the corrected and classical

regression were almost identical.

Figure 4 Estimates for the model parameters A (left) and B (right)

for 10 separate runs from explicit correction for serial correlation

(filled markers) and ordinary least squares (open markers). The

dotted line corresponds to the true parameter values.

3) Bayesian estimation with MCMC sampling

In contrast to regression routines, Bayesian estimation

offers statistical inference on unknown model parameters in

the form of a so-called posterior density distribution over

parameter space, rather than point estimates and

corresponding confidence intervals. The marginalized

versions of this distribution are shown in Figure 5. Clearly,

both functions attain their global maximum in the close

vicinity of the actual model parameter values. Nevertheless,

the calculated 95% probability density interval was found to

be broader than its analogue from ordinary least squares

regression.

Figure 5 Marginalized posterior density function

for the model parameters

V. CONCLUSIONS

In this thesis, the currently applied methodology to obtain

statistical inference on unknown model parameter is critically

reviewed. The quality of the data from continuous-flow

transesterification experiments, performed with the aim of

evaluate the robustness of the estimation procedure, did not

allow for a quantitative assessment. Nevertheless,

quantitatively the results did show the expected trends. On the

other hand, the application of the methodology to the

modeling of thin-layer solar cell performance proved to be

successful, opening perspectives for future research in this

field. At last, the benchmark analysis of the adapted

regression techniques on the rudimentary model demonstrated

that the offered gains in performance were not always

consistent. Bayesian estimation yielded in turn accurate, yet

less precise results than classical regression.

REFERENCES

1. Thybaut, J.W., Kinetic Modeling and Simulation - University

Course. 2014, Ghent University.

2. Toch, K., An intrinsic kinetics based methodology for multi-scale modeling of chemical reactions. 2014.

3. Van de Steene, E., Kinetic study of the (trans)esterification

catalyzed by gel and macroporous resins. 2014.

4. Williams, B.L., et al., Identifying parasitic current pathways in

CIGS solar cells by modelling dark J–V response. Progress in

Photovoltaics: Research and Applications, 2015. 5. Pritchard, D.J., J. Downie, and D.W. Bacon, Further

Consideration of Heteroscedasticity in Fitting Kinetic Models.

Technometrics, 1977. 19(3): p. 227-236. 6. Seber, G.A.F. and C.J. Wild, Nonlinear Regression. 2003: Wiley.

7. Stewart, W.E. and M. Caracotsios, Computer-Aided Modeling of

Reactive Systems. 2008: Wiley. 8. Goodman, J. and J. Weare, Ensemble samplers with affine

invariance. Communications in Applied Mathematics and

Computational Science, 2010. 5(1): p. 65-80.

4

5

6

0 5 100

1

2

0 5 10

4.9

5

5.1

0 5 10

0.9

1

1.1

0 5 10

0

1

2

0 5 10

A

0

1

2

-2 0 2

B

�̂� �̂�

�̂� �̂�

i

Table of Contents

Table of Contents i

List of Figures iii

List of Tables vi

List of Symbols vii

Chapter 1 The noble art of model building 1

1.1 Aim of the thesis 3

1.2 Outline of this work 4

1.3 References 4

Chapter 2 Case study: kinetic modeling of continuous-flow transesterification 5

2.1 Biofuels: a promising alternative to fossil feedstocks? 5

2.1.1 Catalytic pathways to biodiesel production 8

2.1.2 Outline of this chapter 11

2.2 Reactor-scale kinetic modeling of the transesterification reaction on macroporous resins 11

2.2.1 Reaction network for catalytic transesterification of methanol and ethyl acetate 11

2.2.2 Model for continuous reactor configuration 13

2.3 Experimental setup and procedures 15

2.3.1 Reactants and catalyst 15

2.3.2 Tubular reactor setup 16

2.4 Discussion of the experimental data 20

2.5 Parameter estimation and statistical analysis 23

2.6 References 26

Chapter 3 Case study: modeling of current-voltage characteristics of thin layer solar cells

27

3.1 Solar energy: towards a bright, sustainable future? 27

3.1.1 General working principle of a photovoltaic device 30

3.1.2 Thin film solar cell technology 33

3.1.3 Outline of this chapter 35

3.2 Model for the dark current-voltage characteristic of CIGS solar cells 36

3.2.1 Ideal versus non-ideal electric behavior of solar cells 36

3.2.2 Modelling parasitic current pathways and non-idealities in a CIGS heterojunction solar

device 37

ii

3.3 Experimental setup and procedures 41

3.3.1 Overview of the statistical analysis 42

3.4 Analysis of the results 44

3.4.1 Results of the statistical assessment 44

3.4.2 Physical interpretation of the results 50

3.5 References 56

Chapter 4 Literature review on alternative parameter estimation techniques 58

4.1 Tackling the heteroscedasticity issue: towards a proper handling of heterogeneous variance

of the experimental error 58

4.1.1 Data-based weighing of the residuals 58

4.1.2 Robust estimation and outlier detection 63

4.2 Accounting for serial correlation of the error 66

4.2.1 Explicit modelling of the serial correlation of the error term 67

4.2.2 Second-order statistical regression 70

4.3 Bayesian statistical assessment 72

4.3.1 A Bayesian view on parameter estimation 72

4.3.2 Bayesian parameter estimation 74

4.3.3 Posterior density distribution for relevant scenarios in kinetic parameter estimation 77

4.3.4 Posterior inference on model parameters 82

4.3.5 Including insights by an informative prior function 87

4.4 References 89

Chapter 5 Benchmark analysis of alternative parameter estimation techniques 91

5.1 Data-based weighted regression 92

5.2 Explicit modelling of serial correlation in the data 96

5.3 Bayesian estimation by MCMC posterior sampling 100

5.4 References 108

Chapter 6 Conclusions and future work 108

Appendix 110

A.1 Matlab routines for alternative techniques to estimate model parameters 110

A.1.1 Data-based weighing 110

A.1.2 Correcting for serial correlation 111

A.1.3 Bayesian estimation with affine invariant MCMC 113

A.2 Lab journal: table of contents 116

iii

List of Figures

Figure 1-1 Reaction network for the catalytic hydrogenation of benzene to

cyclohexane, demonstrating the high degree of complexity of kinetic model building

for even a limited number of reactants and products [1] .................................................. 2

Figure 2-1 EU28 greenhouse gas emissions by sector in 2012 ......................................... 6

Figure 2-2 Evolution of the biodiesel (dark columns) and total biofuel (light columns)

consumption and the share of biodiesel (line) in the EU transport sector [12, 13] ........ 8

Figure 2-3 General scheme for the transesterification of natural fat materials to

produce methyl esters, the actual biodiesel components, with R1,2,3 the carbon chain

of the fatty acid ........................................................................................................................ 9

Figure 2-4 Microscopic structure of gel-type (left) and macroporous resins (right) [22]

.................................................................................................................................................. 10

Figure 2-5 Reaction mechanism for the transesterification of ethyl acetate and

methanol on an ion-exchange resin .................................................................................... 12

Figure 2-6 Original catalyst (left) and the material after dialyzing (right) ................... 16

Figure 2-7 Tubular reactor setup as used during the continuous-flow experiments .. 17

Figure 2-8 Experimentally obtained conversions of ethyl acetate for varying flow

rates, temperatures and initial molar feed composition .................................................. 22

Figure 2-9 Comparison of the observed (markers) and the simulated (line) conversion

of ethyl acetate concentration based on the reported kinetic parameter values for the

studied temperature, feed composition and volumetric flow (♦ 5, ■ 8 and ▲ 15

ml/min) .................................................................................................................................... 24

Figure 3-1 Evolution of the global energy demand by fuel over the last 50 years and

outlooks for the near future [1] ........................................................................................... 28

Figure 3-2 Recent evolution of the total photovoltaic production capacity worldwide

.................................................................................................................................................. 29

Figure 3-3 Transfer of electrons (black) and holes (white) across a p-n

(hetero)junction by means of diffusion ( ) and drift ( ). Recombination

induces the formation of a neutral depletion zone around the interface ...................... 31

Figure 3-4 Current-voltage characteristic of a typical CIGS solar cell in the dark (

) and under illumination ( ) .............................................................................. 32

Figure 3-5 Schematic overview of the multilayer structure of the CIGS thin layer

solar cell .................................................................................................................................. 35

iv

Figure 3-6 Ideal (dashed line) vs. non-ideal (solid) current-voltage profiles, showing

the different non-idealities to be explained ....................................................................... 36

Figure 3-7 Typical parallel current pathways proposed for explaining the non-ideal

behavior of real solar cells and the equivalent electric circuit [22] ................................ 38

Figure 3-8 Experimental setup in real life and schematically, showing the two-wire

configuration .......................................................................................................................... 42

Figure 3-9 Observed current-voltage characteristics of the first cell ............................. 44

Figure 3-10 Best fitting curve when considering resistance of non-ideal contacts only

.................................................................................................................................................. 45

Figure 3-11 Best fitting curve when considering non-ideal contacts and shunt

resistance for the first cell at 290 K ...................................................................................... 46

Figure 3-12 Best fitting curve when considering non-ideal contacts, shunt resistance

and space charge limited current leakage for the first cell at 290 K............................... 46

Figure 3-13 Parity diagram for the first cell at 290 K ....................................................... 48

Figure 3-14 Residual plot for the first cell at 290 K ........................................................... 49

Figure 3-15 Parameter estimates and corresponding 95% individual confidence

intervals for the first cell ....................................................................................................... 51

Figure 3-16 Parameter estimates and corresponding 95% individual confidence

intervals for the second c ell ................................................................................................ 52

Figure 3-17 Contribution of the suggested leakage pathways for both solar cells, for

different temperatures. First cell: left, second cell: right Main junction Shunt

resistance SCLC Shunt tunneling ....................................................................... 55

Figure 4-1 Illustrative example of residual plots for a case with constant variance

(left) and strong heteroscedastic experimental errors (right) ......................................... 59

Figure 4-2 Typical lag-plots of the residuals of an uncorrelated (left) and positively,

first-order correlated (right) data set .................................................................................. 68

Figure 4-3 Typical likelihood distribution for the parameters in a linear (left) and

non-linear model (right) with two parameters ................................................................. 76

Figure 4-4 Illustration of the convergence of a MCMC sampling routine towards an

unknown probability distribution (full line) [32] ............................................................. 86

Figure 4-5 MCMC sampling from a one-dimensional target distribution for different

variances of the normal proposal distribution: 0.05 (left), 1 (middle) and 100 (right)

Top: actual distribution (red line) and sample histogram (blocks) Bottom: trace-plot

of the sampled parameter value as function of the iteration number ........................... 87

Figure 5-1 Estimates for the model parameters A (left) and B (right) for different

values of the scaling factor σ2 for 10 subsequent runs. Filled symbols denote the

results of the weighted regression, open markers correspond to classical least squares

v

estimation. The dotted line corresponds to the true parameter values. Remark the

varying scaling of the y axis................................................................................................. 93

Figure 5-2 Optimal values for the transformation parameter for all three considered

scaling factors and for all runs ............................................................................................ 95

Figure 5-3 Weighted residuals for the ordinary (open symbols) and weighted least

squares estimation (filled markers) .................................................................................... 96

Figure 5-4 Estimates for the model parameters A (left) and B (right) for different

values of the autocorrelation ρ as function of the run number. Filled symbols denote

the results of the Two Stage Iterative regression, open markers correspond to

classical least squares estimation. The dotted line corresponds to the true parameter

values. Remark the varying scaling of the y axis. ............................................................. 98

Figure 5-5 Point estimates for the autocorrelation values for all runs as tinted markers

with the corresponding, true values given by the dotted lines ...................................... 99

Figure 5-6 Residual (left) and lag plot (right) for ordinary (open symbols) and

corrected regression (filled markers) for a run with ρ having a pre-specified value of -

0.99. The solid line in the lag plot is given by ecorr, i = −0.99ecorr, i − 1 .................. 100

Figure 5-7 Results of the Metropolis-Hastings procedure for c = 10, giving the

marginal probabilities from the sampled posterior (above) and the sampled values

throughout the iteration for the model parameters A (left) and B (right) .................. 102

Figure 5-8 Results of the Metropolis-Hastings procedure for c = 1, giving the

marginal probabilities from the sampled posterior (above) and the sampled values

throughout the iteration for the model parameters A (left) and B (right) .................. 102

Figure 5-9 Results from the Adaptive Metropolis algorithm, giving the marginal

probabilities from the sampled posterior (above) and the sampled values throughout

the iteration for the model parameters A (left) and B (right) ........................................ 104

Figure 5-10 Stretch move of walker Xk along the line through walker Xj, yielding

candidate sample Y. All other walkers (grey dots) do not participate. ....................... 105

Figure 5-11 Results from the affine invariant MCMC algorithm, giving the marginal

probabilities from the sampled posterior (above) and the sampled values throughout

the iteration for the model parameters A (left) and B (right) ........................................ 106

vi

List of Tables

Table 2-1 Mass of reactants and internal standard used to make up each of the

studied feed compositions, in gram ................................................................................... 15

Table 2-2 Settings of the FID ................................................................................................ 18

Table 2-3 Retention time for the reactants, products and internal standard ................ 18

Table 2-4 Relative sensitivities of the reactants, products and internal standard........ 20

Table 2-5 Reported kinetic parameters for the transesterification of ethyl acetate and

methanol on Lewatit K2629, as originally estimated from batch-reactor data [22] ..... 23

Table 3-1 Binary correlation matrix of the model parameters for the first cell at 290 K

.................................................................................................................................................. 49

vii

List of Symbols

Roman symbols

𝐴 exponent of the diode equation [1/V]

𝐴𝑖 peak surface from GC analysis for component 𝑖 [cm²]

𝑎𝑖 activity of component 𝑖 [mol/l]

𝐶𝑖 concentration of component 𝑖 [mol/l]

𝑑 Durbin-Watson test criterion

𝐸𝑎 activation energy [J/mol]

𝑒𝒊 residual on the 𝒊’th observation

𝐹𝑖 molar flow rate of component 𝑖 [mol/s]

𝐼 current [A]

𝐽 current density [A/cm²]

𝐽0 saturation current density [A/cm²]

𝐾𝑒𝑞 equivalent equilibrium constant of the catalyzed transesterification

𝐾𝐸𝑡𝑂𝐴𝑐 equilibrium constant for adsorption of ethyl acetate on the catalyst

𝐾𝑀𝑒𝑂𝐴𝑐 equilibrium constant for adsorption of methyl acetate on the catalyst

𝐾𝑆𝑅 equilibrium constant for the surface reaction on the catalyst

𝑘 prefactor of SCLC current density [A/cm²]

𝑘𝐵 Boltzmann constant [J/K]

𝐿(𝜷|𝒚) likelihood function of the model parameters

𝑚 number of responses

- Chapter 4: SCLC exponent []

𝑛 number of experiments

- Chapter 4: ideality factor of a diode [-]

𝑝 number of model parameters

𝑝(𝜷|𝒚) posterior density function of the model parameters

𝑝(𝜷) prior density function of the model parameters

viii

𝑝(𝜷, 𝜮) prior density function of the model parameter values and the covariance

matrix of the experimental error

𝑞(𝜷) proposal distribution

𝑅 Universal gas constant [J/mol/K]

𝑅𝑠 series resistance [Ω/cm²]

𝑅𝑠ℎ shunt resistance [Ω/cm²]

𝑅𝑤,𝑖 net specific reaction rate of formation of component 𝑖 [mol/kg/s]

𝑟𝑤 specific reaction rate [mol/s/kgcat]

𝑇 absolute temperature [K]

𝑢 white noise

𝑽 covariance matrix of the experimental errors

𝑉 voltage [V]

�̇� volumetric flow rate (l/s)

𝑾 weighing matrix

𝑊 catalyst weight [g]

𝑤𝑖 weighing factor for the i’th observation

𝑋𝑖 conversion of component 𝑖 [-]

𝑥𝑖 𝑖’th process condition

𝑦𝑖𝑗 j’th observed response for the i’th experiment

�̂�𝑖 model prediction for responses yi

Greek symbols

𝛽 model parameter

�̂� estimated parameter value

𝜀 experimental error

𝛾𝑖 activity coefficient of component 𝑖 [-]

𝜙 Box-Cox transformation parameter

𝜌 autocorrelation function

ix

𝜮𝒊 covariance matrix of the responses of 𝑖’th experiment

𝜎 standard deviation

List of abbreviations

AR(1) first-order autoregressive model

CIGS copper-indium-gallium-selenium

FID flame ionization detector

MCMC Markov-Chain Monte-Carlo

PTBS Power-Transform-Both-Sides

SCLC space-charge limited current

TLSC thin-layer solar cells

The noble art of model building 1

Chapter 1

The noble art of model building

Ever since man’s ability to comprehend the phenomena that occur in his surroundings had

reached a sufficient level of sophistication, he has strived for a thorough understanding of

the fundamental mechanisms which underlie them. This intrinsic drive for profound insights

in the complex biological, physical or chemical principles that control the environment have

triggered scientific research and enabled the development of revolutionary technologies in

turn. Indeed, the talent to derive quantitative cause-and-consequence relationships between

a particular outcome of the process and the conditions at which it took place, allows for

precise predictions on the behavior of the phenomenon in a broader range of situations. In

particular, the circumstances can then be manipulated, or tuned, in such a way that the

process functions exactly, at least to some extent, as it was stipulated beforehand.

At this point, the need emerges for an exact and accurate expression that fully describes the

dynamics of the process under study. This role is played by the theoretical model, which

tries to capture the effect of all relevant process conditions on its outcome, in a

mathematically closed way. Starting from fundamental principles that are believed to be

applicable to the observed system, a number of parameters will be introduced during the

construction of this relation to reflect the specific characteristics of the studied process.

Although exact values for these parameters are required to allow for an exact model-based

prediction of the process outcome for a specific set of conditions, their direct measurement

may be non-trivial or even practically infeasible, in particular when these parameters

correspond to properties of the system at a very small, micro- to even nanometer scale.

The modeling of chemical reactions is a type example of a field of study which is confronted

almost continuously with a load of unknown and not directly measurable characteristics

during model building. Being an inevitable part when assessing the performance of a

chemical reactor, the accurate description of the chemical kinetics, i.e. the net rate at which

reactants are consumed and products are formed, is crucial for both the design of well-

behaving and stable reactor setups, as well as for the search for innovative, high-

performance catalysts or improvements in reactor technology, allowing to operate the

reactions more safely, at milder conditions or at a lower cost. Moreover, a more efficient

operation of chemical reactors will result in a lower environmental impact of the production

process and is hence highly attractive from an ecological and sustainable point of view as

well.

Unfortunately, as chemical reactions, and especially catalytic systems, tend to occur step-

wise, i.e. by producing a series of intermediate components, which can undergo multiple

side reactions, a complex reaction network arises, see Figure 1-1. The combination of a

potentially very high number of elementary steps and their strong mutual interplay results

2 The noble art of model building

in a global reaction rate which is a highly complex function of both process variables and the

kinetic parameters of the different steps in the network.

Figure 1-1 Reaction network for the catalytic hydrogenation of benzene to cyclohexane,

demonstrating the high degree of complexity of kinetic model building for even a limited

number of reactants and products [1]

Moreover, certainly in catalytic systems where adsorption of some reactants on the catalyst

surface has to occur before the actual reactions take place, the strength of the interaction

between the catalyst and the chemicals and the (un)stabilizing effect of the adsorption on

certain elementary steps, has to be accounted for as well. Since a direct monitoring of these

parameters during the reaction is not possible, different techniques have to be applied to

unravel their actual value indirectly.

An often applied methodology to obtain quantitative values for the unknown kinetic

parameters in a reaction mechanism relies on the regression of the model to a finite set of

experimental data. This way, the observed values for a certain response variable, e.g. the

concentration of a component in the reactor outlet flow, are fitted optimally to the model-

based predictions for that response. Those values for the unknown kinetic parameters that

minimize the objective function of the regression, typically the sum of the squared deviations

between the predicted and measured values, are assumed to be the best possible parameter

estimates. Although intuitively attractive, the validity of this procedure is only assured

under some stringent conditions on the structure of the error on the measured response,

arising from random, unintentional irregularities during the experiment or analysis. At first,

assuming that the actual value for the response at conditions 𝒙 is given by the model

function 𝑓(𝒙), then it follows for the observed response value 𝑦:

𝑦 = 𝑓(𝒙) + 𝜀 (1-1)

which introduces the experimental error 𝜀 as an additive term. Moreover, the following

requirements have to be met concerning the mutual dependence of the errors of different

experiments:

The noble art of model building 3

1. The random experimental error takes a normal distribution;

2. All experimental errors have an expected value of 0;

3. All experimental errors have the same variance or, stated differently, are

homoscedastic;

4. The experimental errors are uncorrelated.

When these conditions are met, a solid mathematical framework allows for the assessment

of the uncertainty on the values of the best fitting parameter estimates, together with

inference on the significance of the estimated parameters individually, i.e. whether they

differ substantially from 0 and hence contribute significantly to the model, and on the

significance, adequacy and quality of the regression as a whole [2]. This way, besides a set of

best performing values for the unknown parameters in the kinetic model, this regression

methodology allows for a full statistical analysis and evaluation of the outcome as well.

Nevertheless, the need for a set of stringent assumptions to ensure the correctness of the

regression procedure forms a considerable pitfall, as there is no a priori certainty that these

conditions will be fulfilled for a certain set of experimental data. Hence, applying the

methodology unwittingly to observations that violate the requirements will potentially yield

inaccurate results. Additionally, any statistical analysis that relies on it or any quantitative

conclusions that are drawn from it may lack any mathematical foundation as well, although

the regression would wrongly point out it does not. Logically, this introduces a considerable

degree of uncertainty on the validity of the outcome of the statistical assessment. Therefore,

the reliability of the estimation procedure is more a case of a good appraisal by the

researcher, or his experience with regression techniques. Despite these disadvantages, the

classical approach towards parameter estimation as described above, finds its application in

almost all in-house kinetic modeling efforts.

1.1 Aim of the thesis

In this work, the performance of the currently used methodology towards kinetic parameter

estimation will be critically evaluated and reviewed where necessary.

The evaluation will be two-fold. At first, the potential for its overall applicability in other

scientific fields beside chemical kinetics will be assessed by implementing the methodology

on another scientific field, different from chemical kinetics. Following the remark made

above on the significant contribution of model building to gains in sustainability of chemical

technology, the modeling of the electrical characteristics of thin-layer solar cells, a truly

physical process which is a promising candidate for the generation of green energy, has been

elected as a suited candidate. Strong technological advances in this segment of photovoltaic

technology have to often been based on trial-and-error in the past. Recent breakthroughs in

its fundamental modeling of the physics of the system have resulted in an increased interest

for the theoretical approach as a helpful tool to guide further research. By applying the

modeling methodology that has already proved to be successful in the modeling of complex

chemical processes, it is hoped to contribute to this important evolution.

4 The noble art of model building

On the other hand, the robustness of the modeling methodology will be quantified by

comparing the estimated model parameters for a type reaction in both a batch and

continuous-flow configuration. Up to this moment, it is common practice to base kinetic

modeling on experimental data from continuous reactor setups, primarily because it is

believed that following a certain batch reaction in time is detrimental for the non-correlation

between the different data points. By testing the impact of the reactor type on the estimated

model parameters for a transesterification reaction, which is a key step in the production

process of biodiesels, the importance of this concern will be determined.

Besides this twofold evaluation of the currently used methodology, the potential of three

alternative estimation procedures will be benchmarked. By offering small adaptations to the

classical regression framework, these candidate routines try to guard the performance of the

parameter estimation against violation of the theoretical conditions on the experimental

error, as stated above. After implementation of the routines in Matlab, their added value will

be evaluated on a simple linear model, and suggestions will be made about the use of their

incorporation in the current routines.

1.2 Outline of this work

The evaluation of the current statistical methodology by means of the two case studies is

covered in the upcoming two chapters. The outcome of the transesterification experiments

and of the subsequent assessment of the robustness of the estimation procedures will be

discussed in chapter 2. Subsequently, the overall applicability of the methodology will be

evaluated on the modeling of the thin-layer solar cells, which is presented in chapter 3. The

benchmark analysis of alternative estimation techniques is covered in the second part of the

thesis. In chapter 4 the results of the literature survey on alternative techniques towards

model parameter estimation will be presented. Special attention will be paid to the

underlying mathematical theory and the corresponding algorithms, given that current in-

house knowledge about the suggested methods is limited and in view of their final encoding

in Matlab. The results of the actual benchmark analysis from their application on a

rudimentary, linear test model are presented in Chapter 5. Finally, the overall conclusions

and outlooks for further research in this field will be the subject of chapter 6.

1.3 References

1. Tapan, B., Hydrogenation of aromatics: single-event microkinetic (SEMK) methodology and scale-up. 2012.

2. Thybaut, J.W., Kinetic Modeling and Simulation - University Course. 2014, Ghent University.

Case study: kinetic modeling of continuous-flow transesterification 5

Chapter 2

Case study: kinetic modeling of

continuous-flow

transesterification

2.1 Biofuels: a promising alternative to fossil feedstocks?

A growing demand for personal transport and a gradual transition towards a globalized

economy over the last decades have increased the needs for transporting people and

goods. A vast majority of current engines is powered through the combustion of

conventional, carbon-rich fuels, i.e. gasoline, diesel and kerosene. Because of these

increasing transport requirements there is a continuous search for new sources of crude

oil. For decades, both by discovering new oilfields and the expansion of crude

exploitation rates at existing facilities have been able to meet these growing demands.

Although the interest in technology relying on alternative power sources, with electricity

driven engines as the most promising candidate, has grown and resulted in an onset for

commercialization, the strong dependence of the transport sector on fossil feedstock is

expected to remain constant at least for the upcoming years.

Over the last decade, global awareness has grown about the downside of this situation.

As for any carbon-relying energy source, the combustion of fossil fuels for transport

purposes generates significant emissions of CO2 into the atmosphere, a greenhouse gas

known contribute strongly to global climate change [1]. In 2012, transport represented

24.3 % of the total greenhouse gas emissions in the European Union, this way being the

second biggest contributor as it is only preceded by energy production facilities, as

shown in Figure 2-1 [2]. In contrast to other sectors, e.g. industry, households and

agriculture, which showed an average decrease of 15% in greenhouse gas emissions from

1990 to 2012, the emissions associated with transport have seen a strong growth of 36 %,

despite improved vehicle efficiencies. Although road traffic is holding the highest share

in transport emissions, aviation and maritime sectors show the fastest growth of all.

6 Case study: kinetic modeling of continuous-flow transesterification

Figure 2-1 EU28 greenhouse gas emissions by sector in 2012

The supplies of crude oil are known to be finite. Although the prediction about the exact

moment of depletion is not unambiguous, even nowadays the gradual diminishing of

high-quality and easily accessible crude oil reserves has already resulted in an increased

use of heavier and less pure feedstocks. This requires more complex cracking and

refining operations and hence higher production costs to obtain the same product quality

[3]. Logically, the future exploitation of less accessible oilfields will require higher

investment and operating costs, which will induce an additional challenge for the

profitability of the process as a whole. At last, the distinctly unequal spread of crude oil

production facilities over the world has rendered its abundant availability highly

vulnerable to local conflicts and geopolitical motives. Multiple crises in the past have

already demonstrated that long-term stability of crude supply is never ascertained [4].

A growing concern about these potential issues of a continued dependence of economical

and societal needs on crude oil has driven the search for attractive alternative energy

sources, which preferably show both sustainability and long-term secured supply.

Moreover, the ratification of international protocols to counter climate change by

building down global emissions of greenhouse gases, e.g., the Kyoto protocol (1997)

which stipulated a decrease with at least 5% compared to the situation in 1990 by 2012 [5]

and even the more stringent engagement of the European Union member states to reduce

their emissions with at least 20% by 2020 [6] have resulted in an active energy policy to

stimulate the research and development of sustainable energy carriers [7].

One of the most promising outcomes of this quest was found in the field of liquid

biofuels, and more specifically in the counterparts of currently used fuels but originating

from biomass instead of fossil feedstocks. Theoretically, such technology combines a

bunch of advantages that makes it very attractive as a competitor to conventional fuels.

Indeed, since biofuel compounds are chemically identical to the components found in

29.20%

24.30% 17.70%

11.30%

12.50%

5.00%

Heat and power generation

Transport

Industry & Construction

Agriculture

Residential & Commercial

Others


conventional fuels, nowadays engines require few or no adaptation, depending on

whether the biofuels are fed in pure state or blended with a conventional fossil fuel.

When produced out of materials from vegetable origins, biofuels are intrinsically CO2

neutral, as all carbon dioxide which is released upon combustion of these compounds

was captured out of the atmosphere by photosynthesis during the growth of the original

plants. Hence, when the cycle is completed, no net CO2 has been released into the

environment. In this regard, it is important to remark that multiple steps in the

production process of biofuels are energy-consuming, so that the environmental benefit

of the biofuel pathways is, at least partly, cancelled. The high production costs are often

reported as the major impediment for this technology to expand to a fully mature

alternative to fossil fuels, that is capable of being cost-effective and profitable without the

requirement for governmental allowance and lucrative taxation measures and subsidiary

programs [8].

A second advantage of bio-based alternatives is its relatively abundant availability, as it

only requires additional cultivation to meet growing demands, biofuels offer the

perspective on a potentially inexhaustible energy source. This picture has proven to be

too optimistic, as the main feedstocks for biofuel production nowadays are found in

high-quality crops, especially sugar-rich or oily vegetables which need highly-qualitative

agricultural land for proper growth. This way, the production of sufficient amounts of

feedstocks will potentially compete with food supply. The need for sufficient agricultural

areas will be compensated at the expense of forested territories. As a consequence, the

unambiguous transition to such biofuels, belonging to the so-called first generation as they

are produced out of specific crops, raises important ethical questions, which makes its

feasibility as a stand-alone alternative to fossil fuels at least questionable [9].

Nevertheless, over the last decade, the consumption of conventional biofuels has seen a

distinct increase, primarily booming in the early and mid-2000’s and somewhat

stabilizing more recently, as is concluded from Figure 2-2, which shows the recent trends

for the European Union. Due to the variety of fuel products that need to be replaced,

ranging from short-chain gasoline components over kerosene to heavier diesel fractions,

different biomass conversion processes have been developed, each starting from a

specific type of feedstock. Gasoline fractions have been successfully replaced by bio-

ethanol, being produced by the direct fermentation of sugar-rich biomass, primarily

sugarcane, sugar beet and starch [10]. The production of bio-ethanol is a well-established

technology and is still the subject of extensive research efforts to further improve energy

requirements and product yields. Active governmental policies have stimulated the

market expansion over the last years with Brazil, the United States and the European

Union as the largest customers [11].


Figure 2-2 Evolution of the biodiesel (dark columns) and total biofuel (light columns)

consumption and the share of biodiesel (line) in the EU transport sector [12, 13]

However, the largest share of nowadays biofuel consumption in the European Union is

taken by biodiesels, the actual subject of this work. Because diesel fractions consist of

much longer hydrocarbon chains compared to gasoline, different feedstocks and

production processes are required. The most relevant feedstocks which are mentioned in

literature are fatty materials, including vegetable oils and animal greases, which are

transformed into useful hydrocarbon compounds by transesterification process,

discussed more extensively in the next section. The importance of the biodiesel sector in

the field of renewable fuels is depicted in Figure 2-2 as well and shows its vast and

steady dominance on the total biofuel consumption in the transport sector.

2.1.1 Catalytic pathways to biodiesel production

The most attractive biomaterial feedstock that offers the organic building blocks to

produce the relatively long hydrocarbon chains that characterize diesel fractions are fatty

substances. Indeed, vegetable oils and greases from animal origins are in fact tri-ester

combinations of a glycerol molecule and three fatty acids. Depending on the type of fatty

acids that are built in, a different triglyceride or fat type, arises. Fatty esters are carboxylic

acids with an aliphatic carbon chain, typically ranging between 12 and 24 for natural

sources. Most vegetable oils consist of unsaturated fatty esters, so that at least one double

bond is present in the chain. On the other hand, animal fats are typically completely

saturated, which explains why these latter are found in solid form at room temperature.

The direct use of raw vegetable oils as fuels in ordinary diesel engines is hindered by an

excessive viscosity, an insufficient volatility and a limited stability of the oil, the latter

primarily due to the unsaturation of the chain [14]. Therefore, reforming of the fatty

feedstock into actual diesel-like components prior to its application for fueling is

required.

0

1

2

3

4

5

0

2

4

6

8

10

12

14

16

2005 2006 2007 2008 2009 2010 2011 2012

Share in energy

consumption [%]

Yearly

consumption

[Mtoe]


The most preferable industrial pathway towards biodiesel production comprises a

transesterification of the fatty compounds with methanol, to obtain both glycerol and a

methyl ester blend, being the final biodiesel product, see Figure 2-3. As for all

transesterification reactions, this process requires base or acid catalysis. Common

practice makes use of a homogeneous catalyst, with NaOH and KOH as the most eligible

candidates. Unfortunately, the choice for a homogeneous base catalyst during the

production process strongly suffers from the requirement for subsequent purification. To

ensure sufficient biodiesel quality, the catalyst has to be removed from the reaction blend

product by washing it away, which produces high amounts of wastewater to be treated

and hinders the complete recuperation of the catalyst simultaneously. It goes without

saying that any separation step is strongly reflected in higher operation costs as well [15].

Moreover, raw biodiesel feedstock tends to contain a relatively high concentration – more

than 5% – of free fatty acids. These compounds will react with strong base catalysts to

form soaps, which are detrimental for the quality of the product and require a specific

separation consequently. Trace amounts of water in the feedstock are known to favor the

saponification as well [16]. An esterification of the fatty acids with methanol under acid

catalysis prior to the actual transesterification is an attractive pathway to overcome this

issue.

Figure 2-3 General scheme for the transesterification of natural fat materials to produce

methyl esters, the actual biodiesel components, with 𝑅1,2,3 the carbon chain of the fatty acid

Because of these strong disadvantages associated with homogeneous base catalysis,

interest has grown in heterogeneous alternatives that do not show the abovementioned

issues. One potential candidate that has received above average research interests is the

field of heterogeneous inert carriers, typically metallic oxides, with base or acid

functionalities, which allow for circumventing the need for purification and separation

[17, 18]. It has to be noted that this reduction of the process complexity is at the expense

of a loss of reaction rate [19].

A relatively novel approach towards the heterogeneous catalysis of biodiesel production

processes has emerged from the field of ion-exchange resins [20, 21]. Given some major

advantages compared to homogeneous catalysis, including an straightforward


separation from the product stream, only mild corrosion and a favorable long-term

stability allowing for re-use, ion-exchange catalysis by acid cation exchanger has been

identified as a promising alternative. Additionally, the undesired soap formation does

not occur upon acid catalysis.

Suited materials consist of an inert matrix carrying acid functions which are sufficiently

loose-bound to allow for interchange with ions in the surrounding solution. Although

this technique shows some similarities to sorption of solutes on a solid, ion-exchange is

intrinsically stoichiometric with respect to the electrolyte concentration of the solution:

for any ion that is attached to the material, another ion, with the same charge, is released.

Two types of acid ion-exchange materials are distinguished, gel-type as well as

macroporous resins. The inert carrier is formed by a network of styrene-polymers and

divinylbenzene crosslinkers, while acid sulfonic functionalities serve as the actual active

sites. When submerged in polar solvents, part of the liquid molecules interacts with the

acid functions and are protonated. Due to solvation effects, more solvent molecules are

attracted into the material and swelling takes place. The resulting volume expansion is

crucial for the effective functioning of the catalyst as it increases the accessibility of the

active sites. In contrast to gel-type materials, macroporous resins have an intrinsic open

structure, see Figure 2-4, allowing for an enhanced accessibility of the active sites, even in

non-polar solvents. Consequently, the catalytic performance of macroporous resins is less

dependent on the swelling degree of the material as well.

Figure 2-4 Microscopic structure of gel-type (left) and macroporous resins (right) [22]

The potential application of ion-exchange resins for the catalysis of transesterification

reactions has triggered the expansion of research efforts on this alternative approach, and

a proper understanding of the chemical phenomena taking place on the micro-scale has

been the object of extensive modeling efforts. Recently, a complete kinetic model has

been developed, combining the chemical interplay between the reactants and the resin,

the effect of the morphology of the catalyst and the impact of the swelling degree [22]. In

Section 2.2, the most relevant aspects of this model will be briefly discussed.


2.1.2 Outline of this chapter

The assessment of the robustness of the current in-house modeling methodology will be

carried out on the basis of the kinetic model introduced below. The reported values of the

corresponding kinetic parameters have been determined based on experimental data for

the catalytic transesterification of methanol and ethyl acetate in a batch-type reactor

setup. For the actual estimation procedure, a classical nonlinear regression has been

performed in the statistical software package Athena Visual Studios. By repeating these

experiments in a tubular reactor setup under the same process conditions, continuous-

flow data have been collected and in turn subjected to a regression procedure. By

comparing the outcome of this estimation with the original values from the batch

experiments, this case study will provide a quantitative assessment of both the accuracy

and the robustness of the modeling methodology towards variable reactor.

In what follows, first the kinetic model under study will be discussed into the relevant

details. After a thorough description of the experimental setup, the results of the kinetic

parameter estimation and the subsequent analysis of the performance of the regression

methodology will be given.

2.2 Reactor-scale kinetic modeling of the

transesterification reaction on macroporous resins

2.2.1 Reaction network for catalytic transesterification of methanol and ethyl

acetate

The use of ion-exchange resins rather than the more common, “inert-carrier based”

catalysts which introduces some additional degrees of complexity in the kinetic modeling

effort. Indeed, the coupling of a reactive component on the resin involves an exchange

process instead of the purely sorptive adherence encountered in the majority of

heterogeneous catalysis. Additionally, the impact of the resin’s swelling on its

effectiveness to accelerate the transesterification reaction will have to properly described

and translated mathematically, in order to allow for a model that combines accuracy and

completeness. The kinetic model was originally developed for the transesterification of

ethyl acetate with an excess of methanol, so that the latter will act as the polar solvent

required for swelling of the ion-exchange resin.

Basically, the proposed kinetic model is an extension of the typical adsorption-based

behavior which has already been repeatedly described for heterogeneous catalysis.

Because of the strong interaction of the resin with the polar methanol, it is assumed that

the latter will occupy all available active sites of the catalyst upon contact with the

reaction mixture. By reacting with the acid functions, the adsorbed methanol becomes

protonated, and the resin starts to swell. The uptake of solvent will stop once

thermodynamic equilibrium is reached. The reaction mechanism is graphically


represented in Figure 2-5. It is important to remark that all steps in this mechanism, both

the actual surface reaction and the different exchange reactions, are explicitly assumed to

be equilibrated.

The first step in the actual transesterification reaction is the adsorption of the ester, in this

case ethyl acetate, on the active sites of the material. Since all sites are occupied by

solvent molecules, this boils down to an exchange between the protonated methanol and

the ester in the liquid phase inside the resin, depicted as 1 in the figure. The now

protonated, and hence activated, ester molecule is now readily available for an Eley-

Rideal type surface reaction with a free, i.e. unbounded, solvent molecule from the liquid

phase, step 2. The actual transesterification step occurs in the adsorbed species, depicted

as step 3, and was found to be rate-determining. After the release of an ethanol molecule

Figure 2-5 Reaction mechanism for the transesterification of ethyl acetate and methanol on

an ion-exchange resin

1

2 4

3

5

exchange

transesterification

𝐾𝑆𝑅

𝐾𝑀𝑒𝑂𝐴𝑐

𝐾𝐸𝑡𝑂𝐴𝑐


into the solvent, the desired, yet still adsorbed, methyl acetate end-product is formed, see

step 4. Eventually, the transesterification is completed by a second exchange step, shown

as step 5, in which the protonated methyl acetate is again substituted by a methanol

molecule from the liquid phase.

2.2.2 Model for continuous reactor configuration

Since a tubular reactor will be used to perform the continuous-flow experiments,

discussed in detail in Section 2.3.2, the setup is modelled as ideal plug flow. Therefore,

the following mass balance holds for each component at every point along the catalyst

bed in the tube:

𝑑𝐹𝑖

𝑑𝑊= 𝑅𝑤,𝑖 (2-1)

where 𝐹𝑖 is the molar flow rate of component 𝑖 through the reactor, 𝑊 is the catalyst mass

𝑅𝑤,𝑖 is the net specific rate of formation of component 𝑖. Except for the internal standard,

which does not participate in the transesterification reaction, material balances like (2-1)

are applied to the reactants methanol and ethyl acetate, as well as for the products

ethanol and methyl acetate. Assuming a constant volumetric flow, i.e. neglecting any

density changes of the mixture due to reaction, the material balance is rewritten in terms

of the concentration of the components, as:

𝑑𝐶𝑖

𝑑𝑊=

1

�̇�𝑅𝑤,𝑖 (2-2)

where �̇� is the total volumetric flow rate through the tube. When adopting these reactor

model equations, it is implicitly assumed that the operation occurs in the intrinsic kinetic

regime, in accordance with the findings on negligible transport limitations as stated in

the original research.

Since the net specific rate of formation 𝑅𝑤,𝑖 strongly depends on the reaction mechanism

underlying the reaction under study, it is in general a complex function of both the

concentrations of the reactive components and of the reaction temperature. The

derivation of a useable analytical expression for this term requires some assumptions on

the chemical kinetics of the system. As was mentioned in Section 2.2, the surface reaction

denoted as step 3 was elected as the rate-determining step. Consequently, the rate of the

global reaction equals that of the surface reaction, as the rates of all other steps in the

mechanism as depicted in Figure 2-5 will instantaneously adjust to it according to what

equilibrium requires. Additionally, the frequently applied quasi-steady state

approximation is used for all absorbed species on the catalyst surface. Combining both

assumptions results in a closed relation between the rate 𝑟𝑆𝑅 of the surface reaction and

the activities 𝑎𝑖 of the involved species:


𝑟𝑤 ≡ 𝑟𝑆𝑅,𝑤 =

𝑘𝑆𝑅𝐾𝐸𝑡𝑂𝐴𝑐 (𝑎𝐸𝑡𝑂𝐴𝑐 −1

𝐾𝑒𝑞

𝑎𝐸𝑡𝑂𝐻𝑎𝑀𝑒𝑂𝐴𝑐𝑎𝑀𝑒𝑂𝐻

)

1 +𝐾𝐸𝑡𝑂𝐴𝑐𝑎𝐸𝑡𝑂𝐴𝑐

𝑎𝑀𝑒𝑂𝐻+

𝐾𝑀𝑒𝑂𝐴𝑐𝑎𝑀𝑒𝑂𝐴𝑐𝑎𝑀𝑒𝑂𝐻

(2-3)

where the rate coefficients and equilibrium constants are in accordance to Figure 2-5. The

equivalent equilibrium constant 𝐾𝑒𝑞 of the global reaction obeys:

𝐾𝑒𝑞 =𝐾𝐸𝑡𝑂𝐴𝑐𝐾𝑆𝑅

𝐾𝑀𝑒𝑂𝐴𝑐 (2-4)

The link between the activities 𝑎𝑖 of the components and their concentrations 𝐶𝑖 is given

by:

𝑎𝑖 = 𝛾𝑖 ∙ 𝐶𝑖 (2-5)

where the proportionality constant 𝛾𝑖 is the activity coefficient of that particular

component. Since the strongly differing polarity of the different species in the reaction

mixture, the original kinetic model explicitly accounted for non-ideal thermodynamic

behavior of the liquid phase by applying the UNIFAC method relying on group

contribution theories. This allows for estimating the thermodynamic properties of a

specific component in a complex mixture by splitting up the molecule in its functional

groups. Subsequently, by means of theoretical correlations available in literature, the

interaction of the different groups is modelled, yielding values for the activity of the

different components after summation. The specific net rate of formation of each

component is then given by:

𝑅𝑤,𝑖 = 𝜈𝑖𝑟𝑤 (2-6)

where 𝜈𝑖 equals −1 for the reactants methanol and ethyl acetate and 1 for the ethanol and

methyl acetate products. Combining equations (2-2) to (2-6) yields a differential equation

for the composition of the mixture along the reactor. This way, for each of the reactants

and the product components, the concentration at the end of the catalyst bed can be

calculated once values for the set of unknown rate coefficients 𝑘𝑆𝑅, 𝐾𝐸𝑡𝑂𝐴𝑐 and 𝐾𝑀𝑒𝑂𝐴𝑐 are

available. The temperature dependence of the rate coefficient of the surface reaction is

given by an Arrhenius relation:

𝑘𝑆𝑅 = 𝐴𝑆𝑅exp (−𝐸𝑎,𝑆𝑅𝑅

𝑇) (2-7)

which is typically reparametrized for parameter estimation as:

𝑘𝑆𝑅 = 𝑘𝑆𝑅,𝑇𝑚exp (−𝐸𝑎,𝑆𝑅𝑅 (

1

𝑇−

1

𝑇𝑚)) (2-8)

with 𝑇𝑚 the average experimental temperature, and 𝑘𝑆𝑅,𝑇𝑚 the rate coefficient at that

temperature. For the equilibrium constants, the temperature dependence is neglected, as

suggested for the original model. This way, a set of 4 unknown parameters remains,


i.e. 𝐴𝑆𝑅, 𝐸𝑎,𝑆𝑅, 𝐾𝐸𝑡𝑂𝐴𝑐 and 𝐾𝑀𝑒𝑂𝐴𝑐, which have to be estimated based on an optimal fitting

of the collected experimental data.

2.3 Experimental setup and procedures

2.3.1 Reactants and catalyst

The transesterification reaction was run with ethyl acetate, purity of 99+%, Chemlab, and

methanol, purity 99.8+%, Chemlab, as reactants. N-decane, with 99+% purity from Acros

Organics, was used as internal standard. Experiments have been performed for feed

mixtures with an initial methanol to ethyl acetate ratio of 1:1, 5:1 and 10:1. Table 2-1 lists

the weights of reactants and internal standards that were added for each of these

situations. The amount of n-decane was chosen so that a mass fraction of 5% in the initial

mixture was obtained.

Table 2-1 Mass of reactants and internal standard used to make up

each of the studied feed compositions, in gram

1:1 5:1 10:1

methanol 1474 2563 3557

ethyl acetate 4053 1410 978

n-decane 291 209 239

The macroporous ion-exchange resin Lewatit K2629, produced by Lanxess, was used to

catalyze the transesterification reaction. It consists of small beads with a diameter of

about 1 mm, consisting of polystyrene and divinylbenzene crosslinkers. The active acid

sites are formed by sulphonic functional groups. The resin is thermostable up to 398 K;

for higher temperatures a loss in catalytic activity is observed [22].

Because of the strong interaction of the acid sites with polar components, water tends to

adsorb instantaneously upon exposure of the catalyst to the air. Therefore, to ensure an

optimal activity of the resin, all adsorbants have to be removed prior to any experiments

by freeze-drying the beads under vacuum conditions. However, some impurities other

than water may be adsorbed on the catalyst surface as well. Because of the sensitivity of

the freeze-drier to primarily acid components, any polluting substance on the resin has to

be removed in advance. To this end, the catalyst was first put in a dialysis setup for two

days. A semipermeable membrane of 15 kDa was used as a container for the resin and

pended in a vessel of water, which was refreshed every day. The visual difference

between the clean catalyst and initial material was remarkable. As is readily seen from

Figure 2-6, the color of the initial material was heterogeneous, where some of the beads

were deep brown, while after the dialysis all of the catalyst has the same beige tint.

Subsequently the catalyst was put in the freeze-drier, where the material was cooled


down to -73 °C under a pressure of 0.370 bar. In these conditions, the sublimation of the

adsorbed water is favored, allowing for a freeing of the active sites. Any formed water

vapor is then captured by deposition on a cold disk at the bottom of the installation.

Figure 2-6 Original catalyst (left) and the material after dialyzing (right)

After two days, the dry catalyst was evacuated from the vacuum and sealed from the air

with parafilm immediately, waiting for its addition to the reactor setup.

2.3.2 Tubular reactor setup

The entire reactor configuration is depicted in Figure 2-7. The reactor core consists of a

glass, double-walled tube, placed vertically and connected with the feed reservoir at the

bottom and with a drain at the top. The inner tube is filled half with Raschig rings, with a

tuft of glass wool which acts as the support for the catalyst plug on top. By means of an

adjustable volumetric pump, which manages flow rates ranging from 1 to 20 ml per

minute, the reactor is then filled with methanol up to a few centimeters above the glass

wool. Subsequently, the parafilm seal of the catalyst holder is broken, and, after having

weighed 2.00 gram of the resin, it is added into the methanol layer. After having waited

for a quarter of an hour to allow for swelling of the resin, another layer of glass wool is

applied on top of the catalyst plug. The tube is then further filled with Raschig rings and

made leak-proof meticulously. Afterwards, the reactor is flushed with methanol.


Figure 2-7 Tubular reactor setup as used during the continuous-flow experiments

The outside ring of the tube is connected to a heater using thermal oil as heating liquid.

This way, the temperature of the catalyst bed can be changed as required and kept

almost constant during the reaction, for each experiment. Experiments were performed at

30, 45 and 60°C, well below the critical temperature for the stability of the catalyst and

therefore limiting its deactivation.

Besides temperature also the feed flow rate and feed composition are process conditions

that are easily manipulated. Feed mixtures with a molar ratio ranging from 1:1 over 5:1 to

10:1 have been prepared, and sent through the reactor at 5, 8 and 15 ml/min. Together

with the varying reactor temperature, varying these parameters creates a collection of 27

different experimental conditions, and all of them have been investigated. It was

observed that, after addition of the polar reactants and apolar internal standard to the

feed reservoir, a thin layer of n-decane floated on the bulk. To break this phase separation

and ensure the homogeneity of the feed composition, the feed container was stirred prior

to its implementation in the setup. Experiments have been performed successively to

minimize time loss. To avoid contamination of subsequent measurements, the reaction

had to be run at the new process conditions for a sufficiently long time before the new

sample was taken from the reactor effluent. The optimal run time in between to samples

was determined by means of a stability test of the reactor, i.e. the reactor was fed with a

mixture of methanol, ethanol and internal standard of known composition. Samples of

the reactor effluent were taken every 30 minutes and injected in a GC analyzer, see below

for the technical details and applied during this analysis. This way, the composition of

the reactor effluent was retrieved. Comparison of the results over time learnt that the

concentrations of reaction products stabilized after 60 minutes and therefore, the time


between to samples was set at this value. Moreover, as running the reaction for more

than 4 hours did not result in any change in the effluent composition, it was concluded

that the catalyst stability sufficed to .

To determine the concentrations of the different components in the reactor effluent,

which will be used as the response values for the kinetic parameter estimation, the

samples were subjected to a GC analysis. The gas chromatograph is a 6850 Network GC

from Agilent Technology. The capillary column, 1.6 m long and with an internal

diameter of 0.25 mm, is covered with a 25 µm layer of polydimethylsiloxane. Helium is

used as carrier gas with an initial flow rate of 1.00 ml/min. The actual detection takes

place in a flame ionization detector, with settings as listed in Table 2-2.

Table 2-2 Settings of the FID

Temperature 300 °C

Hydrogen flow 60 ml/min

Air flow 450 ml/min

An autosampler was used to inject 1.2 µl of the samples in the GC analyzer. In between

two subsequent injections, the syringe was washed 6 times with hexane and thereafter 6

times with the reaction mixture itself. To limit the loading of the column during the

analysis, the split ratio was set at 20:1. By applying a tuned temperature program for the

GC analysis, it was ensured that all component peaks were nicely separated in time,

allowing for an accurate integration afterwards. The temperature is first hold at 40°C for

10 minutes, followed by a constant increase of 10°C/min during 11 minutes. The

integration of the peaks of the chromatogram was carried out by the EZChrom Elite

client/server software. The retention time for the different components in the reaction

mixture is given in Table 2-3.

Table 2-3 Retention time for the reactants, products and internal standard

Component Retention time [s]

Methanol 5.0

Ethanol 5.3

Methyl acetate 5.9

Ethyl acetate 7.3

n-decane 20.7


To calculate the actual concentrations of the different components from their integrated

peak areas, calibration factors are required. Prior to the actual experiments, multiple

attempts have been made to determine these values experimentally. Therefore, sets of

five samples with a well-known composition of all reactants, products and the internal

standard were made up and subjected to a GC-analysis. Normally, a linear relation has to

emerge when plotting the known concentrations with respect to the collected peak areas,

for each of the components. Unfortunately, for some not yet identified reason no regular

plots were obtained, even after repeating this procedure for a second and third time.

Therefore, the calibration method of Dietz (1967) for Hydrogen Flame Ionization

Detectors was considered to process the GC results from the actual experiments [23].

Following this procedure, the weight fractions 𝑤𝑖 of the 𝑖’th component in the reaction

mixture is linked to the corresponding peak area 𝐴𝑖 by:

𝑤𝑖 =

𝐴𝑖𝑅𝑆𝑖

⁄

∑𝐴𝑗

𝑅𝑆𝑗⁄𝑗

(2-9)

where 𝑅𝑆𝑖 is the relative sensitivity of the component, for which tabulated values are

available for the most common chemical compounds and the normalization factor in the

denominator takes up all components in the mixture. The relative sensitivities for the

components in the reactor effluent stream are given in Table 2-4. Once weight fractions

were obtained for all components, the corresponding molar fractions 𝑥𝑖 were calculated

by:

𝑥𝑖 =

𝑤𝑖𝑀𝑊𝑖

⁄

∑ 𝑤𝑗 𝑀𝑊𝑗⁄𝑗

(2-

10)

with 𝑀𝑊𝑖 the molar mass. Out of these molar fractions, the total concentration of each

products follows straightforwardly as:

𝐶𝑖 =𝑥𝑖 ∙ 𝑀𝑡𝑜𝑡

𝑉𝑡𝑜𝑡= 𝑥𝑖 ∙ 𝐶𝑡𝑜𝑡 (2-11)

where the total molar concentration of all components 𝐶𝑡𝑜𝑡 yet to be determined.

However, study of equations (2-2), (2-3) and (2-5) learns that this factor cancels out on

both sides of the reactor equations and therefore does not need to be determined to allow

for estimation of the kinetic parameters.


Table 2-4 Relative sensitivities of the reactants, products and internal standard

Component Relative sensitivity [-]

Methanol 0.23

Ethanol 0.46

Methyl acetate 0.20

Ethyl acetate 0.38

n-decane 1.00

2.4 Discussion of the experimental data

Once the molar composition of the samples for all 27 experiments was unraveled from

the from the GC data, the quality of the results was assessed by inspection of the mass

balance. Indeed, as was mentioned in the procedures section, care was taken that each

feed contained 5% of n-decane. Since this compound is an inert for the transesterification

process, it is expected that, in normal operation without leakage, the reactor effluent

contains exactly the same amount of internal standard. Moreover, as the reaction is

equimolar, the calculated molar fractions for both products have to be both equal to the

reductions of the initially present reactants. Hence, two measures are at disposal to

evaluate the quality of the data.

Surprisingly, this closure was never achieved for both methods simultaneously. The

relative mismatch of the calculated mass fraction of n-decane in the effluent was ranged

from 0.4 to 49.6%, while the percentage deviation between the calculated molar fractions

of ethanol and methyl acetate varied between 22 and 94%, which is far above errors that

are attributed to random experimental errors. Moreover, the amount of methanol, still a

reactant, in the reactor effluent was found to have increased compared to the feed. Since

no distinct explanations were found for these disappointing results, the interpretation of

their occurrence is not that straightforward.

It was noticed during the experiments, especially those at 45°C and 60°C, that some gas

bubbles popped up above the heating zone. At least one component appeared to

vaporize at these temperatures, although the boiling points of all reactants, products and

standard are higher. Vapor leakages were assumed to be negligible because the bubbles

had to pass a relatively long, non-heated zone before they reached the sampling point.

Therefore, it was expected that most of the bubbles would have condensed there, give

their relatively high boiling points. It is possible that this assumption was slightly too

ambitious and that this evaporation did change the composition of the effluent


substantially. However, the equally present deviations at 30°C, where no bubble

formation was noticed, are not explained this way.

Alternatively, the unexpected observations may arise from ineffective or insufficient

reactor stabilization, i.e. part of the reaction mixture of the foregoing experiment was not

entirely flushed out and mixed up with the product stream of its successor. Although it

surely is a plausible argument, the measurements on the reactor’s stabilization prior to

the actual experiments were rather convincing that the chosen sampling period would

suffice. Moreover, if this mixing is indeed the culprit of the bad data quality, it is

reasonable to expect this effect would be smaller for the experiments at high volumetric

flow rate, and hence a fast flushing of the reactor content. Nevertheless, this trend was

not seen.

A third possible explanation was searched in the bad mixing of the reactants and the

internal standard. Indeed, as was mentioned in the previous section, the feed reservoir

had to be stirred prior to the experiments because of phase separation of the n-decane

and the polar reactants. Although no demixing was observed anymore afterwards, nor in

the feed container, nor in the vials with the samples, it is possible that a separation of the

n-decane did occur. However, if this was the case, it would suffice to neglect the

corresponding GC peak area and repeat the normalization in (2-9) for the products and

reactants solely to obtain the required reaction stoichiometry. Unfortunately, even this

procedure did not yield the desired results, raising once more doubt about the

plausibility of this reasoning.

A last potential culprit was found in the invalidity of the calibration factors. As has been

mentioned above, these values had to be taken from literature as repeated attempts to

determine them experimentally have all failed for a not yet identified reason. In this

perspective, it is interesting to remark that the explanation for these unsuccessful

attempts to collect qualitative calibration data is rooted in the same issue that caused the

poor outcome of the transesterification as well.

Although the quality of the data is doubtful, inspection and comparison of the calculated

composition for the product streams do allow for assessing the impact of the different

process variables on the process of the reaction. A useful variable to describe the degree

at which a reaction has occurred is the conversion of a certain component. For the 𝑖’th

component, its definition reads:

𝑋𝑖 =𝐶𝑖,𝑖𝑛 − 𝐶𝑖,𝑜𝑢𝑡

𝐶𝑖,𝑖𝑛=

𝑥𝑖,𝑖𝑛 − 𝑥𝑖,𝑜𝑢𝑡

𝑥𝑖,𝑖𝑛 (2-12)

where the latter equality holds because both the volume and the total amount of moles

are expected to be constant during reaction. Here, it was opted to base the calculations on

ethyl acetate, which resulted in the plots given in Figure 2-8.


Figure 2-8 Experimentally obtained conversions of ethyl acetate for varying flow rates,

temperatures and initial molar feed composition

0

10

20

25 35 45 55 65

Conversion [%]

Temperature [°C]

Methanol : ethyl acetate = 1 : 1

5 ml/min

8 ml/min

15 ml/min

0

10

20

30

40

25 35 45 55 65

Conversion [%]

Temperature [°C]


5 ml/min

8 ml/min

15 ml/min

0

10

20

30

40

25 35 45 55 65

Conversion [%]

Temperature [°C]


5 ml/min

8 ml/min

15 ml/min


The observed trends all correspond to what is intuitively expected. At first, for an

increasing excess of methanol, the conversion of ethyl acetate is favored. Indeed, in

accordance with the rate equation in (2-3), an increase of the amount of methanol will

augment the reaction rate and hence the degree of consumption of the reactants. The

biggest gain in conversion is obtained when going from a 1:1 to a 5:1 initial feed

composition; for a 10:1 molar ratio the increase is less distinct. Secondly, for each feed

composition, a higher temperature induces an increase in the conversion as well. Indeed,

according to the Arrhenius dependence of rate coefficient, see (2-7), the consumption of

reactants will be higher for increasing temperature. At last, a negative effect of the

volumetric flow rate is observed. Indeed, the space time is lower for higher flow rates,

which results in a less intense reaction and hence a reduced conversion of the reactants.

2.5 Parameter estimation and statistical analysis

Although the quality of observed concentrations was found to be doubtful, the data were

put in the simulation software Athena Visual Studio (AVS). The code of the kinetic model

for transesterification on ion-exchanging resins as described by E. Van de Steene [22],

originally developed for batch-reactor setups, served as a starting point for the

processing of the continuous-flow data in this work. Logically, the batch reactor model

equations were adapted to their plug-flow analogues, see equation (2-2), while the

section to correct for non-ideal mixing of the components was updated by updating the

settings for the original internal standard n-octane to those for n-decane. As for the

original model, the measured ethyl acetate concentrations are used to regress the model

and obtain the model parameters. The Bayesian minimization as implemented in AVS

was used to perform the optimal fitting.

In a first step, the ethyl acetate concentration were simulated for all studied process

conditions, using the reported values of the kinetic parameters as retrieved from batch-

reactor data and listed in Table 2-5. The results of this simulation are depicted in Figure

2-9.

Table 2-5 Reported kinetic parameters for the transesterification of ethyl acetate and

methanol on Lewatit K2629, as originally estimated from batch-reactor data [22]

Kinetic parameter Estimated value

𝒌𝑺𝑹,𝑻𝒎 9.2 10−3 mol kg𝑐𝑎𝑡

−1 𝑠−1

𝑬𝒂,𝑺𝑹 49.7 kJ mol−1

𝑲𝑴𝒆𝑶𝑨𝒄 9.04 [-]

𝑲𝑬𝒕𝑶𝑨𝒄 1.15 [-]


Figure 2-9 Comparison of the observed (markers) and the simulated (line) conversion of

ethyl acetate concentration based on the reported kinetic parameter values for the studied

temperature, feed composition and volumetric flow (♦ 5, ■ 8 and ▲ 15 ml/min)

It is striking that the available kinetic parameters yield very poor predictions for the

observed conversion. The increase of the observed conversion for higher temperatures is

considerably less steep than the model predicted trend. Moreover, using the original values

of the kinetic parameters, the simulated conversion profiles have a much stronger mutual

difference for varying space time compared to the experimental values. Therefore, an

attempt is made to determine those parameter values which fit the continuous-flow data

optimally. Unfortunately, the statistical software was not able to yield significantly estimated

values for the parameters. Nevertheless, it was noticed that most of the misfit disappeared

for a higher value for the rate coefficient 𝑘𝑆𝑅 for the surface reaction, either by increasing the

pre-exponential factor or by lowering the activation energy.

To conclude, experiments have been performed on a continuous-flow transesterification

setup. The observed composition of the reactor effluent was found to be of doubtful quality,

as the mass balance did not close and stoichiometric requirements were not met. Some

potential explanations have been formulated, although the precise reason for the deviating

results has not been identified. The collected data have been put into the kinetic model, using

the reported kinetic parameter values which were originally estimated based on batch-data.

A strong discrepancy was revealed between the model predictions and the actual

observations on the conversion of the reactant ethyl acetate. Unfortunately, due to the

uncertain reliability of the experimental data, it was not possible to draw conclusions on the

accuracy of the kinetic parameter values from this misfit.

0

10

20

30

40

50

60

70

80

30 45 60Temperature [°C]

10 : 1

0

10

20

30

40

50

60


5 : 1

0

5

10

15

20

25


1 : 1

conversion [-]


2.6 References

1. Nigam, P.S. and A. Singh, Production of liquid biofuels from renewable resources. Progress in Energy and Combustion Science, 2011. 37(1): p. 52-68.

2. European Commision. Reducing emissions from transport. 25 March 2015; Available from: http://ec.europa.eu/clima/policies/transport/index_en.htm.

3. Van Geem, K.M., Sustainable Chemical Production Processes - University Course. 2014, Ghent University.

4. Oberling, D.F., et al., Investments of oil majors in liquid biofuels: The role of diversification, integration and technological lock-ins. Biomass and Bioenergy, 2012. 46(0): p. 270-281.

5. UN, Kyoto protocol to the united nations framework convention on climate change. 1998, United Nations New York, NY.

6. Directive 2009/28/EC of the European Parliament and of the Council of 23 April 2009 on the promotion of the use of energy from renewable sources and amending and subsequently repealing Directives 2001/77/EC and 2003/30/EC. 2009, European Parliament, Council of the European Union.

7. Rajagopal, D. and D. Zilberman, Review of Environmental, Economic and Policy Aspects of Biofuels. 2007: World Bank Publications.

8. Demirbas, A., Political, economic and environmental impacts of biofuels: A review. Applied Energy, 2009. 86, Supplement 1(0): p. S108-S117.

9. Meher, L.C., D. Vidya Sagar, and S.N. Naik, Technical aspects of biodiesel production by transesterification—a review. Renewable and Sustainable Energy Reviews, 2006. 10(3): p. 248-268.

10. Balat, M. and H. Balat, Recent trends in global production and utilization of bio-ethanol fuel. Applied Energy, 2009. 86(11): p. 2273-2282.

11. Mussatto, S.I., et al., Technological trends, global market, and challenges of bio-ethanol production. Biotechnology Advances, 2010. 28(6): p. 817-830.

12. EU energy in figures - Statistical pocketbook. Collected from editions 2005-2013. Directorate-General for Energy - European Commission.

13. Eurostat, Share of renewable energy in fuel consumption of transport. 14. Demirbaş, A., Biodiesel fuels from vegetable oils via catalytic and non-catalytic supercritical

alcohol transesterifications and other methods: a survey. Energy Conversion and Management, 2003. 44(13): p. 2093-2109.

15. Di Serio, M., et al., From homogeneous to heterogeneous catalysts in biodiesel production. Industrial & Engineering Chemistry Research, 2007. 46(20): p. 6379-6384.

16. Ma, F. and M.A. Hanna, Biodiesel production: a review1. Bioresource Technology, 1999. 70(1): p. 1-15.

17. Bournay, L., et al., New heterogeneous process for biodiesel production: A way to improve the quality and the value of the crude glycerin produced by biodiesel plants. Catalysis Today, 2005. 106(1–4): p. 190-192.

18. Di Serio, M., et al., Heterogeneous Catalysts for Biodiesel Production. Energy & Fuels, 2008. 22(1): p. 207-217.

19. Semwal, S., et al., Biodiesel production using heterogeneous catalysts. Bioresource Technology, 2011. 102(3): p. 2151-2161.

20. Tesser, R., et al., Kinetics and modeling of fatty acids esterification on acid exchange resins. Chemical Engineering Journal, 2010. 157(2–3): p. 539-550.

21. Russbueldt, B.M.E. and W.F. Hoelderich, New sulfonic acid ion-exchange resins for the preesterification of different oils and fats with high content of free fatty acids. Applied Catalysis A: General, 2009. 362(1–2): p. 47-57.

http://ec.europa.eu/clima/policies/transport/index_en.htm


22. Van de Steene, E., Kinetic study of the (trans)esterification catalyzed by gel and macroporous resins. 2014.

23. Dietz, W.A., Response Factors for Gas Chromatographic Analyses. Journal of Chromatographic Science, 1967. 5(2): p. 68-71.

Case study: modeling of current-voltage characteristics of thin layer solar cells 27

Chapter 3

Case study: modeling of current-

voltage characteristics of thin

layer solar cells

3.1 Solar energy: towards a bright, sustainable future?

Over the last decades, the rapid expansion of global industrial capacity to meet the

growing demand for industrial products and the evolution towards a highly technology-

driven society due to major advances in research and product development have induced

a massive increase in the worldwide consumption of energy. Compared to 1965, the

needs for primary energy on an annual basis have tripled, and a further rise in demand

at a similar pace is to be expected for the near future, as is shown in Figure 3-1. Up to

now, a vast majority of this energy supply is still taken by classical, carbon-based fuels,

as more than 80% of the total amount of energy is produced by the combustion of oil,

coal and natural gas. Over the past couple of years, the detrimental impact of this carbon-

based way of energy generation on climate change has been made clear extensively and

has put an urge on the search for new sources of energy, which form sustainable and

environmentally friendly alternatives. Additionally, by establishing sufficient production

facilities for these renewable energy sources, the dependence of the energy demand on

the capricious price evolution of classical energy sources is lowered.

28 Case study: modeling of current-voltage characteristics of thin layer solar cells

Figure 3-1 Evolution of the global energy demand by fuel over the last 50 years

and outlooks for the near future [1]

Besides the kinetic energy of wind and the hydrodynamic power of falling water, a great

opportunity was thereby found in the valorization of the solar energy that is freely

available in our surroundings. Every day, the equivalent of 10000 times the annual global

energy demand is incident on the earth’s surface as sunlight. However, today only a

marginal part of this gargantuan amount of energy is effectively captured [2]. Due to this

significant potential of the recuperation of solar energy to supply growing energy

demands, the interest for valorizing techniques has grown. Typically, three technologies

are available to capture the radiant energy of the sun and transform it into a form that is

more easily handled, going from heat recuperated by boiler systems, over power

generated by means of solar-generated steam to electricity produced by photovoltaic

panels, consisting of semiconductor, mostly silicon-rich, materials [3], the latter being the

focus of this work. Moreover, due to the intrinsic renewable character of solar energy,

governmental action has been undertaken to actively support the expansion of sufficient

production capacity to increase the share taken by renewable sources in the energy

envelope.

Under the impulse of the binding engagement of the European Commission to raise the

share of the energy production taken by renewable sources to at least 20% by 2020 to

gradually build down Europe’s dependence on conventional energy carriers and lower

the negative impact of its energy supply on the environment, the development of

Renewables

Hydropower

Nuclear

Annual energy

consumption

[billion tonne of

oil equivalent]


photovoltaic technology and the extension of the production capacity were strongly

stimulated by an active energy policy. This way, the last decade Europe has become by

far the leading region with regard to the installed photovoltaic production capacity of

generated solar energy, as is depicted in Figure 3-2 [4]. The share of photovoltaic

electricity in the entire energy production from renewable sources – wind, solar and

water – has increased from 0.1% in 2000 to 9.2% in 2013 [1, 5].

Figure 3-2 Recent evolution of the total photovoltaic production capacity worldwide

At the present, photovoltaic technology is still largely dominated by silicon-based

devices: classical designs relying on crystalline or polycrystalline silicon wafers account

for more than 80% of the photovoltaic market [6]. These solar panels consist out of slices

of pure, i.e. uncontaminated, silicon with a typical thickness of about 400 µm and a

surface area of 100 cm², which form the core of the unit as the photocurrent generation

takes place in being the forming the . By a series connection of typically 28 to 36 of these

cells into modules, dc output voltages of ca. 10 V are realized, which suffices for proper

device operation; a further increase in generated voltage and current through the device

is obtained by combining multiple modules is series and in parallel. To abduct the

generated photo-electricity, a grid of metallic contacts is imprinted on the cell surface.

Polycrystalline silicon wafers are easier to produce than the crystalline variant, at the

expense of lower cell performance due to the presence of lattice defects in the

semiconductor material. Typical commercial device efficiencies range from 15 to 22%,

while lab efficiencies exceeding 40 % render prospects for even higher performance in the

near future.

Nevertheless, the high initial investments still associated with photovoltaics result in

long payback times, which form a serious impediment for this technology to become a

serious challenger of conventional energy sources. Moreover, rising silicon prices in the

past due to market shortage of raw silicon have demonstrated the strong sensitivity of

0

20000

40000

60000

80000

100000

120000

140000

160000

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

Capacity

(MW)

Rest of the world

China

Americas

Asia Pacific

Europe


the feasibility to feedstock prices and put an urge on the reduction of the material use

and the synthesis of performant alternatives competitive to silicon [7]. Research has

therefore resulted in drastic increases in the efficiency of classical solar panels based on

silicon, but also in the development of new and promising semiconductor materials as

potential replacers for silicon and the feasibility of thin-layer solar cells (TLSC), enabling

the production of flexible photovoltaic modules. For a long time, knowledge about the

underlying physical mechanisms was very limited and the development and production

of thin film solar technology was relying almost entirely on trial and error and experience

rather than on purposeful design, which logically strongly impeded advances in this

field. Recent insights in the nature of the microscopic phenomena have led to promising

models for the behavior of TLSC technology; the assessment of their overall applicability

and statistical relevance is the focus of this chapter. Before taking a closer look at the

proposed models for TLSC’s, the current state of the technology itself will be briefly

touched, however, not without first having given a short introduction on the physics

behind solar cells.

3.1.1 General working principle of a photovoltaic device

As for all photovoltaic devices, the working principle of CIGS-based thin-layer solar cells

relies on the photovoltaic effect which many semiconductor materials show when being

shone by sufficiently energetic light. Part of the incident light is absorbed by the material

and the associated photonic energy is transmitted to the solid’s electrons. If this energy

suffices to overcome the band gap between the valence and the conduction band of the

material, the electron is disrupted from its binding orbital and gains sufficient mobility to

move through the semiconductor. To prevent the excited electron from dissipating its

gained energy by releasing heat followed by retaking its position in the electronic shell, a

potential has to present over the material, which sweeps the generated, mobile photo-

electrons towards the positive terminal where they are collected and fed to the grid. The

covalent bond that is broken upon freeing of the electro is now available to bind to other

electrons in the lattice structure so that, simultaneously to the electron transport, a

positive hole moves through the material [8].

To ensure the separation of the electron-hole pairs, the photovoltaic device thus has to be

foreseen by a built-in driving force. Typical designs rely on the junction principle, a well-

known characteristic of the interface between two semiconductor materials that are

specifically designed to show a different electronic behavior by the purposeful addition

of impurities. Incorporation of those so-called dopant compounds, elements that have a

higher or lower number of valence electrons compared to the pure semiconductor and

are added in trace amounts to the material, indeed increases the concentration of charge

carriers in, and hence the conductivity of, the semiconductor. Impurities of the first type

have an excess of valence electrons compared to the number of bonds formed in the

semiconductor lattice. If the energy of this additional electron exceeds the Fermi level of


the pure, often named intrinsic, material, the Fermi level will increase upon

contamination, which results in turn in a higher electron concentration in equilibrium.

Because of the excess of negative charge carriers, this type of doped materials is denoted

as n-type. Analogously, the addition of compounds with a relative shortage of valence

electrons will yield p-type semiconductors upon addition to the material, which have a

higher hole concentration in equilibrium and hence an excess of positive charge carriers

compared to the pure material [9].

If upon the crystallization process of the semiconductor material, a layer of p-type

semiconductor is allowed to grow on a n-type base, a p-n-junction is formed at the

interface. In case the n- and p-type semiconductors are intrinsically different materials,

the interface is called a heterojunction. Upon contact of the layers, a diffusional

interchange of charge carriers will start spontaneously due to the concentration gradients

across the junction; by recombination of electron-hole pairs in this zone of intense charge

interchange, a neutral, almost non-conductive depletion zone starts to form around the

interface. Simultaneously, the transfer of the charge carriers across the interface causes

the build-up of a fixed opposite charge on both sides of the junction, which opposes the

continued flow of majority carriers through the junction, see Figure 3-3 [10]. At

equilibrium, the diffusional transport of majority carriers across the depletion zone,

opposite to the electric field over the depletion zone, is balanced by the drift transfer of

minority carriers along the field.

Figure 3-3 Transfer of electrons (black) and holes (white) across a p-n (hetero)junction by

means of diffusion ( ) and drift ( ). Recombination induces the formation of a

neutral depletion zone around the interface

Hence, though strongly hindered, transport of the majority carriers over the depletion

zone is not completely absent. When an external forward voltage, often referred to as bias

voltage, is applied over this p-n junction, which means that the positive terminal is

attached to the n-region and vice versa, the built-in electric field is, at least partially,

Electric field

- + n region p region


countered, which results in turn in a narrowing of the depletion zone. Majority carriers

will overcome this barrier more easily, and a higher current flows through the junction.

Analogously, a reverse bias enforces the internal electric field, which broadens the

depletion zone and strongly hinders the flow of current. This filtering electronic

characteristic of the p-n junction depending on the current direction – favoring for

positive and almost complete blocking for negative voltages – resembles the working

principle of a diode. Hence, in the absence of light, the current-voltage behavior of an

ideal photovoltaic cell equals that of a single diode.

When sufficiently energetic light is incident on the device and photo-emission takes

place, a photocurrent is generated which moves oppositely to the “dark” current

resulting from the charge carrier diffusion described above; the total current flowing

through the cell is then equal to the superposition of them both, shown in Figure 3-4. It

follows that, to produce a maximal photocurrent, the dark current has to be minimized,

being a major challenge in the design step for suitable photovoltaic devices.

Figure 3-4 Current-voltage characteristic of a typical CIGS solar cell

in the dark ( ) and under illumination ( )

Nevertheless, real-life photovoltaic devices do show strong deviations from ideal

behavior that highly affects the current flow through the cell. As will be explained in the

next paragraph, TLSC’s require a multilayered structure which introduces non-idealities

inevitably, especially in the form of imperfect interlayer contacts and resistive losses.

Non-uniformities inside the semiconductor material or originating from the production

process itself are additional sources for more of these so-called parasitic current pathways

inside the cell. As these phenomena all have a significant effect impact on the current-

voltage characteristics of the solar cell, a proper design of the devices requires that all

these processes are accurately identified and adequately modelled.

Current

density

Voltage


3.1.2 Thin film solar cell technology

The development of promising alternative semiconductor materials, which show a

higher absorbance of the incident light, has been the major breakthrough to stimulate the

expansion of TLSC’s. Allowing for modules of only 10 µm thick to be produced - 10

times smaller than conventional technology – these materials enable the fabrication of

photovoltaic modules become sufficiently thin to flex, making them suited for deposition

on either flexible or stiff substrates, like polymer and metallic foils or glass respectively.

Additionally, this innovation strongly reduces the material costs associated with

photovoltaic installations, while its faster production process, relying on deposition and

sputtering rather than the slow crystal growth required for silicon-based designs, lowers

the manufacturing costs as well [11]. Literature reports three potential candidates to this

purpose, being amorphous silicon, cadmium-telluride and a complex semiconductor

containing copper, indium, gallium and selenium, therefore denoted CIGS. The latter is

thereby considered to be the most-promising candidate, showing lab-scale efficiencies of

over 20% combined with excellent stability and allowable production costs [12].

Moreover, future outlooks for this technology are highly optimistic, as efficiencies

exceeding 30% have come into view [13].

Nevertheless, efficiencies of actual CIGS-based modules have long been intolerably small

compared to the conventional silicon-based photovoltaic technology: module efficiencies

on monocrystalline silicon based already showed efficiencies of over 22.7 %, highly

outperforming the CIGS cells that offered only 12% [14]. One major reason for this strong

difference was found in the initial production process, which required elevated

temperatures and brought down the stability of the cell. Moreover, up-scaling of the

production process from laboratory-scale to industrial amounts were associated with the

introduction of non-uniformities in the multilayer structure of the cell, while a lack of

knowledge about the impact of these non-idealities hindered the correct addressing and

tackling of these problems. Recent advances in the production process and growing

understanding of the underlying mechanisms inside thin-layer cells have resulted in a

significant rise in performance, up to 15.7% in 2014, while maximal efficiencies of

conventional technology have stagnated, which improves its competitiveness towards

established technologies [15]. Moreover, future outlines for this technology are strongly

optimistic, allocating a high potential to outperform the currently used silicon-based

photovoltaic “workhorses”.

3.1.2.1Structure of CIGS-based thin-layer solar cells

As an alternative to these conventional devices, the light absorption and photo-electron

generation is located in a thin layer of a quaternary semiconductor containing copper,

indium, gallium and selenium, obeying the rule 𝐶𝑢𝐼𝑛𝑥𝐺𝑎1−𝑥𝑆𝑒2, where 𝑥 is a value

between 0 and 1. Having a tetragonal chalcopyrite crystal structure with a slight copper


deficiency, the material is a p-type doped semiconductor, while its high absorption

coefficient – literature reports values exceeding 105 cm-1 for λ<600 nm – renders the CIGS

based solar cells the highest efficiency among all thin-layer solar cells. Moreover, as the

composition of CIGS crystals spans a range of materials bounded by pure copper-

indium-selenium or CIS and copper-gallium-selenium, the band gap is tunable between

1.04 and 1.65 eV, depending on the value of 𝑥 [16, 17]. The gain in band gap arising from

the partial substitution of indium by gallium increases the open-circuit voltage

performance of the heterojunction in the cell [18].

The general structure of a CIGS based thin layer photovoltaic cell shows a multilayer as

depicted in Figure 3-5 [19]. The solar cell is deposited on a substrate layer, which consists

out of soda-lime glass, polymer or metallic foil, the latter two showing the flexibility to

allow for the coverage of flexible surfaces while being reported as slightly less

performant than the glass carrier. It goes without saying that this variety of allowed

substrate materials strongly contributes to the overall applicability of the TLSC

technology.

On top of the carrier, a conductive, metallic layer is deposited which serves as the back

contact of the cell. Often molybdenum is used, as it combines a low purchase cost with

low-resistance contacts with the substrate by the formation of 𝑀𝑜𝑆𝑒2; moreover,

diffusivity of the metal into the upper layers is low, whilst thermal stability is ensured

due to the high melting point.

On this Mo layer, the light absorbing CIGS crystals are grown; typical deposition

processes often mentioned in literature include co-evaporation and vacuum selenization.

The n type counterpart of the junction is formed by a layer of CdS, acting as a buffer

layer as well: due to its adequate coverage of the CIGS crystals, it functions as a

passivation and protection of the absorber material during the sputtering process to

deposit the ZnO front contact. Due to its high band gap of 2.4 eV, photo-absorption is

minimized and a maximal transmittance of the incident light to the CIGS layer is assured,

explaining its naming as a window layer. Moreover, CdS has been reported to effectively

remove oxides and elementary metallic particles and to diffuse partially into the CIGS

material, thereby enhancing the interface characteristics [20]. The strong toxicity of

cadmium compounds has driven the research towards other semiconductors, yet up to

now CdS remains the most frequently industrially applied.

The front contact typically shows a bilayer structure of stacked intrinsic and n-doped

ZnO. The latter maximizes current transport due to its high conductivity and low

resistivity, while the high-resistance intrinsic layer minimizes undesired leakage currents

due to pinholes in the CdS window [21]. The combination of a high band gap, exceeding


3 eV, and a transparency over 85% results in only a minimal amount of light retained by

the ZnO layers.

On top of the front contact a metallic grid made out of an aluminum-nickel alloy is

applied, sometimes combined with an antireflective (AR) MgF2 coating to maximize the

amount of incident light captured by the cell.

Figure 3-5 Schematic overview of the multilayer structure

of the CIGS thin layer solar cell

Due to the specific electrical interaction of the materials in the different layers of the solar

cell, the translation of a desired performance of a solar cell design into the actual module

requires a control on the structure of the cell. The combination of the very low thickness

of the TLSC and its yet complex structure gives rise frequently to imperfections within

the crystal structure of the different layers – especially of the semiconductors – or to non-

uniformities in their stacking. It goes without saying that any such deviation from

ideality will be reflected in a changing electrical behavior of the device, typically

resulting in a loss of generated photo-electrical power and hence decreasing cell

efficiency. In what follows, the contribution of several of these undesired leakage

currents, will be identified and modelled, aiming at a thorough understanding of the

underlying physical mechanisms. Knowing what goes wrong has proved repeatedly to

be a crucial, yet often heavily underrated, step in tackling an issue.

3.1.3 Outline of this chapter

In what follows, first a brief introduction will be given on the candidate mechanisms

which have been reported in literature as potential sources for power loss in a thin-layer

solar cell and on how their electrical characteristics have to be translated effectively into a

useful mathematical model. It will turn out that a number of unknown physical

parameters have to be determined to close the model. Therefore, experimental data have

to be collected, which will subsequently be used as input for a parameter estimation

procedure. The resulting parameter estimates of this model fitting and conclusions about

substrate

Mo back-contact [0.5-1 µm]

n-type CdS buffer [50 nm]

Al-Ni grid + AR

i- and n- ZnO front-contact [0.25-1 µm]

p-CIGS absorber [1-3 µm]


the performance of the cell will then be subjected to an analysis to ensure their validity,

both mathematically and statistically.

3.2 Model for the dark current-voltage characteristic of

CIGS solar cells

3.2.1 Ideal versus non-ideal electric behavior of solar cells

The current flowing through a CIGS solar device in the dark when a voltage is applied is

ideally following a diode-like characteristic, as was discussed in section 3.1.1. Therefore,

the implicit relation between the voltage 𝑉𝐷 over the ideal cell and the resulting dark

current density 𝐽 throughout the device is given by:

𝐽 = 𝐽0 ∙ [𝑒𝑥𝑝(𝐴𝑉𝐷) − 1] (3-1)

where 𝐽0 denotes the saturation current density of the heterojunction. The parameter 𝐴

reflects the mode of carrier transport through the heterojunction when being rewritten

as 𝐴 =𝑞

𝑛𝑘𝐵𝑇⁄ , where 𝑞 is the electronic charge, 𝑘𝐵 the Boltzmann constant, T the

absolute temperature and n the ideality factor of the diode, expressing whether charge

transport through the junction is dominated by recombination, when n equals 2, or

diffusion, corresponding to an ideality factor of 1. Values exceeding 2 do not have an

interpretation in terms of carrier transport, and are reported as indicators for tunneling

effects [8, 22].

Comparison of a typical measured current-voltage behavior of real-life photovoltaic

devices with predictions for an ideal cell based on equation (3-1), reveals some distinct

deviations located over the entire voltage range, as is observed from Figure 3-6. The

profile is asymmetrical for around zero-voltage, showing power-law non-linearity for

negative voltages, while at mediate positive voltages an excess current appears, forming

a distinct shoulder. At high voltages, the profile tends to flatten as well.

Figure 3-6 Ideal (dashed line) vs. non-ideal (solid) current-voltage profiles,

showing the different non-idealities to be explained

0.00001

1

100000

-1 0 1

Current

density

[mA/cm²]

Voltage [V]


It is important to notice that not all deviations result in an increase of the dark current

flowing through the device. This is an indication that the non-ideal electric behavior of

the cell does not originate solely from parasitic phenomena, i.e. the undesired leakage

pathways that favor the flow of current through or in parallel of the junction and thus

induce a loss of the generated photo-electric power, but other effects come into play as

well. Recent studies have been focusing on linking the deviating electric behavior of the

device to physical phenomena acting in the inner structure of the cell [22-24]; each of

these effects will be briefly introduced in the upcoming paragraphs. To translate the

impact of these different side-effects on the current-voltage relation for the entire

structure into an adequate mathematical form, each mechanism will be represented by an

appropriate elementary electric component, having well-known electric behavior. This

way, the construction of an equivalent electric circuit becomes straightforward, which

will be an important aid for the derivation of the final model equations for the current-

voltage characteristic of the device.

3.2.2 Modelling parasitic current pathways and non-idealities in a CIGS

heterojunction solar device

The need for a regularly multilayer structure for a heterojunction solar device at a

micrometer scale of thicknesses which is typical for TLSCs forms a major source for

introducing highly undesirable non-uniformities in the stacking during the production

process. Since the presence of such imperfections induces significant changes in the

electric behavior of the device as a whole and therefore often results in a loss of

performance, a thorough understanding of the underlying mechanisms and a precise

insight in their actual impact are indispensable steps in the development of an improved,

ideally defect–free, production process.

The most extended model that has been described up to the moment of writing of this

work has identified three potential rivalling pathways through the inner structure of the

cell, bypassing the desired passage of the current across the heterojunction.


Figure 3-7 Typical parallel current pathways proposed for explaining the non-ideal

behavior of real solar cells and the equivalent electric circuit [22]

Two contributions have been categorized as shunt currents. Shunting behavior in fact

captures all physical pathways through the multilayer solar cell structure that offer a less

hindered alternative to current passage through the main junction and hence partly

bypass the desired multilayer structure. Because they originate from typically extremely

localized imperfections in the cell structure, shunting effects often show significant and

unpredictable local differences, causing a shunt resistance that varies potentially 1 to 2

orders of magnitude for different solar cells of the same type, or even for different spots

on the same cell.

As shown in Figure 3-7, a first type of shunt behavior arises from purely resistive current

transport phenomena through the cell. The presence of microscopic pinholes in the

multilayer structure, e.g. due to the unequal coverage of the back contact by the CIGS

absorber, will induce a low-resistance interface between back contact and the window

layers. Moreover, current flow along highly conductive grain boundaries in the CIGS

structure, originating from the tendency of In and Ga to build up at the boundaries

rather than in the bulk of the CIGS crystals, forms an attractive alternative to the

heterojunction [25]. Because these pathways form a low-resistance conductive route

parallel to the heterojunction, their electric behavior is accurately modelled by an Ohmic

relation, showing a linear dependence between the passing current 𝐽𝑂ℎ𝑚𝑖𝑐 and the voltage

𝑉𝑂ℎ𝑚𝑖𝑐 over the pinhole or boundary. Due to the parallel positioning of the Ohmic

resistance and junction diode, it holds that 𝑉𝑂ℎ𝑚𝑖𝑐 = 𝑉𝐷, so that the electric behavior of a

shunt resistance 𝑅𝑠ℎ is given by:

𝐽𝑂ℎ𝑚𝑖𝑐 =𝑉𝐷

𝑅𝑠ℎ (3-2)

Main Leakage

terms

Main

junction

𝑹𝒔

𝑹𝒔𝒉 𝑺𝑪𝑳𝑪


Often, modelling of the shunt leakage currents solely by a purely resistive component

does not suffice to explain the entire deviating current through the cell at low positive

voltages. Therefore, a second contribution to shunt leakage was identified as a multistep,

trap-assisted tunneling mechanism, primarily present in solar cells with a high

concentration of mid-gap defect states and a heavily-doped emitter [26]. It has been

shown that the current leakage due to such tunneling processes obeys a diode-like

relation with its voltage [27]. Hence, analogously to equation (3-1), the attribution to the

current-voltage profile of this shunt tunneling diode follows from:

𝐽𝑠ℎ = 𝐽0,𝑠ℎ ∙ [exp(𝐴𝑠ℎ𝑉𝐷) − 1] (3-3)

, where, in contrast to the diode representing the main junction in the cell, the quality

factor 𝐴𝑠ℎ does not have an explanation in terms of relate to any transport mechanism.so

that in this case the associated ideality factor does not have any physical meaning.

Ultimately, even the incorporation of a tunneling contribution to the shunt does not yield

a proper fit of the modelled and measured J-V characteristics for reverse bias. As

mentioned earlier, experimental data obtained for negative voltages show a distinct

power-law dependence on the voltage over the cell, which is not explained by the

incorporation of an Ohmic resistance alone, since this introduced only a linear relation. In

the search for an appropriate mechanism to explain this trend, the principle known as a

space-charge limited current (SCLC) is frequently suggested as a reliable candidate.

Although originally discovered to explain the inexplicably high current passing through

an insulator material separating two electrodes [28], the SCLC mechanism has been

proposed recently as a potential source of current leakage through the semiconductor

absorber layer [29]. The occurrence of SCLC in a solar cell has been related to the

formation of metal-semiconductor-metal combinations. A major situation in which such

defects are formed is found in a non-uniform coverage of the CIGS absorber by the CdS

window layer, so that the emitter is trapped between the metallic front and back-

contacts. Additionally, diffusion of the aluminum dopant out of the front-contact

through the window layers towards the absorber material has been identified as a

potential source of leakage [30].

The current flowing through a SCL region subjected to a voltage 𝑉𝐷 – once more, this

leakage term is flowing parallel to the heterojunction – follows a power-law relation

obeying the general form:

𝐽𝑆𝐶𝐿𝐶 = 𝑠𝑔𝑛(𝑉𝐷) ∙ 𝑘 ∙ |𝑉𝐷|𝑚 (3-4)

the parameter 𝑘 depending on the thickness of the semiconductor layer, its conductivity

and the presence of carrier traps, and the exponent 𝑚 reflecting whether current leakage

is facilitated by deep traps, then 𝑚 > 2, or not, then 𝑚 ≅ 2 [24]. The incorporation of this

parallel unit in the solar cell model was able to tackle the misfit issue between model and

theory.


Besides the presence of leakage pathways which cause the generated photo-electric

current to be partially lost, other non-idealities are found in a real-life solar cell that effect

its electric behavior and performance, but, in contrast to parasitic currents, not

specifically in a negative manner. One intuitive source of non-ideality is found in the

intrinsic, non-zero resistance of the materials that make up the different layers of the cell

and of the interfaces in between. Primarily, the effect of the non-ideal charge conduction

through the metal contacts at the front and the back of the cell has often been reported to

be sufficiently significant as to be incorporated in the model explicitly. Because the non-

ideality of the contacts is assumed to be almost uniform along the cell base, this effect is

modelled as a series resistance 𝑅𝑆 in the equivalent electric circuit. For an Ohmic

resistance, the current-voltage characteristic is given by the linear relation:

𝐽 =𝑉𝑅

𝑅𝑆 (3-5)

with 𝑉𝑅 the voltage over the series resistance and 𝐼 the current through the solar cell.

Because of the series connection of this resistance to the different parallel pathways

discussed above, the presence of the series resistance does affect the current passing

through the junction by altering the voltage in the characteristic diode equation. The

voltage over the diode and the resistance are indeed linked by:

𝑉 = 𝑉𝐷 + 𝑉𝑅 (3-6)

𝑉 being the total externally applied voltage over the entire cell. Sensitivity analyses on

the dark-voltage characteristics showed that this parameter mainly affects the behavior

for higher voltages; correction for the split potential fall over the cell captures a major

part of the observed flattening.

Taking all these parallel current-leakage mechanisms into account, an equivalent

electrical circuit as depicted in Figure 3-7 is found. The modelled current through a thin-

layer photovoltaic cell which is subject to an external voltage V is then modelled as:

𝐽 = 𝐽𝑗𝑢𝑛𝑐𝑡𝑖𝑜𝑛 ∙ [exp (𝐴𝑗𝑢𝑛𝑐𝑡𝑖𝑜𝑛(𝑉 − 𝐽𝑅𝑆)) − 1]

+𝐽0,𝑠ℎ ∙ [exp(𝐴𝑠ℎ(𝑉 − 𝐽𝑅𝑆)) − 1]

+𝐽0,𝑠ℎ ∙ [exp(𝐴𝑠ℎ(𝑉 − 𝐽𝑅𝑆)) − 1]

+𝑉 − 𝐽𝑅𝑆

𝑅𝑠ℎ

+ 𝑠𝑔𝑛(𝑉 − 𝐽𝑅𝑆) ∙ 𝑘 ∙ |𝑉 − 𝐽𝑅𝑆|𝑚

(3-7)

Of course, the magnitude of the different parasitic parameters are a measure for the

presence of the different current losses; a notion of the relative importance of these

mechanisms forms an indispensable part of the search and development of further steps

in the improvement and intensification of the thin layer solar cell technology.


3.3 Experimental setup and procedures

The presence of multiple, parallel current pathways through a CIGS solar cell, each of

them associated with at least one unknown physical parameter, requires the collection of

data points covering the entire electrical behavior of the device in the highest possible

extent. Therefore, dark current-voltage measurements will have to be performed

covering both forward and reverse biases, as to assess the particularities of the suggested

model; for each experiment, the applied voltage ranges between -1 and 1 V, resulting for

201 data points for each studied temperature.

Additionally, not all electrical phenomena influence the observed dark J-V profiles in the

same extent. When analyzing the experimental data, the largest contributors to parasitic

current transport will tend to dominate the results, and it is therefore plausible that less

pronounced leakage terms, although equally relevant for a complete understanding of

the present imperfections and the following improvements of the technology, will be

overshadowed. To increase their weight in the observations and hence allow for

unravelling the physics of the systems in much finer detail, the measurements will be

repeated for lower temperatures as well. By cooling the system down, the larger

contributors are “frozen”, enabling the smaller ones to be more strongly distinguished.

Using liquid nitrogen as cooling medium, temperatures down to 110 K will be reached;

measurements will be collected for temperature increments of 10 K, starting from 290 K.

Two CIGS-type solar cells have been studied, both of them being slices cut out of one

mother panel produced by the Swiss EMPA institute. Given that their original mutual

distance amounted only a few centimeter, the potential difference in electrical behavior

of both pieces will demonstrate whether or not there are strongly local irregularities in

original panel. Each of the cells has a surface area of 0.5 cm² which will allow for the

calculation of specific model parameter values during the estimation procedure, which

will become clear in the following section.

The solar cells were put in the experimental setup shown in Figure 3-8. The samples are

attached to a metallic support, depicted in gray in the scheme, on a small disk, which is

in close thermal contact to a vessel filled with liquid nitrogen, to provide the cooling to

cryogenic regimes. To maintain the desired temperature during the experiment, the disk

is connected to a temperature sensor. The thermal contact between the solar cell and the

support is ensured by using thermal conductive glue.

A two-wire technique is used to register the applied voltage and measure the current

measurement separately. This way, the potential difference across the isolated wiring

will not bias the results.


Figure 3-8 Experimental setup in real life and schematically,

showing the two-wire configuration

To minimize the thermal losses, a bell is placed on the setup, which allows

simultaneously for working under vacuum conditions. The configuration is shut off from

light by putting a black blanket on top.

3.3.1 Overview of the statistical analysis

The experimental data were used as input for the parameter estimation procedure in the

statistical modelling software package Athena Visual Studio (AVS), version 14.2. Because

of its more funded theoretical framework compared to ordinary nonlinear least squares

regression, the alternative Bayesian procedure as encoded in AVS was used. As was

already mentioned in Section 2.3.4, this routine relies on the approximate approach

where inference on confidence intervals is obtained by local linearization, rather than

MCMC sampling of the posterior density. Meanwhile, the default AVS settings on prior

density distribution are used, i.e. a uniform prior on all model parameters and Jeffreys’

non-informative prior for the error covariance matrix.

To determine whether an estimate for a certain model parameter is significant, in the

sense that it differ significantly from zero and hence participates actively in the model,

the test variable 𝑡𝑐𝑎𝑙𝑐 has to be calculated:

𝑡𝑐𝑎𝑙𝑐 =𝑏𝑖

𝑠(𝑏𝑖)~𝑡(𝑛 − 𝑝) (3-8)

where 𝑏𝑖 is the calculated parameter estimate for model parameter 𝛽𝑖 and 𝑠(𝑏𝑖) is the

corresponding standard deviation. To reject the null hypothesis 𝛽𝑖 = 0 with a certai0

probability 𝛼, 𝑡𝑐𝑎𝑙𝑐 is compared to the tabulated t-value with a probability 1 − 𝛼/2:

T sensor

solar cell

Liquid N2

vessel


|𝑡𝑐𝑎𝑙𝑐| > 𝑡(𝑛 − 𝑝, 1 − 𝛼/2) (3-9)

Because the output file of an estimation procedure in AVS reports the confidence

intervals on the estimated parameters of a single-response model as:

𝑏𝑖 − 𝑠(𝑏𝑖) ∙ 𝑡 (𝑛 − 𝑝, 1 −𝛼

2) ≤ 𝛽𝑖 ≤ 𝑏𝑖 + 𝑠(𝑏𝑖) ∙ 𝑡 (𝑛 − 𝑝, 1 −

𝛼

2) (3-10)

meeting criterion (3-10) corresponds to excluding 0 from the confidence interval of all

parameters. Hence, it had to be assured that this condition was fulfilled for the optimal

parameter estimates, and this for all temperatures and both cells.

Once the individual confidence intervals were acquired and it was ascertained that all

parameters were significantly different from zero, the reliability of the estimation

procedure as a whole has to be checked. Typically, during this control loop the lack-of-fit

between experimental data and model-based predictions is assessed, while the

considered model fitting is tested onto which extent the supposed normality of the

residuals is satisfied or rather violated. In principle, for every estimation the lack-of-fit

has to be tested as well; however, since no replicate experiments have been performed,

no conclusions will be drawn.

In agreement to the analysis above to determine whether the estimated model

parameters are significantly different from zero individually, the significance of the

regressed model as a whole will be assessed as well. Indeed, for the parameter estimation

to be meaningful, it is crucial that the resulting model does predict the observed data

points significantly better than a model where all parameters equal zero. In the latter

case, it would indeed not be worth the effort of doing any parameter estimation at all.

To reject with a certainty 1 − 𝛼 the hypothesis that all model parameters are

simultaneously equal to zero, the following F-test has to be passed:

𝐹𝑖 =

∑ 𝐼𝑐𝑎𝑙𝑐,𝑖,𝑘2201

𝑘=1

𝑝

∑ (𝐼𝑜𝑏𝑠,𝑖,𝑘 − 𝐼𝑐𝑎𝑙𝑐,𝑖,𝑘)2201

𝑘=1

201 − 𝑝

> 𝐹(𝑝, 𝑛 − 𝑝; 𝛼) (3-11)

where 𝑝 gives the number of included model parameters. Values for 𝐹(𝑝, 𝑛 − 𝑝; 𝛼) are

tabulated and, taking 𝛼 equal to 0.05, amount to about 2.

At last, it has to be determined to what extent the model parameters are mutually

correlated. Indeed, in case of a strong dependence between two parameters, the

likelihood of compensation behavior becomes considerable. Hence, it is plausible that

part of the contribution of the first parameter to the final model predictions is taken by

the other one, which is detrimental for the reliability of the results from the estimation. A

measure for the degree of correlation between model parameters 𝑏𝑖 and 𝑏𝑗 is the binary

correlation coefficient given by:


𝜌𝑖,𝑗 = 𝑽(𝒃)𝑖𝑗/√𝑽(𝒃)𝑖𝑖𝑽(𝒃)𝑗𝑗 (3-12)

where 𝑉(𝑏) is the covariance matrix of the parameter estimates 𝑏. Strong correlation

corresponds to |𝜌𝑖𝑗| ≥ 0.95. The AVS software automatically reports the binary

correlation matrix in the output file; for the type experiment

3.4 Analysis of the results

3.4.1 Results of the statistical assessment

Current-voltage measurements were collected for both cells. The experimental results for

the first one are depicted in Figure 3-9, showing clearly the non-ideality of the electrical

behavior. The profiles obtained from the second cell were slightly different but showed

similar deviations from ideality. It follows immediately that the impact of lowering

temperature is significant, which proves the need for cooling to fully assess the electrical

behavior of the cells. The calculation of the current density from the measured current 𝐼 is

straightforward:

𝐽 = 𝐼/𝐴𝑐𝑒𝑙𝑙 (3-13)

where 𝐽 is the current density through a cell with surface area 𝐴𝑐𝑒𝑙𝑙, which equals 0.5

cm².

Figure 3-9 Observed current-voltage characteristics of the first cell

All model parameters are expected to be temperature dependent, at least in some extent.

Due to the lack of overall applicable functions that accurately describe this dependence, it

was opted to perform the parameter estimation procedure for each particular

temperature separately. The physical meaning of their temperature dependence will then

be assessed by interpreting the plots of the resulting parameter estimates from the

isothermal fittings. The same modelling methodology was applied for the experimental

0.0001

0.01

1

100

-1 0 1

Current [mA]

Voltage [V]

290 K

270 K

249 K

230 K

209 K

189 K

169 K

151 K

130 K

110 K


data from both cells and for all considered temperatures, giving a total of 38 performed

fitting operations.

The theoretical model given by equation (3-7) is in its most complete form, i.e. all terms

that have been suggested in literature up to the moment of writing are included.

Nevertheless, there is no a priori requirement for all associated leakage pathways to be

present in a particular CIGS cell. In other words, the suggested model will potentially be

too extensive, i.e. having redundant contributions, which explains the need for a proper

assessment of the need for each particular contribution.

The most elementary model, comprising only the main junction, is immediately excluded

because of the strong deviation from ideality by the measured J-V characteristics.

Including the series resistance associated with the non-ideal contacts as the first power

loss mechanism resulted in a flattening of the simulated J-V characteristic for high

positive voltages. As is depicted in Figure 3-10 for the experiment on the first cell at 290

K, the fit to the observations in this range is remarkably good, while at lower positive

voltages the deviation becomes significant. At negative voltages, the misfit is even more

pronounced.

Figure 3-10 Best fitting curve when considering resistance of non-ideal contacts only

When adding the shunt resistance as a first candidate parasitic contribution, optimal

curve fits similar to the one as shown in Figure 3-11 are obtained. It is seen that the

inclusion of a parallel resistance improves the match between observed and predicted

values significantly for lower positive voltages, without undermining the fit for higher

positive values. Meanwhile, a major part of the misfit between model and experiments in

the negative range is overcome. Nevertheless, some small deviations remain, which

suggests the need for a further extension of the model.

0.0000001

0.00001

0.001

0.1

10

-1 0 1Absolute

current

[mA]

Voltage [V]

Observed

Predicted

Main junction 𝑹𝒔


Figure 3-11 Best fitting curve when considering non-ideal contacts and shunt resistance

for the first cell at 290 K

The incorporation of a space charge limited current pathway was able to solve the

deviations of the model-predicted for this specific situation, as is clear from Figure 3-12.

Additionally, again for this particular experiment, it was found that the inclusion of a

shunt tunneling diode, the last candidate parasitic pathway, did not improve the fit of

the observed responses significantly. Moreover, when attempting to fit the model with a

shunt tunneling contribution while removing the SCLC term, it followed that the final fit

throughout the entire voltage range was worse. Therefore, for this particular experiment,

it was found that the most complete and least complex model only includes two parasitic

terms, besides the non-ideal resistance of the contacts.

Figure 3-12 Best fitting curve when considering non-ideal contacts, shunt resistance and

space charge limited current leakage for the first cell at 290 K

It is important to stress that the exclusion of the tunneling contribution for this particular

experiment does not hold for all observations. The identification of the most suited

model function is repeated for each experiment. During these steps, it is important to

keep an eye on both the quality of the estimates, by checking the visual fit of the

predicted model, and on the statistical validity of the obtained parameter and the

regression as a whole.

0.001

0.01

0.1

1

10

100

-1 0 1

Absolute

current

[mA]

Voltage [V]

Observed

Predicted

0.001

0.01

0.1

1

10

100

-1 0 1

Absolute

Current

[mA]

Voltage [V]

Observed

Predicted

𝑹𝒔

𝑹𝒔𝒉

𝑹𝒔

𝑹𝒔𝒉 𝑺𝑪𝑳𝑪


Unfortunately, while assessing the significance of the different parameters, the

simulation software did not succeed in calculating the individual confidence intervals of

all model parameters simultaneously, especially for lower temperatures. Therefore, only

a limited set of parameters was hence fully estimated in every run. Therefore, an

iterative procedure has been adopted, where the values of the estimated parameter

values were used as fixed values for a new minimization, wherein the indeterminable

parameter are assessed. Once point estimates and confidence intervals were obtained for

these parameters as well, this procedure was then repeated until no additional gain in

minimization was observed. By following this method, a considerable improvement of

the final fit of the model to the experimental data was realized.

When assessing the significance of the regression as a whole in accordance to equation (3-

11), the number of model parameters will amount to 6 or 8, depending on whether the

shunt tunneling diode term is considered or not. Calculation of the expression of the left-

hand side yields values exceeding 106, for each temperature and both cells, which clearly

demonstrates that the model as a whole is indeed significant.

A strong indication for the quality of the model fitting is the construction of a parity

diagram, plotting the observed with respect to the predicted response values. In the ideal

case of a perfect fit, all points are located at the first bisector. Hence, for a model

estimation procedure to be performant, the deviation for the different experiments has to

be as low as possible. Figure 3-13 shows the parity plot for the exemplary experiment.

For all J-V measurements, the points are lying almost perfectly on the bisector. Hence, the

goodness of the fit which was already observed in Figure 3-12 is confirmed. The square

of the multiple correlation coefficient given by:

𝑅𝑖2 =

∑ 𝐼𝑐𝑎𝑙𝑐,𝑖,𝑘2201

𝑘=1

∑ 𝐼𝑜𝑏𝑠,𝑖,𝑘2201

𝑘=1

(3-14)

denotes the fraction of the observed values for the current through the cell which is

captured by the model for the 𝑖’th experiment. Therefore, a higher value will, in general,

point at a better quality of the parameter estimates. For each estimation procedure, i.e. for

each temperature and both cells, this indicator amounted to more than 0.99.


Figure 3-13 Parity diagram for the first cell at 290 K

The interpretation of the residual plot for each estimation step yields some useful

inference on the validity of the statistical theory that underlies the fitting routines, i.e. a

Gaussian distribution for the experimental error, with a constant variance for each data

point and a zero correlation between the different measurements at each set of

experimental conditions. In ideal situations, when plotting the misfit between

observations and model predictions, a chaotic scatter has to result, with values contained

in a finite and symmetric band across the horizontal axis.

The residual plot for the type experiment is depicted in Figure 3-14. For negative and

small positive voltages, the residuals are nicely located around the voltage axis; however,

at higher voltages a significant increase of the residuals is seen. Meanwhile, a trend

seems to emerge in this region. This behavior is probably explained by the quasi-

continuous measurement of the different points of the current-voltage characteristics at

each temperature. Hence, the misfit between the experimental data and the model

predicted responses is a trending function as well. Again, it follows that the actual fit is

accurate, as no residual amounts to more than 1% of the actually measured current.

Similar residual profiles have been observed for the other experimental conditions.

-50

0

50

100

-20 0 20 40 60 80 100

Calculated

current [mA]

Observed current [mA]


Figure 3-14 Residual plot for the first cell at 290 K

At last, the correlational structure of the set of estimated model parameters is assessed.

For the type experiment, where only six model parameters had to be considered, the

result in shown in Table 3-1. For this particular temperature and cell, the only strongly

correlated parameters are those associated with the main junction. This does not surprise,

as a high correlation between the pre-exponential factor and parameters in the exponent

is a commonly encountered phenomenon for the kinetic modelling of chemical reactions

as well, where a similar Arrhenius dependence exists for the rate coefficients.

Table 3-1 Binary correlation matrix of the model parameters for the first cell at 290 K

𝑱𝟎𝟏 𝑨𝟏 𝑹𝒔𝒉 𝑹𝒔 𝒌 𝒎

𝑱𝟎𝟏 1 -0.999 -0.087 -0.917 -0.007 0.226

𝑨𝟏 -0.999 1 0.082 0.929 0.004 -0.217

𝑹𝒔𝒉 -0.087 0.082 1 0.039 0.843 -0.898

𝑹𝒔 -0.917 0.929 0.039 1 -0.019 -0.139

𝒌 -0.007 0.004 0.843 -0.019 1 -0.595

𝒎 0.226 -0.217 -0.898 -0.139 -0.595 1

However, since the software was not able to estimate all model parameters

simultaneously at all temperatures, for those cases no complete binary correlation matrix

was generated as well. Therefore, a similar analysis could not be performed for all

situations so that general conclusions on the correlational structure could not be drawn.

-0.5

0.5

-1 -0.5 0 0.5 1

Residual

[mA]

Voltage [V]


3.4.2 Physical interpretation of the results

The point estimates for the model parameters obtained from the Bayesian estimation

routine in Athena Visual Studio are depicted as a function of temperature in Figure 3-15

and Figure 3-16 for the first and second cell respectively.

The pre-exponential factors 𝐽01 and 𝐽02 corresponding to the main junction and the shunt

tunneling diode term respectively are depicted in a semi log plot with respect to the

inverse temperature. As is concluded from the Figure 3-15a and Figure 3-16a, the pre-

exponential strongly decreases for lower temperatures. Moreover, a linear relation

emerges, which allows for the calculation of an Arrhenius type of temperature

dependence for both cells, according to:

𝐽0𝑖 = 𝐽0,0𝑖 exp (−𝑞𝐸𝑖

𝑘𝐵𝑇) , 𝑖 = 1,2 (3-15)

with 𝑞 the elementary electric charge and 𝑘𝐵 the Boltzmann constant. For the first cell,

the contribution of the shunt tunneling diode turned out not to be significant, and has

therefore not been reported. For the main junction term, it is found that 𝐽0,01 =

483.185 𝑚𝐴 while 𝐸1 = 624𝑚𝐽

𝐶. The associated exponential factor is represented in terms

of the ideality constant 𝑛1 in Figure 3-15b. It is seen that, for all studied temperatures, this

parameter takes a value between 1 and 2, which obeys the criterion for diffusion and

recombination controlled electron transport across the cell.

For the second cell, a similar decrease of the main junction factor is observed, although

less steep than for the first one: here 𝐽0,01 = 2.74 10−4 𝑚𝐴 and 𝐸1 = 437𝑚𝐽

𝐶 were

calculated. Except for the three lowest temperatures, the restraint on the value of the

ideality factor is again met, see Figure 3-16b. On the other hand, the optimal pre-

exponential factor is considerably smaller than for the first cell. In contrast to the first cell,

for the second one the shunt tunneling contribution does become significant for lower

temperatures. The corresponding values are given by the filled blocks in Figure 3-16a. To

guide the eye, the blank symbols extend the Arrhenius dependence of the shunt diode

pre-exponential to higher temperatures. Following the notation introduced in equation

(3-15), it was calculated that 𝐽0,02 = 2.74 10−4 𝑚𝐴 and 𝐸2 = 187𝑚𝐽

𝐶. As has already been

mentioned in the theoretical discussion of the model in Section 3.2.2, the value of the

corresponding ideality factor does not have any physical interpretation and is therefore

not explicitly shown. Where the shunt diode term was significant, values for this ideality

parameter amounted to almost 10.

The optimal estimates for the shunt resistance are depicted in Figure 3-15 and Figure 3-

16c. In contrast to the parameters for the main junction and shunt tunneling diode terms,

there is no common trend observed for both cells. For the first one, the estimated shunt

resistance is at first remarkably smaller than for the second cell: the difference amounts to


Figure 3-15 Parameter estimates and corresponding 95% individual confidence intervals for

the first cell

0 0.005 0.01

J01

[A/cm²]

1/T [1/K]

0

1

2

100 200 300

n1 [-]

T [K]

0

1000

2000

3000

4000

100 200 300

Rsh

[Ω/cm²]

T [K]

0

100

200

300

400

100 200 300

Rs

[Ω/cm²]

T [K]

c

0

1

2

100 200 300

k

[mA/Vm/cm²]

T [K]

0

1

2

3

4

5

100 200 300

m [-]

T [K]

a b

d

e f


Figure 3-16 Parameter estimates and corresponding 95% individual confidence intervals for

the second c ell

1

10-5

10-10

10-15

10-20

10-25

10-30

0 0.005 0.01

J0

[A/cm²]

1/T [1/K] 0

0.5

1

1.5

2

2.5

3

100 200 300

n1 [-]

T [K]

0

50000

100000

150000

200000

100 200 300

Rsh

[Ω/cm²]

T [K]

0

100

200

300

400

100 200 300

Rs

[Ω/cm²]

T [K]

0

0.04

0.08

100 200 300

k [mA/Vm/cm²]

T [K]

0

1

2

3

4

100 200 300

m [-]

T [K]

a b

c d

e f


a factor of 1000. Moreover, for the first cell, the increase of the resistance follows a linear

trend for lowering temperatures. On the other hand, the increase of the simulated shunt

resistance is rather exponential for decreasing temperatures. To ensure that this

remarkable difference in the temperature dependence of the resistance of the second cell

was not caused by a modeling mistake, the estimation routine was run again, starting

from the estimated values for the first cell. Since the obtained fit was poor, the reason for

the deviation will rather have a physical origin.

A similar exponential increase for decreasing temperatures is observed for the simulated

series resistance, depicted in Figure 3-15 and Figure 3-16d. In contrast to the shunt

contribution, the estimations for the series resistance are remarkably similar for both

cells.

A stronger difference between the cells is observed for their estimated space-charge

limited current. While for the first cell the value for the multiplicative parameter 𝑘 is

almost constant for the entire considered temperature range, the second cell shows an

almost linear decrease for lower temperatures. Given the strong difference in the

estimated parameter values for both cells, a similar strategy was performed as for the

shunt resistance. Once more, using the estimated parameter values from the first cell in

the second resulted in an inferior fit, demonstrating again that reasons will have to be

searched in the physics of the system. Additionally, the resulting point values are a factor

50 higher for the first cell compared to the second one. For the exponent parameter 𝑚 to

be physically meaningful, the estimated parameters have to be located around 2 for all

temperatures. From the inspection of Figure 3-15 and Figure 3-16f, it follows that this

criterion is fulfilled, except for the experiment at 300K for the first cell. However, the

relatively wide confidence interval on this estimate, comprising the desired value of 2 as

well, points at the statistical uncertainty about its accuracy. It is interesting to notice that

the confidence intervals on the point estimates are remarkably higher for this parameter

compared to the others.

Based on these estimated parameter values, the contribution of the different suggested

parasitic current pathways to the global electric behavior of the cell can be determined.

This way, it is possible to perform a sort of electric path analysis, analogously to the often

performed reaction path analysis in the kinetic modelling of chemical reaction systems.

Therein, the importance of each potential step in a complex reaction mechanism is

assessed for varying process conditions. When doing a similar assessment for the

different parasitic pathways in a solar cell, temperature and voltage will be the most

relevant experimental variables.

The contribution of the different current pathways is graphically represented in Figure 3-

17, giving the share in the total current through the cell in percentage terms. It follows

immediately that both the temperature and the voltage across the cell have a strong


impact on how strongly each mechanism takes part in the conduction of the electric

current.

In the negative and lower positive voltage range the current through the cell is

dominated by the shunt resistance and SCLC. Depending on the temperature, both

mechanisms are competing for the lead role: at 290 K the current through the shunt

resistance is always higher than the SCLC, while at lower temperatures latter is growing

in importance and becomes the strongest current transport mechanism for reverse biases.

Nevertheless, for all temperatures, the shunt resistance almost completely represents the

current at lower positive and negative voltages. Only for higher positive voltages, the

main junction comes into play and the contributions of the parasitic current pathways

start to decrease strongly. It follows that for lower temperatures, the onset for significant

current through the main junction is shifted towards higher voltages. The contribution of

the shunt tunneling diode, which was significantly estimated for the second cell at lower

temperatures, is visible for solely higher voltages as well.

Moreover, by showing the graphs for both cells together, it follows that the electrical

behavior of the cells is fundamentally different. Although similar qualitative trends are

seen about the importance of the different pathways, strong deviations exist between

their absolute contributions for the both cells. Compared to first cell, the SCLC

mechanism is present to a much lower extent, favoring current through the shunt

resistance. Since both cells are cut out of the same mother panel, this demonstrates the

strongly local character of the imperfections in the cell structure which cause the

undesired power loss in the device.


Figure 3-17 Contribution of the suggested leakage pathways for both solar cells,

for different temperatures. First cell: left, second cell: right

Main junction Shunt resistance SCLC Shunt tunneling

0

100

-1 0 1

0

100

-1 0 1

0

100

-1 0 1

0

100

-1 0 1

0

100

-1 0 1Voltage [V]

0

100

-1 0 1Voltage [V]

290

K

200

K

110

K


3.5 References

1. BP Energy Outlook 2035. 2015.

2. Zhu, J. and Y. Cui, Photovoltaics: More solar cells for less. Nat Mater, 2010. 9(3): p.

183-184.

3. Integration of Renewable Energy in Europe. 2010, DNV GL - Energy.

4. Directive 2009/28/EC of the European Parliament and of the Council of 23 April 2009 on

the promotion of the use of energy from renewable sources and amending and

subsequently repealing Directives 2001/77/EC and 2003/30/EC. 2009, European

Parliament, Council of the European Union.

5. EPIA, Global market outlook for Photovoltaics 2013-2018. European Photovoltaic

Industry Association, 2014.

6. Razykov, T.M., et al., Solar photovoltaic electricity: Current status and future prospects.

Solar Energy, 2011. 85(8): p. 1580-1608.

7. Saga, T., Advances in crystalline silicon solar cell technology for industrial mass

production. NPG Asia Mater, 2010. 2: p. 96-102.

8. Nelson, J., The Physics of Solar Cells. 2003: Imperial College Press.

9. Reyniers, M.-F., Algemene Scheikunde. 2010.

10. Luque, A. and S. Hegedus, Handbook of Photovoltaic Science and Engineering. 2011:

Wiley.

11. El Chaar, L., L.A. lamont, and N. El Zein, Review of photovoltaic technologies.

Renewable and Sustainable Energy Reviews, 2011. 15(5): p. 2165-2175.

12. Jackson, P., et al., New world record efficiency for Cu(In,Ga)Se2 thin-film solar cells

beyond 20%. Progress in Photovoltaics: Research and Applications, 2011. 19(7): p.

893-897.

13. Rockett, A.A., Current status and opportunities in chalcopyrite solar cells. Current

Opinion in Solid State and Materials Science, 2010. 14(6): p. 143-148.

14. Green, M.A., et al., Solar cell efficiency tables (version 15). Progress in Photovoltaics:

Research and Applications, 2000. 8(1): p. 187-195.

15. Green, M.A., et al., Solar cell efficiency tables (Version 45). Progress in Photovoltaics:

Research and Applications, 2015. 23(1): p. 1-9.

16. Neisser, A., et al., Effect of Ga incorporation in sequentially prepared CuInS2 thin film

absorbers. Solar Energy Materials and Solar Cells, 2001. 67(1–4): p. 97-104.

17. Kaigawa, R., et al., Improved performance of thin film solar cells based on Cu(In,Ga)S2.

Thin Solid Films, 2002. 415(1–2): p. 266-271.

18. Jager-Waldau, A., Progress in chalcopyrite compound semiconductor research for

photovoltaic applications and transfer of results into actual solar cell production. Solar

Energy Materials and Solar Cells, 2011. 95(6): p. 1509-1517.

19. Decock, K., Defect related phenomena in chalcopyrite based solar cells. 2012.

20. Schock, H.-W. and R. Noufi, CIGS-based solar cells for the next millennium. Progress

in Photovoltaics: Research and Applications, 2000. 8(1): p. 151-160.

21. Chopra, K.L., P.D. Paulson, and V. Dutta, Thin-film solar cells: an overview. Progress

in Photovoltaics: Research and Applications, 2004. 12(2-3): p. 69-92.

22. Williams, B.L., et al., Identifying parasitic current pathways in CIGS solar cells by

modelling dark J–V response. Progress in Photovoltaics: Research and Applications,

2015: p. n/a-n/a.


23. Hengel, I., et al., Current transport in CuInS2:Ga/Cds/Zno – solar cells. Thin Solid

Films, 2000. 361–362(0): p. 458-462.

24. Pallarès, J., et al., A compact equivalent circuit for the dark current-voltage

characteristics of nonideal solar cells. Journal of Applied Physics, 2006. 100(8): p.

084513.

25. Bosio, A., et al., Polycrystalline CdTe thin films for photovoltaic applications. Progress

in Crystal Growth and Characterization of Materials, 2006. 52(4): p. 247-279.

26. Kaminski, A., et al. Conduction processes in silicon solar cells. in Photovoltaic

Specialists Conference, 1996., Conference Record of the Twenty Fifth IEEE. 1996.

27. Rau, U., et al., Electronic loss mechanisms in chalcopyrite based heterojunction solar

cells. Thin Solid Films, 2000. 361–362(0): p. 298-302.

28. Rose, A., Space-Charge-Limited Currents in Solids. Physical Review, 1955. 97(6): p.

1538-1544.

29. Dongaonkar, S., et al., Universality of non-Ohmic shunt leakage in thin-film solar cells.

Journal of Applied Physics, 2010. 108(12): p. 124509.

30. Liao, Y.-K., et al., A look into the origin of shunt leakage current of Cu(In,Ga)Se2 solar

cells via experimental and simulation methods. Solar Energy Materials and Solar Cells,

2013. 117(0): p. 145-151.

58 Literature review on alternative parameter estimation techniques

Chapter 4

Literature review on alternative

parameter estimation techniques

In this chapter three potential adaptations of the currently used, classical nonlinear

regression method, extracted from a literature survey on statistical techniques to estimate

unknown model parameters, will be discussed.

The first couple of methods originates from the rather strong requirements on the

covariance structure of the experimental errors under which classical nonlinear

regression is guaranteed to perform well. The underlying idea of these techniques is to

encode a correction system in the classical regression procedure to account explicitly for

a potential violation of the theoretical conditions. This way, the stringent theoretical

framework will be loosened and the range of applicability of the classical regression

methods will be highly extended. Logically, the adapted procedure will yield more

reliable results than the original one. The methods introduced in Section 4.1 will account

for the possibility of a non-constant error variance, or heteroscedasticity, while Section

4.2 will focus on the correction for the occurrence of a serial correlation in a set of time

series observations.

At last, the potential of the Bayesian approach to parameter estimation will be explored

in section 4.3. Starting from a fundamentally different approach to the problem of model

parameter estimation, the Bayesian framework combines some very attractive features,

including an efficient exploration of high-dimensional probability distributions, an

automatic weighing of experimental errors and the possibility to include knowledge on

the model parameter prior to any experiment. It goes without saying that, if proven to be

sufficiently performant, Bayesian parameter estimation will be a serious challenger of

classical regression schemes.

4.1 Tackling the heteroscedasticity issue: towards a

proper handling of heterogeneous variance of the

experimental error

4.1.1 Data-based weighing of the residuals

One of the fundamental assumptions that make up the mathematical framework of

ordinary least squares estimation is the homogeneity of the variance of the random error

terms which are associated with the experimental data. Only when this criterion is met,

Literature review on alternative parameter estimation techniques 59

the efficiency of the regression is assured. Moreover, in the case of nonlinear models both

the point estimates of the parameters and the confidence intervals resulting from the

regression are potentially biased by illicitly neglecting the presence of heteroscedasticity

[1]. The variance of the experimental error on a certain observation is a measure for its

precision, and, hence, of the reliability of the information it contains. A higher variance

reflects a more pronounced uncertainty about the actual response value. When assuming

a constant variance for all observations, all experimental data will contribute equally to

the objective function of the regression, irrespective of the information they contain,

which is a doubtful practice.

The presence of an inhomogeneous error variance is clearly reflected in the residual plot,

as illustrated in Figure 4-1. Where a random scatter is expected when the constant

variance criterion is met, distinct trends, like the formation of clusters or strong

fluctuations in residual order of magnitude, will be observed. Different causes for the

departure from the assumption of a constant error variance have been identified for

kinetic modelling. Reaction rates are known to be strongly depending on mainly

temperature and the concentration of the involved species. Depending on the response

variable to be measured and the analytical devices to collect the experimental data, this

will potentially bias the uncertainty on the results. Often higher variances are expected

for more ‘severe’ reaction conditions.

Figure 4-1 Illustrative example of residual plots for a case with constant variance (left) and

strong heteroscedastic experimental errors (right)

An additional source of heteroscedasticity arises from erroneous steps in the statistical

procedure itself, e.g., due to an explicit linearization of a nonlinear kinetic model by the

user. The statistical validity of the parameter estimates obtained for this linearized model

is doubtful, as not the actual dependent variables but rather functions of them are

regressed. In general, the additivity of the modelled prediction and its residual is not

invariant to any transformation, so that the assumption of a Gaussian error term on the

residual

Experiment number Experiment number


actual response does not necessarily hold for its transformed counterpart as well. The

detrimental effect of such transformations, when applied improvidently, on the precision

of the final parameter estimates and the validity of the resulting model predictions have

already been demonstrated [2].

The presence of heteroscedasticity is tackled straightforwardly by considering a suited

weighing of the collected data. If 𝑛 independent single-response experiments have been

carried out and variance heterogeneity is allowed, the covariance matrix of the

experimental errors 𝑽 is a diagonal matrix with the heterogeneous error variances as

diagonal elements, hence:

𝑽 = [𝜎112 ⋯ 0⋮ ⋱ ⋮0 ⋯ 𝜎𝑛𝑛

2] = 𝜎2

[ 1

𝑤1⋯ 0

⋮ ⋱ ⋮

0 ⋯1

𝑤𝑛]

= 𝜎2𝑾−1 (4-1)

where the weight matrix 𝑾 is introduced, 𝜎2 being a multiplicative constant. The

likelihood function of the 𝑝 regression parameters 𝜷 of the general nonlinear model:

𝑦 = 𝑓(𝒙, 𝜷) + 𝜖 (4-2)

is then given by:

𝐿(𝜷|𝑦) =1

√2𝜋|𝐕|exp {−

1

2∑

[𝑦𝑖 − 𝑓(𝒙𝒊, 𝜷)]2

𝜎𝑖𝑖2

𝑛

𝑖=1

}

=∏ √𝑤𝑖𝑛𝑖=1

√2𝜋𝜎exp {−

1

2𝜎2∑𝑤𝑖[𝑦𝑖 − 𝑓(𝒙𝒊, 𝜷)]

2

𝑛

𝑖=1

}

(4-3)

with n the number of collected data points, assuming the errors are normally distributed.

From this, it follows that the heterogeneity of the error variance is bypassed if the error

terms appearing in ordinary least squares regression are scaled by a factor inversely

proportional to the inverse of the error variances, so 𝑤𝑖 = 1/𝜎𝑖𝑖2. The interpretation of this

scaling as a weighing operation follows naturally, as an observation with the lower

variance, or equivalently, a higher precision, will have a stronger contribution to the

ultimate sum of squared residuals. Since knowledge about these values is often limited

and even completely absent, it is not possible to determine or predict 𝑤𝑖 exactly.

Alternative pathways towards an adequate and robust determination of the suited

weighing factors have to be explored.

An often encountered practice to implement a weighted least squares methodology. This

relies on the estimation of the error variance from the observations of replicate

experiments, thus by collecting a set of 𝑚 response values 𝑦𝑖,𝑗, 𝑗 = 1. . 𝑛 under identical

conditions 𝒙𝒊. The underlying reasoning is that the sample variance �̂�𝑖2 associated with

the experimental conditions 𝒙𝒊, given by:


�̂�𝑖2 =

∑ (𝑦𝑖,𝑗 − 𝑦�̅�)2𝑛

𝑗=1

𝑛 − 1 (4-4)

where 𝑦�̅� denotes the mean response value of the 𝑚 experiments, is an adequate and

unbiased estimator for the unknown error variance 𝜎𝑖𝑖2 of the i’th observation [3]. This

methodology is far from optimal: an analysis based on sample variances is very

inefficient unless the number of replicate experiments is very large. Approximations that

rest on a limited number of replications have been reported to be wildly unstable, and

hence introduce an additional level of variability in the regression procedure [4]. The

precision of the resulting parameter estimates is therefore often observed to be inferior to

the outcome of an unweighted regression procedure. Moreover, due to the lacking of a

solid theoretical base, the quality and adequacy of the parameter estimates obtained by

this regression is not ascertained, which makes their physical meaning uncertain [5].

One of the onsets for a more mathematically founded technique to correct for variance

heterogeneity was developed by Box and Hill (1974) [6]. The method finds it origin in the

assumption that the error variance is a monotonic function of the expected value of the

observation. It explicitly proposes the existence of a power law transformation of the

data, given by a function of the a priori unknown transformation parameter 𝜙, which

does show constant variance for all experiments. In general, the statement reads:

𝑦𝑖(𝜙)

= {

|𝑦𝑖|𝜙 − 1

𝜙,𝜙 ≠ 0

log(|𝑦𝑖|) , 𝜙 = 0, 𝑦𝑖 ≠ 0

(4-5)

when accounting for the suggestion of Pritchard, Downie and Bacon (1977) to expand the

methodology to allow for negative response values [2]. Based on this transformation, a

closed expression for the most suited weighing factors is derived:

𝑤𝑖 ∝ |𝑦�̂�|2𝜙−2 (4-6)

where 𝑦�̂� denotes the predicted response value for the i’th observation, based on the

calculated parameter estimates.

It follows that the most suited weighing factors and the parameter estimates are mutually

dependent, hence, the finally calculated values of these target variables have to be

consistent with each other. This forms the basis of the iteratively reweighted least squares

method, an iterative approach in which the weights and model parameters are

repeatedly calculated in a two-stage routine [4]. The corresponding algorithm reads:

1. Calculate a preliminary estimate �̂�∗ of the model parameters by ordinary least

squares minimization;

2. Calculate the weights 𝑤𝑖∗ in accordance to equation (4-6)

3. Determine an updated version �̂�∗ by minimizing of the weighted least squares

estimation ∑ 𝑤𝑖∗[𝑦𝑖 − 𝑓(𝒙𝒊, 𝜷)]

2𝑛𝑖=1

4. Recalculate the weights 𝑤𝑖∗ based on the updated model parameters


5. Repeat steps 3 and 4 𝑁 − 1 times

A clear value for the optimal number of iterations 𝑁 is not provided by literature. It is

often suggested that major gains in the performance of the weighted regression are found

for the first cycles only, and little improvement is gained by further iterations. For any

number of cycles, the asymptotic behavior of the obtained model parameters is well

described under the condition that the starting estimate �̂�∗ is √𝑛-consistent. When this

criterion is met, the final parameter estimator �̂� is asymptotically normally distributed

with mean 𝜷 and covariance matrix:

�̂� = 𝜎2 [∑𝑤�̂�

𝑛

𝑖=1

�̅�𝑓(𝒙𝒊, �̂�) ∙ [�̅�𝑓(𝒙𝒊, �̂�)]T]

−1

(4-7)

with �̅�𝑓(𝑥𝑖, �̂�) the 𝑝 × 1 gradient vector of the model function evaluated at the parameter

estimator, hence �̅�𝑓(𝑥𝑖, �̂�)|𝑘 =𝜕𝑓(𝑥𝑘,𝜷)

𝜕𝛽𝑘|𝜷=�̂�

.

Theoretically, the ideal estimation corresponds to iterating until convergence towards a

self-consistent set of weights and model parameters is achieved, which corresponds

to 𝑁 = ∞. In this case, it is suggested not to run the iterative procedure, but to maximize

directly the joint log-likelihood given by:

log[𝐿(𝜷, 𝜎, 𝜙|𝒚)] =

−𝑛

2log(2𝜋𝜎2) + (𝜙 − 1)∑log(𝑦𝑖)

𝑛

𝑖=1

− (2𝜎2)−1∑[𝑦𝑖(𝜙)

− 𝑓(𝜙)(𝒙𝒊, 𝜷)]2

𝑛

𝑖=1

(4-8)

to obtain the optimal parameter estimate �̂� from its mode. Hence, an explicit

transformation of both the response data and the model predictions appears, which is

why this method is referred to as Power Transformation Both Sides (PTBS) [7, 8]. This is

an important remark, since due to this strict dependence of the response values upon the

previously unknown value of 𝜙, the implementation of this regression may not be

allowed for certain statistical software packages.

Alternatively, Pritchard et al. suggested to estimate the parameters by means of a direct,

Bayesian estimation of the 𝑝 + 1-dimensional vector (𝜷, 𝜙), especially in case of highly

nonlinear model functions. Taking the Jeffreys prior for the standard deviation 𝜎 and a

uniform distribution for the parameter set (𝜷, 𝜙), the joint prior distribution is given by:

𝑝(𝜎, 𝜷, 𝜙) ∝ 𝜎−1 (4-9)

The corresponding joint posterior density function is then obtained by integrating out 𝜎,

leading to the expression:


𝑝(𝜷, 𝜙|𝒚) ∝ √∏𝑤𝑖

𝑛

𝑖=1

∙ [∑{𝑤𝑖[𝑦𝑖 − 𝑓(𝒙𝒊, 𝜷)]}2

𝑛

𝑖=1

]

−𝑛/2

(4-10)

All knowledge about the unknown parameters is described by this distribution. The

modal values (�̂�, �̂�) obtained by maximization of this posterior density function, will

eventually serve as point estimates of the parameters. However, in accordance to the

Bayesian view on regression, the true statistical inference on both 𝜷 and 𝜙 is found in

their highest probability intervals, rather than in point estimates, see section 4.3 for more

details. Assuming that the vector 𝜸 = [𝜷,𝜙] approximately obeys a multivariate normal,

the associated covariance matrix �̃� reads:

�̃� = {�̃�𝑖𝑗} = {−𝜕2log (𝑝(𝜷, 𝜙|𝒚))

𝜕𝛾𝑖𝜕𝛾𝑗|�̂�

}

−1

(4-11)

which is the inverse Hessian matrix of the posterior, evaluated at the modal parameter

values. Closed analytical expressions are available for linear models.

As for ordinary least squares regression, the model adequacy has to be assessed by

examining the residual plots, by ensuring the physical meaning of the obtained

parameter estimates – care has to be taken that not only the modal values are checked,

but rather the entire probability interval – and by performing a lack-of-fit test.

Comparisons of both the Bayesian approach and the PTBS method, showed both the

increased performance compared to ordinary, unweighted least squares as well as the

high mutual resemblance of the finally calculated parameter estimates [9].

One major drawback of this type of variance modelling is the high sensitivity of the

weighing factors, and thus of the final parameter estimates, on the presence of bad

experimental data points. At worst, if the weighing procedure does assign a relatively

high importance to these observations, the outcome of the regression will be

outperformed by the ordinary least squares method. Over the last decades, several

diagnostic techniques have been developed to distinguish outlier points from the reliable

results, and to deal with them appropriately.

4.1.2 Robust estimation and outlier detection

When fitting a model to a set of experimental data, some points have a higher impact on

the regression than others, e.g., when the associated responses strongly differ from those

of the other measurements. Such points often tend to ‘attract’ the regression curve, i.e.

pull it away from what would be the best-fitting line based on the other measurements,

and therefore strongly influence the final estimates of the model parameters. The origin

of these so-called influential points is two-fold. On the one hand, for some specific process

conditions, typically located at the boundaries of the operational range, the phenomenon

under study will probably behave somewhat unexpectedly, i.e. deviating from what is in


line with the other responses. Since such ‘extreme’ behavior contains the highest amount

of information to unravel all subtleties and details of the physicochemical mechanisms

over the entire range of process conditions, the proper inclusion of such unexpected

results in the final regression procedure is beyond dispute [5]. By applying a suited

weighing operation, as suggested above, the overly dominating of such responses on the

fitting procedure is normalized.

In contrast to this class of, in fact valuable, influential points, it is also possible that the

strong deviation of some responses finds its origin in serious errors during the

experiment, e.g., by mistakes in the measurement or analysis steps. The corresponding

observations are intrinsically wrong and do not add any valuable information to the

fitting method. Nevertheless, despite their incorrectness, such measurements will bias

the regression in a similar way as the desired influential data, however, now at the

expense of the reliability of the parameter estimates. Because of their detrimental impact

on the quality of the regression, a proper handling of these measurements is required.

The major issue in dealing with outliers is in their identification. Indeed, as such points

will strongly influence the regression curve, it is not ensured that the actual wrong data

point will be further away from the final fit than the other, correct observations.

Therefore, flagging an outlier based on the inspection of the residuals is often ineffective

[10].

Additionally, literature is not unambiguous about the most effective way to deal with

outlying data points. Several statistical tests have been developed to decide whether an

observation has to be seen as an outlier or not [11]. Since most of them require a

considerable amount of replicate experiments or only detect single outliers, their overall

use in an automated scheme is hindered. Another possible methodology is found in the

field of robust regression, a collective term for all adaptations to the least squares routine

to guard it against violation of the fundamental assumptions underlying regression

theory, including the presence of undesired erroneous data [12]. Although the

information included in outliers is of inferior quality, robust fitting routines do include

those observations in the estimation procedure, yet with a lower ‘weight’ for data which

are located far from the final regression curve. The performance of this rather

conservative approach was reported as insufficient, and its inability to provide reliable

confidence intervals for the parameters is seen as a serious shortcoming. More recently, a

new method has been described which combines the strengths of robust regression with

an automatic removal of outliers. Although the removal of data points which do not fit in

the expected framework is not uncontroversial, the automated nature of the method

blocks the possible infiltration of ad hoc decisions and intentional biasing of the

regression [13].

The first step of the algorithm consists of a robust regression of the model on the

complete data set. This routine differs from ordinary least squares regression since it

explicitly assumes a Lorentzian distribution of the residuals, which is claimed to be less


sensitive to response values that are located further from the ‘ideal’ baseline. Indeed, the

Lorentzian merit function to be minimized for set of experimental errors 𝜀 is given by:

∑ln [1 +𝜀𝑖2

2]

𝑖

(4-12)

which lowers the contribution of strongly deviating data, i.e. observations for which 𝜀𝑖 is

high.

The robust residuals 𝑒𝑅 associated with the experimental data set 𝑦 and the

corresponding model predictions �̂� are defined as:

𝑒𝑅,𝑖 =𝑦𝑖 − �̂�𝑖𝜎𝑅

(4-13)

where 𝜎𝑅 is the robust standard deviation of the residuals (RSDR), given by:

𝜎𝑅 = 𝑃68𝑛

𝑛 − 𝑝 (4-14)

with 𝑃68 the 68.27 percentile of the absolute value of the actual residuals 𝑒, 𝑛 the number

of experiments and 𝑝 the number of model parameters to be estimated. The final

objective function to be minimized by the regression procedure is given by:

∑ln [1 + 𝑒𝑅,𝑖

2 ]

𝑖

(4-15)

which is slightly different from the true merit function in equation (4-12) and therefore

more suited for robust estimation purposes. It is important to note that no user-specified

weighing factors have to be included, as this has a negative impact on the regression

quality for robust techniques. The local minimization of the objective function is done

via, e.g., the Levenberg-Marquardt algorithm. Since 𝜎𝑅 changes upon convergence of the

routine, it is important to use the initial value when evaluating the objective function

value for two subsequent iterations. Before the improvement of the goodness of fit is

determined, it is hence required to recalculate the merit function of the prior iteration

with the most recent value of 𝜎𝑅.

Once the regression has converged and preliminary parameter estimates have been

determined, the corresponding true residuals are calculated. Their absolute values are

then ordered from lowest to highest. It is suggested to set the maximum fraction of

outlying data at 30%; therefore, only those data points with the 30% highest residuals are

subjected to the outlier detection analysis. This procedure consists of an iterative cycle,

according to the following algorithm, for all data points:

For all 𝑖 from ⌈0.7𝑛⌉ to 𝑛

1. Calculate the parameter:


𝛼𝑖 =0.01 ∙ [𝑛 − (𝑖 − 1)]

𝑛 (4-16)

and the variable

�̂� =|𝑒𝑖|

𝜎𝑅 (4-17)

2. Determine the two-tailed P value from the t-distribution with 𝑛 − 𝑝 degrees of freedom,

corresponding to 𝑃𝑟(|𝑡| > �̂�)

3. If 𝑃 < 𝛼𝑖, data point 𝑖 and all observations with higher residuals are outliers and have to

be removed, while the iterative cycle is stopped;

Else, include data point 𝑖 in the set of reliable observations, and repeat the cycle for

observation 𝑖 + 1. If 𝑖 = 𝑁 there are no outliers in the data set.

Once the data set is purified from present outliers, the actual parameter estimation will

be performed using the remaining data by means of weighted least squares routines, like

those introduced in section 4.1.1.

4.2 Accounting for serial correlation of the error

A common procedure to study a chemical reaction in a batch reactor setup is by setting a

number of desired experimental conditions and, subsequently, following the evolution

for each of these systems over time, by periodically taking samples of the reaction

mixture. Since there is no variable feed stream, varying the clock time is the only way to

assess the effect of the residence time of the reaction mixture in the reactor.

Due to the absence of a continuous flow through the reactor, any fluctuation in the

system will remain as ideally no exchange with the environment is possible. This holds in

particular for the variables that are required to elucidate the underlying kinetics: any

source of experimental error that pops up at a certain moment of time will build up in the

reactor and influence the future dynamics of the system. Ultimately, the experimental

results at any following moment will show some trend with respect to the past, or, stated

otherwise, it is expected for any data point to be correlated to the result of the foregoing

measurement.

Experimental data showing some distinct and persistent trend over time are called to be

serially correlated. The occurrence of serial correlation in times series data is a well-

known phenomenon, which is often easily detected [5]. However, tackling the issue is not

that straightforward, as it requires the introduction of an additional error model, which

causes in turn the nonlinear regression procedure to become more complex. Moreover,

since the exact expression for the time dependence of the experimental error is not


available, both the diagnostics and remedies are inevitably restricted to, though well-

funded, approximations.

One of the requirements that have to be fulfilled to ensure that an ordinary least squares

regression returns reliable and accurate estimations of the unknown kinetic parameters,

is the mutual independence of the experimental errors associated with the output of the

performed measurements. Only if this criterion is met, the regression procedure will

return both the best (in terms of the maximization of the likelihood function) and the

most efficient (in terms of the amount of required information input) estimates for the

model parameters [14]. However, for an increasing degree of mutual correlation of the

errors on the experimental output, the improvident application of ordinary least squares

technique to a non-linear model will result in parameter estimates that are strongly

biased and inefficient while having estimated variances that deviate from their actual

values in an unpredictable manner [1]. To circumvent these potential pitfalls, a measure

for the degree of serial correlation for a certain situation has to be defined, and a suitable

correction procedure has to be applied.

4.2.1 Explicit modelling of the serial correlation of the error term

One possible way to model the time-dependent experimental error 𝜀(𝑡) of a series of

continuously measured data that are mutually correlated over time is by a so-called

autoregressive model of order 𝑞, i.e. 𝐴𝑅(𝑞):

𝜀(𝑡) =∑𝜌𝑖𝜀(𝑡 − 𝑖)

𝑞

𝑖=1

+ 𝑢(𝑡) (4-18)

By this definition, besides the term 𝑢(𝑡) part of the experimental error at a certain

moment t consists of a contribution of the error terms at foregoing moments, the order 𝑞

of the autoregression denoting the number of time steps incorporated. By assuming the

time series to be weakly stationary, the prefactor 𝜌𝑖, named the autocorrelation function

of the error, is given by:

𝜌𝑖 ≔ 𝜌(𝑖) = 𝑐𝑜𝑟𝑟[𝜀(𝑡), 𝜀(𝑡 − 𝑖)], ∀𝑡 (4-19)

and depends only on the time shift 𝑖 and not on the absolute clock time passed since the

start of the measurements. Hence, the correlation between the errors on two different

data points only depends on the distance in time that separates them [1]. Because of

equation (4-19), the following restriction on holds:

0 ≤ |𝜌𝑖| ≤ 1, ∀𝑖 (4-20)

The signal 𝑢(𝑡) in equation contains the contribution to the experimental error that

originates solely from the moment of sampling itself. To bridge the gap with the

experimental error under the classical assumptions for which ordinary least squares is

valid, the signal is believed to behave like white noise, hence obeying:


𝐸(𝑢(𝑡)) = 0;

𝑉𝑎𝑟(𝑢(𝑡)) = 𝜎𝑢2;

𝐶𝑜𝑣(𝑢(𝑡), 𝑢(𝑠)) = 0, ∀𝑡, 𝑠

The first criterion expresses the unbiasedness of the noise signal by stating that its time

average value is equal to zero; any instantaneous deviation of the value of the measured

variable is hence compensated over time. The requirement on the variance of the signal

limits the uncertainty of the signal to a finite and constant value 𝜎𝑢. The last characteristic

reflects the zero autocorrelation of the white noise at any moment of measuring. Stating

things this way, it follows readily that the white noise contribution is equivalent to the

experimental error under the classical assumptions that are made for the application of

ordinary least squares regression.

Due to its simplicity and its satisfactory capability to tackle the major part of the serial

correlation issue, the 𝐴𝑅(1) model is the most commonly used technique to describe the

relation between experimental error at different moments, allowing for the detection of a

significant degree of serial correlation by either a graphical way or by calculating a test

criterion having a closed form. The general definition in equation (4-18) hence simplifies

to:

𝜀(𝑡) = 𝜌𝜀(𝑡 − 1) + 𝑢(𝑡) (4-21)

so that only one autocorrelation function needs to be determined.

By plotting the couples of residuals [𝑒(𝑖), 𝑒(𝑖 − 1)] in a so-called lag-plot obtained by

ordinary least squares regression for all data points of the time series, a potential mutual

correlation of the errors will be readily visible in the form of a distinct trend, whereas a

chaotic scattering of the data points is characteristic for uncorrelated noise, as is shown in

Figure 4-2.

Figure 4-2 Typical lag-plots of the residuals of an uncorrelated (left) and

positively, first-order correlated (right) data set

-2

0

2

-2 0 2

et

et-1

-2

0

2

-2 0 2

et

et-1


Besides, a distinct test criterion has been developed to quantify whether the degree of

serial correlation is sufficiently high to become considerable. The Durbin-Watson test

was originally derived to detect 𝐴𝑅(1) relations between the experimental errors of linear

models, but meanwhile its approximate validity for nonlinear models has been described

as well. Moreover, signals showing an order of time-dependence higher than 1 are

equally failed for the test as well, making it a robust and tool to diagnose serial

correlation of higher order tool [5].

The starting point of the Durbin-Watson analysis of independence is the test statistic:

𝑑 =∑ [𝑒(𝑖) − 𝑒(𝑖 − 1)]2𝑛𝑖=2

∑ [𝑒(𝑖)]2𝑛𝑖=1

(4-22)

Hence, the higher the correlation between the residuals, the lower the value of the test

statistic becomes. The thus obtained value serves as criterion to determine the

significance of the serial correlation: the null hypothesis of independence, 𝐻(0): 𝜌 = 0, is

compared to the alternative hypotheses 𝐻𝑎1: 𝜌 > 0 and 𝐻𝑎

2: 𝜌 < 0. The null-hypothesis is

rejected with a certainty 1 − 𝛼 if 𝑑 < 𝑑𝐿,𝛼, accepted if 𝑑 > 𝑑𝑈,𝛼 and inconclusive if 𝑑𝐿,𝛼 <

𝑑 < 𝑑𝑈,𝛼. Numerical values for the critical numbers 𝑑𝐿,𝛼 and 𝑑𝑈,𝛼 depend on both the

number of experimental data points and regression parameters and are tabulated in

literature [15, 16]. To circumvent the issue of a region for which no reliable conclusion

can be drawn, it is suggested to treat it as part of the rejection zone.

Once the significance of serial correlation has been ascertained, a procedure is started to

properly correct for the presence of correlation in the errors. Two different pathways are

available: the first constructs an adapted form of the OLS likelihood function and yields,

after maximization, an asymptotically efficient estimator of the kinetic parameters.

Alternatively, the choice may fall on an iterative scheme, performing a cyclic calculation

in which the regression parameters and the autocorrelation function are updated until

finally convergence is reached.

Given that equation (4-19) holds for the experimental error at every sampling point, the

covariance of the errors of the measurements separated by a time distance k is given by:

𝑐𝑜𝑣(𝜀(𝑡), 𝜀(𝑡 − 𝑘)) = 𝜌𝑘 (4-23)

Therefore, the covariance matrix of the vector of experimental errors, all following

an 𝐴𝑅(1) model, is given by:

𝑽 ≔ 𝑽(𝜺) =1

1 − 𝜌2[

1𝜌⋮

𝜌𝑛−1

𝜌1⋮

𝜌𝑛−2

𝜌2

𝜌⋮

𝜌𝑛−3

⋯⋯⋱⋯

𝜌𝑛−1

𝜌𝑛−2

⋮1

] 𝜎𝑢2 (4-24)

with 𝜀 = [𝜀(1)… 𝜀(𝑛)]𝑇.

The joint probability density of the vector of experimental errors 𝜀, still assumed to

follow a multidimensional normal distribution, is then given by:


𝑝(𝜺|𝜷, 𝜌) =

1

√2𝜋|𝑽|𝑛 𝑒𝑥𝑝 {−𝑆(𝜷, 𝜌)} (4-25)

with the extended sum of squares 𝑆(𝜷, 𝜌) given by:

𝑆(𝜷, 𝜌) = 𝜺𝑻𝑽−1𝜺 = [𝒚 − 𝒇(𝒙,𝜽)]𝑇𝑽−1[𝒚 − 𝒇(𝒙,𝜷)]

=1

𝜎𝑢2[(1 − 𝜌2)[𝑦1 − 𝑓(𝒙1,𝜷)]

𝟐 +∑[𝑦𝑖 − 𝑓(𝒙𝑖 ,𝜷) − 𝜌(𝑦𝑖−1 − 𝑓(𝒙𝑖−1,𝜷))]2

𝑛

𝑖=2

] (4-26)

the latter expression resulting after the introduction of (4-24) and elaborating. As in

ordinary least squares, maximization of the joint probability density yields the maximum

likelihood estimates of the kinetic parameters 𝜷, the autocorrelation function 𝜌 and the

white noise variance 𝜎𝑢2 simultaneously. Under stringent restrictions, the consistency,

asymptotic normality and independence of the regression parameters are established.

As an alternative to the maximization of the joint probability density, an iterative

procedure can be adopted. A possible scheme starts with a regression of the nonlinear

model by use of ordinary least squares. The set of residuals 𝑒 that is obtained this way

serve as the basis to determine an approximation of the autocorrelation 𝜌:

�̂� =∑ 𝑒(𝑖) ∙ 𝑒(𝑖 − 1)𝑛𝑖=2

∑ [𝑒(𝑖)]2𝑛𝑖=1

(4-27)

being the value of 𝜌 that minimizes equation (4-26) for the set of kinetic parameters

obtained by OLS. Given this value, the regression parameters are updated by finding the

value that minimizes the approximated sum of squares:

�̂�(𝜷) = [𝒚 − 𝒇(𝒙,𝜷)]𝑇�̂�−1[𝒚 − 𝒇(𝒙,𝜷)]

=1

𝜎𝑢2[(1 − �̂�2)[𝑦1 − 𝑓(𝒙1,𝜷)]

𝟐 +∑[𝑦𝑖 − 𝑓(𝒙𝑖 ,𝜷) − �̂�(𝑦𝑖−1 − 𝑓(𝒙𝑖−1,𝜷))]2

𝑛

𝑖=2

] (4-28)

This cycle is repeated until convergence. Under certain regularity conditions, the thus

obtained set of kinetic parameters shares the same asymptotic distribution as the results

of direct maximization of the likelihood function. The covariance matrix of the estimated

model parameters then reads:

4.2.2 Second-order statistical regression

An alternative approach to deal with the cross-correlation between the experimental

errors in time series data has recently been developed by Roelant and al [17]. Their

statistical technique, called Second-Order Statistical Regression, is mathematically well-

founded and enables a more or less explicit estimation of the error variance matrix based

on replicate experiments. The most relevant aspects of this methodology will be shortly

discussed below.


Let the matrix 𝒚 ∈ ℝ𝑛×𝑣 represent all observations for 𝑛 sets of experimental conditions

with 𝑣 measured responses, so that:

𝑦 = [

𝑦11 … 𝑦1𝑣

⋮ ⋱ ⋮𝑦𝑛1 … 𝑦𝑛𝑣

] (4-29)

where each element 𝑦𝑖𝑗, 𝑖 = 1. . 𝑛, 𝑗 = 1. . 𝑣 contains all data points obtained during time-

series measurements at 𝑛𝑡 subsequent moments:

𝑦𝑖𝑗 = [

𝑦𝑖𝑗(𝑡1)

⋮𝑦𝑖𝑗(𝑡𝑛𝑡)

] (4-30)

If 𝑛𝑟𝑖 replicate experiments are available for each set of experimental conditions:

𝑦𝑖𝑗: 𝑦𝑖𝑗(1), 𝑦𝑖𝑗

(2), … , 𝑦𝑖𝑗𝑛𝑟𝑖 (4-31)

the approximate 𝑛𝑡 × 𝑛𝑟𝑖 error matrix 𝐸𝑖𝑗 of the time series is defined as:

𝐸𝑖𝑗 = [

⋮ ⋮ ⋮𝑦𝑖𝑗

(1) − 𝑦𝑖𝑗

… 𝑦𝑖𝑗(𝑛𝑟𝑖) − 𝑦

𝑖𝑗

⋮ ⋮ ⋮

] (4-32)

with 𝑦𝑖𝑗

the average observation for that particular time-series. The variance matrix of the

actual experimental error of the time series 𝑉(𝜀𝑖𝑗) is then approximately given by the

variance matrix of the sample :

�̂�(𝜀𝑖𝑗) =1

𝑛𝑟𝑖 − 1𝐸𝑖𝑗𝐸𝑖𝑗

𝑇 (4-33)

Since this matrix is symmetric and positive definite, there exists an eigenvalue

decomposition:

�̂�(𝜀𝑖𝑗) = �̂�𝑖𝑗 ∙ Λ̂𝑖𝑗 ∙ �̂�𝑖𝑗𝑇 (4-34)

where Λ̂𝑖𝑗 is the 𝑛𝑡 × 𝑛𝑡 diagonal matrix with the positive eigenvalues of �̂�(𝜀𝑖𝑗) as

diagonal elements, ordered from high to lower values. The 𝑛𝑡 × 𝑛𝑡 matrix �̂�𝑖𝑗 contains the

associated eigenvectors at its columns. When transforming the experimental data sets

according to:

𝑦𝑖𝑗′ = �̂�𝑖𝑗 ∙ 𝑦𝑖𝑗 (4-35)

the components of the associated transformed error vector 𝜀𝑖𝑗′ are given by:

𝜀𝑖𝑗′ = �̂�𝑖𝑗 ∙ 𝜀𝑖𝑗 (4-36)

and mutually uncorrelated, since the corresponding variance matrix �̂�(𝜀𝑖𝑗′ ) reads:

�̂�(𝜀𝑖𝑗′ ) = Λ̂𝑖𝑗 (4-37)

Moreover, after performing an additional scaling operation on the transformed data:


𝑦𝑖𝑗′′ = Λ̂𝑖𝑗

−1/2∙ �̂�𝑖𝑗 ∙ 𝑦𝑖𝑗 (4-38)

the resulting error vector

𝜀𝑖𝑗′′ = Λ̂𝑖𝑗

−1/2∙ �̂�𝑖𝑗 ∙ 𝜀𝑖𝑗 (4-39)

becomes virtually homoscedastic, as the corresponding variance matrix reads:

�̂�(𝜀𝑖𝑗′′) = 𝐼𝑛𝑟𝑖−1 (4-40)

The pitfall of the analysis lies in the approximate character of the variance matrix

obtained above. Hence, the above equation does not hold unambiguously for the actual,

unknown error variance matrix, and reservation has to be made when it is concluded

from the analysis above that:

𝑉(𝜀𝑖𝑗′′) ≈ 𝐼𝑛𝑟𝑖−1 (4-41)

The homoscedasticity of the errors associated with different measurements in the same

time-series and their mutual independence is therefore only approximately valid when

regressing the data points modified by the suggested transformations. Moreover, due to

the dimension reduction during the transformation, some statistical information is lost

inevitably, at the expense of wider individual confidence intervals for the model

parameters.

4.3 Bayesian statistical assessment

4.3.1 A Bayesian view on parameter estimation

Classical approaches to determining the unknown parameters of an – often nonlinear –

model rely fundamentally on the idea that each of these variables has a fixed value,

waiting for elucidation by the experimenter or regressor. To obtain them, typically

experimental results are collected for a wide range of process conditions, which quantify

the impact of different input parameters on the process under study. Based on that

limited data set, the proposed model curve, stating an underlying mechanism derived

from theoretical principles, is fitted by varying the unknown model parameters that have

to be determined. The set of parameter values which minimizes the objective function of

the regression is then considered as the most likely estimate for the unknown model

parameters. The predicted responses calculated based on the resulting model statement

will differ from the experimentally obtained values, which allows for the determination

of confidence intervals around the point estimates, where the model parameters will be

located with a certain probability. Calculation of these regions requires the explicit

allocation of a specific probability distribution to the model parameters and a

linearization of the model function in the neighbourhood of the point estimates


subsequently. Logically, the broader these intervals, the lower the quality of the

regressed parameters will be. Predictions of the future outcome for different process

conditions will be then based on these point estimates.

This procedure is the outcome of the frequentist view on parameter estimation, treating

the actual, unknown model parameters as fixed values, derivable from experimental data

as long as sufficient of these observations are corrected [18]. Although widely applied,

the technique is vulnerable to some fundamental criticism on the validity of its statistical

framework. As an alternative, the Bayesian approach to parameter estimation reasons

from an entirely different starting point [13, 19]. In contrast to the frequentist maximum

likelihood estimation, it considers the exact determination of the unknown model

parameters as impossible when only a limited set of experimental data is available.

Indeed, since information on the full response behaviour of the model, i.e. over the entire

range of possible process conditions, is incomplete, only approximate conclusions are

possible from any statistical analysis. Model parameters are therefore better treated like

statistical variables, characterized by a probability distribution, rather than by exact

values. The actual inference obtained from a Bayesian estimation procedure is hence a

confidence interval comprising a certain, user-defined probability density, rather than a

point estimate. Since these intervals are determined based on a sampling routine, as will

be discussed below, rather than a local linearization of the model function, the need for

forcing the unknown model parameters into a multivariate normal distribution is

avoided.

The interpretation of the model parameter vector as a statistical variable allows for

assessing its estimation by means of classical principles of probabilistic calculus. As will

follow from the more detailed elaborations below, this approach enables an elegant, and

therefore attractive, solution to cope with the unknown correlational structure of the

experimental error. Hence, Bayesian estimation does not require the stringent restrictions

on the regularity of the error like classical regression schemes. Additionally, Bayesian

estimation enables a more explicit capture of the available knowledge and insights about

the studied phenomenon before any experiment has been conducted. The experimenter

does have some so-called prior information about the range in parameter space where the

final parameter values are most probably located, based on personal research experience

or available literature. For example, both the pre-exponential factors and the activation

energy present in rate coefficients appearing in chemical reaction networks, require a

positive value to be physically meaningful. Moreover, the order of magnitude of these

parameters is typically readily obtained from literature, by comparison of the studied

case to similar studies performed in the past. This way, it follows that the candidate

regions in parameter space in which the pursued set of kinetic parameter estimates is

located, is drastically reduced. This particular feature of Bayesian estimation strongly

differs from regression analysis, as the only way to include prior experience in this


classical approach is by choosing a suitable initial guess for the local minimization

routine, based on for example the Levenberg-Marquardt algorithm. It goes without

saying that this rather indirect way of transferring readily available insights about the

model into the estimation procedure is far less efficient than the Bayesian approach.

Nevertheless, it can be argued that the incorporation of prior information entrains the

change of adding wrong insights to the estimation procedure as well. Hence, though

often presented as a beneficial feature, it will potentially hinder the statistical analysis

instead of improving it. Unfortunately, at the moment of writing no benchmark analysis

on the impact of bad prior information on the performance of Bayesian methods was

available. Hence, a cautious approach, like those discussed below, has to be followed.

4.3.2 Bayesian parameter estimation

Named after the pioneering investigator of conditional statistics, the Bayesian approach

towards the estimation of parameters in modelling is based on the well-known statement

of conditional probability, which reads:

𝑝(𝐴|𝐵)𝑝(𝐵) = 𝑝(𝐵|𝐴)𝑝(𝐴) (4-42)

where 𝑝(𝐴|𝐵) denotes the probability that an event A is observed when it is given that

event B happens, or, stated different, the chance of an event A to occur conditional on

event B. The reformulation of the estimation of unknown model parameters from a

limited number of experimental data into such a conditional framework follows quite

straightforwardly when approaching it from a Bayesian point of view. Indeed, since a

Bayesian philosophy accounts explicitly for the intrinsic uncertainty on the final findings

about the model parameters obtained from the estimation procedure, the relation of these

estimated confidence intervals and the experimental data set they are being obtained

from is very strong. Speaking in conditional terms, the results of the estimation

procedure are hence conditional to the choice of that particular set of observations that

was fed to the routine. When adopting this way of reasoning, the following statement

expressing the mutual dependency of the experimental data 𝒚 (an 𝑛 ×𝑚 matrix in

general, 𝑛 being the number of experiments performed, 𝑚 denoting the number of

outputs measured for each experiment) and the set of unknown model parameters 𝜷

holds:

𝑝(𝜷|𝒚) =𝑝(𝒚|𝜷)𝑝(𝜷)

𝑝(𝒚) (4-43)

for a non-linear model given in general terms as:

𝑦𝑖𝑗 = 𝑓𝑗(𝒙𝒊, 𝜷) (4-44)

which describes the j’th output for the i’th performed experiment, for experimental

conditions given by the vector 𝒙𝑖.


The prior probability density function 𝑝(𝜷) reflects the knowledge or intuition of the

researcher about the yet unknown parameters before any experiment has been

performed. Therefore, this statistical function does not depend on the experimental

output 𝑦. Prior information is expressed in different ways, depending on the level of the

insights present at the moment of the analysis. In the limiting case in which no

information is available, the choice falls on an unprejudiced prior function which assigns

an equal likelihood to every point of the parameter space, or, in particular, of those

regions of parameter space that have not been discarded based on prior beliefs.

Analytical expressions for these so-called non-informative priors are available based on the

theory developed by Jeffreys [20]. Those priors that are suggested for implementation in

a regression strategy for physicochemical modelling will be briefly discussed in what

follows. Likewise, any prior density function that does favour a certain region in

parameter space compared to others is being referred to as informative. It goes without

saying that these latter functions are of main interest for application when having an

advanced insight in the physicochemical phenomenon under study. It is worth

mentioning that, although prior functions are named density functions, it is not strictly

required that these functions are normalized. As long as the combination of prior

distribution and likelihood function yields an integrable and normalized posterior

density, the prior is free to take any reasonable form. Prior functions that do not obey the

criterion of normalization, e.g., a uniform density function for all positive parameter

values, are called improper. Likewise, priors that do integrate up to a finite value in

parameter space are called proper.

The factor 𝑝(𝒚|𝜷) has already been touched briefly in the above introduction of the prior

density function. It represents the probability of a certain experimental output set to be

obtained if the unknown model parameter values should nevertheless be known. To

solve this paradoxical situation, this probability function is handled by means of Fisher’s

interpretation of the likelihood function 𝐿(𝜷|𝒚) in case the experimental output is known.

This analysis showed that the following equality holds:

𝐿(𝜷|𝒚) ≔ 𝑝(𝒚|𝜷) (4-45)

which tackles the issue of the problematic interpretation of this factor. In contrast to the

original probability density, expressions for likelihood functions are available and their

characteristics are well understood. A more refined discussion of the likelihood functions

in case of physicochemical modelling is given below.

The final probability density that demands elucidation is the factor in the denominator in

the right-hand side of equation (4-43). As was mentioned above, this probability function

solely is a function of the experimental output alone, not depending on the unknown

model parameters. An explicit expression of this function is given by:


𝑝(𝒚) = ∫ 𝑝(𝒚|𝜷)𝜷

𝑝(𝜷)𝑑𝜷 (4-46)

Although this equation follows directly from the normalization criterion on the posterior

density function, the practical calculation of this integral is a non-trivial task when

studying non-linear models. Indeed, while it is demonstrated theoretically that the

likelihood function of the parameters for linear models takes a nice multivariate normal

form, the likelihood for their non-linear counterparts can be highly irregular in turn, as

illustrated in Figure 4-3. Therefore, the calculation of a closed analytical solution for

expressions like (4-46) is strongly hindered.

Figure 4-3 Typical likelihood distribution for the parameters in a linear (left) and non-

linear model (right) with two parameters

The posterior density function 𝑝(𝜷|𝒚) captures all available information, improving the

original, prior beliefs about the model parameters with the inference entrained by the

executed experiments. Logically, any statistical analysis about the model parameters will

be based on the behavior and characteristics of this probability distribution. Given the

difficulty of calculating the integral in equation (4-46), the original expression for the

posterior density function is often shortened to:

𝑝(𝜷|𝒚) ∝ 𝐿(𝜷|𝒚)𝑝(𝜷) (4-47)

, i.e. due to the lack of knowledge about the exact value for 𝑝(𝒚), replaced by a

proportional statement. Indeed, any study and analysis based on this new expression

will yield the same qualitative information as for the original statement, except for a

constant scaling factor. This feature is commonly used on Bayesian routines as

implemented in statistical software, e.g. Athena Visual Studio. Unfortunately, as will be

extensively discussed below, the Bayesian analysis in these simulation packages will

often stop at this point due to an inability to calculate the posterior density function

efficiently. To cope with this issue, the routines will assume explicitly that the unknown

posterior density function takes a predetermined, easily manipulable form. It is believed

that any statistical findings for the model parameters, e.g. on their confidence intervals,


which are based on that associated expression, will resemble the true statistical inference

from the original posterior closely.

Because it is intuitively clear that the correctness of the above strategy is doubtful, and

because some computational methods have been developed that get the calculation of

equation (4-46) within reach, at least approximately, a different approach will be

described in what follows. Combining the best of both worlds, the method will be based

on the strong theoretical framework which underlies the Bayesian routines in Athena

Visual Studio, followed by a quantitative assessment of the posterior density function by

means of Markov Chain Monte Carlo (MCMC) sampling schemes. These techniques,

with the Metropolis-Hastings and Gibbs algorithms as the most prominent exponents,

have been frequently suggested in literature as elegant and computationally efficient way

to approximate the posterior density function by evaluating deliberately taken samples

from parameter space [21, 22]. A more extended discussion of sampling methods will be

given in Section 4.3.4.2.

4.3.3 Posterior density distribution for relevant scenarios in

kinetic parameter estimation

The determination of an analytical expression for the likelihood that a particular set of

values equals the unknown model parameters based on a given set of experimental data,

is completely similar to the analysis underlying the classical maximum likelihood

approach. Assuming an additive error model, the link between the actually observed

response and the model prediction is given by:

𝑦𝑖𝑗 = 𝑓𝑗(𝒙𝒊, 𝜷) + 𝜀𝑖𝑗 (4-48)

for the 𝑗’th response and the 𝑖’th experiment, where 𝜀𝑖𝑗 gives the corresponding

experimental error. The error matrix 𝜺 = {𝜀𝑖𝑗}𝑖=1..𝑛𝑗=1..𝑚

is assumed to obey a multivariate

normal distribution, with expected value 𝟎𝑛×𝑚 and an a priori unknown covariance

matrix 𝑽. By allowing for a multi-response character of the model to keep the scope of the

discussion as wide as possible, this covariance matrix is 4-dimensional in general, with

𝑽 = {𝑉𝑖𝑗𝑘𝑙} = {𝐸(𝜀𝑖𝑗𝜀𝑘𝑙)}𝑖,𝑘=1..𝑛𝑗,𝑙=1..𝑚

∈ ℝ𝑛×𝑚×𝑛×𝑚, which strongly complicates the

mathematical framework of the statistical analysis. Several authors have been working

on the solution of this issue and proposed additional assumptions on the error

covariance structure to obtain closed and practically useful analytical expressions [23-25].

Two of those approaches have been reported to be useful in practice, and will therefore

be discussed below. The latter introduces an additional level of complexity compared to

the first approach, which allows for a more general application at the expense of a

tougher implementation.


A first step in the reduction of the high-dimensionality of the error covariance matrix 𝑽

was suggested by Box and Draper (1965) and further elaborated by Stewart et al. (1981)

[23, 26]. Therein, it is assumed that the covariance matrix of the experimental errors

between the different responses of one particular experiment is equal for all experiments,

up to a scaling factor. In short:

𝚺𝑖 = {𝐸(𝜀𝑖𝑗𝜀𝑖𝑙)}𝑗=1..𝑚𝑙=1..𝑚

=1

𝑤𝑖𝚺, 𝑖 = 1. . 𝑛 (4-49)

where 𝚺𝑖 = 𝑽𝑖,:,𝑖,:. 𝑤𝑖 is the weighing factor corresponding to the 𝑖’th experiment and has

to be specified by the user explicitly. All correlations between observations from different

experiments are assumed to be 0. To keep the scope of the reasoning as wide as possible,

𝜮 will be handled as a full, completely unknown 𝑚 × 𝑚 matrix. In this case, the

likelihood function of the model parameters associated with the 𝑖’th experiment is given

by:

𝑝(𝒚𝑖|𝜷, 𝚺) =1

√2𝜋|𝚺|exp {−

1

2∑ 𝑤𝑖𝜎𝑗𝑙[𝑦𝑖𝑗 − 𝑓𝑗(𝒙𝒊, 𝜷)][𝑦𝑖𝑙 − 𝑓𝑙(𝒙𝒊, 𝜷)]

𝑚

𝑗,𝑙=1

} (4-50)

where 𝜮−1 = {𝜎𝑗𝑙}𝑗=1..𝑚𝑙=1..𝑚

. Since the reduced covariance matrix is yet unknown, it is

explicitly taken as an argument of the function, besides the actual model parameters.

Following the assumption of non-correlation between the different experiments, the

following statement for the likelihood function for the complete set of observations

holds:

𝐿(𝜷, 𝚺|𝒚) ≔ 𝑝(𝒚|𝜷, 𝚺) =∏𝑝(𝒚𝑖|𝜷, 𝚺)

𝑛

𝑖=1

=1

√2𝜋|𝚺|𝑛 exp {−

1

2∑ ∑ 𝑤𝑖𝜎𝑗𝑙[𝑦𝑖𝑗 − 𝑓𝑗(𝒙𝒊, 𝜷)][𝑦𝑖𝑙 − 𝑓𝑙(𝒙𝒊, 𝜷)]

𝑚

𝑗,𝑙=1

𝑛

𝑖=1

}

(4-51)

Upon introduction of the auxiliary vector 𝒗(𝜷) containing the residual sum of squares for

the given by:

𝒗(𝜷) = {∑𝑤𝑖[𝑦𝑖𝑗 − 𝑓𝑗(𝒙𝒊, 𝜷)][𝑦𝑖𝑙 − 𝑓𝑙(𝒙𝒊, 𝜷)]

𝑛

𝑖=1

}

𝑗=1..𝑚

𝑙=1..𝑚

(4-52)

the expression of the likelihood function is significantly simplified:

𝐿(𝜷, 𝚺|𝒚) = (2𝜋|𝚺|𝑛 2⁄ )−1exp {−

1

2∑[𝒗(𝜷)𝚺−1]𝑗𝑗

𝑚

𝑗=1

}

∝ |𝚺|−𝑛 2⁄ exp {−1

2∑[𝒗(𝜷)𝚺−1]𝑗𝑗

𝑚

𝑗=1

}

(4-53)


which is a function of both the set of model parameters to be estimated and the unknown

covariance matrix of the experimental errors.

To close the determination of unknown parameters in a Bayesian framework, an

analytical expression for the prior function has to be identified. Although the exact form

of the covariance matrix 𝚺 of the experimental error is not known, its explicit appearance

in the expression for the likelihood function requires that a certain value has to be

provided for it, to allow for any inference on the model parameters 𝜷. To avoid this need

for an a priori, user-specified and hence potentially inaccurate covariance matrix, a good

practice is to treat 𝚺 as an additional variable to be determined and capture it in the

estimation procedure. Since both the model parameters and the covariance matrix of the

experimental errors are unknown in advance of the experimental program, an analytical

expression has to be found for the joint prior function 𝑝(𝜷, 𝜮). This search is significantly

facilitated by the assumption that 𝜷 and 𝜮 are not correlated, which allows for the joint

prior density function to be factorized:

𝑝(𝜷, 𝜮) = 𝑝(𝜷)𝑝(𝜮) (4-54)

so that the focus is now in finding the separate priors of both unknowns. This

assumption is intuitively convincing, as the vector of model parameters and the

covariance matrix of the experimental error, a measure for the quality of the performed

experiments, represent in fact two different things, which makes it reasonable to suggest,

at least preliminary, their mutual dependence.

Since the prior insights on the error covariance matrix are typically scarce, the choice for

a non-informative prior function is seen as a cautious yet useful option. An unprejudiced

prior density function based on the method of Jeffreys was obtained for the covariance

matrix of the experimental errors as:

𝑝(𝜮) ∝ |𝜮|−(𝑚+1) 2⁄ (4-55)

where 𝑚 still denotes the number of responses [20].

The definition of a similar non-informative prior for the vector of model parameters is

not as straightforward, since its strong dependence on the specific structure of the model

hinders the stipulation of a generally applicable expression. Up to the moment of writing,

complete theoretical analyses were only found for the choice of a uniform prior in the

allowed range for the model parameters:

𝑝(𝜷) = {1, 𝜷𝑚𝑖𝑛 ≤ 𝜷 ≤ 𝜷𝑚𝑎𝑥

0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒 (4-56)

It goes without saying that the available knowledge and insights about model

parameters is often appreciably higher than just having a notion about the permitted

ranges in parameter space. Information available from literature or obtained from

techniques like preliminary isothermal regression for chemical kinetics allow for a more

explicit expression of the prior beliefs on the model parameters, e.g. in the form of a

distribution. At the moment of writing, full theoretical analyses are however only


reported for prior functions like (4-56), and most, if not all, available computational

routines for Bayesian parameter estimation like, e.g. Athena Visual Studio, are making

the same assumption. A distinct and well-documented comparison of the performance of

different prior functions and their impact on the quality of the final inference on the

model parameters is not available.

Updating of equation (4-47) for the posterior density function with the results from (4-53)

and (4-55) yields:

𝑝(𝜷, 𝜮|𝒚) ∝

{

|𝜮|−(𝑚+𝑛+1) 2⁄ 𝑒𝑥𝑝 {−

1

2∑[𝒗(𝜷)𝜮−1]𝑗𝑗

𝑚

𝑗=1

}

0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒

,𝜷𝑚𝑖𝑛 ≤ 𝜷 ≤ 𝜷𝑚𝑎𝑥 (4-57)

giving the fullest possible information on both the model parameters and the error

covariance matrix, appearing both as arguments of this joint probability distribution.

Removal of the covariance matrix from this expression to obtain the marginal posterior

distribution of the model parameters alone is achieved by replacing 𝜮 by the most

probable value �̃�(𝜷) for each value of 𝜷 [27]. The modified posterior density function

then obeys:

�̃�(𝜷|𝒚) ≔ 𝑝(𝜷, �̃�(𝜷)|𝒚) ∝ |𝒗(𝜷)|−(𝑚+𝑛+1) 2⁄ (4-58)

on the condition that 𝒗(𝜷) is non-singular, still in the permitted range for 𝜷. When

minimizing |𝒗(𝜷)|, the resulting modal point �̂� serve as the Bayesian alternative to the

point estimates obtained from non-linear regression analysis. Due to the replacement of

the unknown covariance matrix of the error by an optimal ‘representative’, the user does

no longer have to make any assumptions on the correlational structure of the

experimental errors prior. For example, in case of inter-response heteroscedasticity, an

optimization of the covariance matrix, in that situation an 𝑚 × 𝑚 diagonal matrix with

non-constant elements, will result in a proper weighing of the error on the responses by

itself, instead of requiring user-specified values as an input.

The assumption of a constant covariance matrix 𝜮 of errors for different responses,

independent of the experimental conditions, is not generally valid. The analysis of Box

and Draper (1972) acknowledged the issue of non-homogeneous variances associated

with different experiments and suggested an extension of the approach discussed above

[24]. This way, the former restriction on the error covariance structure posed by equation

(4-49) is abandoned and an inter-response covariance matrix is now considered for each

experiment separately, obeying:

𝚺𝑖 = {𝐸(𝜀𝑖𝑗𝜀𝑖𝑙)}𝑗=1..𝑚𝑙=1..𝑚

, 𝑖 = 1. . 𝑛 (4-59)

Still assuming the zero-correlation between the errors of the different experiments

mutually, the likelihood function reads for this situation reads:


𝑙(𝜷, 𝚺|𝒚) = (2𝜋)−𝑚 2⁄ ∏[|𝚺𝑖|−1/2 𝑒𝑥𝑝{−

1

2∑ 𝜎𝑗𝑙

(𝑖)𝑣𝑗𝑙(𝑖)

𝑚

𝑗,𝑙=1

}]

𝑛

𝑖=1

(4-60)

where 𝚺𝑖−1 = {𝜎𝑗𝑙

(𝑖)}𝑗=1..𝑚

𝑙=1..𝑚, and:

𝒗(𝑖)(𝜷) = {𝑣𝑗𝑙(𝑖)}𝑗=1..𝑚

𝑙=1..𝑚= {[𝑦𝑖𝑗 − 𝑓𝑗(𝒙𝒊, 𝜷)][𝑦𝑖𝑙 − 𝑓𝑙(𝒙𝒊, 𝜷)]}𝑗=1..𝑚

𝑙=1..𝑚 (4-61)

Application of the Jeffreys invariant prior for the ensemble {𝚺𝑖} of the unknown

covariance matrices reads:

𝑝({𝚺𝑖}) ∝∏|𝚺𝑖|−(𝑚+1) 2⁄

𝑛

𝑖=1

(4-62)

and assuming a uniform prior function for the model parameters as given in equation (4-

56), the analogue of equation (4-58) becomes:

�̃�(𝜷|𝒚) ≔ 𝑝(𝜷, 𝜮�̃�(𝜷)|𝒚) ∝∏| 𝒗(𝑖)(𝜷)|−(𝑚+2) 2⁄𝑛

𝑖=1

(4-63)

on the condition that 𝒗(𝑖)(𝜷) is non-singular, for 𝑖 = 1. . 𝑛, and valid in the relevant

regions in parameter space. This feature has a high potential for unbiased parameter

estimation. Replacing the unknown error covariance matrices by an optimal candidate

allows for a kind of automatic weighing of the responses for each experiment. As a

consequence, the need for the researcher to stipulate the weighing factors by himself is

completely bypassed. Unfortunately, although the less stringent assumptions in (4-59)

allow for a more general assessment of the experimental error than (4-49), it seems to

have not yet been encoded in a practically useful routine.

It is important to keep in mind that point values for the model parameters, obtained from

minimizing expressions (4-58) and (4-60), are in fact irrelevant in a Bayesian framework.

The true Bayesian inference is in the confidence intervals from their posterior density

function. Nevertheless, statistical software packages like Athena Visual Studio do

calculate such modal values, and use them for the approximate determination of

confidence intervals as well, as will be discussed in Section 4.3.4.1. It goes without saying

that the need for a minimization of a highly non-linear function, with potentially a

numerous amount of local minima, introduces a similar pitfall encountered in classical

non-linear regression. Only if the initial guess is chosen in the neighbourhood of the

global minimum of the objective function, the optimal solution of the estimation

procedure will be found. To overcome this issue, a new methodology, based on Monte

Carlo sampling techniques, will have to be applied. The details of these methods will be

described in Section 4.3.4.2.


4.3.4 Posterior inference on model parameters

4.3.4.1Statistical assessment by local approximation

In the introductory discussion of this chapter, attention has been paid repeatedly to the

importance of the final confidence intervals resulting from the statistical analysis when it

comes to inference about the unknown model parameters instead of point estimates. Due

to the beforehand undeterminable and often capricious behaviour of the posterior

density function, depending on the precise structure of the model, the calculation of

intervals in parameter space comprising a desired probability density is a non-trivial

task. For an optimal accuracy of the interval calculations, which accounts for all

particularities in the course of the posterior density function, a sampling routine has to be

applied which allows for a thorough scanning of parameter space.

Nevertheless, Bayesian routines in modelling software, e.g., Athena Visual Studio, often

simplify these calculations by calculating the probability intervals approximately.

Therein, the objective function 𝑆(𝜷) = −2𝑙𝑛[𝑝(𝜷|𝒚)] is linearly expanded around the

modal value �̂� of the posterior density as:

�̃�(𝜷) = 𝑺(�̂�) + (𝜷 − �̂�)𝑇�̂�𝜷𝜷(𝜷 − �̂�) (4-64)

where �̂�𝜷𝜷 =1

2{𝜕2�̃�

𝜕𝛽𝑖𝜕𝛽𝑗}𝑖=1..𝑝

𝑗=1..𝑝

. Reformulating the definition of the objective function back to

the – now modified – posterior density function yields:

𝑝(𝜷|𝒚) ∝ exp {−1

2(𝜷 − �̂�)

𝑇�̂�𝜷𝜷(𝜷 − �̂�)} (4-65)

which resembles a multivariate normal distribution with expected value �̂� and

covariance matrix [�̂�𝜷𝜷]−1

. Hence, the highest posterior (1 − 𝛼) probability density

interval for the estimated parameters reads:

�̂�𝑖 − �̂�𝛽𝑖𝒩(𝛼

2) ≤ 𝛽𝑖 ≤ �̂�𝑖 + �̂�𝛽𝑖𝒩(

𝛼

2) (4-66)

where �̂�𝛽𝑖 = {[�̂�𝜷𝜷]

−1}𝑖𝑖, 𝑖 = 1. . 𝑝 and the factor 𝒩(

𝛼

2) denotes the 1 −

𝛼

2 percentile of the

standard normal distribution. The highest probability confidence intervals following

from the approximations are hence symmetrical around the point estimates, and this

applies for each model parameter.

4.3.4.2Sampling schemes for Bayesian estimation

Since this approach explicitly approximates the unknown posterior density distribution

and forces it into a multivariate normal frame, a lot of information about the true

distribution is lost. Therefore, to explore the posterior density function into its finest

details, a Monte Carlo sampling procedure will have to be followed.

As has already been noted above, the core of the Bayesian approach to parameter

estimation is the posterior density function 𝑝(𝜷|𝒚), since it comprises all information


available from both prior beliefs in the model parameters and inference from the

experimental results. All statistical analyses, including the calculation of confidence

intervals and correlational structures requires calculations like:

𝐸[𝑓(𝜷)] = ∫𝑓(𝜷) 𝑝(𝜷|𝒚)𝑑𝜷 =∫𝑓(𝜷) 𝑝(𝒚|𝜷)𝑝(𝜷)𝑑𝜷

∫𝑝(𝒚|𝜷)𝑝(𝜷)𝑑𝜷 (4-67)

where the function 𝑓(𝜷) depends on the characteristic of the posterior density to be

assessed, e.g., 𝑓(𝜷) = 𝜷 for the expected value 𝐸(𝜷) or 𝑓(𝜷) = [𝜷 − 𝐸(𝜷)]2 to determine

the variances on the different parameter variances.

Moreover, the confidence intervals showing where the model parameters are located

with a certain, user-defined probability are of particular interest when quantifying the

quality of the estimation procedure. To obtain such intervals for each separate parameter,

the joint posterior density, describing the statistics of all parameters simultaneously, is

integrated over parameter space to get the marginal posterior distribution 𝑝(𝛽𝑖|𝒚)

obeying:

𝑝(𝛽𝑖|𝒚) = ∫𝑝(𝜷|𝒚)𝑑𝛽1. . 𝑑𝛽𝑖−1𝑑𝛽𝑖+1. . 𝑑𝛽𝑝 (4-68)

The calculation of the integrals appearing in equations (4-67) and (4-69) is a non-trivial

task, and their problematic computation has long been an impediment for fully

quantitative results and has hence blocked the breakthrough of Bayesian methods in the

field of parameter estimation for a long time.

Indeed, the classical approach to approximate the posterior density function by placing a

discrete grid over parameter space and calculate the posterior density at the nodes is

associated with some important drawbacks. At first, before the actual discretization is

performed, the relevant zone where the grid is spanned has to be specified.

Consequently, all regions outside this zone will escape from analysis and all

particularities in the posterior’s behavior will not be unraveled. Hence, sufficient prior

knowledge about the location of the most likely values for the kinetic parameters is

required. Secondly, the lack of knowledge about the behavior of the posterior makes it

hard, if not impossible, to specify an optimal resolution and get a sufficiently detailed

idea about the posterior’s behavior. Connected to this resolution issue, the number of

required numerical operations varies exponentially with the number of model

parameters. Hence, even for a limited number of parameters, the computational load of

the discretization scheme will strongly mount up for a decreasing cell width.

Fortunately, the development of a class of powerful Markov Chain Monte Carlo (MCMC)

algorithms allowing for a computationally efficient approximation of any probability

distribution via a sampling procedure has broadened the scope of Bayesian approaches,

which has resulted in growing interest for the application of Bayesian routines in kinetic


modelling [19, 28]. In contrast to classical discretization procedures, these sampling

methods scan parameter space automatically for the zones with considerable posterior

probability density and are intrinsically apt to locate and elucidate those regions where

this density is the highest.

All relevant methods for handling the calculations similar to equation (4-67) are relying

on a Monte Carlo implementation. In an attempt to get around the explicit integration of

the unknown posterior probability function, the expected value for a function 𝑓 of the

model parameters is approximated by drawing a number of random samples {𝜷𝑖}

from 𝑝(𝜷|𝒚), and calculating:

𝐸(𝑓(𝜷)) =∫𝑓(𝜷) 𝑝(𝒚|𝜷)𝑝(𝜷)𝑑𝜷

∫𝑝(𝒚|𝜷)𝑝(𝜷)𝑑𝜷≅1

𝑛∑𝑓(𝜷𝑖)

𝑛

𝑖=1

(4-69)

Hence, the expected value for any function of the model parameters is estimated by the

mean value for the different samples. According to the law of great numbers, the

approximation will hold more strongly for an increasing value for 𝑛 on the condition that

the samples are taken independently [29]. Unfortunately, it is in general not

straightforward to draw independent samples of the model parameters from a

potentially highly complex probability distribution.

The construction of a Markov chain to control the sampling was found to be extremely

useful to overcome this barrier. Thereby, the sample 𝜷𝑖 to be drawn at step 𝑖 is taken

from a distribution 𝑃(𝜷𝑖|𝜷𝑖−1) which depends only the foregoing sample 𝜷𝑖−1. Hence,

the probability at which the current sample is taken is only influenced by the very near

history of the chain, i.e., earlier samples do not come into play. Nevertheless, the value of

the sample 𝜷𝑖 will depend on the choice of the starting point 𝜷0, say according to a

distribution 𝑄𝑖(𝜷𝑖|𝜷0) which explicitly depends on the time step 𝑖. Indeed, as more

samples are drawn, the ‘distance’ between the starting point and the final sample will

increase, and it is intuitively clear that this will change their mutual relation.

Surprisingly, for increasing 𝑖 𝑄𝑖(𝜷𝑖|𝜷0) tends to converge to a stationary distribution 𝜙(𝜷)

which does no longer depend on the starting point 𝜷0. The trick that forms the bridge to

a sampling technique for the posterior density function is to design the Markov chain in

such a way that its stationary distribution equals the posterior, hence 𝑄𝑖(𝜷𝑖|𝜷0) →

𝜙(𝜷) ≡ 𝑝(𝜷|𝒚). This way, the more samples are taken, the more their distribution

resembles the desired posterior density function. The number of samples 𝑚 needed for

stabilization of the probability function is called the burn-in time.

The construction of a Markov Chain with a stationary distribution which equals the

posterior distribution is typically carried out by the Metropolis-Hastings algorithm,

named after its discoverers [30, 31]. First, a starting value 𝜷0 is chosen to initialize the

procedure. Then, during the 𝑖’th cycle, a candidate parameter set 𝑩 is sampled randomly


from a proposal distribution 𝑞(𝜷|𝜷𝑖−1) which solely depends on the previous sample

value 𝜷𝑖−1. This proposal has to be specified by the researcher and may be a fixed as well

as an updating function of the model parameters and the parameter values of the

previous step, 𝜷𝑖−1, solely. If the proposal distribution neither depends on the current

point, the algorithm is called independent. When the proposal distribution is symmetrical,

i.e. 𝑞(𝜷|𝜷𝑖−1) = 𝑞(𝜷𝑖−1|𝜷), the technique is called Metropolis sampling. This is the case

when choosing a multivariate normal proposal.

Based on the candidate 𝑩, the acceptance ratio 𝐴 is determined as follows:

𝐴(𝑩,𝜷𝑖−1) = min(1,𝑝(𝑩|𝒚)

𝑝(𝜷𝑖−1|𝒚)

𝑞(𝜷𝑖−1|𝑩)

𝑞(𝑩|𝜷𝑖−1)) (4-70)

When it holds that

𝐴 ≥ 𝑈 (4-71)

where 𝑈 is a randomly generated number from [0,1], 𝑩 is accepted, and hence 𝜷𝑖 = 𝑩;

otherwise, 𝜷𝑖 = 𝜷𝑖−1. It is important to remark that the unknown normalization factor in

the statement of the posterior cancels out since only ratios of the density functions appear

in equation (4-71). Hence, given that the exact form of the posterior has to be known up

to a scaling factor constant in the model parameters, this technique is extremely suited

for this particular problem.

As was discussed above, the distribution of the samples will gradually resemble the

unknown posterior density function while running the routine for a certain,

predetermined number of cycles 𝑛, as depicted in Figure 4-4 for an illustrative example,

not specifically encountered during parameter estimation. All inference about statistical

characteristics of the posterior density function 𝑝(𝜷|𝒚) is now equally accessible from the

distribution of the Markov chain samples. It goes without saying that all samples taking

during the burn-in time are useless and therefore have to be discarded from the statistical

analysis.

Because the explicit calculation of the posterior density function throughout the relevant

ranges in parameter space allows for the elucidation of all particular irregularities in its

course, the conclusions drawn based on the sampled distribution will be more accurate

than those relying on normal approximations as mentioned in section 4.3.4.1.


Figure 4-4 Illustration of the convergence of a MCMC sampling routine towards

an unknown probability distribution (full line) [32]

The proposal distribution 𝑞(. |. ) appearing in the calculation of the acceptance ratio has

to be specified beforehand. Although the distribution of the Markov chain will converge

to 𝑝(𝜷|𝒚) regardless of its exact form, choosing the proposal function wisely, i.e. to show

an as high as possible resemblance with the unknown posterior distribution, will

strongly enhance the rate of convergence; moreover, to make calculations not

unnecessarily complex, it is advisory to take a distribution which allows for easy

sampling and evaluation. Because large-sample analysis stipulates that the posterior

probability density approaches the multivariate normal distribution for an increasing

number of samples [21], the proposal distribution is often chosen as:

𝑞(𝜷|𝜷𝑖−1) = 𝓝(𝜷|𝜷𝑖−1, 𝜎2𝑰𝑝) (4-72)

i.e. a multivariate normal distribution centered at the foregoing sample 𝜷𝑖−1 with an

uncertainty expressed by 𝜎2. The user-defined value for this variance is crucial for the

quality of the MCMC sampler, as is clearly demonstrated in Figure 4-5. Proposals that are

too narrow will cause the routine to get stuck at certain values during the iterations,

instead of scanning parameter space for regions with higher posterior probabilities. On

the other hand, in case of an excessively wide proposal, the high acceptance rate of the

candidate samples will undermine the convergence of the sampled distribution to the

targeted posterior density, a behaviour often referred to as bad mixing [33]. When

perfectly tuned, convergence to the target distribution is achieved fast, resulting a

regularly zigzagging trace plot, showed in the lower half of the figure. This plot gives the

evolution of the sampled value throughout the iteration and clearly shows the fast

convergence of the chain combined with an intense scanning of along the parameter axis.


Figure 4-5 MCMC sampling from a one-dimensional target distribution for different

variances of the normal proposal distribution: 0.05 (left), 1 (middle) and 100 (right)

Top: actual distribution (red line) and sample histogram (blocks)

Bottom: trace-plot of the sampled parameter value as function of the iteration number

4.3.5 Including insights by an informative prior function

Instead of choosing the default uniform prior which includes no information on the

model parameters, it is proposed to follow a procedure often used in-house for ordinary

least squares minimization of chemical kinetic models. Whereas the technique aims in

fact at obtaining suitable initial guesses of the kinetic parameters to start the

minimization routine, the results will be useful in a Bayesian approach as well by

allowing for the construction of an informative prior distribution.

Specifically applied to the estimation of kinetic parameters for chemical reactions, the

technique consists of a grouping of the experimental data for each temperature being

studied. By performing a nonlinear regression on each of these data sets, for each

temperature a preliminary estimate is obtained for the rate coefficients associated with

the different reactions in the reaction mechanism. Assuming an Arrhenius dependence

β1 β1 β1

Iteration Iteration Iteration

𝑝(𝛽1|𝒚)

𝛽1


on temperature, regression of the values for the rate coefficients with respect to the

different temperatures yields estimates �̃�𝐼𝑅 = {�̃�𝐼𝑅𝑖 }

𝑖=1..𝑝 of the corresponding pre-

exponential factors and activation energies. In a standard nonlinear regression routine,

these values will be set as the initial guesses for the iterative optimization algorithm.

For a Bayesian analysis, these values may serve as the modes of the prior beliefs on the

model parameters. When determining a ‘variance’ term for each of the parameter,

reflecting the uncertainty on their estimated values, a preliminary covariance matrix

�̃�𝐼𝑅 of the model parameters is calculated. This allows in turn for expressing the prior

information by means of a multivariate normal distribution:

�̃�(𝜷|𝑦) =1

√2𝜋|�̃�𝐼𝑅|

𝑒𝑥𝑝 {−1

2[𝜷 − �̃�𝐼𝑅]

𝑇�̃�𝐼𝑅−1[𝜷 − �̃�𝐼𝑅]}

∝ 𝑒𝑥𝑝 {−1

2[𝜷 − �̃�𝐼𝑅]

𝑇�̃�𝐼𝑅−1[𝜷 − �̃�𝐼𝑅]}

(4-73)

where 𝑝 represents the number of model parameters to be estimated and 𝑛𝑇 denotes the

number of different temperatures studied.

Following the same reasoning as before, an alternative posterior density function for the

model parameters is determined as:

�̃�(𝜷|𝒚) ∝∏| 𝒗(𝑖)(𝜷)|−(𝑚+2) 2⁄

𝑛

𝑖=1

exp {−1

2[𝜷 − �̃�𝐼𝑅]

𝑇�̃�𝐼𝑅−1[𝜷 − �̃�𝐼𝑅]} (4-74)

The vector �̂� that maximizes this probability will serve as a new point estimation of the

unknown model parameters. Given the difference in the function statement of the

posterior density distribution, it is expected that the predictions acquired by

implementation of this method will be different from those obtained with the non-

informative prior.


4.4 References

1. Seber, G.A.F. and C.J. Wild, Nonlinear Regression. 2003: Wiley.

2. Pritchard, D.J., J. Downie, and D.W. Bacon, Further Consideration of Heteroscedasticity in

Fitting Kinetic Models. Technometrics, 1977. 19(3): p. 227-236.

3. Maria, G., A review of algorithms and trends in kinetic model identification for chemical and

biochemical systems. Vol. 18. 2004, Zagreb, CROATIE: Croatian Society of Chemical

Engineers. 28.

4. Carroll, R.J. and D. Ruppert, Transformation and Weighting in Regression. 1988: Taylor

& Francis.

5. Rawlings, J.O., S.G. Pantula, and D.A. Dickey, Applied Regression Analysis: A Research

Tool. 1998: Springer.

6. Box, G.E.P. and W.J. Hill, Correcting Inhomogeneity of Variance with Power

Transformation Weighting. Technometrics, 1974. 16(3): p. 385-389.

7. Carroll, R.J. and D. Ruppert, Diagnostics and Robust Estimation When Transforming the

Regression Model and the Response. Technometrics, 1987. 29(3): p. 287-299.

8. Carroll, R.J. and D. Ruppert, Power Transformations when Fitting Theoretical Models to

Data. Journal of the American Statistical Association, 1984. 79(386): p. 321-328.

9. Beal, S.L. and L.B. Sheiner, Heteroscedastic Nonlinear Regression. Technometrics, 1988.

30(3): p. 327-338.

10. Rousseeuw, P.J. and A.M. Leroy, Robust Regression and Outlier Detection. 2005: Wiley.

11. Barnett, V. and T. Lewis, Outliers in Statistical Data. 1994: Wiley.

12. Hampel, F.R., et al., Robust Statistics: The Approach Based on Influence Functions. 2011:

Wiley.

13. Motulsky, H.J. and R.E. Brown Detecting outliers when fitting data with nonlinear

regression - a new method based on robust nonlinear regression and the false discovery rate.

BMC bioinformatics, 2006. 7, 123 DOI: 10.1186/1471-2105-7-123.

14. Wooldridge, J., Introductory Econometrics: A Modern Approach. 2008: Cengage

Learning.

15. Durbin, J. and G.S. Watson, Testing for Serial Correlation in Least Squares Regression: I.

Biometrika, 1950. 37(3/4): p. 409-428.

16. Durbin, J. and G.S. Watson, Testing for Serial Correlation in Least Squares Regression. II.

Biometrika, 1951. 38(1/2): p. 159-177.

17. Roelant, R., Mathematical determination of reaction networks from transient kinetic

experiments. 2011.

18. Samaniego, F.J., A Comparison of the Bayesian and Frequentist Approaches to Estimation.

2010: Springer New York.

19. Hsu, S.-H., et al., Bayesian Framework for Building Kinetic Models of Catalytic Systems.

Industrial & Engineering Chemistry Research, 2009. 48(10): p. 4768-4790.

20. Jeffreys, H., The Theory of Probability. 1998: OUP Oxford.

21. Gelman, A., et al., Bayesian Data Analysis, Second Edition. 2003: Taylor & Francis.

22. Qian, S.S., C.A. Stow, and M.E. Borsuk, On Monte Carlo methods for Bayesian inference.

Ecological Modelling, 2003. 159(2–3): p. 269-277.

23. BOX, G.E.P. and N.R. DRAPER, The Bayesian estimation of common parameters from

several responses. Biometrika, 1965. 52(3-4): p. 355-365.


24. Box, M.J. and N.R. Draper, Estimation and Design Criteria for Multiresponse Non-Linear

Models with Non-Homogeneous Variance. Journal of the Royal Statistical Society. Series

C (Applied Statistics), 1972. 21(1): p. 13-24.

25. Stewart, W.E., Multiresponse Parameter Estimation with a New and Noninformative Prior.

Biometrika, 1987. 74(3): p. 557-562.

26. Stewart, W.E. and J.P. Sørensen, Bayesian Estimation of Common Parameters from

Multiresponse Data with Missing Observations. Technometrics, 1981. 23(2): p. 131-141.

27. Stewart, W.E. and M. Caracotsios, Computer-Aided Modeling of Reactive Systems. 2008:

Wiley.

28. Galagali, N. and Y.M. Marzouk, Bayesian inference of chemical kinetic models from

proposed reactions. Chemical Engineering Science, 2015. 123(0): p. 170-190.

29. Gilks, W.R., S. Richardson, and D. Spiegelhalter, Markov Chain Monte Carlo in Practice.

1995: Taylor & Francis.

30. Metropolis, N., et al., Equation of State Calculations by Fast Computing Machines. The

Journal of Chemical Physics, 1953. 21(6): p. 1087-1092.

31. Hastings, W.K., Monte Carlo Sampling Methods Using Markov Chains and Their

Applications. Biometrika, 1970. 57(1): p. 97-109.

32. Andrieu, C., et al., An Introduction to MCMC for Machine Learning. Machine Learning,

2003. 50(1-2): p. 5-43.

33. Haario, H., E. Saksman, and J. Tamminen, Adaptive proposal distribution for random

walk Metropolis algorithm. Computational Statistics, 1999. 14(3): p. 375-395.

Benchmark analysis of alternative parameter estimation techniques 91

Chapter 5

Benchmark analysis of

alternative parameter estimation

techniques

In the upcoming section, the alternative techniques towards the estimation of unknown

model parameters that were suggested in Chapter 4 will be evaluated on their potential to be

a reliable, well-performing challenger of currently applied, classical nonlinear regression

theory. Routines were programmed to correct ordinary regression for heteroscedasticity and

for serial correlation and implement the Bayesian approach combined with MCMC

sampling. At first, a single response linear model was considered as a candidate case to

assess their performance. Indeed, for linear models ordinary regression methods are

straightforward to apply, which allows in turn for a relatively simple comparison to the

outcome of the alternative techniques. Specifically for the Bayesian procedures, the scope

was extended to simple nonlinear models as well. As will be discussed below more

extensively, this introduced the need for more refined sampling methods compared to those

described in Chapter 4.

All routines were coded in Matlab version 8.4 and can be found in the appendix. A set of

𝑛 = 20 experimental data is randomly generated prior to the run of each script in accordance

to the assumption of a normally distributed, zero-mean error. For a single response model,

this boils down to:

𝒚 ~ 𝒩(𝒚𝒄𝒂𝒍𝒄, 𝑽) (5-1)

where 𝒚𝒄𝒂𝒍𝒄 is the exact response value corresponding to the independent variable 𝑥, i.e. for a

simple linear model:

𝒚𝒄𝒂𝒍𝒄 = 𝐴𝑥 + 𝐵 (5-2)

and 𝑽 a user-specified variance matrix of the ‘experimental’ error. As will follow from the

discussions below, the actual form of this matrix will vary throughout the benchmark

analysis, depending on which particular candidate method will be evaluated. The values for

the model parameters 𝜷 = [𝐴, 𝐵] were chosen beforehand and were generated randomly 5

and 1, respectively. The simulated experimental data set in (5-1) then serves as the basis for

92 Benchmark analysis of alternative parameter estimation techniques

the estimation of these values by the suggested techniques. Their performance and accuracy

will then be assessed by a comparison to the results from classical linear regression.

For ordinary linear least squares estimation the point estimates 𝒃 = [�̂�, �̂�] for the unknown

model parameters 𝜷 are given by:

𝒃 = (𝑿𝑇𝑿)−1𝑿𝑇𝒚 (5-3)

when assuming a normally distributed, uncorrelated and homoscedastic experimental error

[1]. In this case, the 𝛼 percent confidence intervals on these parameter estimates are given by:

𝑏𝑖 − 𝑉𝒃,𝒊𝒊𝑡 (1 −𝛼

2, 𝑛 − 𝑝) ≤ 𝛽𝑖 ≤ 𝑏𝑖 + 𝑉𝒃,𝒊𝒊𝑡 (1 −

𝛼

2, 𝑛 − 𝑝) , 𝑖 = 1,2 (5-4)

with 𝑿𝑇 = [𝑥1 ⋯ 𝑥𝑛

1 ⋯ 1], 𝑆(𝒃) = ∑ (𝑦𝑖

𝑛𝑖=1 − �̂�𝑥𝑖 − �̂�) and the number of model parameters

𝑝 = 2. An unbiased estimator for the in general unknown covariance matrix of the

parameter estimates 𝑽𝒃 is given by:

�̂�(𝒃) = (𝑿𝑇𝑿)−1𝑠2 = (𝑿𝑇𝑿)−1𝑆(𝒃)

𝑛 − 𝑝 (5-5)

5.1 Data-based weighted regression

In Section 4.2, two slightly different techniques to correct for a non-constant variance of the

experimental error were discussed, i.e. the iterative Power-Transform-Both-Sides method

and the direct maximization of the joint posterior density obtained from a Bayesian

approached. As has been mentioned in the literature survey, the difference in performance of

both techniques was reported to be minimal. Therefore, given that Matlab offers some built-

in optimization routines, the second method was chosen to be implemented. Based on its

performance, the need to properly account for heteroscedasticity will be evaluated. The error

covariance matrix 𝑽 was implemented as a diagonal matrix, with elements given by:

𝑉𝑖𝑖 = 𝜎2|𝑦𝑐𝑎𝑙𝑐,𝑖𝑖|2

(5-6)

i.e., the uncertainty on the experimental error is set proportional to the actual magnitude of

the corresponding response, with the true ‘homoscedastic’ error variance 𝜎2 as a scaling

factor. Hence, if the weighted regression is performed, the optimal value for the

transformation parameter 𝜙 will ideally be around 0. The value for the scaling factor is free

to choose and will be varied to assess the robustness of the presented technique concerning

the quality of the experimental observations. Indeed, the higher 𝜎2, the more scattered the

generated data set will be around the true, model based response value.

After the experimental data set was simulated, both an ordinary and a weighted regression

were performed. As was mentioned above, the latter requires the use of available local

optimization routines, which in turn needs the definition of an initial guess, both on the


Figure 5-1 Estimates for the model parameters A (left) and B (right) for different values of the

scaling factor 𝜎2 for 10 subsequent runs. Filled symbols denote the results of the weighted

regression, open markers correspond to classical least squares estimation. The dotted line

corresponds to the true parameter values. Remark the varying scaling of the vertical axes.

4.9

5

5.1

0 5 10

0.9

1

1.1

0 5 10

4

5

6

0 5 10

0

1

2

0 5 10

3

5

7

0 5 10Run number

-2

1

4

0 5 10Run number

𝜎 = 0.1

𝜎 = 1

𝜎 = 0.01

�̂�

�̂�

�̂�

�̂�

�̂�

�̂�


model parameters and the transformation parameter. For this purpose, the optimal

parameter estimates obtained from the ordinary regression were used, together with a

starting value of 1 for 𝜙, which corresponds to unweighted regression. The routine was then

run for 10 times for 𝜎 equal to 0.01, 0.1 and 1. The confidence intervals on the point estimates

of the model and transformation parameters are calculated approximately by application of

(5-4) in combination with the adapted covariance matrix of the parameter estimates, as

introduced in equation (4-11). To keep the clarity of Figure 5-1, these intervals are not shown

explicitly. Nevertheless, it was found that the results for the weighted regression were

almost equally broad than those for ordinary least squares. Hence, from a purely statistical

point of view, both techniques yield estimates which are equally informative.

Figure 5-1 depicts the results of this procedure. Inspection of the results shows that the

parameter estimates obtained by the automated weighing procedure are in almost all

situations considerably closer to the true values. The introduction of heteroscedasticity does

have a negative impact on the performance of ordinary least squares regression, which is

especially clear for the estimations of the intercept 𝐵, showing deviations higher than 5% for

multiple runs. In contrast, the estimates from the weighted estimation stay in the vicinity of

the true parameter values for all runs with low and all but one situations for mild values

of 𝜎, i.e. at superior and intermediate quality of the experimental data. Logically, for the

highly uncertain observations with 𝜎 = 1, the performance of both the classical and adapted

methodology decreases drastically yielding an output that has a highly variable accuracy for

different runs. Still, in most cases, the point estimates from the weighted regression still

outperform those obtained by the classical approach.

The point estimates for the transformation parameter 𝜙 are shown in Figure 5-2. Ideally, its

estimated value has to approach 0, since the optimal weighing factors are inversely

proportional to the squared model predictions for the error covariance matrix as in (5-6). It is

seen that all values, i.e. for all runs at each considered value for the scaling factor 𝜎, are

estimated different from 1, which corresponds to ordinary least squared. However, it is

noticed that for multiple situations and irrespectively of the scaling factor, the outcome of the

estimation was strongly fluctuating around 0, being in the close vicinity of the true value in

only about half of the studied cases. This is truly remarkable, given that the inference on the

model parameters was in fact quite satisfying. Obviously, the performance gains of the data-

based weighing routines is hence somewhat indifferent to how well the true value of 𝜙 is

estimated. Only for 𝜎 = 1, a trend is seen between the accuracy of the estimates on 𝜷 and 𝜙,

as the closer the latter approaches 0, the less badly the quality of the model parameter

estimates becomes.


Figure 5-2 Optimal values for the transformation parameter for all three considered scaling

factors and for all runs

The weighted residuals {𝑒𝑤,𝑖}, 𝑖 = 1. . 𝑛, corresponding to the last run at 𝜎 = 0.01 are plotted

in Figure 5-3 for both the weighted and the ordinary regression. Given the point

estimates [�̂�, �̂�], these are given by:

𝑒𝑤,𝑖 = √𝑤(�̂�, �̂�) ∙ 𝑒𝑖 = √𝑤(�̂�, �̂�) ∙ [𝑦𝑖 − �̂�𝑥𝑖 − �̂�] (5-7)

Logically, the weight factors all equal 1 for the unweighted regression. While a clearly

diverging scatter is noticed for the classical approach, even at the smallest considered value

for the scaling factor, the residuals for the weighted have been stabilized remarkably,

removing all trends and resulting in a bounded scatter around 0.

Based on this rudimentary comparison and despite the limited scope of the analysis to

simple cases only, some trends have been emerging. It is clear that an unallowable neglect of

the heteroscedasticity will have a strongly negative impact on the accuracy and quality of the

inference on unknown model parameters by the analysis of experimental data. Accounting

explicitly for a non-constant variance of the experimental error during the regression was

shown to yield considerable gains in the accuracy of the parameter estimates, on the

condition that the overall quality of the observations is sufficient and that the uncertainty on

a certain response is proportional to its magnitude.

-1

0

1

0 2 4 6 8 10

Experiment

σ = 1

σ = 0.1

σ = 0.01

�̂�


Figure 5-3 Weighted residuals for the ordinary (open symbols) and weighted least squares

estimation (filled markers)

Nevertheless, since weighted regression still relies on the minimization of a residual sum of

squares, it does not solve the issue of getting stuck in a local rather than the global minimum

of the objective function, yielding incorrect point estimates for the model parameters

especially in case of nonlinear models. Hence, although weighted regression will properly

correct for heteroscedasticity and therefore outperform ordinary least squares estimation, the

reliability of its outcome is still not ensured.

5.2 Explicit modelling of serial correlation in the data

A similar procedure was followed to evaluate the added value of correcting for serial

correlation between subsequent experiments, a phenomenon that is believed to be relevant in

particular when batch reactor setups are involved.

A set of 20 experimental data were randomly generated for the simple single-response linear

model introduced above. The covariance matrix 𝑽 was implemented in order to obey

an 𝐴𝑅(1) time dependence of the experimental errors, and hence reads:

𝑽 = {𝑉𝑖𝑗} = 𝜎𝟐𝜌|𝑖−𝑗| (5-8)

with 𝜌 is the tunable autocorrelation function and 𝜎𝟐 once more the homoscedastic

experimental error and free to choose. In what follows, its value will remain fixed at 0.1.

The Two-Stage Iterative method as discussed in Chapter 4, which relies on an 𝐴𝑅(1)

modelling of the experimental error, was implemented in an attempt to unravel this

correlation from the scattered data points and properly correct for it. Keeping in mind that

the value of the autocorrelation is bounded between -1 and 1 to be physically relevant, the

-0.4

0

0.4

-5 5

x

𝑒𝑤


performance of the code will be evaluated for 𝜌 equal to -0.99, 0.1 and 0.99, which

corresponds to strongly negative and mildly and strongly positive correlation, respectively.

The results from running the code ten times for each of the proposed autocorrelations are

given in Figure 5-4. The Durbin-Watson test criterion to detect considerable serial correlation

was calculated for each of the runs, and yielded values between -0.955 and 0.6986. Keeping

in mind that the null hypothesis on zero serial correlation for 20 observations and 2 model

parameters is rejected with 95% certainty when this test criterion is lower than 1.10, the test

clearly points at a non-negligible mutual dependence of the experimental data set.

In contrast to the correction for heteroscedasticity of the experimental error that was

discussed in the foregoing section and showed a considerable gain in the accuracy of the

point values of the parameter estimates, accounting for serial correlation yields less distinct

results.

For highly positive correlation, the performance of the correction method is somewhat

capricious. For about one third of the runs, the estimated point values of the slope 𝐴 via the

adapted regression are worse than those obtained by classical regression, while in 4 other

cases the difference in the estimates is negligibly small. The same applies to the results for

the parameter 𝐵, for which no improvement is remarked by applying the alternative routine.

The quality of the estimates is very low for both techniques, yielding values which are of by

more than 100% for multiple experimental data sets.

At milder positive correlation, the theoretical difference between both techniques starts to

diminish. Studying the corresponding graphs indeed reveals that the difference between the

parameter estimates for both procedures is almost absent. It is a remarkable observation that

apparently, for such small autocorrelations, the impact of the presence of time propagation

of the experimental error is small enough to not be detected by the correction mechanism.

This is slightly different for the regression of the strongly negatively correlated data set.

Primarily for the slope parameter, the parameter estimates from the adapted procedure

approach the actual model parameters considerably better. The picture is more nuanced as

regards the second parameter, showing slight differences between the estimates from both

methods. In about half of the simulations, the classical regression outperformed the adapted

technique, an additional sign that the overall performance of this methodology to account for

serial correlation is at least doubtful.


Figure 5-4 Estimates for the model parameters A (left) and B (right) for different values of the

autocorrelation 𝝆 as function of the run number. Filled symbols denote the results of the Two

Stage Iterative regression, open markers correspond to classical least squares estimation. The

dotted line corresponds to the true parameter values. Remark the varying scaling of the

vertical axes.

4.9

5

5.1

0 5 10

-1

0

1

2

3

0 5 10

4.5

5

5.5

0 5 10

0

1

2

0 5 10

4.9

5

5.1

0 5 10

0.9

1

1.1

0 5 10

𝜌 = 0.99

𝜌 = 0.10

𝜌 = −0.99


Part of the explanation for the bad performance of the modified regression, especially at

highly positive correlation, follows from inspection of Figure 5-5. Obviously, the routine was

not capable to accurately retrieve the actual autocorrelation in none of the runs. As this , now

wrongly estimated, value explicitly occurs in the objective function from which the optimal

values for the model parameters are determined, it is not surprising that the quality of the

resulting parameter estimates is only moderate. On the other hand, for the two other studied

autocorrelation factors, the routine was able to retrieve the specified value quite accurately

for all cases. Not truly surprisingly, this corresponds to considerably better estimates for the

model parameters as well.

Figure 5-5 Point estimates for the autocorrelation values for all runs as tinted markers with

the corresponding, true values given by the dotted lines

Although the correction for serial correlation has only a limited effect on the final parameter

estimates, a closer look at the lag plots indeed reveals its detrending effect. Figure 5-6 shows

both the corrected residuals and the lag plot for the last run for 𝜌 = −0.99. The plotted

corrected residuals are calculated as:

𝑒𝑐𝑜𝑟𝑟,𝑖 = 𝑒𝑖 − �̂�𝑒𝑖−1 (5-9)

where �̂� equals 0 for ordinary regression, and was estimated at -0.9507 for the modified

procedure. It is noticed that the introduction of a strong negative correlation between

subsequent experiments causes the sign of the associated residuals to alternate, although

their magnitude is bounded between 0 and 0.15. This regular flipping behavior is clearly

removed for the adapted regression scheme, while the variance of the residuals is

remarkably smaller as well. This demonstrates, at least in this case, that the alternative

procedure is capable of accounting correctly for the contribution of the foregoing response to

-2

-1

0

1

2

0 2 4 6 8 10

ρ

Experiment

ρ = 0.99

ρ = 0.1

ρ = - 0.99


its successor. The same stabilization is noticed in the lag plot, giving the residuals as function

of its predecessor. To guide the eye, the line 𝑒𝑐𝑜𝑟𝑟,𝑖 = −0.99𝑒𝑐𝑜𝑟𝑟,𝑖−1 was drawn, and is

readily seen that the residuals from the classical regression are distinctly scattered around it.

On the other hand, the corrected residuals from the modified regression are, besides smaller

in magnitude, also spread out more randomly. Hence, the latter resemble the behavior of

truly uncorrelated errors more closely.

Figure 5-6 Residual (left) and lag plot (right) for ordinary (open symbols) and corrected

regression (filled markers) for a run with 𝝆 having a pre-specified value of -0.99. The solid

line in the lag plot is given by 𝒆𝒄𝒐𝒓𝒓,𝒊 = −𝟎. 𝟗𝟗𝒆𝒄𝒐𝒓𝒓,𝒊−𝟏

Based on this analysis, the performance of the studied procedure to account for serial

correlation of the experimental data is concluded to be prone to considerable fluctuations, as

well as highly dependent on the actual degree of autocorrelation. Nevertheless, when the

method succeeded in elucidating the actual autocorrelation quite accurately, the resulting

parameter estimates were often better than those obtained from ordinary regression.

Moreover, in those situations the stabilizing effect of the method on the behavior of the

residuals was clearly demonstrated.

Given that the method’s stability is not assured even for simple models, it is questionable

whether its application for complex, highly nonlinear problems will prove to be

considerable.

5.3 Bayesian estimation by MCMC posterior sampling

As has already been pointed out in the second chapter, Bayesian procedures start from a

different view on the estimation of model parameters compared to classical regression. In

contrast to the methods that were evaluated above, which boiled down to tuning regression

analysis to make it robust in ‘problematic’ situations, the successful implementation of a

Bayesian routine will generate an alternative pathway to the point of capturing the inference

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

-5 0 5

ecorr

x -0.15

-0.1

-0.05

0

0.05

0.1

0.15

-0.15 -0.05 0.05 0.15

ecorr,i

ecorr,i-1


entrained by finite experimental data sets in accurate and translating it into useful and

accurate findings on unknown model parameters. In what follows, the number of model

parameters will be denoted as 𝑝.

The Matlab implementation relied on the theoretical findings as exposed on Bayesian

estimation in Section 4.4. This way, it combines the insights from the most general analytical

expression for the posterior density function as given in (4.63) , which fully captures the lack

of knowledge on the covariance structure of the experimental errors, with the strengths of

Markov Chain Monte Carlo schemes to scan its 𝑝-dimensional surface efficiently, and make

well-founded decisions about the model parameters.

Again, the simple linear model 𝑦 = 5𝑥 + 1 was applied to test the performance of the code,

adding a normally distributed error 휀 ~ 𝒩(0,1) to the true response value to simulate the

experimental data 𝒚 = {𝑦𝑖} set for 20 conditions {𝑥𝑖} . Since the model under study is single-

response, i.e. 𝑚 = 1, the posterior density function is given by:

𝑝(𝐴, 𝐵|𝒚) ∝ ∏ |𝑦𝑖 − (𝐴𝑥𝑖 + 𝐵)|−3

20

𝑖=1

(5-10)

with 𝐴 and 𝐵 the model parameters to be estimated. The Metropolis-Hastings algorithm as

introduced before was applied to perform the MCMC sampling of the posterior density. The

proposal distribution 𝑞 from which the samples are drawn was chosen as a bivariate normal,

centered at the foregoing sample value with constant variance:

𝑞(𝜷|𝜷𝑖−1) = 𝓝(𝜷𝑖−1, 𝑐2𝑰2) (5-11)

Choosing a suited value for 𝑐 is vital for a good sampling quality, but literature on how to

make a good guess is scarce. Therefore, the value was taken, quite randomly, at 10. The

number of samples to be drawn was set at 10000, with a burn-in time of 1000. Lastly, the

parameter values to start the Markov chain were taken at 0. Once samples for the model

parameters were collected, a two-dimensional histogram was constructed which resulted,

after normalization, in the sampled representative of the posterior density function. At last,

after summing the probabilities along each of the dimensions, and repeating this for both

parameters, the marginal sampled posterior density functions were calculated.

The results of this procedure are given in Figure 5-7. Inspection of the marginal probability

densities learns that the accuracy of the sampled distribution is rather poor: the actual value

of parameter 𝐴 is present as a small peak at 5, which is strongly outnumbered by two, almost

equally high, peaks at around 4 and 6. Moreover, the marginal distribution of parameter 𝐵 is

shows two equally distinct maxima: besides a peak at the actual value 1, a similar peak is

noticed around 0.5. Fortunately, the lacking quality of the Bayesian inference is explained by

the trace plots at the bottom of the figure, which indicate that the mixing ability of the

sampling is too low. The specified variance of the proposal distribution is apparently too

high, which results in an excessive rejection rate of the candidate samples and causes the


scheme to get ‘stuck’ at certain values during multiple iterations. Hence, multiple peaks

result, which do not necessarily correspond to parameter values with a high posterior

probability.

Figure 5-7 Results of the Metropolis-Hastings procedure for c = 10, giving the marginal

probabilities from the sampled posterior (above) and the sampled values throughout the

iteration for the model parameters A (left) and B (right)

The code was therefore run again, now for 𝑐 = 1, for which the results are given in Figure 5-

8. The marginal sampled posterior probabilities are now sharply peaked, having only one

Figure 5-8 Results of the Metropolis-Hastings procedure for c = 1, giving the marginal



0

1

2

0 5 10

p(A|y)

A

0

10

0 0.5 1 1.5

p(B|y)

B

0

2

4

6

8

0 10000

A

0

1

0 10000

B

01234

0 5 10

p(A|y)

A

0

10

0 1 2 3

p(B|y)

B

-2

0

2

4

6

8

0 10000

A

Run

-1

0

1

2

3

0 10000

B

Run


maximum located in the close vicinity of its actual value. Simultaneously, the corresponding

trace plots reveal that the mixing behaviour of the Markov chain is remarkably better when

choosing a proposal with a lower variance. Nevertheless, since the iteration still tempts to get

stuck, an even lower value for 𝑐 seems to be required for optimal performance.

Apparently, the implemented sampling scheme turned out to show very poor robustness to

a suboptimal choice for the proposal distribution. The need for a manual tuning to ensure

proper functioning of the routine is logically considered as very unattractive with respect to

the overall applicability of the code to more complex models with a higher number of

parameters to be estimated. Therefore, more advanced sampling techniques were searched in

literature, especially for those algorithms which allow for an automated, sample based

updating of the proposal density function.

A plausible candidate was found in the field of adaptive MCMC [2-4]. As the name suggests,

this routine uses a built-in mechanism to tune the proposal distribution ‘on the fly’ so that an

optimal performance of the MCMC sampling is obtained. For a bivariate normal distribution,

this boils down to updating both its mean and variance matrix for every iteration, based on

the values of preceding samples. The most general of the reported procedure, the so-called

Global Adaptive Metropolis componentwise adaptive scaling was implemented. In this

algorithm, the proposal distribution from which the candidate sample is drawn reads:

𝑞(𝜷|𝜷𝑖−1) = 𝓝(𝜷𝑖−1, 𝚲𝑖−11/2

𝚺𝑖−1𝚲𝑖−11/2

) (5-12)

a 𝑝-dimensional multivariate distribution around the foregoing sample with a variable

covariance matrix. No specifications are made about the exact form of the latter, so that non-

constant variance of the different parameters and non-zero correlations between them are

allowed. The core of this matrix consists of a marginal contribution 𝚺𝑖−1, which gives the

variance of the collected samples, and is scaled by a 𝑝 × 𝑝 diagonal matrix 𝚲𝑖−1 to tune it

specifically for each component, i.e. each model parameter. The global property reflects that

all parameters in the new sample are allowed to be updated and not one parameter at a time.

The complete algorithm is given by [4]:

1. Choose starting values 𝜷0, 𝝁0, 𝚺0 and 𝚲0

2. Repeat for each sample 𝑖

2.1 Sample the candidate 𝑩𝑖 as 𝜷𝑖−1 + 𝑿𝑖 with 𝑿𝑖 ~ 𝓝(𝜷𝑖−1, 𝚲𝑖−11/2

𝚺𝑖−1𝚲𝑖−11/2

) and 𝑈

randomly from [0,1]. If 𝑈 < 𝛼(𝑩𝑖, 𝜷𝑖−1), set 𝜷𝑖 = 𝑩𝑖; otherwise, set 𝜷𝑖 = 𝜷𝑖−1;

2.2 Update for 𝑘 = 1 … 𝑝

ln(𝜆𝑖𝑘) = ln(𝜆𝑖−1

𝑘 ) + 𝛾𝑖𝑘[𝛼(𝜷𝑖−1 + 𝑋𝑖,𝑘𝒆𝑘 , 𝜷𝑖−1) − �̅�∗∗], with 𝒆𝑘 = {𝛿𝑘𝑗}

𝑗=1..𝑝 and

𝜆𝑖𝑘 = 𝚲𝑖,𝑘𝑘

𝝁𝑖 = 𝝁𝑖−1 + 𝛾𝑖(𝜷𝑖 − 𝝁𝑖−1)

𝚺𝑖 = 𝚺𝑖−1 + 𝛾𝑖[(𝜷𝑖 − 𝝁𝑖−1)(𝜷𝑖 − 𝝁𝑖−1)𝑇 − 𝚺𝑖−1]


The vanishing parameter 𝛾 is required to stabilize the adaptations of the proposal. Indeed,

because a continuous change of the proposal distribution will potentially hinder the

convergence of the stationary distribution of the chain to the actual posterior. Therefore, it is

suggested to reduce the effect of recent samples on the properties of the proposal for an

increasing number of iterations. For simplicity, 𝛾𝑖 was set at 1 𝑖⁄ . The value for the expected

acceptance probability �̅�∗∗ was taken at 0.44 as suggested. The starting values 𝜷0 and 𝝁0

were set at 0, while 𝚺0 and 𝚲0 were taken at 𝑰2.

The main results of the run are shown in Figure 5-9. It is noticed immediately that, while the

marginal posterior densities are similar to those generated earlier, the mixing behaviour is

considerably worse. Apparently, the explicit incorporation of a permanent updating

mechanism does not yield the expected improvements in performance of the sampling

routine.

Both the initial guess on the proposal variance and the number of samples have been varied

manually to check for explanations for the lack of mixing, but without success. Hence, the

problematic convergence of the sampling scheme seems to be an intrinsic shortcoming of the

algorithm. Although adaptive MCMC sampling has been reported as a valuable technique,

its application to this particular problem showed that the performance of the algorithm has

at least some shortcomings. The literature survey was resumed which resulted in a recent

sampling technique which is reported as a promising algorithm in cases with badly scaled

proposal distribution, yet without requiring a mechanism for continuous updating [5, 6].

Figure 5-9 Results from the Adaptive Metropolis algorithm, giving the marginal probabilities

from the sampled posterior (above) and the sampled values throughout the iteration for the

model parameters A (left) and B (right)

0

10

20

0 5 10

p(A|y)

A

0

10

20

0 1 2 3

p(B|y)

B

0

2

4

6

0 10000

A

Run -5

0

5

10

0 10000

B

Run


This alternative sampling scheme introduces two new concepts compared to the previously

applied algorithms. At first, the starting point of the method is to design a Markov chain

generator which is affine invariant, i.e. the way in which candidate samples are drawn from

the posterior distribution is not influenced by any linear transformation of its variables.

Apparently, this feature causes the convergence of a sampling routine to become more

robust. Secondly, where the traditional approaches to MCMC sampling initialize one

Markov chain and follow its evolution in time, i.e. throughout the iterations in the

Metropolis algorithm, an alternative pathway is in starting from multiple seeds which are

allowed to explore parameter space in parallel, during a proportionally smaller number of

iterations. By allowing explicitly for strong interactions between the different walkers, i.e. to

make the evolution of one walker depend on the positions of the others, the moves of this

ensemble of walkers is known to adapt automatically to the targeted posterior distribution.

One particular manner to realize an affine invariant walk through parameter space is by

means of a stretch move, as represented in Figure 5-10. This way, every candidate sample for

a certain walker 𝑿𝒌 from the ensemble is obtained by using the current value of one other

walker 𝑿𝒋, 𝑗 ≠ 𝑘, to obtain the linearly interpolated value 𝒀:

𝑋𝑘(𝑡) → 𝑌(𝑡 + 1) = 𝑋𝑗(𝑡) + 𝑍 (𝑋𝑘(𝑡) − 𝑋𝑗(𝑡)) (5-13)

where 𝑍 is a scaling factor to be sampled, as will be discussed below. Which of the

complementary walkers has to be used for the interpolation has to be determined at random.

Figure 5-10 Stretch move of walker 𝑿𝒌 along the line through walker 𝑿𝒋, yielding candidate

sample 𝒀. All other walkers (grey dots) do not participate.


The complete algorithm for an ensemble consisting of 𝐾 walkers reads:

For each time step t

1. Repeat, for each walker:

1.1 Choose a walker 𝑿𝒋, 𝑗 ≠ 𝑘 at random

1.2 Draw a sample 𝑍 from the distribution 𝑔(𝑧) ∝1

√𝑧, 𝑧 ∈ [

1

2, 2]

1.3 Calculate 𝑌(𝑡) according to (5-13)

1.4 Determine 𝑞 = 𝑧𝑝−1 𝑝(𝑌(𝑡))

𝑝(𝑋𝑘(𝑡−1)) and sample 𝑟 from [0,1]

1.5 If 𝑟 ≤ min (1, 𝑞), set 𝑋𝑘(𝑡) = 𝑌(𝑡); otherwise, set 𝑋𝑘(𝑡) = 𝑋𝑘(𝑡 − 1)

Literature is not clear about which values to choose for the number of walkers and the

number of iterations to give optimal performance of the sampling scheme. One important

guideline is that a sufficiently large number of walkers is more beneficial for convergence

than a large number of time steps. It has to be remarked that this will increase the burn-in

time and hence the number of samples to be discarded [6].

In this case, the number of walkers was chosen at 100, so that 100 iterations were required to

generate the same number of samples as for the previous methods. The preliminary burn-in

time was taken at 10. The results of running the Matlab code are shown in Figure 5-11.

Again, two peaked marginal distributions are found, the one of the slope parameter being

sharper than that of the increment. In contrast to the previous sampling techniques, the trace-

plots now do show the desired mixing of the chain, which is promising concerning the

Figure 5-11 Results from the affine invariant MCMC algorithm, giving the marginal



0

1

2

-5 0 5 10

p(A|y)

A

0

0.5

1

1.5

-10 -5 0 5 10

p(B|y)

B

-5

0

5

10

0 5000 10000

A

Run -10

0

10

0 5000 10000

B

Run


quality of the samples.

All statistical properties of the posterior density function are determined from the samples,

including the uncertainty on the model parameters. The 𝛼-percent probability interval in

which the parameter values are most likely located, is defined as the smallest interval on the

parameter axis for which the surface integral of the marginal posterior density equals 𝛼. In

this case, the 95% probability intervals on 𝐴 is given by [4.5306, 5.3469], that for 𝐵

reads [0.5102, 1.7347], which clearly include the actual parameter values. Classical

regression on the simulated data set estimated the optimal parameter values at [�̂�, �̂�] =

[5.0846, 0.9966]. Application of this formula to parameters 𝐴 and 𝐵 yields 95% confidence

intervals of [5.0752, 5.0940] and [0.9101, 1.0832] respectively, which are considerably

smaller than those obtained from the Bayesian estimation.

The remarkable gain in performance of the last sampling scheme and its ability to yield

reasonable results for the test case, though very simplified, is an encouraging finding in the

search for reliable alternative routines towards parameter estimation. Naturally, more tests

will have to be performed on more complex models to truly assess the overall reliability of

the routine. Moreover, in the discussion above the quality of the sampling scheme was

evaluated from a visual inspection of the trace-plots, which can be argued to be a rather

subjective decision criterion. Unfortunately, at the moment of writing of this work no solid

and mathematically well-funded theory is available to draw incontestable conclusions on the

convergence of the chain. Nevertheless, since Bayesian estimation procedures using MCMC

sampling are a growing field of interest in statistical research, progress on its theoretical

foundation might be expected in the near future.

5.4 References

1. Thybaut, J.W., Kinetic Modeling and Simulation - University Course. 2014, Ghent

University.

2. Haario, H., E. Saksman, and J. Tamminen, An adaptive Metropolis algorithm.

Bernoulli, 2001: p. 223-242.

3. Gelfand, A.E. and S.K. Sahu, On Markov Chain Monte Carlo Acceleration. Journal of

Computational and Graphical Statistics, 1994. 3(3): p. 261-276.

4. Andrieu, C. and J. Thoms, A tutorial on adaptive MCMC. Statistics and Computing,

2008. 18(4): p. 343-373.

5. Goodman, J. and J. Weare, Ensemble samplers with affine invariance.

Communications in Applied Mathematics and Computational Science, 2010. 5(1): p.

65-80.

6. Foreman-Mackey, D., et al., emcee: The MCMC hammer. Publications of the

Astronomical Society of the Pacific, 2013. 125(925): p. 306-312.

108 Conclusions and future work

Chapter 6

Conclusions and future work

In this master thesis, the currently applied methodology to estimate unknown kinetic

parameters in chemical reaction modelling is critically reviewed. As these procedures rely on

classical regression analysis, the quality of the parameter estimates will only be guaranteed

in those conditions where the necessary theoretical conditions, primarily on the regularity of

the experimental error, are fulfilled. Unfortunately, this assumption is often too strong and

as a consequence, the performance of the estimation methodology is not ascertained.

In the first part of this work, an attempt was made to evaluate the overall performance of the

current procedures for parameter estimation. To assess their robustness, it was attempted to

evaluate whether the estimated values for kinetic parameter values were determined for data

from both batch-reactor and continuous-flow experiments. Continuous-flow experiments for

the transesterification of ethyl acetate with methanol catalyzed by the ion-exchanging resin

Lewatit K2629 were performed for varying process conditions. Batch-data based kinetic

parameters were available from recent research on this topic.

A qualitative comparison of the results of the experiments yielded the expected trends. The

conversion of the reactants was positively influenced by increasing temperature and if one of

reactants was excessively present, while higher flow rates through the reactor, and hence a

shorter contact time of the mixture with the catalyst, was found to lower their consumption.

Unfortunately, a quantitative analysis of the data showed some strong inconsistencies of the

results, as the observed concentrations of the ethanol and methyl acetate did not obey to the

required reaction stoichiometry. The explanation for this imbalance was searched in an

improper function of the reactor setup, yet up to the moment of writing the true reason as

not yet been identified. An attempt to fit the experimental observations with the reported

kinetic model revealed strong deviations. Therefore, the evaluation of the robustness did not

succeed and will have to be retaken, if desired, in the future.

Apart from its robustness, the current statistical methodology was tested on its applicability

to physical systems as well. For this purpose, the modelling of the electrical behavior of thin-

layer solar cells was chosen as it is a growing field of interest in the development of new

photovoltaic technology. Current-voltage experiments have been performed on two copper-

indium-gallium-selenium (CIGS) type solar cells which were cut out of the same mother

panel, at ambient to heavily cooled temperatures. Based on these data, the parameters of the

most extended model in literature to data were successfully simulated, yielding both

physically relevant and statistically significant estimates. Based on these values, a current

path analysis was made for both cells that revealed and quantified the contribution of

Conclusions and future work 109

different current leakage mechanisms to the performance and efficiency of the entire cell.

Comparison of these analyses showed strong deviations between the cells, which

demonstrated the strongly localized nature of the parasitic effects. Therefore, the application

of the statistical methodology to physical model building was considered successful. It is

hoped and believed that this promising synergy will prove to be of even higher use in future

fundamental research on solar cells.

The second half of the thesis focuses on the evaluation of alternative techniques for

parameter estimation. Three candidate routines were withheld from an extensive literature

and encoded in Matlab. Their performance was benchmarked by parameter estimation for a

simple, single-response linear model, which allowed for a first comparison with the results

from classical regression. The first method extended the scope of classical regression routines

to situations with heteroscedastic errors, i.e., with non-constant variance, by modelling the

weighing factors as proportional to a power of the model predictions. The performance of

this procedure was found to be rather variable, as its ability to retrieve the true parameter

values heavily depended on the overall quality of the data. Nevertheless, a significant gain in

accuracy was noticed with respect to ordinary least squares estimation, while the regularity

of the heteroscedastic error was drastically increased after weighing.

The second method accounted for non-zero correlation among the experimental error, a

situation that is of particular interest for time series experiments, e.g. data from batch-reactor

setups for chemical modelling. Now, the experimental error was modelled as a time series

obeying to a first order auto-regressive model. Although again a considerable positive

impact of the procedure was seen on the randomness of the errors, the improvement in the

quality of the parameter estimates was not striking. For some runs of the routine, the

calculated estimates were performing even less well. Although the routine was not explicitly

evaluated on a nonlinear model, as this introduces an additional level of complexity in the

algorithm, the only moderate performance on simple models does not leave much hope for

its added value for more complicated situations.

At last, a Bayesian estimation procedure was implemented. The routine tried to combine, for

the first time, theoretical insights on optimal design of the posterior density function to

allow, at least theoretically, for an automated weighing of the experimental data, with the

computational power of MCMC sampling schemes to evaluate that multidimensional

posterior efficiently. It turned out that classical MCMC sampling schemes were not able to

converge properly and therefore, more advanced algorithms had to be searched. Although

the obtained inference on the model parameters was not as efficient as the parameter

estimates from classical regression, as the calculated 95% confidence intervals were slightly

broader, the first successful application of this new methodology is yet seen as a promising

result. Keeping in mind that this procedure allows not only for an automated weighing, but

for the explicit inclusion of prior information on the model parameter values as well, further

research on and testing of this interesting technique for more complex situations is highly

recommended.

110 Appendix

Appendix

A.1 Matlab routines for alternative techniques to

estimate model parameters

A.1.1 Data-based weighing

clear all % format long

% Proposed model y = Aact*x+Bact par = 2; Aact = 5; Bact = 1;

% Generate the heteroscedastic output set. It is assumed that the variance % of the experimental error is proportional to the square of the actual % value of the output n = 20; sigma = 1; x = linspace(-5,5,n)'; X = [x,ones(size(x))];

ycalc = X*[Aact;Bact]; V = diag(abs(ycalc).^2)*sigma^2; yobs = ycalc+mvnrnd(zeros(size(ycalc)),V)';

% Ordinary Least Squares estimate and confidence intervals b0 = (X'*X)^-1*X'*yobs; yopt = X*b0; Sb = (yobs-yopt)'*(yobs-yopt); Vb0 = (X'*X)^-1*Sb/(n-par); tval = tinv(1-0.05/2,n-par); CIb0low = b0-tval*sqrt(diag(Vb0)); CIb0up = b0+tval*sqrt(diag(Vb0));

% Start of the weighted routine weights = @(p,b) vpa(abs(X*b).^(2*p-2)); f1 = @(p,b) -1/2*log(prod(weights(p,b)))+n/2*log(sum(weights(p,b).*(yobs-

X*b).^2)); f = @(v) f1(v(1),v(2:end)); vopt = fminsearch (f,[1;b0]); popt = vopt(1); bopt = vopt(2:end); yopt = X*bopt; wopt = weights(popt,bopt);

% Confidence intervals Vb = zeros(par+1,par+1); res = yobs-yopt; WSSR = sum(wopt.*res.^2); a = (popt-1)./(yopt.^2)+n/WSSR*((popt-1)*(2*popt-

3)*wopt.*res.^2./(yopt.^2)-4*(popt-1)*wopt.*res./yopt+wopt);

Appendix 111

b = 1./yopt-n/WSSR*(wopt.*res.*(res./yopt+2*((popt-1)*res./yopt-

1).*log(abs(yopt)))); k = sum(wopt.*res.^2.*log(abs(yopt))); c = wopt.*res.^2.*(log(abs(yopt))).^2; for i = 1:par+1 for j = 1:i S = @(q) sum(wopt.*res.*((popt-1)*res./yopt-1).*X(:,q)); if i <= par Vb(i,j) = sum(a.*X(:,i).*X(:,j))-2*n*S(i)*S(j)/WSSR^2; Vb(j,i) = Vb(i,j); elseif j <= par Vb(i,j) = -sum(b.*X(:,j))-2*n*k*S(j)/WSSR^2; Vb(j,i) = Vb(i,j); else Vb(i,j) = 2*n/WSSR*sum(c)-2*n*k^2/WSSR^2; end end end Vb = Vb^-1; tval = tinv(1-0.05/2,n-(par+1)); CIblow = [bopt;popt]-tval*sqrt(diag(Vb)); CIbup = [bopt;popt]+tval*sqrt(diag(Vb));

disp([CIb0low b0 CIb0up]) disp([CIblow [bopt;popt] CIbup])

% figure % subplot(2,2,1) % scatter (x,yobs,'b') % hold on % scatter (x,X*b0,'r') % hold off % subplot(2,2,2) % scatter (x,yobs,'b') % hold on % scatter (x,X*bopt,'r') % hold off % subplot(2,2,3) % scatter (x,yobs-X*b0) % subplot(2,2,4) % scatter (x,sqrt(wopt).*res);

A.1.2 Correcting for serial correlation

% Benchmark study for the correction for serial correlation for a single % response, linear model by implementing the "iterated two-stage" AR(1) % model as suggested by Seber and Wild (2003) and introduced in Chapter 2

clear all

% Proposed model y = A*x+B A = 5; B = 1;

% Generate the correlated output set, x=[-5,5] with predetermined value % rhospec for the AR(1) parameter n = 20; sigma = 0.1; x = linspace(-5,5,n)';

112 Appendix

X = [x,ones(size(x))]; rhospec = -0.99; yexact = A*x+B; V = zeros(n,n); for i = 1:n for j = 1:n V(i,j) = rhospec^(abs(j-i)); end end V = V*sigma^2; yobs = yexact+mvnrnd(zeros(size(yexact)),V)';

% Ordinary Least Squares estimate b0 = (X'*X)^-1*X'*yobs;

% Calculate Durbin-Watson test criterion % For alpha = 0.05, n = 20 and p = 2, dL = 1.10 and dU = 1.54 res0 = yobs-X*b0; dL = 1.10; dU = 1.54; d = (res0(1:end-1)'*res0(2:end))/(res0'*res0);

tol = 1; bopt = b0; while tol>10^-4 bold = bopt; res = yobs-X*bopt; rho = res(1:n-1)'*res(2:n)/(res'*res); f = @(a,b) (1-rho^2)*(yobs(1)-X(1,:)*[a;b])^2+sum((yobs(2:n,:)-

X(2:n,:)*[a;b]-rho*(yobs(1:n-1,:)-X(1:n-1,:)*[a;b])).^2); fun = @(b) f(b(1),b(2)); bopt = fminsearch (fun,bold); tol = norm((bopt-bold)./bopt); % disp(rho) end

% Define the new, uncorrelated experimental error vector u = [res(1);res(2:n)-rho*res(1:n-1)];

disp(b0) disp(bopt) disp(rho) disp([dL d dU])

% figure % subplot(3,2,1) % scatter (x,yobs,'b') % hold on % plot (x,X*bopt,'r') % hold off % subplot(3,2,2) % scatter (x,yobs,'b') % hold on % plot (x,X*b0,'r') % hold off % subplot(3,2,3) % scatter (x,yobs-X*b0) % subplot(3,2,4) % scatter (x,u) % subplot(3,2,5)

Appendix 113

% scatter(res0(1:end-1),res0(2:end)); % subplot(3,2,6) % scatter(res(1:end-1),res(2:end)-rho*res(1:end-1));

A.1.3 Bayesian estimation with affine invariant MCMC

% Bayesian estimation of a single-response model clear all

% Create the experimental data par = 2; A = 5; B = 1;

% Number of "experimental" data points n = 20;

% Number of responses m = 1; sigma = 1; x = linspace(-5,5,n)'; y = @(a,b) a*x+b; ycalc = y(A,B); %y = @(a,b) a*exp(b*x); %ycalc = y(A,B); yobs = mvnrnd(ycalc,sigma^2);

% For a single-response model and heteroscedastic but non-correlated

experimental % data set, the marginal posterior density function % p(O|y) is given by % p(O|y) ~ prod[|y(i)-f(x(i))|^-(m+2)], i = 1..n

%p = @(theta) prod(abs(yobs-y(theta(1),theta(2),theta(3),theta(4))))^-

(m+2); p = @(theta) prod(abs(yobs-y(theta(1),theta(2))))^-(m+2);

% Initialize all walkers at 0; K = 10000; T = 1000; walkers = zeros(T*K,par); % Initialize the walkers at a random position between -1 and 1. % Initialization at 0 will make the walkers immobile. walkers (1:K,:) = 2*rand(K,par)-1;

for t = 2:T temp = walkers((t-2)*K+1:(t-1)*K,:); for k = 1:K disp((t-1)*K+k); % Construct the set of complementary walkers if k == 1 temp2 = temp(2:end,:); else temp2 = [temp(1:k-1,:);temp(k+1:end,:)]; end % Random picking of the complementary walker for the stretch move pos = ceil((K-1)*rand); Xj = temp2(pos,:); % Sample from the distribution g(z) = 1/sqrt(z) , z = 1/a..a a = 2;

114 Appendix

intz = (2*a-2)/sqrt(a); z = (rand*intz*sqrt(a)+2)^2/4/a; Xk = temp(k,:); Y = Xj+z*(Xk-Xj); q = z^(par-1)*p(Y)/p(Xk); % Perform the acceptance/rejection of the candidate r = rand; if p(Y) == 0 walkers((t-1)*K+k,:) = Xk; elseif r <= min(q,1) walkers((t-1)*K+k,:) = Y; else walkers((t-1)*K+k,:) = Xk; end end end

burnin = 50; walkers1 = walkers; walkers = walkers(burnin*K+1:end,:);

% Now calculate the par-dimensional probability density distribution of the % parameters of a par-dimensional histogram data set, using the function

histcn.m added in the folder edge1 = linspace(4,6,1000); edge2 = linspace(0,2,1000); [histdata, edges,mids] = histcn (walkers,edge1,edge2); h = size(histdata,1); edges = cell2mat(edges); mids = cell2mat(mids); % The vector histdata now contains the number of counts for a par-

dimensional % cube of h bins. The analogue for the marginal posterior density function % for th j'th parameter is then found by summing over all dimensions, % except for the j'th. prob = zeros(h,par);

dist1 = mids(2)-mids(1); dist2 = mids(h+2)-mids(h+1); histdata = histdata/(sum(histdata(:)))/dist1/dist2;

% Calculate the marginal probability densities along each parameter axis prob(:,1) = dist2*sum(histdata,2); prob(:,2) = dist1*sum(histdata,1)';

figure subplot(2,2,1) plot(mids((1:h)),prob(:,1)); subplot(2,2,2) plot(mids(h+1:end),prob(:,2)); subplot(2,2,3) plot(1:size(walkers,1),walkers(:,1)); subplot(2,2,4) plot(1:size(walkers,1),walkers(:,2)); hold off

% Calculate the alpha percent confidence interval alpha = 0.95; int1 = 100; int2 = int1;

Appendix 115

for i = 1:h-1 for j = i+1:h area1 = sum(prob(i:j,1))*dist1; area2 = sum(prob(i:j,2))*dist2; if area1 >= alpha && (j-i+1)*dist1<int1 int1 = (j-i+1)*dist1; unlim1 = edges(i); uplim1 = edges(j+1); Area1 = area1; end if area2 >= alpha && (j-i+1)*dist2<int2 int2 = (j-i+1)*dist2; unlim2 = edges(h+1+i); uplim2 = edges(h+1+j+1); Area2 = area2; end end end

116 Appendix

A.2 Lab journal: table of contents

Overview of calibration experiments (pp 1-4)

Overview of transesterification experiments (pp. 5-10)

Laurenz Peleman - Universiteit Gent

Documents

Transcript of Laurenz Peleman - Universiteit Gent