Robust estimation of correlation coefficients among soil parameters...

12
Robust estimation of correlation coefficients among soil parameters under the multivariate normal framework Jianye Ching a,, Kok-Kwang Phoon b , Dian-Qing Li c a Dept of Civil Engineering, National Taiwan University, Taiwan, ROC b Dept of Civil and Environmental Engineering, National University of Singapore, Singapore c State Key Laboratory of Water Resources and Hydropower Engineering Science, Wuhan University, PR China article info Article history: Received 4 March 2016 Received in revised form 12 July 2016 Accepted 12 July 2016 Keywords: Soil properties Correlation Multivariate normal distribution Statistical uncertainty abstract Based on limited amount of multivariate soil data Y , it is only possible to reliably estimate the marginal distributions and the correlations. A common practical approach of constructing the multivariate proba- bility distribution of Y is to transform Y into standard normal data X and construct the multivariate stan- dard normal distribution for X . This method is called the translation method. Its success depends on whether the Pearson product-moment correlations (d ij ) for X can be robustly estimated. This paper inves- tigates the robustness for four methods of estimating d ij . The emphasis is on the statistical uncertainty in the estimated d ij when the amount of soil data is limited. It is found that the well known method that maps the Pearson correlations for Y to d ij is the least robust, suffering the most significant statistical uncertainty. The causes for this non-robustness are investigated. The two methods that map the Spearman and Kendall rank correlations for Y to d ij are quite robust. The method that converts Y to X and directly estimates d ij is also robust as long as the conversion is based on properly chosen marginal distributions. Ó 2016 Elsevier Ltd. All rights reserved. 1. Introduction Geotechnical data are multivariate in their nature. For instance, when borehole samples are drawn, SPT-N values are usually avail- able; moreover, the information regarding unit weight, plasticity index (PI), liquid limit (LL) and water content can be quickly obtained through laboratory tests. Many of these test indices may be simultaneously correlated to a design soil parameter such as the undrained shear strength (s u ). Some multivariate soil data- bases have been compiled in recent studies [2,4,6,8]and multivari- ate probability distribution models have been constructed. Table 1 shows these databases, labeled as (soil type)/(number of parame- ters of interest)/(number of data points). With the multivariate dis- tribution models, these studies showed that it is possible to reduce the uncertainty in the design soil parameter by incorporating mul- tiple site investigation information. This reduction in uncertainty can further translate to actual savings in design dimensions under the reliability-based design framework [9]. This more explicit link between site investigation efforts and possible design savings is a distinctive and important subject in geotechnical engineering [20,35,36]. In practice, it is not possible to construct the exact multivariate distribution based on limited amount of data. The available infor- mation is typically limited to the marginal distributions and the correlations only [30,24]. Given the marginal distributions and the correlations of a set of parameters of interest, the underlying multivariate distribution is not unique [24]. A common practical approach of constructing the multivariate distribution is to trans- form the non-normal data into standard normal data and construct the underlying multivariate normal distribution. To be specific, let Y = (Y 1 ,Y 2 , ...,Y n ) be the multivariate geotechnical parameters of interest. In general, Y i is non-normal, and the following CDF transform can be adopted to transform Y i into standard normal variable X i : X i ¼ U 1 ½F i ðY i Þ Y i ¼ F 1 i ½UðX i Þ ð1Þ where F i is the cumulative density function (CDF) for Y i ; U is the standard normal CDF; U 1 is the inverse function for U;F i 1 is the inverse function for F i . Furthermore, X = (X 1 ,X 2 , ...,X n ) is assumed to follow the multivariate normal distribution with the Pearson product-moment correlation d ij between (X i ,X j ). This method of constructing multivariate distribution has broad applications in the literature [29,19,30,1,24]. It is called the ‘‘Nataf model” [31] in Liu and Der Kiureghian [30], the ‘‘NORTA (NOR mal T oA nthing) http://dx.doi.org/10.1016/j.strusafe.2016.07.002 0167-4730/Ó 2016 Elsevier Ltd. All rights reserved. Corresponding author. E-mail address: [email protected] (J. Ching). Structural Safety 63 (2016) 21–32 Contents lists available at ScienceDirect Structural Safety journal homepage: www.elsevier.com/locate/strusafe

Transcript of Robust estimation of correlation coefficients among soil parameters...

  • Structural Safety 63 (2016) 21–32

    Contents lists available at ScienceDirect

    Structural Safety

    journal homepage: www.elsevier .com/locate /s t rusafe

    Robust estimation of correlation coefficients among soil parametersunder the multivariate normal framework

    http://dx.doi.org/10.1016/j.strusafe.2016.07.0020167-4730/� 2016 Elsevier Ltd. All rights reserved.

    ⇑ Corresponding author.E-mail address: [email protected] (J. Ching).

    Jianye Ching a,⇑, Kok-Kwang Phoon b, Dian-Qing Li caDept of Civil Engineering, National Taiwan University, Taiwan, ROCbDept of Civil and Environmental Engineering, National University of Singapore, Singaporec State Key Laboratory of Water Resources and Hydropower Engineering Science, Wuhan University, PR China

    a r t i c l e i n f o

    Article history:Received 4 March 2016Received in revised form 12 July 2016Accepted 12 July 2016

    Keywords:Soil propertiesCorrelationMultivariate normal distributionStatistical uncertainty

    a b s t r a c t

    Based on limited amount of multivariate soil data Y, it is only possible to reliably estimate the marginaldistributions and the correlations. A common practical approach of constructing the multivariate proba-bility distribution of Y is to transform Y into standard normal data X and construct the multivariate stan-dard normal distribution for X. This method is called the translation method. Its success depends onwhether the Pearson product-moment correlations (dij) for X can be robustly estimated. This paper inves-tigates the robustness for four methods of estimating dij. The emphasis is on the statistical uncertainty inthe estimated dij when the amount of soil data is limited. It is found that the well known method thatmaps the Pearson correlations for Y to dij is the least robust, suffering the most significant statisticaluncertainty. The causes for this non-robustness are investigated. The two methods that map theSpearman and Kendall rank correlations for Y to dij are quite robust. The method that converts Y to Xand directly estimates dij is also robust as long as the conversion is based on properly chosen marginaldistributions.

    � 2016 Elsevier Ltd. All rights reserved.

    1. Introduction

    Geotechnical data are multivariate in their nature. For instance,when borehole samples are drawn, SPT-N values are usually avail-able; moreover, the information regarding unit weight, plasticityindex (PI), liquid limit (LL) and water content can be quicklyobtained through laboratory tests. Many of these test indicesmay be simultaneously correlated to a design soil parameter suchas the undrained shear strength (su). Some multivariate soil data-bases have been compiled in recent studies [2,4,6,8]and multivari-ate probability distribution models have been constructed. Table 1shows these databases, labeled as (soil type)/(number of parame-ters of interest)/(number of data points). With the multivariate dis-tribution models, these studies showed that it is possible to reducethe uncertainty in the design soil parameter by incorporating mul-tiple site investigation information. This reduction in uncertaintycan further translate to actual savings in design dimensions underthe reliability-based design framework [9]. This more explicit linkbetween site investigation efforts and possible design savings is adistinctive and important subject in geotechnical engineering[20,35,36].

    In practice, it is not possible to construct the exact multivariatedistribution based on limited amount of data. The available infor-mation is typically limited to the marginal distributions and thecorrelations only [30,24]. Given the marginal distributions andthe correlations of a set of parameters of interest, the underlyingmultivariate distribution is not unique [24]. A common practicalapproach of constructing the multivariate distribution is to trans-form the non-normal data into standard normal data and constructthe underlying multivariate normal distribution. To be specific, let

    Y = (Y1, Y2, . . ., Yn) be the multivariate geotechnical parameters ofinterest. In general, Yi is non-normal, and the following CDFtransform can be adopted to transform Yi into standard normalvariable Xi:

    Xi ¼ U�1½FiðYiÞ� Yi ¼ F�1i ½UðXiÞ� ð1Þ

    where Fi is the cumulative density function (CDF) for Yi; U is thestandard normal CDF; U�1 is the inverse function for U; Fi�1 is the

    inverse function for Fi. Furthermore, X = (X1, X2, . . ., Xn) is assumedto follow the multivariate normal distribution with the Pearsonproduct-moment correlation dij between (Xi, Xj). This method ofconstructing multivariate distribution has broad applications inthe literature [29,19,30,1,24]. It is called the ‘‘Nataf model” [31] in

    Liu and Der Kiureghian [30], the ‘‘NORTA (NORmal To Anthing)

    http://crossmark.crossref.org/dialog/?doi=10.1016/j.strusafe.2016.07.002&domain=pdfhttp://dx.doi.org/10.1016/j.strusafe.2016.07.002mailto:[email protected]://dx.doi.org/10.1016/j.strusafe.2016.07.002http://www.sciencedirect.com/science/journal/01674730http://www.elsevier.com/locate/strusafe

  • Table 1Soil databases.

    Database Reference Parameters of interest # sites/studies Marginal PDF Method of determining dij

    CLAY/5/345 Ching and Phoon [2] LI, su, sure, r0p, r0v 37 sites Lognormal Method XPCLAY/6/535 Ching et al. [8] su/r0v, OCR, (qt � rv)/r0v, (qt � u2)/r0v, (u2 � u0)/r0v, Bq 40 sites Johnson Method XPCLAY/7/6310 Ching and Phoon [4] su under 7 different test modes 164 studies Lognormal Method XPCLAY/10/7490 Ching and Phoon [6] LL, PI, LI, r0v/Pa, r0p/Pa, su/r0v, St, (qt � rv)/r0v, (qt � u2)/r0v, Bq 251 studies Johnson Method XP

    LL: liquid limit; PI: plasticity index; LI: liquidity index; su: undrained shear strength; sure: remolded su; r0p: preconsolidation stress; r0v: vertical effective stress; rv: verticaltotal stress; OCR: overconsolidation ratio; qt: corrected cone tip resistance; u2: pore pressure behind the cone; u0: static pore pressure; Bq: CPTU pore pressure parameter; Pa:one atmosphere pressure; St: sensitivity. Method XP refers to the method of estimating the Pearson moment-product correlation (dij) by transforming the non-normal soildata into standard normal variables.

    22 J. Ching et al. / Structural Safety 63 (2016) 21–32

    distribution” in Cario and Nelson [1], and the ‘‘translation method”in Johnson [22] and Li et al. [24]. Some special cases of such modelsinclude the multivariate lognormal model proposed by Johnson andRamberg [21] and the bivariate Johnson model proposed by Johnson[22], which is later extended to the multivariate Johnson model byStanfield et al. [38]. For bivariate geotechnical data, the copula the-ory has been widely used for constructing bivariate distributions[25,26,28,27,39,40,41,18,17]. With the copula theory, it is possibleto go beyond the bivariate normal distribution framework. How-ever, there are only limited studies applying copulas to multivariatedistributions with dimension more than 2 (n > 2), because only theelliptical copulas (e.g., Gaussian copula and t copula) have practicaln-dimensional generalizations. The current study focuses on themultivariate normal distribution framework. This multivariate nor-mal distribution framework will be referred to as the ‘‘translationmethod” and the multivariate model will be referred to as the‘‘translation model” in the following.

    The success of the translation model depends on whether themarginal probability density functions (PDF) and the Pearsonproduct-moment correlations dij can be reliably estimated. Thegoodness-of-fit of the marginal PDFs can be assessed through clas-sical statistical tests such as the Chi-squared and K-S tests [12]. Thefocus of the current paper is on the robust estimation of the Pear-son correlation dij between each (Xi, Xj) pair. The translationmethod is by no means a complete framework. Not every multi-variate distribution can be represented as a translation model.Therefore, the translation model should be considered as anapproximate albeit practical model. Li et al. [24] investigated theperformance of the translation model in approximating a non-translation model. The main goal was to verify the effectivenessof the translation model when the target multivariate distributionmodel is beyond the multivariate normal framework. Nonetheless,the purpose of the current paper is different. The purpose is toinvestigate the statistical uncertainty in the estimated dij. Due tothe limited amount of the geotechnical data, the estimated dij isnot identical to the actual dij. The discrepancy is the statisticaluncertainty. In this study, four methods of estimating dij will beinvestigated. The one with the least statistical uncertainty shouldbe considered as the most robust method. Note that these fourmethods will produce the same correlation coefficients if the dataare produced by a translation model and the sample size is infi-nitely large.

    To conduct this investigation, a translation model that is withinthe multivariate normal framework with marginal PDFs and corre-

    lation matrix is adopted to simulate Y data. This translation modelwas constructed by Ching and Phoon [2] based on the database

    CLAY/5/345 in Table 1. The simulated Y data are adopted to esti-mate dij by the four methods. The discrepancy between the actualand estimated dij can then be quantified, and conclusions regardingthe robustness of each method will be given. To confirm the appli-cability of the conclusions with respect to real soil databases, fur-ther comparisons among the four methods will be conducted forthe four real soil databases shown in Table 1.

    2. Methods for estimating dij

    In the literature, there are at least four methods for estimatingdij. They are denoted by Method P, Method S, Method K, andMethod XP below:

    1. Method P. This method is the most common method adopted inthe literature (e.g., [29,30,1,25]. It is based on the Pearsonproduct-moment correlation between (Yi, Yj), denoted by qij,which can be estimated using the following equation:

    qij �1

    N�1PN

    k¼1 YðkÞi �mi

    � �� YðkÞj �mj� �

    ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

    N�1PN

    k¼1 YðkÞi �mi

    � �2� 1N�1

    PNk¼1 Y

    ðkÞj �mj

    � �2r ð2Þ

    where the superscript (k) is the sample index; mi is the samplemean of Yi; N is the total number of data points. To implementMethod P, qij is first estimated from the soil data of (Yi, Yj), thendij can be found by solving the following integral equation[29,30,25]:

    qij ¼Z 1�1

    Z 1�1

    F�1i ½UðxiÞ� � liri

    !F�1j ½UðxjÞ� � lj

    rj

    !

    � 12p

    ffiffiffiffiffiffiffiffiffiffiffiffiffiffi1� d2ij

    q exp � x2i � 2dijxixj þ x2j2ð1� d2ijÞ

    ( )dxidxj ð3Þ

    where li and ri are the mean value and standard deviation for Yi. If(Y1, Y2, . . ., Yn) are multivariate lognormal, i.e., Yi = exp(Xi), Eq. (3)has the following analytical form:

    dij ¼ln 1þ qij � ðri=liÞ � ðrj=ljÞh i

    ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiln 1þ ðri=liÞ2h ir

    �ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiln 1þ ðrj=ljÞ2h ir ð4Þ

    2. Method S. This method is adopted in Li et al. [24]. It is based onthe Spearman rank correlation between (Yi, Yj), denoted by rij,which is defined to be the Pearson correlation between [Fi(Yi),Fj(Yj)], where Fi is the CDF for Yi. rij can be estimated as the Pear-son correlation between the ranks of (Yi, Yj), namely using Eq.(2) but the (Yi, Yj) data are replaced by their ranks. BecauseFi(Yi) =U(Xi) (see Eq. (1)), it is clear that the Pearson correlationbetween [Fi(Yi), Fj(Yj)] is identical to that between [U(Xi),U(Xj)]. This implies that the Spearman correlation between(Yi, Yj) is identical to that between (Xi, Xj). Moreover, forbivariate normal (Xi, Xj), their Pearson and Spearman correla-tions are related by the following equation [16]:

    dij ¼ 2 sin p6 � rij� �

    ð5Þ

    Eq. (5) does not apply to any bivariate distribution, although theerror of doing so has not been studied due to the difficulty ofsimulating genuinely non-normal multivariate data, i.e., data thatdeviate significantly from those produced by the Nataf or

  • Table 2Statistics of the lognormal marginal PDFs for (Y1, Y2, Y3, Y4, Y5) (from [3]).

    l = mean r/l = COV k = meanof ln(Y)

    n = stdev*

    of ln(Y)

    Y1 = LI 1.251 0.486 0.122 0.459Y2 = su 31.009 kN/m2 0.951 3.051 0.898Y3 = sure 2.514 kN/m2 1.516 0.226 1.191Y4 = r0p 105.820 kN/m2 0.975 4.311 0.835Y5 = r0v 66.631 kN/m2 0.803 3.891 0.823

    *

    J. Ching et al. / Structural Safety 63 (2016) 21–32 23

    translation method. To implement Method S, rij is first estimatedfrom the geotechnical data of (Yi, Yj), then dij can be determinedthrough Eq. (5).

    3. Method K. This method is adopted in Li et al. [25] and Chinget al. [11]. It is based on the Kendall rank correlation [32]between (Yi, Yj), denoted by sij, which is defined as the probabil-ity of concordance minus the probability of discordancebetween (Yi, Yj). sij can be estimated by the following equation:

    stdev denotes standard deviation.

    sij �P

    k

  • Fig. 1. Simulated Y data versus actual data in database CLAY/5/345.

    24 J. Ching et al. / Structural Safety 63 (2016) 21–32

    discrepancy between the estimated and actual dij is the statisticaluncertainty. This simulation and estimation is repeated for 1000times, hence the statistical uncertainty is realized for 1000 timesand an empirical distribution of the statistical uncertainty can beproduced.

    Fig. 2 shows the estimated versus actual dij for the four meth-ods. The vertical bar indicates the 95% confidence interval, theinterval between the 2.5% and 95% sample percentiles of the1000 estimated dij values. The square indicates the 50% sample per-centile (median). There are ten dij (d12, d13, . . ., d45), hence there areten vertical bars and ten squares. Theoretically, the estimated dijcannot be less than �1. However, the d13 estimated by Method Pis sometimes below �1 (the ‘‘prohibited zone” in Fig. 2a). This isbecause Eq. (4) does not prevent the dij solution from being lessthan �1. However, this (dij < �1) is impossible for Methods S, K,and XP because Eqs. (2), (5) and (7) strictly prohibit so. Further-more, the dij estimated by Method P is the least robust becausethe confidence intervals are the widest. This is especially true ford13: the confidence interval for d13 estimated by Method P(Fig. 2a) is very wide. The other confidence intervals for methodP are also wider than those for Methods S, K, and XP. In general,Methods S, K, and XP seem equally robust.

    3.3. Why Method P is the least robust?

    It is evident that Method P is the least robust method for esti-mating dij. To understand why this is the case, the estimation pro-cess for d13 (where Method P performs poorly) is furtherinvestigated. The processes of estimating d13 for the four methodsare illustrated in Fig. 3. For instance, Fig. 3a shows the process forMethod P:

    1. q13 for the simulated (Y1, Y3) data points is first estimated usingEq. (2). A set of simulated (Y1, Y3) data points are illustrated inFig. 4a (Y1 = LI & Y3 = sure). The estimated q13 value is shown as across mark on the horizontal axis in Fig. 3a. The horizontal axisof Fig. 3a now demonstrates 100 realizations for the estimatedq13 values.

    2. The estimated q13 value is then mapped to d13 through Eq. (4).The mapped d13 value is shown as a cross mark on the verticalaxis in Fig. 3a. The actual value of d13 (=�0.83) is shown as thecircle on the vertical axis.

    The process for Method S (or Method K) is similar, except thatr13 (or s13) is estimated in Step 1 and that the mapping in Eq. (5)(or Eq. (7)) is adopted in Step 2. The process for Method XP issomewhat different: (Y1, Y3) data points are first converted to(X1, X3) data points, then d13 is directly estimated. There are threeimportant observations:

    1. The q13 values estimated by Method P (cross marks on the hor-izontal axis in Fig. 3a) have larger scatter than the r13, s13, andd13 values estimated by the other three methods (cross markson the horizontal axis in Fig. 3b–d). The q13 estimated byMethod P has larger scatter because (Y1, Y3) data points arenonlinearly correlated (Fig. 4a). The Pearson product-momentcorrelation becomes non-robust because it only measures the‘‘linear” correlation. Methods S and K do not have this issuebecause they estimate the ‘‘rank” correlation that does notrequire the data points to be linearly correlated. Method XPdoes not have this issue, either, because the converted (X1, X3)data points are indeed linearly correlated for data simulatedfrom the translation model. The issue of nonlinear correlationbecomes less significant if the COVs for (Y1, Y3) are both smaller.

  • Fig. 2. Estimated versus actual dij for the four methods.

    J. Ching et al. / Structural Safety 63 (2016) 21–32 25

    Fig. 4b shows the (Y1, Y3) data points if the COVs for (Y1, Y3) areboth reduced to 0.3 (they were originally 0.486 and 1.516). Thedegree of nonlinear correlation is significantly reduced. Fig. 5ashows the process for Method P for reduced COVs. Comparedto Fig. 3a, the scatter for the estimated q13 values on the hori-zontal axis is now greatly reduced.

    2. q13 cannot extend over the full limits (from �1 to +1). Phoon[33] discussed this issue and noted that qij cannot extend overthe full limits between +1 and �1, when the COVs between Yiand Yj are significantly different. Fig. 3a show the q13 � d13mapping for Method P (i.e., Eq. (4)). It is clear that d13 = �1 cor-responds to q13 = �0.49 (see the square mark in Fig. 3a). This isbecause the CDF transform in Eq. (1) is nonlinear. A perfect lin-ear correlation between (X1, X3) turns into a perfect nonlinearcorrelation between (Y1, Y3). However, the Pearson correlationbetween two perfectly nonlinearly correlated (Y1, Y3) is not�1. Methods S, K, and XP do not have this issue of incompleteq limits. This issue is severe as long as one of the COVs for(Y1, Y3) is large. For the example in Fig. 3a, the COVs for (Y1,Y3) are 0.486 and 1.516, respectively. The COV for Y3 is quitelarge. The issue is still severe when the COVs are both large(see Fig. 5b). The issue becomes less severe only when the COVsare both small (see Fig. 5a).

    3. The q13 � d13 mapping amplifies the variability in d13. The map-ping between (q13, d13) bends downwards because it has to passthrough (�0.49, �1) (see the solid line in Fig. 3a). The bend-downward mapping tends to further amplify the scatter inq13: the estimated q13 ranges from �0.32 to �0.56, but themapped d13 now ranges from �0.57 to �1.2! Methods S, K,and XP do not have this bend-downward mapping issue (seethe solid lines in Fig. 3b–d), hence the scatter does not amplify

    through the mappings. The bend-downward mapping issuebecomes less severe only when the COVs for (Y1, Y3) are bothsmall (see Fig. 5a).

    The above three issues for Method P are the most severe when(Yi, Yj) are strongly negatively correlated. In principle, the sameissues can happen when (Yi, Yj) are strongly positively correlated.For instance, consider ln(Yj) = a + b � ln(Yi) (b positive). One cansee that (Xi, Xj) has Pearson correlation = +1, but the Pearson corre-lation for (Yi, Yj) cannot attain + 1 because Yj = exp(a) � Yib is highlynonlinear if the constant b is very different from +1. However, if b isclose to +1, (Yi, Yj) data points also exhibit fairly linear correlation.In this case, Method P can become reasonably robust. One suchexample can be found in Fig. 1d: the (Y2, Y4) data points (Y2 = su &Y4 =r0p) exhibit strong positive correlation, but the correlation isfairly linear because the constant b is close to +1. Hence, MethodP seems reasonably robust for the estimation of d24 (see Fig. 2a).

    In short, Method P is the least robust method for estimating dij,whereas Methods S, K, and XP seem more robust. Method Pbecomes non-robust if (Yi, Yj) are strongly correlated with largeCOVs. Exceptions can happen if (Yi, Yj) are strongly positively cor-related with constant b close to +1. When the COVs are both small,the issues can become mild. These observations are consistent tothose obtained in Li et al. [25], in which the performance of Meth-ods P and K for estimating dij are comprehensively compared andinvestigated.

    3.4. Unknown marginal PDFs

    We have investigated the scenario where the marginal PDFs for

    Y are known thus far. In real applications, these marginal PDFs are

  • Fig. 3. The processes of estimating d13 for the four methods.

    Fig. 4. Simulated (Y1, Y3) data points: (a) with original COVs; (b) with reduced COVs.

    26 J. Ching et al. / Structural Safety 63 (2016) 21–32

    unknown. The impact of unknown marginal PDFs is investigated inthis section. Methods S and K are completely unaffected becausethey do not require the marginal PDFs to be pre-determined. Theconclusions for Method P do not significantly change when themarginal PDFs are unknown: Method P can become non-robust if(Yi, Yj) are strongly correlated with large COVs. Therefore, thefollowing investigation is limited to Method XP.

    It is evident that Method XP is robust when the marginal PDFsare known. The investigation in this section will reveal whetherMethod XP is still robust under the following scenarios:

    1. The types of the underlying marginal PDFs for Y are known(lognormal), but the parameters are unknown. (k, n) are firstestimated as the sample mean and sample standard deviationof the 345 ln(Y) data points. Method XP is then applied withthe resulting lognormal distributions (parameters = estimatedk & n).

    2. The types of the underlying marginal PDFs and the PDF param-eter are both unknown. One logical choice is to adopt the

    empirical marginal CDFs of Y. However, these empirical CDFsare not continuous functions. Continuous marginal CDFs are

  • Fig. 5. The processes of estimating d13 for Method P (a) COVs for Y1 and Y3 both equal to 0.3; (b) COVs for Y1 and Y3 both equal to 1.0.

    J. Ching et al. / Structural Safety 63 (2016) 21–32 27

    desirable if we require a smooth translation model. In thisstudy, the unknown marginal PDFs are estimated using theJohnson system of distributions [22]. For the Johnson system,there are three PDF types (SU, SB, SL), and there are four param-eters. The Johnson system can generate distributions with awide range of shapes. Slifker and Shapiro [37] proposed an ele-gant selection and parameter estimation approach for the John-son system using percentiles. The Johnson system ofdistributions has been implemented in Ching & Phoon [6] andChing et al. [8]. The details for the Johnson system can be foundin Phoon and Ching [34] and Ching and Phoon [10], includingthe CDF transforms between the standard normal Xi and John-son Yi. Method XP is then applied with the resulting Johnsondistributions.

    3. The underlying marginal PDFs are modeled as normal distribu-tions, which is clearly an absurd choice. (l, r) are first esti-mated as the sample mean and sample standard deviation of

    the 345 Y data points. Method XP is then applied with theresulting normal distributions (parameters = estimated l & r).

    Fig. 6b–d shows the estimated dij versus actual dij under theabove three scenarios. The original scenario where the lognormalmarginal PDFs are completely known is shown in Fig. 6a (sameas Fig. 2d). The robustness for Method XP does not degrade underScenarios 1 & 2 (lognormal with unknown parameters & Johnson)but clearly degrades under Scenario 3 (normal). Under Scenario 3,the estimated dij is biased, as seen from Fig. 6d that some confi-dence intervals (vertical bars) do not contain the 1:1 line (dashedline).

    It is significant that Method XP is still robust under Scenario 2.Detailed results show that the selected PDF type is either the John-son SU or Johnson SB distribution. Those two distribution types arein principle incorrect because they are not lognormal. Yet, the

    Johnson parameters are estimated based on the Y data, so a best-fit SU or SB distribution is obtained. The best-fit SU or SB resemblesthe lognormal distribution because the Johnson system can gener-ate distributions with a wide range of shapes. Because the best-fitJohnson distributions resemble lognormal, the (Xi, Xj) data pointsobtained by the CDF transform Xi =U�1[Fi(Yi)] resemble multivari-ate normal, where Fi(.) is the CDF of the best-fit Johnsondistribution.

    Consider the (Y1, Y3) data points shown in Fig. 4a. For this data-set, Y1 is identified as Johnson SB, whereas Y3 is identified as John-son SU. Fig. 7a shows the (X1, X3) data points mapped by theJohnson CDF transform as square marks. The mapped (X1, X3) datapoints shown in Fig. 7a resemble bivariate normal although the

    actual underlying PDFs are not Johnson. These (X1, X3) data pointspass the line test [15,23] for the bivariate normal distribution (p-value = 0.77). For comparison, the actual (X1, X3) data points areshown in Fig. 7a as cross marks. The Pearson correlation for themapped (X1, X3) data points (squares) is �0.8032, whereas thatfor the actual (X1, X3) data points (crosses) is �0.8031: they arenearly the same.

    However, Method XP is non-robust under Scenario 3. This isbecause an absurd distribution (normal distribution) is chosen.The mapped (Xi, Xj) data points may not resemble bivariate nor-mal. Fig. 7b shows the (X1, X3) data points mapped by the normalCDF transform: Xi = (Yi � li)/ri. It is clear that the mapped (X1, X3)data points (squares) do not resemble bivariate normal at all. ThePearson correlation for the mapped (X1, X3) data points is�0.4286, whereas that for the actual (X1, X3) data points (crosses)is �0.8031: they are very different.

    In short, Method XP seems robust as long as the marginal PDFsprovide a reasonable fit to the data. These ‘‘reasonable” PDFs can beselected from standard goodness-of-fit tests. Based on our experi-ences, the Johnson system of distributions performs well because itcan generate distributions with a wide range of shapes.

    4. Real soil databases

    We have investigated the robustness of the four methods whenthey are applied to data points simulated from a translation modelthus far. A method is robust if the statistical uncertainty for theestimated dij is small. The conclusions for the robustness of the fourmethods are general in the sense that they are applicable as long asthe data are produced by a translation model. However, the trans-lation model is only one procedure to construct a valid joint prob-ability function from marginal distributions and a correlationmatrix. Alternate models produced by copulas exist. In principle,real soil data may not fit a translation model or any known model,although Ching and Phoon [10] noted that the translation model isreasonable for real soil data studied by the authors thus far. If realsoil data do not follow a translation model, the conclusions con-cerning the robustness of the four methods may not apply. Thepurpose of this section is to empirically verify whether these con-clusions would remain useful when real soil databases are charac-terized. Four soil databases are studied below.

    4.1. CLAY/5/345 database

    The robustness of the four methods will be investigated overthe real soil databases shown in Table 1, starting from CLAY/5/345.

  • Fig. 6. Estimated versus actual dij for Method XP under various scenarios.

    Fig. 7. (X1, X3) data points mapped by the CDF transform Xi =U�1[Fi(Yi)].

    28 J. Ching et al. / Structural Safety 63 (2016) 21–32

    There are three main differences between the simulated databaseand the real database:

    1. The underlying multivariate model for the simulated databaseis a translation model, but the underlying model for the realdatabase, if it exists, is not necessarily a translation model.

    2. The actual dij values for the simulated database are known, butthey are unknown for the real database.

    3. The simulated database can be randomly simulated for numer-ous times, but the real database is a fixed database.

    Because the real CLAY/5/345 database is a fixed database, thebootstrapping technique [13] is applied to investigate the statisti-cal uncertainty in the estimated dij. ‘‘Bootstrap samples” of the dijestimates can be obtained by the following steps:

    1. N = 345 random samples of (Yi, Yj) are drawn with replacementfrom the original (Yi, Yj) dataset (re-sampling).

    2. With the re-sampled Y data, dij estimates can be obtained by thefour methods. When applying Methods P and XP, the marginalPDFs are assumed to be lognormal (Scenario 1) because the log-

    normal distribution seems to provide satisfactory fit to the Ydata [2].

    Steps 1 & 2 are repeated 1000 times to obtain 1000 sets of boot-strap dij samples. These bootstrap samples can then be used toobtain the 95% confidence interval, the range bounded by the2.5% and 97.5% sample percentiles.

    Fig. 8 compares the estimated dij for various methods. Theresults for Methods K and S are always very similar, so only the

  • Fig. 8. Estimated dij for Methods K, P, and XP for CLAY/5/345 database.

    Fig. 9. Estimated dij for Methods K and XP for CLAY/5/345 database (Johnsondistribution).

    J. Ching et al. / Structural Safety 63 (2016) 21–32 29

    results for Method K will be presented from here on. Fig. 8a com-pares between Methods P and K. The squares indicate the esti-mated dij values based on the real database CLAY/5/345. If asquare is close to the 1:1 line (dashed line), the estimated dij valuesfor Methods K and P are similar. The horizontal and vertical barsare the bootstrap 95% confidence intervals for Methods K and P,respectively. The size of the bar quantifies the magnitude of thestatistical uncertainty in the estimated dij. For instance, thelower-left most square in Fig. 8a is for d13. It has a long verticalbar and a short horizontal bar. This indicates that the statisticaluncertainty in the d13 estimated by Method P is significantly largerthan that by Method K. Fig. 8b compares between Methods XP andK. The results in Fig. 8 indicate that Methods K and XP have verysimilar dij estimates and statistical uncertainty magnitudes. How-ever, Method P has somewhat different dij estimates with largerstatistical uncertainty. These results are consistent to the previousobservations from simulated data: Methods S, K, and XP are morerobust than Method P. The result regarding d13 is also consistent tothe previous observation: Method P is non-robust because (Y1, Y3)are strongly negatively correlated with large COVs.

    Previously we have seen that the robustness of Method XP canbe affected by the choice of the marginal PDF types. Fig. 9 com-pares between Methods XP and K for the scenario where the John-son distribution is adopted (Scenario 2). Note that the results forMethod K remain the same because Method K does not requirethe determination of the underlying marginal PDFs, but the resultsfor Method XP will change. Nonetheless, Fig. 9 indicates that Meth-ods K and XP still have very similar dij estimates and statisticaluncertainty magnitudes. This is consistent to the previous observa-tion that Method XP is robust when the Johnson distribution isadopted, because the Johnson system can generate distributionswith a wide range of shapes.

    4.2. CLAY/6/535 database

    The CLAY/6/535 database ([8]; see Table 1) contains 535 datapoints for 6 clay properties: Y1 = su/r0v, Y2 = OCR, Y3 = (qt � rv)/r0v, Y4 = (qt � u2)/r0v, Y5 = (u2 � u0)/r0v, and Y6 = Bq. It is also agenuine multivariate database, i.e., Y = (Y1, Y2, Y3, Y4, Y5, Y6) aresimultaneously known for each data point. The main differencesbetween CLAY/6/535 and CLAY/5/345 are

    1. The lognormal distribution does not provide good fit to the Ydata in CLAY/6/535. In fact, the Johnson distribution wasadopted in Ching et al. [8].

    2. The COVs for the Y data in CLAY/6/535 are smaller than those in

    CLAY/5/345. This is because the Y data in CLAY/6/535 are nor-malized quantities without units, whereas those in CLAY/5/345are not normalized. It is widely known that a normalizedquantity has less variability: for instance, su can range from10 kN/m2 to 200 kN/m2, but su/r0v typically ranges from 0.2to 1.0.

    Fig. 10 compares among Methods K, P, and XP (Method S is sim-ilar to Method K). For Methods P and XP, the Johnson distribution isadopted (Scenario 2). Methods K and XP have similar dij estimatesand statistical uncertainty magnitudes (Fig. 10b), whereas MethodP provides slightly different dij estimates with slightly larger statis-tical uncertainty (Fig. 10a). The difference between Method P andthe other methods is not significant (‘‘slightly” different) probably

    because the COVs of Y are relatively small. This is consistent to theprevious observation that the robustness of Method P will improve

    if the COVs of Y are small.

    4.3. CLAY/7/6310 database

    The CLAY/7/6310 database ([4]; see Table 1) contains 6310 datapoints. About 4000 data points regarding su/r0v of NC clays from 7different su test modes (e.g., CIUC, CK0UC, UU, UC, etc.) are

  • Fig. 10. Estimated dij for Methods K, P, and XP for CLAY/6/535 database.

    Fig. 12. Estimated dij for Methods K and XP for CLAY/7/6310 database (Johnsondistribution).

    30 J. Ching et al. / Structural Safety 63 (2016) 21–32

    deduced. The COVs for (Y1, Y2, . . ., Y7) are relatively small becausesu/r0v for NC clays varies in a narrow range. This database is not agenuine multivariate database, i.e., only a subset in (Y1, Y2, . . ., Y7)is known for each data point. In fact, for most of the data points,only univariate information is available (one of Y1 to Y7 is known).Bivariate information (Yi and Yj simultaneously known) is avail-able, but the amount is limited. There are 14 (Yi, Yj) pairs withmore than 30 bivariate data points. There are only four (Yi, Yj) pairswith more than 100 bivariate data points. However, univariateinformation is abundant, e.g. there are more than 500 univariateCK0UC data points and more than 1000 univariate data points forfield vane. Based on the univariate data points, Ching et al. [5]found that the lognormal distribution provides acceptable fit toall (Y1, Y2, . . ., Y7).

    It is possible to estimate dij given this incomplete database. ForMethods K and S, the bivariate (Yi, Yj) data points are used to esti-mate sij and rij, and they are converted into dij using Eqs. (5) and(7). For Method P, the univariate Yi and Yj data points are used toestimate the marginal lognormal PDFs (Scenario 1), to obtain thebest-fit parameters (ki, ni) and (kj, nj). The bivariate (Yi, Yj) datapoints are used to estimate qij, and dij can be obtained from Eq.(4) given the best-fit ni and nj. Note that the univariate Yi and Yjdata may be from sources (sites, regions) different from the bivari-ate (Yi, Yj) data. For Method XP, the best-fit lognormal parameters(ki, ni) and (kj, nj) are adopted to convert bivariate (Yi, Yj) data intobivariate (Xi, Xj) data, and dij can be directly estimated with thebivariate (Xi, Xj) data.

    The bootstrapping technique is applied to the database by re-sampling the 6310 data points. The four methods are then adoptedto estimate dij for the re-sampled database. Again, this is repeated

    Fig. 11. Estimated dij for Methods K, P,

    for 1000 times to obtain 1000 bootstrap dij samples. Only the dijwith more than 30 bivariate (Yi, Yj) data points are estimated.Fig. 11 compares among Methods K, P, and XP (Method S is similarto Method K). The bootstrap confidence intervals for all methodsare large compared to previous results for CLAY/5/345 andCLAY/6/535 because the bivariate (Yi, Yj) data points forCLAY/7/6310 are limited. The bootstrap confidence intervals for

    and XP for CLAY/7/6310 database.

  • Fig. 13. Estimated dij for Methods K, P, and XP for CLAY/10/7490 database.

    J. Ching et al. / Structural Safety 63 (2016) 21–32 31

    Method P are not larger than those for the other methods probably

    the COVs of Y are small, so the issues associated with Method P areminimized. There are two Method P dij estimates that are not veryconsistent with the corresponding Method K estimates (Fig. 11a:two squares not close to the 1:1 line). The Method XP estimatesare fairly consistent to the Method K estimates (Fig. 11b: squaresclose to the 1:1 line). The consistency is further improved if theJohnson distribution is adopted (Fig. 12: squares even closer tothe 1:1 line).

    4.4. CLAY/10/7490 database

    The CLAY/10/7490 database [6] contains 7490 data points, cov-ering the 10 normalized clay parameters shown in Table 1. TheCOVs for (Y1, Y2, . . ., Y10) are large because the database containsa variety of clays from various regions. This database is not a gen-uine multivariate database, i.e., only a subset in (Y1, Y2, . . ., Y10) isknown for each data point. Bivariate information (Yi and Yj simul-taneously known) is abundant: most (Yi,Yj) pairs have more than500 bivariate data points. Univariate information is also abundant,most parameters have more than 2000 univariate data points.Based on the univariate data, it was found that the lognormal dis-tribution is not suitable, and the Johnson distribution was adoptedfor all parameters (Y1, Y2, . . ., Y10) in Ching and Phoon [7]. Fig. 13compares among Methods K, P, and XP by adopting the Johnsondistribution as the marginal PDFs (Scenario 2). The bootstrap con-fidence intervals are small because the bivariate (Yi, Yj) data pointsare abundant. Again, the Method XP dij estimates are fairly consis-tent to the Method K estimates (Fig. 13b: squares close to the 1:1line). The Method P estimates are less consistent to the Method Kestimates (Fig. 13a). The magnitude of the statistical uncertainty(size of the bars) for Method P is comparable to that for the othermethods, probably because there is only one pair (Yi, Yj) with sig-nificant negative correlation: (Y8, Y10) is with dij � �0.70 (Y8 = Bq &Y10 = ln[(qt � u2)/r0v]). For other pairs, dij is mostly greater than –0.5. Previously, we have seen that Method P has large statisticaluncertainty if (Yi, Yj) are strongly negatively correlated with largeCOVs. Interestingly, Method P seems to have small statisticaluncertainty even for the (Y8, Y10) pair (see the lower-left mostsquare in Fig. 13a: the associated vertical bar is not long). One pos-sible explanation is that the negative correlation is not very strong.

    In short, Methods S and K seem to provide consistent dij esti-mates for the four real soil databases shown in Table 1. MethodXP seems to provide consistent dij estimates as well if the marginalPDFs are properly chosen. Method P sometimes provides dij esti-mates that are less inconsistent with the other methods (squaresnot close to the 1:1 line) with large statistical uncertainty (wide

    confidence interval). It is likely that Method P is the least robustmethod of estimating dij among the four methods. These conclu-sions obtained from the four real soil databases are consistent to

    the previous conclusions obtained from the simulated Y data. Itis possible that the dependency structure in the soil databasescan be approximated well by a translation model, but there maybe other reasons as discussed below.

    4.5. Discussions

    1. It is very unlikely that the real soil databases follow the multi-variate normal distribution model (the translation model)exactly, because the translation model is fairly restrictive.Methods S, K, and XP will provide consistent dij estimates ifthe soil databases indeed follow the translation model. How-ever, if the soil databases do not follow the translation model,there is no guarantee that these methods will provide consis-tent estimates. Nonetheless, we have seen that Method XP pro-vides dij estimates that are fairly consistent to those forMethods S and K for the real soil databases. It remains an openquestion why such consistency can happen for databases thatdo not follow the translation model exactly. One possible expla-nation is that all methods (Methods P, S, K & XP) force real datainto the multivariate normal ‘‘mould” even if the real data arenon-translation, and the consistency exists within this ‘‘mould”.Another possible explanation is that although these real soildatabases do not follow the translation model exactly, but theyfollow it ‘‘roughly”. There are limited evidences in the literature[2,8,4,7] showing that the converted (Xi, Xj) data points roughlyexhibit linear relations.

    2. There is no guarantee that the full correlation matrix C obtainedfrom all four methods (by assembling all dij estimates) is posi-tive definite. The only exception is Method XP with genuinemultivariate data (e.g., Method XP with CLAY/5/345 or withCLAY/6/535): the resulting C matrix is always positive definite.Methods P, S, and K do not guarantee a positive definite Cmatrix even with genuine multivariate data. Among all meth-ods, Method P produces non-positive definite Cmatrix most fre-quently, because its dij estimates are the least robust.

    5. Conclusions and recommendations

    Four methods of estimating the correlation coefficients in themultivariate normal distribution framework (the translationmodel) are assessed for their robustness. Method P first estimatesthe Pearson correlation between the bivariate soil data (Yi, Yj), thenconverts it into the Pearson correlation between the underlying

  • 32 J. Ching et al. / Structural Safety 63 (2016) 21–32

    bivariate normal data (Xi, Xj), denoted by dij. Methods S and K firstestimate the Spearman and Kendall correlations, respectively, thenconvert them into dij. Method XP first converts (Yi, Yj) into (Xi, Xj)through the CDF transform, then directly estimate dij based onthe converted (Xi, Xj). All four methods assume that the underlyingmultivariate distribution is a translation model. The robustness isquantified by the statistical uncertainty magnitude of eachmethod.

    Based on the data simulated from a translation model, the fol-lowing conclusions are obtained:

    1. Methods S and K are found to be the most robust methods ofestimating dij under the multivariate normal framework. Theirperformances are similar: the estimated dij is consistent to theactual dij with small statistical uncertainty. They are also conve-

    nient: they do not require the marginal PDFs of Y to be pre-determined.

    2. Method XP seems to be as robust as Methods S and K if the mar-ginal PDFs are properly chosen. It is found that the Johnson sys-tem of distributions is sufficiently flexible to provide areasonable fit to soil data. However, Method XP can be non-robust if absurd marginal PDFs are chosen.

    3. Method P is the least robust among the four methods. It canbecome non-robust when (Yi, Yj) are strongly correlated withlarge COVs. The most serious scenario occurs when (Yi, Yj) arestrongly negatively correlated with large COVs. When the COVsfor (Yi, Yj) are both small, Method P may become robust.

    The above conclusions only apply to correlated data producedby a translation model. In principle, real soil data may not fit atranslation model or any known model. The applicability of theseconclusions to real soil data is studied using four clay databases:CLAY/5/345, CLAY/6/535, CLAY/7/6310, and CLAY/10/7490. It wasfound that the robustness of methods S, K, XP, and P observed inthe characterization of simulated data is applicable to the charac-terization of real soil data.

    Based on this empirical observation, this paper tentatively rec-ommends the adoption of Methods S and K to estimate dij. Theadvantage is obvious: there is no need to estimate the marginalPDFs and the robustness of the dij estimates are superior. Whenthe marginal PDFs can be estimated, Method XP can be adoptedto estimate dij as well, but caution should be taken: the dij esti-mates can be misleading if the marginal PDFs are poorly chosen.Method P is not recommended because it is the least robustmethod.

    References

    [1] Cario MC, Nelson BL. Modeling and generating random vectors with arbitrarymarginal distributions and correlation matrix Technical Report. Department ofIndustrial Engineering and Management Sciences. Evanston,Illinois: Northwestern University; 1997.

    [2] Ching J, Phoon KK. Modeling parameters of structured clays as a multivariatenormal distribution. Can Geotech J 2012;49(5):522–45.

    [3] Ching J, Phoon KK. Corrigendum: modeling parameters of structured clays as amultivariate normal distribution. Can Geotech J 2012;49(12):1447–50.

    [4] Ching J, Phoon KK. Multivariate distribution for undrained shear strengthsunder various test procedures. Can Geotech J 2013;50(9):907–23.

    [5] Ching J, Phoon KK, Lee WT. Second-moment characterization of undrainedshear strengths from different test procedures. Foundation Engineering in theFace of Uncertainty. Geotechnical Special Publication honoring Professor F. H.Kulhawy; 2013. p. 308–20.

    [6] Ching J, Phoon KK. Transformations and correlations among some clayparameters – the global database. Can Geotech J 2014;51(6):663–85.

    [7] Ching J, Phoon KK. Correlations among some clay parameters – themultivariate distribution. Can Geotech J 2014;51(6):686–704.

    [8] Ching J, Phoon KK, Chen CH. Modeling CPTU parameters of clays as amultivariate normal distribution. Can Geotech J 2014;51(1):77–91.

    [9] Ching J, Phoon KK, Yu JW. Linking site investigation efforts to final designsavings with simplified reliability-based design methods. ASCE J GeotechGeoenviron Eng 2014;140(3):04013032.

    [10] Ching J, Phoon KK. Constructing multivariate distributions for soil parameters.In: Phoon KK, Ching J, editors. Chap. 1 in risk and reliability in geotechnicalengineering. Taylor & Francis; 2015.

    [11] Ching J, Li DQ, Phoon KK. Statistical characterization of multivariategeotechnical data. In: Phoon KK, Retief JV, editors. Chapter 4, reliability ofgeotechnical structures in ISO2394. CRC Press/Balkema; 2016.

    [12] Conover WJ. Practical nonparametric statistics. 3rd ed. New York: John Wiley& Sons Inc; 1999.

    [13] Efron B, Tibshirani R. An introduction to the bootstrap. Boca Raton,FL: Chapman and Hall/CRC; 1993.

    [14] Fang K, Kotz S, Ng KW. Symmetric multivariate and relateddistribution. London: Chapman & Hall; 1990.

    [15] Hald A. Statistical theory with engineering applications. New York: John Wileyand Sons; 1952.

    [16] Hotelling H, Pabst MR. Rank correlation and tests of significance involving noassumption of normality. Am Math Stat 1936;7:29–43.

    [17] Huang D, Yang C, Zeng B, Fu G. A copula-based method for estimating shearstrength parameters of rock mass. Math Prob Eng 2014;2014. http://dx.doi.org/10.1155/2014/693062.

    [18] Huffman JC, Stuedlein AW. Reliability-based serviceability limit state design ofspread footings on aggregate pier reinforced clay. ASCE J Geotech GeoenvironEng 2014;140(10):04014055.

    [19] Iman RL, Conover WJ. A distribution-free approach to inducing rankcorrelation among input variables. Commun Stat Simul Comput 1982;11(3):311–34.

    [20] International organization for standardization. General principles on reliabilityof structures. ISO2394:2015, Geneva; 2015, .

    [21] Johnson, M.E., Ramberg, J.S. Transformations of the multivariate normaldistribution with applications to simulation. In: 11th international conferenceon system sciences, Honolulu, Hawaii; 1978.

    [22] Johnson NL. Systems of frequency curves generated by methods of translation.Biometrika 1949;36(1/2):149–76.

    [23] Kowalski CJ. The performance of some rough tests for bivariate normalitybefore and after coordinate transformations to normality. Technometrics1970;12(3):517–44.

    [24] Li DQ, Wu SB, Zhou CB, Phoon KK. Performance of translation approach formodeling correlated non-normal variables. Struct Saf 2012;39:52–61.

    [25] Li DQ, Tang XS, Zhou CB, Phoon KK. Uncertainty analysis of correlated non-normal geotechnical parameters using Gaussian copula. Sci China Technol Sci2012;55(11):3081–9.

    [26] Li DQ, Tang XS, Phoon KK, Chen YF, Zhou CB. Bivariate simulation using copulaand its application to probabilistic pile settlement analysis. Int J Numer AnalMeth Geomech 2013;37(6):597–617.

    [27] Li DQ, Tang XS. Modeling and simulation of bivariate distribution of shearstrength parameters using copulas. Chapter 2, Risk and reliability ingeotechnical engineering. CRC Press; 2014. p. 77–128.

    [28] Li DQ, Zhang L, Tang XS, Zhou W, Li JH, Zhou CB, et al. Bivariate distribution ofshear strength parameters using copulas and its impact on geotechnicalsystem reliability. Comput Geotech 2015;68:184–95.

    [29] Li ST, Hammond JL. Generation of pseudorandom numbers with specifiedunivariate distributions and correlation coefficients. IEEE Trans Syst ManCybern 1975;5:557–61.

    [30] Liu PL, Der Kiureghian A. Multivariate distribution models with prescribedmarginals and covariances. Probab Eng Mech 1986;1(2):105–12.

    [31] Nataf A. Determination des distribution dont les marges sontdonnees. C R AcadSci 1962;225:42–3.

    [32] Nelsen RB. An introduction to copulas. 2nd ed. New York: Springer; 2006. p.2006.

    [33] Phoon KK. General non-gaussian probability models for first-order reliabilitymethod (FORM): a state-of-the-art report, ICG report 2004-2-4 (NGI report20031091–4). Oslo: International Centre for Geohazards; 2004.

    [34] Phoon KK, Ching J. Multivariate model for soil parameters based on Johnsondistributions. Foundation engineering in the face of uncertainty. GeotechnicalSpecial Publication honoring Professor F. H. Kulhawy; 2013. p. 337–53.

    [35] Phoon KK, Retief JV. ISO2394:2015 Annex D (reliability of geotechnicalstructures). Georisk: Assess Manage Risk Eng Syst Geohazards 2015;9(3):125–7.

    [36] Phoon KK, Retief JV, Ching J, Dithinde M, Schweckendiek T, Wang Y, et al. Someobservations on ISO2394:2015 Annex D (reliability of geotechnical structures).Struct Saf 2016;62:24–33.

    [37] Slifker JF, Shapiro SS. The Johnson system: selection and parameter estimation.Technometrics 1980;22(2):239–46.

    [38] Stanfield PM, Wilson JR, Mirka GA, Glasscock NF, Psihogios JP, Davis JR.Multivariate input modeling with Johnson distributions. In: Proceedings of the28th conference on winter simulation. IEEE Computer Society; 1996. p.1457–64.

    [39] Tang XS, Li DQ, Rong G, Phoon KK, Zhou CB. Impact of copula selection ongeotechnical reliability under incomplete probability information. ComputGeotech 2013;49:264–78.

    [40] Tang XS, Li DQ, Zhou CB, Phoon KK. Copula-based approaches for evaluatingslope reliability under incomplete probability information. Struct Saf2015;52:90–9.

    [41] Zhang J, Huang HW, Juang CH, Su WW. Geotechnical reliability analysis withlimited data: consideration of model selection uncertainty. Eng Geol2014;181:27–37.

    http://refhub.elsevier.com/S0167-4730(16)30027-3/h0005http://refhub.elsevier.com/S0167-4730(16)30027-3/h0005http://refhub.elsevier.com/S0167-4730(16)30027-3/h0005http://refhub.elsevier.com/S0167-4730(16)30027-3/h0005http://refhub.elsevier.com/S0167-4730(16)30027-3/h0010http://refhub.elsevier.com/S0167-4730(16)30027-3/h0010http://refhub.elsevier.com/S0167-4730(16)30027-3/h0015http://refhub.elsevier.com/S0167-4730(16)30027-3/h0015http://refhub.elsevier.com/S0167-4730(16)30027-3/h0020http://refhub.elsevier.com/S0167-4730(16)30027-3/h0020http://refhub.elsevier.com/S0167-4730(16)30027-3/h0025http://refhub.elsevier.com/S0167-4730(16)30027-3/h0025http://refhub.elsevier.com/S0167-4730(16)30027-3/h0025http://refhub.elsevier.com/S0167-4730(16)30027-3/h0025http://refhub.elsevier.com/S0167-4730(16)30027-3/h0030http://refhub.elsevier.com/S0167-4730(16)30027-3/h0030http://refhub.elsevier.com/S0167-4730(16)30027-3/h0035http://refhub.elsevier.com/S0167-4730(16)30027-3/h0035http://refhub.elsevier.com/S0167-4730(16)30027-3/h0040http://refhub.elsevier.com/S0167-4730(16)30027-3/h0040http://refhub.elsevier.com/S0167-4730(16)30027-3/h0045http://refhub.elsevier.com/S0167-4730(16)30027-3/h0045http://refhub.elsevier.com/S0167-4730(16)30027-3/h0045http://refhub.elsevier.com/S0167-4730(16)30027-3/h0050http://refhub.elsevier.com/S0167-4730(16)30027-3/h0050http://refhub.elsevier.com/S0167-4730(16)30027-3/h0050http://refhub.elsevier.com/S0167-4730(16)30027-3/h0055http://refhub.elsevier.com/S0167-4730(16)30027-3/h0055http://refhub.elsevier.com/S0167-4730(16)30027-3/h0055http://refhub.elsevier.com/S0167-4730(16)30027-3/h0060http://refhub.elsevier.com/S0167-4730(16)30027-3/h0060http://refhub.elsevier.com/S0167-4730(16)30027-3/h0065http://refhub.elsevier.com/S0167-4730(16)30027-3/h0065http://refhub.elsevier.com/S0167-4730(16)30027-3/h0070http://refhub.elsevier.com/S0167-4730(16)30027-3/h0070http://refhub.elsevier.com/S0167-4730(16)30027-3/h0075http://refhub.elsevier.com/S0167-4730(16)30027-3/h0075http://refhub.elsevier.com/S0167-4730(16)30027-3/h0080http://refhub.elsevier.com/S0167-4730(16)30027-3/h0080http://dx.doi.org/10.1155/2014/693062http://dx.doi.org/10.1155/2014/693062http://refhub.elsevier.com/S0167-4730(16)30027-3/h0090http://refhub.elsevier.com/S0167-4730(16)30027-3/h0090http://refhub.elsevier.com/S0167-4730(16)30027-3/h0090http://refhub.elsevier.com/S0167-4730(16)30027-3/h0095http://refhub.elsevier.com/S0167-4730(16)30027-3/h0095http://refhub.elsevier.com/S0167-4730(16)30027-3/h0095http://www.iso.org/iso/catalogue_detail.htm?csnumber=58036http://www.iso.org/iso/catalogue_detail.htm?csnumber=58036http://refhub.elsevier.com/S0167-4730(16)30027-3/h0110http://refhub.elsevier.com/S0167-4730(16)30027-3/h0110http://refhub.elsevier.com/S0167-4730(16)30027-3/h0115http://refhub.elsevier.com/S0167-4730(16)30027-3/h0115http://refhub.elsevier.com/S0167-4730(16)30027-3/h0115http://refhub.elsevier.com/S0167-4730(16)30027-3/h0120http://refhub.elsevier.com/S0167-4730(16)30027-3/h0120http://refhub.elsevier.com/S0167-4730(16)30027-3/h0125http://refhub.elsevier.com/S0167-4730(16)30027-3/h0125http://refhub.elsevier.com/S0167-4730(16)30027-3/h0125http://refhub.elsevier.com/S0167-4730(16)30027-3/h0130http://refhub.elsevier.com/S0167-4730(16)30027-3/h0130http://refhub.elsevier.com/S0167-4730(16)30027-3/h0130http://refhub.elsevier.com/S0167-4730(16)30027-3/h0135http://refhub.elsevier.com/S0167-4730(16)30027-3/h0135http://refhub.elsevier.com/S0167-4730(16)30027-3/h0135http://refhub.elsevier.com/S0167-4730(16)30027-3/h0140http://refhub.elsevier.com/S0167-4730(16)30027-3/h0140http://refhub.elsevier.com/S0167-4730(16)30027-3/h0140http://refhub.elsevier.com/S0167-4730(16)30027-3/h0145http://refhub.elsevier.com/S0167-4730(16)30027-3/h0145http://refhub.elsevier.com/S0167-4730(16)30027-3/h0145http://refhub.elsevier.com/S0167-4730(16)30027-3/h0150http://refhub.elsevier.com/S0167-4730(16)30027-3/h0150http://refhub.elsevier.com/S0167-4730(16)30027-3/h0155http://refhub.elsevier.com/S0167-4730(16)30027-3/h0155http://refhub.elsevier.com/S0167-4730(16)30027-3/h0160http://refhub.elsevier.com/S0167-4730(16)30027-3/h0160http://refhub.elsevier.com/S0167-4730(16)30027-3/h0165http://refhub.elsevier.com/S0167-4730(16)30027-3/h0165http://refhub.elsevier.com/S0167-4730(16)30027-3/h0165http://refhub.elsevier.com/S0167-4730(16)30027-3/h0170http://refhub.elsevier.com/S0167-4730(16)30027-3/h0170http://refhub.elsevier.com/S0167-4730(16)30027-3/h0170http://refhub.elsevier.com/S0167-4730(16)30027-3/h0175http://refhub.elsevier.com/S0167-4730(16)30027-3/h0175http://refhub.elsevier.com/S0167-4730(16)30027-3/h0175http://refhub.elsevier.com/S0167-4730(16)30027-3/h0180http://refhub.elsevier.com/S0167-4730(16)30027-3/h0180http://refhub.elsevier.com/S0167-4730(16)30027-3/h0180http://refhub.elsevier.com/S0167-4730(16)30027-3/h0185http://refhub.elsevier.com/S0167-4730(16)30027-3/h0185http://refhub.elsevier.com/S0167-4730(16)30027-3/h0190http://refhub.elsevier.com/S0167-4730(16)30027-3/h0190http://refhub.elsevier.com/S0167-4730(16)30027-3/h0190http://refhub.elsevier.com/S0167-4730(16)30027-3/h0190http://refhub.elsevier.com/S0167-4730(16)30027-3/h0195http://refhub.elsevier.com/S0167-4730(16)30027-3/h0195http://refhub.elsevier.com/S0167-4730(16)30027-3/h0195http://refhub.elsevier.com/S0167-4730(16)30027-3/h0200http://refhub.elsevier.com/S0167-4730(16)30027-3/h0200http://refhub.elsevier.com/S0167-4730(16)30027-3/h0200http://refhub.elsevier.com/S0167-4730(16)30027-3/h0205http://refhub.elsevier.com/S0167-4730(16)30027-3/h0205http://refhub.elsevier.com/S0167-4730(16)30027-3/h0205

    Robust estimation of correlation coefficients among soil parameters under the multivariate normal framework1 Introduction2 Methods for estimating δij3 Translation model constructed from CLAY/5/3453.1 Simulation of Y data points3.2 Robustness of δij estimates3.3 Why Method P is the least robust?3.4 Unknown marginal PDFs

    4 Real soil databases4.1 CLAY/5/345 database4.2 CLAY/6/535 database4.3 CLAY/7/6310 database4.4 CLAY/10/7490 database4.5 Discussions

    5 Conclusions and recommendationsReferences