Bayesian density estimation using ranked set samples

ENVIRONMETRICS

Environmetrics 2004; 15: 711–728 (DOI: 10.1002/env.667)

Kaushik Ghosh1,2 and Ram C. Tiwari2*,y

1Department of Statistics, George Washington University, Washington, DC 20052, U.S.A.2Statistical Research and Applications Branch, Division of Cancer Control and Population Sciences,

National Cancer Institute, Bethesda, MD 20892, U.S.A.

SUMMARY

In this article, we present a Bayesian approach for estimating the underlying density using a ranked set sample.We assume that the observations are generated from a Dirichlet process mixture model. Density (as well asmoments) of future values generated from the process are estimated through Markov chain Monte Carlosimulations. This approach extends earlier work on density estimation based on a Dirichlet process prior froma simple random sample to a ranked set sample. We carry out a simulation study using a normal kernel to comparethe effect of using a simple random sample versus a ranked set sample on the predictive density. We show that theBayesian density estimate resulting from a ranked set sample has a smaller average mean squared error than thatfrom a simple random sample. Additionally, the average Kullback–Leibler distance of the density estimate basedon the ranked set sample is shown to be closer to zero than that based on the corresponding simple random sample.We illustrate our method by applying it to shrub data available in the literature. Copyright # 2004 John Wiley &Sons, Ltd.

key words: order statistics; Dirichlet process prior; Gibbs sampler; mixture distribution; Kullback-Leiblerdistance; L2 distance

1. INTRODUCTION

In many sampling situations, the units in a population can be difficult (or expensive) to quantify with

respect to the characteristic of interest but are easily compared to one another. For example, in some

agricultural experiments, the yields of plots are expensive to determine but can be easily ranked

visually. In environmental problems, for example, to estimate the amount of contaminant in a

particular area, soil samples are obtained for analysis in a laboratory. However, since chemical

analysis of soil is expensive to perform, the researcher has to economize on the number of

measurements made, all the while maintaining a high degree of accuracy. It may be possible to

visually rank the soil samples with respect to the amount of contaminant present (based on physical

characteristics such as color, texture, smell, etc.) before performing a detailed chemical analysis.

Similar examples can be found in geology, biology, forestry, forensics and various other fields (for

Received 10 July 2002

Copyright # 2004 John Wiley & Sons, Ltd. Accepted 19 January 2004

*Correspondence to: R. C. Tiwari, Statistical Research and Applications Branch, Division of Cancer Control and PopulationSciences, National Cancer Institute, 6116 Executive Blvd., MSC 8317, Bethesda, MD 20892-8317, U.S.A.yE-mail: tiwarir@nih.gov

examples and further references see Mode et al., 1999; Nussbaum and Sinha, 1997; Yu and Lam, 1997;

Patil et al., 1999).

In situations like these, drawing a ranked set sample (RSS) is a valuable tool for data acquisition.

Developed by McIntyre (1952), RSS consists of sampling in multiple stages as follows: A random

sample of n units is drawn from the underlying population and the units are then ranked according to

the characteristic of interest. This can be done easily without judgment error if n is small enough, say

2. From this set, the smallest unit is identified and then measured. Another random sample of n units is

drawn (independent of the first sample), the units ranked, and the second smallest unit is measured.

The process is continued until, at the nth stage, a random sample of n units is taken, the units are

ordered and the largest unit is measured. This completes a cycle and the n measurements so obtained

constitute a ranked set sample of size n from the population of interest. Note that, although n2 units

were screened, the RSS consists of only n observations. Sometimes, the entire cycle is repeated several

times to get multiple replicates of each order statistic. The sample so obtained is called a balanced

ranked set sample, since each order statistic gets equal representation.

A balanced ranked set sample is thus essentially a set of independent order statistics. A

straightforward generalization of the above considers (possibly) unequal sample sizes at the

different stages, along with (possibly) unequal number of replicates of the independent order

statistics. We will call this a generalized ranked set sample and denote it by Xi;j; i ¼ 1; . . . ; n;

j ¼ 1; . . . ;mi. Here, Xi;j stands for the jth replicate of Xðri : kiÞ—the rith order statistic based on a

random sample of size ki from the underlying population. Note that we have a simple random

sample of size nm if ri ¼ 1, ki ¼ 1 and mi ¼ m and a balanced ranked set sample of the same size if

ri ¼ i, ki ¼ n and mi ¼ m. An unbalanced ranked set sample can also be obtained as a special case of

the above generalization.

Early research on RSS focused on the estimation of the unknown population mean (see Kaur

et al., 1995, for a comprehensive review of articles related to ranked set sampling). It has been

shown (see Takahashi and Wakimoto, 1968; Dell and Clutter, 1972) that the sample mean based on

an RSS is unbiased for the population mean and is more efficient than that based on a simple

random sample (SRS) of the same size. Often it is of interest to estimate the density of the

underlying population from an RSS. Stokes and Sager (1988) have investigated the problem of

estimating the underlying cumulative distribution function (CDF) F, using an RSS. They showed

that the empirical distribution function (EDF) based on an RSS is unbiased for F and is uniformly

more efficient in estimating F than the EDF based on an SRS of the same size. Kvam and

Samaniego (1994) derived a non-parametric maximum likelihood estimator for F which is based on

an (unbalanced) RSS. However, application of traditional frequentist methods such as kernel

density estimation Silverman (1981, 1986), non-parametric maximum likelihood Lindsay (1983)

may be difficult to justify in RSS problems, since sample sizes are usually small for any

asymptotics to be applicable.

A natural alternative is to use Bayesian methods, whereby available prior information is

combined with the data to make inferences. Recently, Kvam and Tiwari (1999) derived the Bayes

estimator of F based on an (unbalanced) RSS, with the Dirichlet distribution as the prior. Their

approach assumed that F can be discretized by putting probabilities at the observed sample values.

This completely non-parametric estimator may not be appropriate in cases where, for example, it is

known beforehand that the underlying distribution is continuous or has a particular parametric form

(say mixture of normals).

In this article, we present a semiparametric Bayesian method of estimating the underlying density

from a (generalized) ranked set sample. This is essentially an extension of the method developed in

712 K. GHOSH AND R. C. TIWARI

Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 711–728

Ferguson (1983), Kuo (1986) and Escobar and West (1995) to RSS. We assume that the data

distributions are mixtures of normals in the set-up of Dirichlet processes; see Ferguson (1973), and

Basu and Tiwari (1982). The density comes from a parametric family which has a non-parametric

prior. A Markov chain Monte Carlo method is developed that allows us to incorporate this prior

information to obtain the density estimator. As a byproduct, we can obtain the mean and variance of a

future value from this distribution, conditional on the observed data. Through simulation studies, we

show that the density estimates so obtained are more precise in terms of the smaller values of average

mean-squared error, Kullback–Leibler distance and L2 distance than those obtained based on a simple

random sample. We illustrate our findings with shrub data from Muttlak and McDonald (1990).

In Section 2, we develop the method for any general parametric family. In Section 3, we present the

Markov chain Monte Carlo simulation procedure necessary to use this method. In Section 4, we carry

out a simulation study to show the optimality properties of the proposed method. For brevity, only

results for mixtures of normal are presented. The method can be carried out for mixtures of other

parametric distributions as well. In Section 5, we analyze a real data set to illustrate our method.

Finally, in Section 6, we present our conclusions and discuss future research.

2. SEMIPARAMETRIC BAYES ESTIMATION

2.1. Model

For any i ð1 � i � nÞ, let Xðri : kiÞ be the rith order statistic based on a random sample of size

kið1 � ri � kiÞ drawn from a continuous distribution with pdf f ðx j �iÞ and cdf Fðx j �iÞ, respectively.

Recall that the pdf of Xðri : kiÞ is given by

fiðx j �iÞ ¼ kiki � 1

ri � 1

� �Fðx j �iÞri�1½1 � Fðx j �iÞ�ki�ri f ðx j �iÞ

For any j ð1 � j � miÞ, let Xðri : kiÞj denote mi independent replications of Xðri : kiÞ. We can represent this

set-up in the following tabular form:

Xðr1 : k1Þ1; Xðr1 : k1Þ2; � � � ; Xðr1 : k1Þm1� f1ðx j �1Þ

Xðr2 : k2Þ1; Xðr2 : k2Þ2; � � � ; Xðr2 : k2Þm2� f2ðx j �2Þ

. ... ..

Xðrn : knÞ1; Xðrn : knÞ2; � � � ; Xðrn : knÞmn� fnðx j �nÞ

9>>>=>>>;

For simplicity of notation, we will write Xi; j � Xðri : kiÞj. Given hn ¼ ð�1; . . . ; �nÞ, we assume that Xi;j is

independent of Xi0;j0 for all ði; jÞ 6¼ ði0; j0Þ.Assume further that the unknown parameters �1; . . . ; �n are independent and identically distributed

(i.i.d.) with a non-parametric distribution G. We model G by a Dirichlet process (DP) with parameters

G0 and M (written as G � DðM;G0Þ). See, for example, Ferguson (1973), Blackwell (1973),

Blackwell and MacQueen (1973), and the review article of Ferguson et al. (1992) for more on DP.

The parameters M and G0 are referred to as the scale (or precision) and location parameters,

respectively, of the DP. G0 is the prior guess on G and M is the strength of belief in G0 as being this

prior guess.

BAYESIAN DENSITY ESTIMATION USING RANKED SET SAMPLES 713

Thus, the full model is

Xi;j j �i;G �indep:fiðx j �iÞ; j ¼ 1; . . . ;mi; i ¼ 1; . . . ; n

�1; . . . ; �n jG �i:i:d:G

G j ðM;G0Þ � DðM;G0Þ;M > 0 known; G0 known

or where M and parameters of G0 may involve known prior distributions.

We will call Xn ¼ ðXi; jÞmi

j¼1; i ¼ 1; . . . ; n, a generalized ranked set sample from the mixture

distribution FðxÞ ¼ÐFðx j �ÞdGð�Þ, where G � DðM;G0Þ.

Our goal is to combine the data Xn with the prior information on the �s to make inferences on the

future (and hence unobserved) value Xnþ1 � Xðrnþ1 : knþ1Þ. In particular, we are interested in the

predictive density of this unobserved value, given by fnþ1ðx jXnÞ. Setting rnþ1 ¼ knþ1 ¼ 1, we get

an estimate of the marginal density of Xnþ1 � Xðrnþ1 : knþ1Þ given byÐf ðx j �ÞdGð�Þ.

Special cases:

1. If n¼ 1; r1 ¼ 1; k1 ¼ 1; m1 ¼ m, the sample is X1;jðj ¼ 1; . . . ;mÞ. This is just the first row of 1,which is an SRS of size m from f ðx j �Þ.

2. If ri ¼ 1; ki ¼ 1;mi ¼ 1 ði ¼ 1; . . . ; nÞ; the sample is Xi;1ði ¼ 1; . . . ; nÞ. This is the first column of 1and is an independent set of observations from f ðx j �iÞ; i ¼ 1; . . . ; n.

3. If ri ¼ i; ki ¼ n;mi ¼ 1; we have a balanced ranked set sample Xi � Xði : nÞ; i ¼ 1; . . . ; n with onereplication.

4. If ri ¼ i; ki ¼ n;mi ¼ m; we have a balanced ranked set sample Xi; j � Xði : nÞj; i ¼ 1; . . . ; n;j ¼ 1; . . . ;m, with m replications.

5. If n ¼ 1, we have a nomination sample (see Boyles and Samaniego, 1986; Wells and Tiwari, 1990).

2.2. Predictive density

Ferguson (1973) showed that if G � DðM;G0Þ and � jG � G, then

G j � � D M þ 1;MG0 þ ��M þ 1

� �

where �x stands for point mass at x. Using the above argument sequentially, the (unconditional) joint

distribution of hn ¼ ð�1; . . . ; �nÞ is given by (see Antoniak, 1974)

dQðhnÞ /Yni¼1

MdG0ð�iÞ þPi�1

j¼1 ��jðd�iÞM þ i� 1

" #ð2Þ

From (2), the conditional density of �nþ1 given hn is given by:

dQð�nþ1 j hnÞ ¼M

M þ ndG0ð�nþ1Þ þ

M þ n

Xnj¼1

��jðd�nþ1Þ

Hence, the predictive density of the future unobserved value Xnþ1 � Xðrnþ1 : knþ1Þ given the observed

data Xn is

fnþ1ðx jXnÞ ¼ðfnþ1ðx j hn;XnÞdQðhn jXnÞ

fnþ1ðx j hn;XnÞ ¼ðfnþ1ðx j �nþ1; hn;XnÞdQð�nþ1 j hn;XnÞ

¼ðfnþ1ðx j �nþ1ÞdQð�nþ1 j hnÞ

M þ n

ðfnþ1ðx j �nþ1ÞdG0ð�nþ1Þ

M þ n

Xnj¼1

fnþ1ðx j �jÞ

and Qðhn jXnÞ is the posterior of hn given by

dQðhn jXnÞ /Yni¼1

½fiðXi;j j �iÞ�dQðhnÞ ð3Þ

Simplifying further, the predictive density is given by

fnþ1ðx jXnÞ ¼M

M þ n

ðfnþ1ðx j �nþ1ÞdG0ð�nþ1Þ

M þ n

ðXnj¼1

fnþ1ðx j �jÞdQðhn jXnÞð4Þ

The predictive density in (4) is composed of two parts. The first part is the marginal of Xnþ1 based

only on the baseline prior G0. The second part is based on the observed data. The relative weights of

these parts are proportional to M and n, respectively. In the case of full confidence in the baseline prior

G0, we have M ! 1 and hence only the first part plays a role, whereas if M ! 0 (corresponding to no

prior information), only the second part plays a role.

Writing EðX j �Þ ¼ �ð�Þ and VarðX j �Þ ¼ �2ð�Þ, we have from (4) (note that � can be a vector):

EðXnþ1 jXnÞ ¼M

M þ n

ð�ð�ÞG0ðd�Þ þ

M þ n

ðXnj¼1

�ð�jÞdQðhn jXnÞ ð5Þ

EðX2nþ1 jXnÞ ¼

M þ n

ðð�2ð�Þ þ �2ð�ÞÞG0ðd�Þ

M þ n

ðXnj¼1

ð�2ð�jÞ þ �2ð�jÞÞdQðhn jXnÞ ð6Þ

(5) and (6) can be used to get the mean and variance of the future value Xnþ1 conditional on the

observed data. For example, if X j � � Nð�; �2Þ, we have

EðX jXnÞ ¼M

M þ n

ð�G0ðd�Þ þ

M þ n

ðXnj¼1

�jdQðhn jXnÞ

3. POSTERIOR SIMULATION

3.1. Gibbs sampler

The second term in (4) requires integration with respect to the posterior distribution of hn. Except

in trivial cases, no closed-form expression of this posterior distribution is available. However, as

shown below, it is easy to construct a Markov chain whose stationary distribution coincides with

the posterior. Starting from an arbitrary initial value, we repeatedly sample from this chain until

convergence. Subsequent samples from this chain are used to make posterior inferences.

From (2) and the fact that the �is are exchangeable, the univariate conditional density of �i given

hð�iÞn ¼ ð�1; . . . ; �i�1; �iþ1; . . . ; �nÞ is (see also Blackwell and MacQueen, 1973)

dQ �i j hð�iÞn

� �¼ M

M þ n� 1dG0ð�iÞ þ

M þ n� 1

Xnj¼1j 6¼i

��jðd�iÞ

The univariate conditional posteriors are then given by

dQð�i j h�in ;XnÞ /

fiðXi;j j �iÞdQ �i j hð�iÞn

� �

fiðXi;j j �iÞM

M þ n� 1dG0ð�iÞ þ

M þ n� 1

Xnj¼1j 6¼i

��jðd�iÞ

/ q0i G0;iðd�iÞ þXnj¼1j 6¼i

qji��jðd�iÞ ð7Þ

q0i ¼ ciM

fiðXi;j j �iÞdG0ð�iÞ

qji ¼ ciYmi

fiðXi;j j �i ¼ �jÞð8Þ

dG0;ið�iÞ /Ymi

fiðXi;j j �iÞdG0ð�iÞ ð9Þ

is the baseline posterior of �i given Xi;j, j ¼ 1; . . . ;mi. The constant of proportionality ci in (9) is

chosen so that

q0i þXnj¼1j 6¼i

qji ¼ 1

Thus, the posterior update of �i given hð�iÞn will be from a mixture of the baseline posterior G0;i and

point masses at the remaining �js, with mixing probabilities given by q0;i and qj;i, respectively. The

above univariate conditional posteriors are useful in implementing the Gibbs sampler to sample from

the posterior ðhn jXnÞ. Starting from an arbitrary initial value of hn, the Gibbs sampler sequentially

updates each of the components �i conditional on the observed data and the remaining components

hð�iÞn . This process is repeated a large number of times until the Markov chain converges to its

stationary distribution, which is the desired posterior of hn.

Since the Dirichlet process DðM; G0Þ is discrete with probability 1 (see Ferguson, 1973; Blackwell,

1973; Basu and Tiwari, 1982), with positive probability, not all �1; . . . ; �n are distinct. At any stage, let

there be kð� nÞ distinct values and let h� ¼ ð��1; . . . ; ��kÞ be the vector of these distinct values with

corresponding multiplicities n1; . . . ; nk, respectively, where n1 þ � � � þ nk ¼ n. Define the configura-

tion vector s by s ¼ ðs1; . . . ; snÞ, where si ¼ l if and only if �i ¼ ��l ði ¼ 1; . . . ; n; l ¼ 1; . . . ; kÞ. Note

that the information contained in s and h� is equivalent to that in hn. For simplicity of calculation,

the updates in (7) are sometimes equivalently written in terms of updates in s and h�.

Note that using the above procedure (7) requires our ability to easily calculateÐ Qmi

j¼1 fiðxi;j j �ÞdG0ð�Þ. Since we are dealing with order statistics, even in the presence of conjugate

prior–posterior relationships, such integrals are typically difficult/expensive to obtain. Hence, we

resort to ‘Algorithm 8’ described in Neal (2000), designed for dealing with such non-conjugacy. This

algorithm updates s and h by adding auxiliary parameters so that the resulting stationary distribution

coincides with the desired one.

Often, another level is added to the hierarchy whereby G0 (the baseline prior) is assumed to have a

parametric form and prior distributions are put on these parameters. For example, if G0 � Nð�; �2Þ,we can have the hyperprior specification �2 � IGð�; �Þ; � � IGð�� ; �� Þ, � j ð�; �2Þ � Nð�; ��2Þ. Here

IGða; bÞ denotes the inverted gamma distribution with mean b=ða� 1Þ and variance

b2=½ða� 1Þ2ða� 2Þ�. A Gamma prior on M is also a popular choice. All these steps can be

incorporated into the Gibbs sampler to perform a full-Bayesian analysis. See Escobar and West

(1995) and Escobar and West (1998) for similar models.

We thus proceed with Gibbs sampling by iterating through the following steps in order—sampling

at each stage is based on current values of the conditioning variables and the process is repeated until

convergence (i.e. when the steady state is reached):

1. Sample from �i j ðhð�iÞn ;XnÞ for each i ¼ 1; . . . ; n using methods outlined in (7)–(9). If necessary,

use ‘Algorithm 8’. After completing the cycle, this will result in a new configuration vector s and acorresponding value of k.

2. Given k and s, generate a new set of parameters h� by sampling each ��l from the relevantcomponent posterior:

dQð��l jXn; s; kÞ /Yi : si¼l

fiðXi;j j ��l ÞdG0ð��l Þ; l ¼ 1; . . . ; k

A Metropolis–Hastings step may be necessary to generate from this distribution.3. If appropriate, update the parameters of the baseline prior and the precision parameter M.

3.2. Convergence

Arguments similar to those in the Appendix of Escobar and West (1995) are applicable to our

situation, since the observations are conditionally independent.

4. A SIMULATION STUDY

Since the proposed density estimator does not have a closed form expression, we investigate its

properties through large-scale simulation studies. Our main goal is to see the effect of simple random

sample versus ranked set sample on the precision of density estimates using the proposed method. We

hypothesize that the density estimates based on RSS will be more precise, with the precision increasing

with increase in k. For simplicity, we have only used balanced ranked set samples in our study. We have

used various criteria for comparison of the precision of the methods–namely pointwise variance,

average Kullback–Leibler distance and average L2 distance from the ‘true’ generating distribution,

which is taken to be F0 � 0:5 � Nð0; 1Þ þ 0:5 � Nð7; 1Þ, the mixture of Nð0; 1Þ and Nð7; 1Þ with equal

mixing probabilities. We have used a normal kernel for estimation of the density, i.e.

f ðx j �Þ � f�x j�; �2

�¼ 1ffiffiffiffiffiffiffiffi

2�p e�ðx��Þ2=ð2�2Þ �1 < x < 1

where � ¼ ð�; �2Þ. We have also used a normal-inverted gamma baseline distribution G0. Under G0,

� j ð�; �; �2Þ � Nð�; ��2Þ; �2 � IGð12; 11Þ, � � IGð2 þ 107; 107 � ð1 þ 107ÞÞ; � � Nð0; 104Þ. The

precision parameter M was assumed to have a Gammað0:1; 0:1Þ prior. The prior parameters of

IGða; bÞ are assessed through its mean and variance. Note that if �IG and �2IG are the mean and variance

respectively, of the IGða; bÞ distribution, we have a ¼ �2IG=�

2IG þ 2; b ¼ �IG �2

IG=�2IG þ 1

� �. We used

the above result to choose the parameters of the prior distributions, making them as non-informative as

possible. For example, we took Eð�Þ ¼ 107;Varð�Þ ¼ 107 to arrive at the previously mentioned prior

for �. The prior for �2 was chosen so that Eð�2Þ ¼ 1 and Varð�2Þ ¼ 0:1. This allowed X’s to be drawn

from the normal mixture with variance �2 having prior mean of 1. The extra variability in � was

introduced through � and � . We also tried various other prior parameters for �2 and found that a large

prior mean in �2 results in a smooth density estimate whereas a small prior mean results in sharp

spikes. The simulation study was done as follows:

1. Draw a balanced RSS of size 60 from the mixture distribution F0.2. Draw an RSS X of size n from the mixture distribution F0.3. Find the density estimator based on X. This density estimator is based on 150 000 Gibbs iterations

(with thinning interval of 150) after a burn-in of 1000.4. Repeat steps 1–3 100 times to get 100 density estimators.5. Find the pointwise mean and variances of these estimators.

The same exercise was repeated with different combinations to come up with a sample of size 60.

For example, we have looked at k ¼ 1;m ¼ 60 (SRS of size 60), k ¼ 2, m ¼ 30, k ¼ 3, m ¼ 20.

For each of these combinations, the 100 density estimators were obtained. The pointwise means of

the 100 density estimators obtained for the three methods are shown in Figure 1.

The updates of the prior parameters used Metropolis steps (see Robert and Casella, 1999) where

necessary. All the calculations were programmed using the C language and run on a SUN Ultra 10

running Solaris 8. Random number generators were based on algorithms in Press et al. (1997). Each

simulation of 100 density estimates took about 3 h. Convergence of the Gibbs sampler was

ascertained using the CODA package (see Best et al., 1995) in R. We used the Heidelberger and

Welch test and Geweke (see Cowles and Carlin, 1996) convergence diagnostics to assess convergence

of the sampler.

To see how ‘close’ a density estimate is to the ‘true’, we computed the Kullback–Leibler distance

(see Kullback, 1959) and the L2 distance of the density estimate from the ‘true’. If ff is the density

estimate and f0 is the true, the KL distance between ff and f0 was approximated by

KLðff ; f0Þ ¼ 1L

PLl¼1 ff ðxlÞlogðff ðxlÞ=f0ðxlÞÞ and the L2 distance by L2ðff ; f0Þ ¼ 1

PLl¼1ðff ðxlÞ � f0ðxlÞÞ2

where xl, l ¼ 1; . . . ; L, are regularly spaced points on the support of f0. For our problem, f0 is

0:5 � Nð0; 1Þ þ 0:5 � Nð7; 1Þ. We chose 40 regularly spaced points on ½�3; 10� as the xls. For each

sample of size 60, we calculated ff using the method outlined previously and obtained KLðff ; f0Þ and

L2ðff ; f0Þ for that estimated density. This was repeated 100 times to get 100 density estimates (and, as a

result, 100 KL distances and 100 L2 distances). The means of these 100 distances were calculated

separately for the two methods. The whole procedure was repeated for three different set-ups: (i) SRS

of size 60 (or RSS of size 1 with 60 replicates); (ii) RSS of size 2 with 30 replicates of each and RSS of

size 3 with 20 replicates of each. The results are summarized in Table 1. The smaller KL distance and

L2 distance clearly demonstrates the superiority of using RSS as opposed to SRS.

In Figure 2, we have plotted the pointwise variances of the density estimators obtained using the

three methods: SRS, RSS with k ¼ 2 and RSS with k ¼ 3. Again, the pointwise variances of the

Figure 1. Pointwise means of the density estimates using the three methods

density estimates based on RSS is smaller than those based on SRS, once again demonstrating the

superiority of the Bayes estimate of density based on RSS.

5. AN EXAMPLE

Muttlak and McDonald (1990) report an application in which interest centers on the sizes of shrubs.

Shrubs were initially sampled by the line intercept method in which a transect is laid down and all

shrubs intersecting the transect are sampled. This method yields samples proportional to size. In their

example, three transects were drawn, which gave a sample of 18 þ 22 þ 6 ¼ 46 shrubs. They are

considered to be a big sample of size 46. Assuming that only three shrubs can be ranked at a time with

respect to their widths, the whole sample was broken into 15 groups of 3 each (leaving one out). This

generated a balanced ranked set sample with ki ¼ 3 with m ¼ 5 replicates—a total sample of size 15.

The resulting data are shown in Table 2.

Table 1. Average Kullback–Leibler and L2 distances of estimators from the‘true’ using various sampling schemes

Method KL L2

SRS ðk ¼ 1; m ¼ 30Þ 0.00505676 0.003294646RSS ðk ¼ 2; m ¼ 30Þ �0.00546397 0.002966064RSS ðk ¼ 3; m ¼ 30Þ �0.004072926 0.002601628

Figure 2. Pointwise variances of the density estimates

Since shrub sizes are non-negative, we have assumed a lognormal distribution for the observed

data. Thus, we have worked with Y ¼ logX, which will then be normal. So, our model is:

Y j ð�; �2Þ � LNð�; �2Þ, � ¼ ð�; �2Þ � G, G � DðM;G0Þ. Under G0; � j ð�; �; �2Þ � Nð�; ��2Þ,� � Nð; 107Þ. We put each observation in a separate row in the set-up of (1) (i.e. n ¼ 15 and

mi ¼ 1). We used the following prior parameters: � � Nð0; 107Þ, � � IGð2 þ 102; 102ð1 þ 102ÞÞ,�2 � IGð12; 11Þ, M � Gammað0:01; 0:01Þ. Note that the prior choice for � was made to make it

as non-informative as possible. The prior choice of � was made so that the average of � was large, and

the density was reasonably flat. Note that 1=� can be interpreted as the precision in the mean �, so

large values of � are used to reect the imprecise information on �. Similarly, trial and error with

various prior choices for �2 resulted in various degrees of smoothness of the density estimator, with

smaller prior mean resulting in rougher estimates. The first 50 000 simulations were burned-in and

the next 150 000 simulations were used to do the posterior calculations for density estimation,

sampling the results of every 150th iteration. Convergence was ascertained by running the results of

the simulations through CODA. We present below the result of Heidelberger and Welch stationarity

and interval half-width tests on selected variables. The chain passes stationarity criterion but some

variables fail the half-width test. This suggests running the chain longer. An additional 2000 runs

resulted in passing the half-width test. Selected convergence diagnostics are given in Figures 3–5 and

they also confirm convergence of the chain.

5.1. Heidelberger and Welch stationarity and interval half-width tests

Iterations used = 2000:50000

Thinning interval = 150

Sample size per chain = 321

Precision of half-width test = 0.1

$chain1

Stationarity Start p-value

test iteration

mu1 passed 2000 0.167

sigma1 passed 2000 0.628

eta passed 2000 0.588

tau passed 2000 0.210

prec passed 2000 0.212

Table 2. Ranked set sample of shrub sizes

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

ri 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3ki 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3Xi 0.79 0.20 0.57 0.35 0.75 1.45 0.97 0.97 0.98 1.5 0.52 0.62 2.54 2.12 1.86

Half-width Mean Half-width

mu1 failed -0.107 0.0250

sigma1 passed 0.665 0.0054

eta failed -0.587 0.4117

tau passed 46.953 0.6398

prec passed 1.300 0.1062

5.2. Geweke convergence diagnostic (Z-score)

Iterations used = 2000:50000

$chain1

Fraction in 1st window¼0.1

Fraction in 2nd window¼0.5

mu1 sigma1 eta tau prec

1.50635 �0.83367 �0.04089 1.16328 0.00556

Iterations = 2000:50000

Number of chains = 1

1. Empirical mean and standard deviation for each variable, plus standard

error of the mean:

Mean SD Naive SE Time-series SE

mu1 -0.1071 0.23076 0.012880 0.012776

sigma1 0.6654 0.04816 0.002688 0.002755

eta -0.5870 3.88699 0.216950 0.210030

tau 46.9535 6.84312 0.381946 0.326422

prec 1.3004 0.91771 0.051221 0.054185

2. Quantiles for each variable:

2.5% 25% 50% 75% 97.5%

mu1 -0.7775 -0.1965 -0.09321 0.01570 0.2599

sigma1 0.5845 0.6232 0.67042 0.70750 0.7384

eta -8.2234 -3.2883 -0.62157 1.80865 6.5462

tau 36.8106 41.0880 46.25485 53.12339 58.5537

prec 0.3504 0.6368 1.06644 1.74583 3.3364

We used the above Gibbs-sampler to get density estimates of Xnþ1 where rnþ1 ¼ knþ1 ¼ 1. Figures 6

and 7 show the effect of varying the parameters of the prior distributions of �2 and � . As expected, �2

controls the ‘roughness’ of the density estimator, with smaller �2 giving rougher estimates. The role of

�2 is similar to the role of window width in kernel density estimation. A large � leads to a bigger

variance of �, leading to a flatter prior. The choice of prior distribution of M had negligible effect (see

Figure 8), showing that our analysis was quite robust.

To see the effect of using an SRS in place of RSS, we repeated similar analysis on the same

dataset, but pretending that it was an SRS (case 3 of Section 2.1). Although, in theory, an RSS of

size n costs more to collect than an SRS of the same size, the practical costs of ordering

observations are often negligible and the comparison is a valid one. The two resulting density

estimates are given in Figure 9. Note that the density estimate resulting from RSS is more peaked

than that obtained from SRS, but both have more or less the same center. Hence, any inference

about the center of the distribution will be more accurate using the density estimate based on RSS

than by using one based on SRS.

6. CONCLUSION

We have studied a semiparametric Bayesian method of estimating density when the data are a ranked

set sample. While the estimator does not have an analytically tractable expression, this method is

Figure 4. Autocorrelations of selected parameters

easily programmed on a computer. Simulation studies have shown that the estimation procedure

results in more precise density estimates when a ranked set sample is used compared to a simple

random sample of the same size.

The results presented in this article are based on simulation studies for large samples. The presence

of large samples allows one to use non-informative priors, without running the Gibbs sampler too long

to achieve convergence. In practice, however, we will be dealing with small sample sizes when we

work with ranked set samples. To offset the potential non-identifiability problems related to the

Dirichlet process in such cases (see, for example, Gelfand and Sahu, 1999), stronger (i.e. more

informative) priors may be needed. This may result in severely biased density estimates should

the prior choices be wrong. Another way around this problem would be to use more than one

observation per unknown parameter (e.g. as in longitudinal studies in Kleinman and Ibrahim,

1998a,b). This means forcing certain observations to share the same parameter value. For example,

we may decide that all sample minima have the same � and all sample maxima have another �. The

effect of inaccurate prior information can also be offset by appropriately choosing large M in the

Dirichlet process.

For future research, it would be of interest to study the robustness of the Bayesian density estimator

to the choice of the parametric form of the baseline distribution. Another problem of interest would be

to study the effect of balanced versus unbalanced ranked set samples.

Figure 5. Plot of Geweke test statistic

Figure 6. Effect of varying the prior parameters of �2. Prior parameters leading to smaller values of �2 result in ‘rougher’

density estimates

Figure 7. Effect of varying the prior parameters of �

Figure 8. Effect of varying prior parameters of precision M

Figure 9. Effect of treating the RSS as an SRS. Note that the density estimate based on RSS corresponds to a smaller variability

in the estimated distribution

REFERENCES

Basu D, Tiwari RC. 1982. A note on the Dirichlet Process. In Statistics and Probability: Essays in Honor of C. R. Rao. North-Holland, New York, 89–103.

Best NG, Cowles MK, Vines K. 1995. CODA: convergence diagnosis and output analysis software for Gibbs sampling output,version 0.30, Technical Report, MRC Biostatistics Unit, University of Cambridge.

Blackwell D. 1973. Discreteness of Ferguson selections. Annals of Statistics 1(2): 356–358.Blackwell D, MacQueen JB. 1973. Ferguson distribution via Polya Urn schemes. Annals of Statistics 1(2): 353–355.Boyles RA, Samaniego FJ. 1986. Estimating a distribution function based on nomination sampling. Journal of the AmericanStatistical Association 81: 1039–1045.

Cowles MK, Carlin BP. 1996, Markov chain Monte Carlo convergence diagnostics: a comparative review. Journal of theAmerican Statistical Association 91(434): 883–904.

Dell TR, Clutter JL. 1972. Ranked set sampling theory with order statistics background. Biometrics 28: 545–553.Escobar MD, West M. 1995. Bayesian density estimation and inference using mixtures, Journal of the American StatisticalAssociation 90(430): 577–588.

Escobar MD, West M. 1998. Computing nonparametric hierarchical models. In Practical Nonparametric and SemiparametricBayesian Statistics, Dey DK, Muller P, Sinha D (eds). Springer: New York; 1–22.

Ferguson TS. 1973. A Bayesian analysis of some nonparametric problems. Annals of Statistics 1(2): 209–230.Ferguson TS. 1983. Bayesian density estimation by mixtures of normal distributions. In Recent Advances in Statistics, Rizvi H,

Rustagi J (eds). Academic Press: New York; 287–302.Ferguson TS, Phadia EG, Tiwari RC. 1992. Bayesian nonparametric inference. In Current Issues in Statistical Inference: Essaysin Honor of D. Basu, number 17 in IMS Lecture Notes Monograph Series. Institute of Mathematical Statistics: Hayward, CA;127–150.

Gelfand AE, Sahu SK. 1999. Identifiability, improper priors and Gibbs sampling for generalized linear models. Journal of theAmerican Statistical Association 94: 247–253.

Kaur A, Patil GP, Sinha AK, Taillie C. 1995. Ranked set sampling: an annotated bibliography. Environmental and EcologicalStatistics 2: 25–54.

Kleinman KP, Ibrahim JG. 1998a. A semi-parametric Bayesian approach to generalized linear mixed models. Statistics inMedicine 17: 2579–2596.

Kleinman KP, Ibrahim JG. 1998b. A semi-parametric Bayesian approach to the random effects model. Biometrics 54: 921–938.Kullback S. 1959. Information Theory and Statistics. Wiley: New York.Kuo L. 1986. Computations of mixtures of Dirichlet processes. SIAM Journal of Scientific Statistical Computing 7: 60–71.Kvam PH, Samaniego FJ. 1994. Nonparametric maximum likelihood estimation based on ranked set samples. Journal ofAmerican Statistical Association 89(426): 526–537.

Kvam PH, Tiwari RC. 1999. Bayes estimation of a distribution function using ranked set samples. Environmental and EcologicalStatistics 6: 11–22.

Lindsay B. 1983. The geometry of mixture likelihoods, Part I: a general theory. The Annals of Statistics 11: 86–94.McIntyre GA. 1952. A method of unbiased selective sampling using ranked sets. Australian Journal of Agricultural Research 3:

385–390.Mode NA, Conquest LL, Marker DA. 1999. Ranked set sampling for ecological research: accounting for the total costs of

sampling. Environmetrics 10: 179–194.Muttlak HA, McDonald LL. 1990. Ranked set sampling with size-biased probability of selection. Biometrics 46: 435–445.Neal RM. 2000. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational andGraphical Statistics 9(2): 249–265.

Nussbaum BD, Sinha BK. 1997. Cost effective gasoline sampling using ranked set sampling. In American Statistical Association1997 Proceedings of the Section on Statistics and the Environment. American Statistical Association: Alexandria, VA; 83–87.

Patil G, Sinha AK, Taillie C. 1999. Ranked set sampling: a bibliograpy. Environmental and Ecological Statistics 6: 91–98.Press WH, Teukolsky SA, Vetterling WT, Flannery BP. 1997. Numerical Recipes in C (2nd edn). Cambridge University Press:

Cambridge.Robert CP, Casella G. 1999. Monte Carlo Statistical Methods. Springer: New York.Silverman BW. 1981. Using kernel density estimates to investigate multimodality. Journal of The Royal Statistical Society 43:

97–99.Silverman BW. 1986. Density Estimation for Statistics and Data Analysis. Chapman & Hall: New York.Stokes SL, Sager TW. 1988. Characterization of a ranked set sample with applications to estimating distribution functions.Journal of the American Statistical Association 83: 374–381.

Takahashi K, Wakimoto K. 1968. On unbiased estimates of the population mean based on samples stratified by means ofordering. Annals of Institute of Mathematical Statistics 20: 1–31.

Wells MT, Tiwari RC. 1990. Estimating a distribution function based on minima-nominated sampling. In Topics in StatisticalDependence, Block H, Sampson A, Savits T (eds). Institute of Mathematical Statistics: Hayward, CA; 471–479.

Yu PLH, Lam K. 1997. Regression estimator in ranked set sampling. Biometrics 53: 1070–1080.

Bayesian density estimation using ranked set samples

Documents

Transcript of Bayesian density estimation using ranked set samples

Bayesian Density Estimation and Inference Using …users.wpi.edu/~balnan/Escobar-West-1995.pdfBayesian Density Estimation and Inference Using Mixtures Michael D. ESCOBAR and Mike WEST*

Bayesian Essentials and Bayesian Regression

Structured Variational Learning of Bayesian Neural ... · density p(v) /v a 1expf b=vgfor v>0. This avoids. Structured Variational Learning of Bayesian Neural Networks with Horseshoe

WinBUGS for Population Ecologists: Bayesian Modeling Using ... · Keywords Bayesian statistics ·Density dependence ·Distance sampling ·External covariates ·Hierarchical modeling

bayesian bayesian network

Inference in Bayesian Proxy-SVARs · This is a restricted NGN density over the structural represen-tation of the proxy-SVAR because it is a normal-generalized-normal density over

Feedback in Bayesian Models...What is the target density of a cut model? The target density of a cut model is the mixture: p ( ) = Z p(’jZ)p( j’;Y)d’ This is the sampling density

Optimization of rain gauge sampling density for river discharge … · 2020-07-30 · Optimization of rain gauge sampling density for river discharge prediction using Bayesian calibration

Bayesian Optimization with Robust Bayesian Neural …papers.nips.cc/paper/6117-bayesian-optimization-with-robust... · Bayesian Optimization with Robust Bayesian Neural Networks Jost

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Normalization in Econometrics - NYUpages.stern.nyu.edu/...normalization_ER_07.pdfnormalization is implemented by a priori restrictions or Bayesian prior probability density functions

Assumed Density Filtering Methods For Learning Bayesian ... · Assumed Density Filtering Methods For Learning Bayesian Neural Networks Soumya Ghosh Francesco Maria Delle Fave Jonathan

Audiocodes La tecnologia abilitante delle UCmedia.gswi.westcon.com/media//AudioCodes_21_Marzo_2017_pdf.pdf · Market leader in VoIP networking Products Ranked #1 in low and mid density

BAYESIAN MULTIPLE PERSON TRACKING USING …s2is.org/Issues/v4/n2/papers/paper8.pdf · BAYESIAN MULTIPLE PERSON TRACKING USING PROBABILITY . HYPOTHESIS DENSITY SMOOTHING . S. Hernandez

Raymond J. Carroll Texas A&M University and University of Technology Sydney carroll Bayesian Methods for Density and Regression Deconvolution.

Highest Ranked Mobile EHR Top Ranked Virtualized & Native ...

Mozcon 2017 - I'd rather be Ranked, Then Ranked

Chapter 2 (part 3) Bayesian Decision Theory Discriminant Functions for the Normal Density Bayes Decision Theory – Discrete Features All materials used.

Bayesian nonparametric spectral density estimation using B ...

3. Bayesian Decision Theory - Sophia - Inria · Bayesian theory. 62 Bayesian Decision Theory . Bayesian Decision Theory . Bayesian Decision Theory – Discrete Features– – Discrete