Post on 11-Jun-2016
ENVIRONMETRICS
Environmetrics 2004; 15: 711–728 (DOI: 10.1002/env.667)
Bayesian density estimation using ranked set samples
Kaushik Ghosh1,2 and Ram C. Tiwari2*,y
1Department of Statistics, George Washington University, Washington, DC 20052, U.S.A.2Statistical Research and Applications Branch, Division of Cancer Control and Population Sciences,
National Cancer Institute, Bethesda, MD 20892, U.S.A.
SUMMARY
In this article, we present a Bayesian approach for estimating the underlying density using a ranked set sample.We assume that the observations are generated from a Dirichlet process mixture model. Density (as well asmoments) of future values generated from the process are estimated through Markov chain Monte Carlosimulations. This approach extends earlier work on density estimation based on a Dirichlet process prior froma simple random sample to a ranked set sample. We carry out a simulation study using a normal kernel to comparethe effect of using a simple random sample versus a ranked set sample on the predictive density. We show that theBayesian density estimate resulting from a ranked set sample has a smaller average mean squared error than thatfrom a simple random sample. Additionally, the average Kullback–Leibler distance of the density estimate basedon the ranked set sample is shown to be closer to zero than that based on the corresponding simple random sample.We illustrate our method by applying it to shrub data available in the literature. Copyright # 2004 John Wiley &Sons, Ltd.
key words: order statistics; Dirichlet process prior; Gibbs sampler; mixture distribution; Kullback-Leiblerdistance; L2 distance
1. INTRODUCTION
In many sampling situations, the units in a population can be difficult (or expensive) to quantify with
respect to the characteristic of interest but are easily compared to one another. For example, in some
agricultural experiments, the yields of plots are expensive to determine but can be easily ranked
visually. In environmental problems, for example, to estimate the amount of contaminant in a
particular area, soil samples are obtained for analysis in a laboratory. However, since chemical
analysis of soil is expensive to perform, the researcher has to economize on the number of
measurements made, all the while maintaining a high degree of accuracy. It may be possible to
visually rank the soil samples with respect to the amount of contaminant present (based on physical
characteristics such as color, texture, smell, etc.) before performing a detailed chemical analysis.
Similar examples can be found in geology, biology, forestry, forensics and various other fields (for
Received 10 July 2002
Copyright # 2004 John Wiley & Sons, Ltd. Accepted 19 January 2004
*Correspondence to: R. C. Tiwari, Statistical Research and Applications Branch, Division of Cancer Control and PopulationSciences, National Cancer Institute, 6116 Executive Blvd., MSC 8317, Bethesda, MD 20892-8317, U.S.A.yE-mail: tiwarir@nih.gov
examples and further references see Mode et al., 1999; Nussbaum and Sinha, 1997; Yu and Lam, 1997;
Patil et al., 1999).
In situations like these, drawing a ranked set sample (RSS) is a valuable tool for data acquisition.
Developed by McIntyre (1952), RSS consists of sampling in multiple stages as follows: A random
sample of n units is drawn from the underlying population and the units are then ranked according to
the characteristic of interest. This can be done easily without judgment error if n is small enough, say
2. From this set, the smallest unit is identified and then measured. Another random sample of n units is
drawn (independent of the first sample), the units ranked, and the second smallest unit is measured.
The process is continued until, at the nth stage, a random sample of n units is taken, the units are
ordered and the largest unit is measured. This completes a cycle and the n measurements so obtained
constitute a ranked set sample of size n from the population of interest. Note that, although n2 units
were screened, the RSS consists of only n observations. Sometimes, the entire cycle is repeated several
times to get multiple replicates of each order statistic. The sample so obtained is called a balanced
ranked set sample, since each order statistic gets equal representation.
A balanced ranked set sample is thus essentially a set of independent order statistics. A
straightforward generalization of the above considers (possibly) unequal sample sizes at the
different stages, along with (possibly) unequal number of replicates of the independent order
statistics. We will call this a generalized ranked set sample and denote it by Xi;j; i ¼ 1; . . . ; n;
j ¼ 1; . . . ;mi. Here, Xi;j stands for the jth replicate of Xðri : kiÞ—the rith order statistic based on a
random sample of size ki from the underlying population. Note that we have a simple random
sample of size nm if ri ¼ 1, ki ¼ 1 and mi ¼ m and a balanced ranked set sample of the same size if
ri ¼ i, ki ¼ n and mi ¼ m. An unbalanced ranked set sample can also be obtained as a special case of
the above generalization.
Early research on RSS focused on the estimation of the unknown population mean (see Kaur
et al., 1995, for a comprehensive review of articles related to ranked set sampling). It has been
shown (see Takahashi and Wakimoto, 1968; Dell and Clutter, 1972) that the sample mean based on
an RSS is unbiased for the population mean and is more efficient than that based on a simple
random sample (SRS) of the same size. Often it is of interest to estimate the density of the
underlying population from an RSS. Stokes and Sager (1988) have investigated the problem of
estimating the underlying cumulative distribution function (CDF) F, using an RSS. They showed
that the empirical distribution function (EDF) based on an RSS is unbiased for F and is uniformly
more efficient in estimating F than the EDF based on an SRS of the same size. Kvam and
Samaniego (1994) derived a non-parametric maximum likelihood estimator for F which is based on
an (unbalanced) RSS. However, application of traditional frequentist methods such as kernel
density estimation Silverman (1981, 1986), non-parametric maximum likelihood Lindsay (1983)
may be difficult to justify in RSS problems, since sample sizes are usually small for any
asymptotics to be applicable.
A natural alternative is to use Bayesian methods, whereby available prior information is
combined with the data to make inferences. Recently, Kvam and Tiwari (1999) derived the Bayes
estimator of F based on an (unbalanced) RSS, with the Dirichlet distribution as the prior. Their
approach assumed that F can be discretized by putting probabilities at the observed sample values.
This completely non-parametric estimator may not be appropriate in cases where, for example, it is
known beforehand that the underlying distribution is continuous or has a particular parametric form
(say mixture of normals).
In this article, we present a semiparametric Bayesian method of estimating the underlying density
from a (generalized) ranked set sample. This is essentially an extension of the method developed in
712 K. GHOSH AND R. C. TIWARI
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 711–728
Ferguson (1983), Kuo (1986) and Escobar and West (1995) to RSS. We assume that the data
distributions are mixtures of normals in the set-up of Dirichlet processes; see Ferguson (1973), and
Basu and Tiwari (1982). The density comes from a parametric family which has a non-parametric
prior. A Markov chain Monte Carlo method is developed that allows us to incorporate this prior
information to obtain the density estimator. As a byproduct, we can obtain the mean and variance of a
future value from this distribution, conditional on the observed data. Through simulation studies, we
show that the density estimates so obtained are more precise in terms of the smaller values of average
mean-squared error, Kullback–Leibler distance and L2 distance than those obtained based on a simple
random sample. We illustrate our findings with shrub data from Muttlak and McDonald (1990).
In Section 2, we develop the method for any general parametric family. In Section 3, we present the
Markov chain Monte Carlo simulation procedure necessary to use this method. In Section 4, we carry
out a simulation study to show the optimality properties of the proposed method. For brevity, only
results for mixtures of normal are presented. The method can be carried out for mixtures of other
parametric distributions as well. In Section 5, we analyze a real data set to illustrate our method.
Finally, in Section 6, we present our conclusions and discuss future research.
2. SEMIPARAMETRIC BAYES ESTIMATION
2.1. Model
For any i ð1 � i � nÞ, let Xðri : kiÞ be the rith order statistic based on a random sample of size
kið1 � ri � kiÞ drawn from a continuous distribution with pdf f ðx j �iÞ and cdf Fðx j �iÞ, respectively.
Recall that the pdf of Xðri : kiÞ is given by
fiðx j �iÞ ¼ kiki � 1
ri � 1
� �Fðx j �iÞri�1½1 � Fðx j �iÞ�ki�ri f ðx j �iÞ
For any j ð1 � j � miÞ, let Xðri : kiÞj denote mi independent replications of Xðri : kiÞ. We can represent this
set-up in the following tabular form:
Xðr1 : k1Þ1; Xðr1 : k1Þ2; � � � ; Xðr1 : k1Þm1� f1ðx j �1Þ
Xðr2 : k2Þ1; Xðr2 : k2Þ2; � � � ; Xðr2 : k2Þm2� f2ðx j �2Þ
..
. ... ..
. ... ..
.
Xðrn : knÞ1; Xðrn : knÞ2; � � � ; Xðrn : knÞmn� fnðx j �nÞ
9>>>=>>>;
ð1Þ
For simplicity of notation, we will write Xi; j � Xðri : kiÞj. Given hn ¼ ð�1; . . . ; �nÞ, we assume that Xi;j is
independent of Xi0;j0 for all ði; jÞ 6¼ ði0; j0Þ.Assume further that the unknown parameters �1; . . . ; �n are independent and identically distributed
(i.i.d.) with a non-parametric distribution G. We model G by a Dirichlet process (DP) with parameters
G0 and M (written as G � DðM;G0Þ). See, for example, Ferguson (1973), Blackwell (1973),
Blackwell and MacQueen (1973), and the review article of Ferguson et al. (1992) for more on DP.
The parameters M and G0 are referred to as the scale (or precision) and location parameters,
respectively, of the DP. G0 is the prior guess on G and M is the strength of belief in G0 as being this
prior guess.
BAYESIAN DENSITY ESTIMATION USING RANKED SET SAMPLES 713
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 711–728
Thus, the full model is
Xi;j j �i;G �indep:fiðx j �iÞ; j ¼ 1; . . . ;mi; i ¼ 1; . . . ; n
�1; . . . ; �n jG �i:i:d:G
G j ðM;G0Þ � DðM;G0Þ;M > 0 known; G0 known
or where M and parameters of G0 may involve known prior distributions.
We will call Xn ¼ ðXi; jÞmi
j¼1; i ¼ 1; . . . ; n, a generalized ranked set sample from the mixture
distribution FðxÞ ¼ÐFðx j �ÞdGð�Þ, where G � DðM;G0Þ.
Our goal is to combine the data Xn with the prior information on the �s to make inferences on the
future (and hence unobserved) value Xnþ1 � Xðrnþ1 : knþ1Þ. In particular, we are interested in the
predictive density of this unobserved value, given by fnþ1ðx jXnÞ. Setting rnþ1 ¼ knþ1 ¼ 1, we get
an estimate of the marginal density of Xnþ1 � Xðrnþ1 : knþ1Þ given byÐf ðx j �ÞdGð�Þ.
Special cases:
1. If n¼ 1; r1 ¼ 1; k1 ¼ 1; m1 ¼ m, the sample is X1;jðj ¼ 1; . . . ;mÞ. This is just the first row of 1,which is an SRS of size m from f ðx j �Þ.
2. If ri ¼ 1; ki ¼ 1;mi ¼ 1 ði ¼ 1; . . . ; nÞ; the sample is Xi;1ði ¼ 1; . . . ; nÞ. This is the first column of 1and is an independent set of observations from f ðx j �iÞ; i ¼ 1; . . . ; n.
3. If ri ¼ i; ki ¼ n;mi ¼ 1; we have a balanced ranked set sample Xi � Xði : nÞ; i ¼ 1; . . . ; n with onereplication.
4. If ri ¼ i; ki ¼ n;mi ¼ m; we have a balanced ranked set sample Xi; j � Xði : nÞj; i ¼ 1; . . . ; n;j ¼ 1; . . . ;m, with m replications.
5. If n ¼ 1, we have a nomination sample (see Boyles and Samaniego, 1986; Wells and Tiwari, 1990).
2.2. Predictive density
Ferguson (1973) showed that if G � DðM;G0Þ and � jG � G, then
G j � � D M þ 1;MG0 þ ��M þ 1
� �
where �x stands for point mass at x. Using the above argument sequentially, the (unconditional) joint
distribution of hn ¼ ð�1; . . . ; �nÞ is given by (see Antoniak, 1974)
dQðhnÞ /Yni¼1
MdG0ð�iÞ þPi�1
j¼1 ��jðd�iÞM þ i� 1
" #ð2Þ
From (2), the conditional density of �nþ1 given hn is given by:
dQð�nþ1 j hnÞ ¼M
M þ ndG0ð�nþ1Þ þ
1
M þ n
Xnj¼1
��jðd�nþ1Þ
714 K. GHOSH AND R. C. TIWARI
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 711–728
Hence, the predictive density of the future unobserved value Xnþ1 � Xðrnþ1 : knþ1Þ given the observed
data Xn is
fnþ1ðx jXnÞ ¼ðfnþ1ðx j hn;XnÞdQðhn jXnÞ
where
fnþ1ðx j hn;XnÞ ¼ðfnþ1ðx j �nþ1; hn;XnÞdQð�nþ1 j hn;XnÞ
¼ðfnþ1ðx j �nþ1ÞdQð�nþ1 j hnÞ
¼ M
M þ n
ðfnþ1ðx j �nþ1ÞdG0ð�nþ1Þ
þ 1
M þ n
Xnj¼1
fnþ1ðx j �jÞ
and Qðhn jXnÞ is the posterior of hn given by
dQðhn jXnÞ /Yni¼1
Ymi
j¼1
½fiðXi;j j �iÞ�dQðhnÞ ð3Þ
Simplifying further, the predictive density is given by
fnþ1ðx jXnÞ ¼M
M þ n
ðfnþ1ðx j �nþ1ÞdG0ð�nþ1Þ
þ 1
M þ n
ðXnj¼1
fnþ1ðx j �jÞdQðhn jXnÞð4Þ
The predictive density in (4) is composed of two parts. The first part is the marginal of Xnþ1 based
only on the baseline prior G0. The second part is based on the observed data. The relative weights of
these parts are proportional to M and n, respectively. In the case of full confidence in the baseline prior
G0, we have M ! 1 and hence only the first part plays a role, whereas if M ! 0 (corresponding to no
prior information), only the second part plays a role.
Writing EðX j �Þ ¼ �ð�Þ and VarðX j �Þ ¼ �2ð�Þ, we have from (4) (note that � can be a vector):
EðXnþ1 jXnÞ ¼M
M þ n
ð�ð�ÞG0ðd�Þ þ
1
M þ n
ðXnj¼1
�ð�jÞdQðhn jXnÞ ð5Þ
and
EðX2nþ1 jXnÞ ¼
M
M þ n
ðð�2ð�Þ þ �2ð�ÞÞG0ðd�Þ
þ 1
M þ n
ðXnj¼1
ð�2ð�jÞ þ �2ð�jÞÞdQðhn jXnÞ ð6Þ
BAYESIAN DENSITY ESTIMATION USING RANKED SET SAMPLES 715
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 711–728
(5) and (6) can be used to get the mean and variance of the future value Xnþ1 conditional on the
observed data. For example, if X j � � Nð�; �2Þ, we have
EðX jXnÞ ¼M
M þ n
ð�G0ðd�Þ þ
1
M þ n
ðXnj¼1
�jdQðhn jXnÞ
3. POSTERIOR SIMULATION
3.1. Gibbs sampler
The second term in (4) requires integration with respect to the posterior distribution of hn. Except
in trivial cases, no closed-form expression of this posterior distribution is available. However, as
shown below, it is easy to construct a Markov chain whose stationary distribution coincides with
the posterior. Starting from an arbitrary initial value, we repeatedly sample from this chain until
convergence. Subsequent samples from this chain are used to make posterior inferences.
From (2) and the fact that the �is are exchangeable, the univariate conditional density of �i given
hð�iÞn ¼ ð�1; . . . ; �i�1; �iþ1; . . . ; �nÞ is (see also Blackwell and MacQueen, 1973)
dQ �i j hð�iÞn
� �¼ M
M þ n� 1dG0ð�iÞ þ
1
M þ n� 1
Xnj¼1j 6¼i
��jðd�iÞ
The univariate conditional posteriors are then given by
dQð�i j h�in ;XnÞ /
Ymi
j¼1
fiðXi;j j �iÞdQ �i j hð�iÞn
� �
¼Ymi
j¼1
fiðXi;j j �iÞM
M þ n� 1dG0ð�iÞ þ
1
M þ n� 1
Xnj¼1j 6¼i
��jðd�iÞ
264
375
/ q0i G0;iðd�iÞ þXnj¼1j 6¼i
qji��jðd�iÞ ð7Þ
where
q0i ¼ ciM
ðYmi
j¼1
fiðXi;j j �iÞdG0ð�iÞ
qji ¼ ciYmi
j¼1
fiðXi;j j �i ¼ �jÞð8Þ
and
dG0;ið�iÞ /Ymi
j¼1
fiðXi;j j �iÞdG0ð�iÞ ð9Þ
716 K. GHOSH AND R. C. TIWARI
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 711–728
is the baseline posterior of �i given Xi;j, j ¼ 1; . . . ;mi. The constant of proportionality ci in (9) is
chosen so that
q0i þXnj¼1j 6¼i
qji ¼ 1
Thus, the posterior update of �i given hð�iÞn will be from a mixture of the baseline posterior G0;i and
point masses at the remaining �js, with mixing probabilities given by q0;i and qj;i, respectively. The
above univariate conditional posteriors are useful in implementing the Gibbs sampler to sample from
the posterior ðhn jXnÞ. Starting from an arbitrary initial value of hn, the Gibbs sampler sequentially
updates each of the components �i conditional on the observed data and the remaining components
hð�iÞn . This process is repeated a large number of times until the Markov chain converges to its
stationary distribution, which is the desired posterior of hn.
Since the Dirichlet process DðM; G0Þ is discrete with probability 1 (see Ferguson, 1973; Blackwell,
1973; Basu and Tiwari, 1982), with positive probability, not all �1; . . . ; �n are distinct. At any stage, let
there be kð� nÞ distinct values and let h� ¼ ð��1; . . . ; ��kÞ be the vector of these distinct values with
corresponding multiplicities n1; . . . ; nk, respectively, where n1 þ � � � þ nk ¼ n. Define the configura-
tion vector s by s ¼ ðs1; . . . ; snÞ, where si ¼ l if and only if �i ¼ ��l ði ¼ 1; . . . ; n; l ¼ 1; . . . ; kÞ. Note
that the information contained in s and h� is equivalent to that in hn. For simplicity of calculation,
the updates in (7) are sometimes equivalently written in terms of updates in s and h�.
Note that using the above procedure (7) requires our ability to easily calculateÐ Qmi
j¼1 fiðxi;j j �ÞdG0ð�Þ. Since we are dealing with order statistics, even in the presence of conjugate
prior–posterior relationships, such integrals are typically difficult/expensive to obtain. Hence, we
resort to ‘Algorithm 8’ described in Neal (2000), designed for dealing with such non-conjugacy. This
algorithm updates s and h by adding auxiliary parameters so that the resulting stationary distribution
coincides with the desired one.
Often, another level is added to the hierarchy whereby G0 (the baseline prior) is assumed to have a
parametric form and prior distributions are put on these parameters. For example, if G0 � Nð�; �2Þ,we can have the hyperprior specification �2 � IGð�; �Þ; � � IGð�� ; �� Þ, � j ð�; �2Þ � Nð�; ��2Þ. Here
IGða; bÞ denotes the inverted gamma distribution with mean b=ða� 1Þ and variance
b2=½ða� 1Þ2ða� 2Þ�. A Gamma prior on M is also a popular choice. All these steps can be
incorporated into the Gibbs sampler to perform a full-Bayesian analysis. See Escobar and West
(1995) and Escobar and West (1998) for similar models.
We thus proceed with Gibbs sampling by iterating through the following steps in order—sampling
at each stage is based on current values of the conditioning variables and the process is repeated until
convergence (i.e. when the steady state is reached):
1. Sample from �i j ðhð�iÞn ;XnÞ for each i ¼ 1; . . . ; n using methods outlined in (7)–(9). If necessary,
use ‘Algorithm 8’. After completing the cycle, this will result in a new configuration vector s and acorresponding value of k.
2. Given k and s, generate a new set of parameters h� by sampling each ��l from the relevantcomponent posterior:
dQð��l jXn; s; kÞ /Yi : si¼l
Ymi
j¼1
fiðXi;j j ��l ÞdG0ð��l Þ; l ¼ 1; . . . ; k
BAYESIAN DENSITY ESTIMATION USING RANKED SET SAMPLES 717
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 711–728
A Metropolis–Hastings step may be necessary to generate from this distribution.3. If appropriate, update the parameters of the baseline prior and the precision parameter M.
3.2. Convergence
Arguments similar to those in the Appendix of Escobar and West (1995) are applicable to our
situation, since the observations are conditionally independent.
4. A SIMULATION STUDY
Since the proposed density estimator does not have a closed form expression, we investigate its
properties through large-scale simulation studies. Our main goal is to see the effect of simple random
sample versus ranked set sample on the precision of density estimates using the proposed method. We
hypothesize that the density estimates based on RSS will be more precise, with the precision increasing
with increase in k. For simplicity, we have only used balanced ranked set samples in our study. We have
used various criteria for comparison of the precision of the methods–namely pointwise variance,
average Kullback–Leibler distance and average L2 distance from the ‘true’ generating distribution,
which is taken to be F0 � 0:5 � Nð0; 1Þ þ 0:5 � Nð7; 1Þ, the mixture of Nð0; 1Þ and Nð7; 1Þ with equal
mixing probabilities. We have used a normal kernel for estimation of the density, i.e.
f ðx j �Þ � f�x j�; �2
�¼ 1ffiffiffiffiffiffiffiffi
2�p e�ðx��Þ2=ð2�2Þ �1 < x < 1
where � ¼ ð�; �2Þ. We have also used a normal-inverted gamma baseline distribution G0. Under G0,
� j ð�; �; �2Þ � Nð�; ��2Þ; �2 � IGð12; 11Þ, � � IGð2 þ 107; 107 � ð1 þ 107ÞÞ; � � Nð0; 104Þ. The
precision parameter M was assumed to have a Gammað0:1; 0:1Þ prior. The prior parameters of
IGða; bÞ are assessed through its mean and variance. Note that if �IG and �2IG are the mean and variance
respectively, of the IGða; bÞ distribution, we have a ¼ �2IG=�
2IG þ 2; b ¼ �IG �2
IG=�2IG þ 1
� �. We used
the above result to choose the parameters of the prior distributions, making them as non-informative as
possible. For example, we took Eð�Þ ¼ 107;Varð�Þ ¼ 107 to arrive at the previously mentioned prior
for �. The prior for �2 was chosen so that Eð�2Þ ¼ 1 and Varð�2Þ ¼ 0:1. This allowed X’s to be drawn
from the normal mixture with variance �2 having prior mean of 1. The extra variability in � was
introduced through � and � . We also tried various other prior parameters for �2 and found that a large
prior mean in �2 results in a smooth density estimate whereas a small prior mean results in sharp
spikes. The simulation study was done as follows:
1. Draw a balanced RSS of size 60 from the mixture distribution F0.2. Draw an RSS X of size n from the mixture distribution F0.3. Find the density estimator based on X. This density estimator is based on 150 000 Gibbs iterations
(with thinning interval of 150) after a burn-in of 1000.4. Repeat steps 1–3 100 times to get 100 density estimators.5. Find the pointwise mean and variances of these estimators.
The same exercise was repeated with different combinations to come up with a sample of size 60.
For example, we have looked at k ¼ 1;m ¼ 60 (SRS of size 60), k ¼ 2, m ¼ 30, k ¼ 3, m ¼ 20.
For each of these combinations, the 100 density estimators were obtained. The pointwise means of
the 100 density estimators obtained for the three methods are shown in Figure 1.
718 K. GHOSH AND R. C. TIWARI
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 711–728
The updates of the prior parameters used Metropolis steps (see Robert and Casella, 1999) where
necessary. All the calculations were programmed using the C language and run on a SUN Ultra 10
running Solaris 8. Random number generators were based on algorithms in Press et al. (1997). Each
simulation of 100 density estimates took about 3 h. Convergence of the Gibbs sampler was
ascertained using the CODA package (see Best et al., 1995) in R. We used the Heidelberger and
Welch test and Geweke (see Cowles and Carlin, 1996) convergence diagnostics to assess convergence
of the sampler.
To see how ‘close’ a density estimate is to the ‘true’, we computed the Kullback–Leibler distance
(see Kullback, 1959) and the L2 distance of the density estimate from the ‘true’. If ff is the density
estimate and f0 is the true, the KL distance between ff and f0 was approximated by
KLðff ; f0Þ ¼ 1L
PLl¼1 ff ðxlÞlogðff ðxlÞ=f0ðxlÞÞ and the L2 distance by L2ðff ; f0Þ ¼ 1
L
PLl¼1ðff ðxlÞ � f0ðxlÞÞ2
,
where xl, l ¼ 1; . . . ; L, are regularly spaced points on the support of f0. For our problem, f0 is
0:5 � Nð0; 1Þ þ 0:5 � Nð7; 1Þ. We chose 40 regularly spaced points on ½�3; 10� as the xls. For each
sample of size 60, we calculated ff using the method outlined previously and obtained KLðff ; f0Þ and
L2ðff ; f0Þ for that estimated density. This was repeated 100 times to get 100 density estimates (and, as a
result, 100 KL distances and 100 L2 distances). The means of these 100 distances were calculated
separately for the two methods. The whole procedure was repeated for three different set-ups: (i) SRS
of size 60 (or RSS of size 1 with 60 replicates); (ii) RSS of size 2 with 30 replicates of each and RSS of
size 3 with 20 replicates of each. The results are summarized in Table 1. The smaller KL distance and
L2 distance clearly demonstrates the superiority of using RSS as opposed to SRS.
In Figure 2, we have plotted the pointwise variances of the density estimators obtained using the
three methods: SRS, RSS with k ¼ 2 and RSS with k ¼ 3. Again, the pointwise variances of the
Figure 1. Pointwise means of the density estimates using the three methods
BAYESIAN DENSITY ESTIMATION USING RANKED SET SAMPLES 719
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 711–728
density estimates based on RSS is smaller than those based on SRS, once again demonstrating the
superiority of the Bayes estimate of density based on RSS.
5. AN EXAMPLE
Muttlak and McDonald (1990) report an application in which interest centers on the sizes of shrubs.
Shrubs were initially sampled by the line intercept method in which a transect is laid down and all
shrubs intersecting the transect are sampled. This method yields samples proportional to size. In their
example, three transects were drawn, which gave a sample of 18 þ 22 þ 6 ¼ 46 shrubs. They are
considered to be a big sample of size 46. Assuming that only three shrubs can be ranked at a time with
respect to their widths, the whole sample was broken into 15 groups of 3 each (leaving one out). This
generated a balanced ranked set sample with ki ¼ 3 with m ¼ 5 replicates—a total sample of size 15.
The resulting data are shown in Table 2.
Table 1. Average Kullback–Leibler and L2 distances of estimators from the‘true’ using various sampling schemes
Method KL L2
SRS ðk ¼ 1; m ¼ 30Þ 0.00505676 0.003294646RSS ðk ¼ 2; m ¼ 30Þ �0.00546397 0.002966064RSS ðk ¼ 3; m ¼ 30Þ �0.004072926 0.002601628
Figure 2. Pointwise variances of the density estimates
720 K. GHOSH AND R. C. TIWARI
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 711–728
Since shrub sizes are non-negative, we have assumed a lognormal distribution for the observed
data. Thus, we have worked with Y ¼ logX, which will then be normal. So, our model is:
Y j ð�; �2Þ � LNð�; �2Þ, � ¼ ð�; �2Þ � G, G � DðM;G0Þ. Under G0; � j ð�; �; �2Þ � Nð�; ��2Þ,� � Nð; 107Þ. We put each observation in a separate row in the set-up of (1) (i.e. n ¼ 15 and
mi ¼ 1). We used the following prior parameters: � � Nð0; 107Þ, � � IGð2 þ 102; 102ð1 þ 102ÞÞ,�2 � IGð12; 11Þ, M � Gammað0:01; 0:01Þ. Note that the prior choice for � was made to make it
as non-informative as possible. The prior choice of � was made so that the average of � was large, and
the density was reasonably flat. Note that 1=� can be interpreted as the precision in the mean �, so
large values of � are used to reect the imprecise information on �. Similarly, trial and error with
various prior choices for �2 resulted in various degrees of smoothness of the density estimator, with
smaller prior mean resulting in rougher estimates. The first 50 000 simulations were burned-in and
the next 150 000 simulations were used to do the posterior calculations for density estimation,
sampling the results of every 150th iteration. Convergence was ascertained by running the results of
the simulations through CODA. We present below the result of Heidelberger and Welch stationarity
and interval half-width tests on selected variables. The chain passes stationarity criterion but some
variables fail the half-width test. This suggests running the chain longer. An additional 2000 runs
resulted in passing the half-width test. Selected convergence diagnostics are given in Figures 3–5 and
they also confirm convergence of the chain.
5.1. Heidelberger and Welch stationarity and interval half-width tests
Iterations used = 2000:50000
Thinning interval = 150
Sample size per chain = 321
Precision of half-width test = 0.1
$chain1
Stationarity Start p-value
test iteration
mu1 passed 2000 0.167
sigma1 passed 2000 0.628
eta passed 2000 0.588
tau passed 2000 0.210
prec passed 2000 0.212
Table 2. Ranked set sample of shrub sizes
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ri 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3ki 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3Xi 0.79 0.20 0.57 0.35 0.75 1.45 0.97 0.97 0.98 1.5 0.52 0.62 2.54 2.12 1.86
BAYESIAN DENSITY ESTIMATION USING RANKED SET SAMPLES 721
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 711–728
Half-width Mean Half-width
test
mu1 failed -0.107 0.0250
sigma1 passed 0.665 0.0054
eta failed -0.587 0.4117
tau passed 46.953 0.6398
prec passed 1.300 0.1062
5.2. Geweke convergence diagnostic (Z-score)
Iterations used = 2000:50000
Thinning interval = 150
Sample size per chain = 321
$chain1
Fraction in 1st window¼0.1
Fraction in 2nd window¼0.5
mu1 sigma1 eta tau prec
1.50635 �0.83367 �0.04089 1.16328 0.00556
Iterations = 2000:50000
Thinning interval = 150
Number of chains = 1
Sample size per chain = 321
1. Empirical mean and standard deviation for each variable, plus standard
error of the mean:
Mean SD Naive SE Time-series SE
mu1 -0.1071 0.23076 0.012880 0.012776
sigma1 0.6654 0.04816 0.002688 0.002755
eta -0.5870 3.88699 0.216950 0.210030
tau 46.9535 6.84312 0.381946 0.326422
prec 1.3004 0.91771 0.051221 0.054185
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
mu1 -0.7775 -0.1965 -0.09321 0.01570 0.2599
sigma1 0.5845 0.6232 0.67042 0.70750 0.7384
eta -8.2234 -3.2883 -0.62157 1.80865 6.5462
tau 36.8106 41.0880 46.25485 53.12339 58.5537
prec 0.3504 0.6368 1.06644 1.74583 3.3364
722 K. GHOSH AND R. C. TIWARI
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 711–728
Fig
ure
3.
Tra
ceplo
tof
sele
cted
par
amet
ers
BAYESIAN DENSITY ESTIMATION USING RANKED SET SAMPLES 723
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 711–728
We used the above Gibbs-sampler to get density estimates of Xnþ1 where rnþ1 ¼ knþ1 ¼ 1. Figures 6
and 7 show the effect of varying the parameters of the prior distributions of �2 and � . As expected, �2
controls the ‘roughness’ of the density estimator, with smaller �2 giving rougher estimates. The role of
�2 is similar to the role of window width in kernel density estimation. A large � leads to a bigger
variance of �, leading to a flatter prior. The choice of prior distribution of M had negligible effect (see
Figure 8), showing that our analysis was quite robust.
To see the effect of using an SRS in place of RSS, we repeated similar analysis on the same
dataset, but pretending that it was an SRS (case 3 of Section 2.1). Although, in theory, an RSS of
size n costs more to collect than an SRS of the same size, the practical costs of ordering
observations are often negligible and the comparison is a valid one. The two resulting density
estimates are given in Figure 9. Note that the density estimate resulting from RSS is more peaked
than that obtained from SRS, but both have more or less the same center. Hence, any inference
about the center of the distribution will be more accurate using the density estimate based on RSS
than by using one based on SRS.
6. CONCLUSION
We have studied a semiparametric Bayesian method of estimating density when the data are a ranked
set sample. While the estimator does not have an analytically tractable expression, this method is
Figure 4. Autocorrelations of selected parameters
724 K. GHOSH AND R. C. TIWARI
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 711–728
easily programmed on a computer. Simulation studies have shown that the estimation procedure
results in more precise density estimates when a ranked set sample is used compared to a simple
random sample of the same size.
The results presented in this article are based on simulation studies for large samples. The presence
of large samples allows one to use non-informative priors, without running the Gibbs sampler too long
to achieve convergence. In practice, however, we will be dealing with small sample sizes when we
work with ranked set samples. To offset the potential non-identifiability problems related to the
Dirichlet process in such cases (see, for example, Gelfand and Sahu, 1999), stronger (i.e. more
informative) priors may be needed. This may result in severely biased density estimates should
the prior choices be wrong. Another way around this problem would be to use more than one
observation per unknown parameter (e.g. as in longitudinal studies in Kleinman and Ibrahim,
1998a,b). This means forcing certain observations to share the same parameter value. For example,
we may decide that all sample minima have the same � and all sample maxima have another �. The
effect of inaccurate prior information can also be offset by appropriately choosing large M in the
Dirichlet process.
For future research, it would be of interest to study the robustness of the Bayesian density estimator
to the choice of the parametric form of the baseline distribution. Another problem of interest would be
to study the effect of balanced versus unbalanced ranked set samples.
Figure 5. Plot of Geweke test statistic
BAYESIAN DENSITY ESTIMATION USING RANKED SET SAMPLES 725
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 711–728
Figure 6. Effect of varying the prior parameters of �2. Prior parameters leading to smaller values of �2 result in ‘rougher’
density estimates
Figure 7. Effect of varying the prior parameters of �
726 K. GHOSH AND R. C. TIWARI
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 711–728
Figure 8. Effect of varying prior parameters of precision M
Figure 9. Effect of treating the RSS as an SRS. Note that the density estimate based on RSS corresponds to a smaller variability
in the estimated distribution
BAYESIAN DENSITY ESTIMATION USING RANKED SET SAMPLES 727
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 711–728
REFERENCES
Basu D, Tiwari RC. 1982. A note on the Dirichlet Process. In Statistics and Probability: Essays in Honor of C. R. Rao. North-Holland, New York, 89–103.
Best NG, Cowles MK, Vines K. 1995. CODA: convergence diagnosis and output analysis software for Gibbs sampling output,version 0.30, Technical Report, MRC Biostatistics Unit, University of Cambridge.
Blackwell D. 1973. Discreteness of Ferguson selections. Annals of Statistics 1(2): 356–358.Blackwell D, MacQueen JB. 1973. Ferguson distribution via Polya Urn schemes. Annals of Statistics 1(2): 353–355.Boyles RA, Samaniego FJ. 1986. Estimating a distribution function based on nomination sampling. Journal of the AmericanStatistical Association 81: 1039–1045.
Cowles MK, Carlin BP. 1996, Markov chain Monte Carlo convergence diagnostics: a comparative review. Journal of theAmerican Statistical Association 91(434): 883–904.
Dell TR, Clutter JL. 1972. Ranked set sampling theory with order statistics background. Biometrics 28: 545–553.Escobar MD, West M. 1995. Bayesian density estimation and inference using mixtures, Journal of the American StatisticalAssociation 90(430): 577–588.
Escobar MD, West M. 1998. Computing nonparametric hierarchical models. In Practical Nonparametric and SemiparametricBayesian Statistics, Dey DK, Muller P, Sinha D (eds). Springer: New York; 1–22.
Ferguson TS. 1973. A Bayesian analysis of some nonparametric problems. Annals of Statistics 1(2): 209–230.Ferguson TS. 1983. Bayesian density estimation by mixtures of normal distributions. In Recent Advances in Statistics, Rizvi H,
Rustagi J (eds). Academic Press: New York; 287–302.Ferguson TS, Phadia EG, Tiwari RC. 1992. Bayesian nonparametric inference. In Current Issues in Statistical Inference: Essaysin Honor of D. Basu, number 17 in IMS Lecture Notes Monograph Series. Institute of Mathematical Statistics: Hayward, CA;127–150.
Gelfand AE, Sahu SK. 1999. Identifiability, improper priors and Gibbs sampling for generalized linear models. Journal of theAmerican Statistical Association 94: 247–253.
Kaur A, Patil GP, Sinha AK, Taillie C. 1995. Ranked set sampling: an annotated bibliography. Environmental and EcologicalStatistics 2: 25–54.
Kleinman KP, Ibrahim JG. 1998a. A semi-parametric Bayesian approach to generalized linear mixed models. Statistics inMedicine 17: 2579–2596.
Kleinman KP, Ibrahim JG. 1998b. A semi-parametric Bayesian approach to the random effects model. Biometrics 54: 921–938.Kullback S. 1959. Information Theory and Statistics. Wiley: New York.Kuo L. 1986. Computations of mixtures of Dirichlet processes. SIAM Journal of Scientific Statistical Computing 7: 60–71.Kvam PH, Samaniego FJ. 1994. Nonparametric maximum likelihood estimation based on ranked set samples. Journal ofAmerican Statistical Association 89(426): 526–537.
Kvam PH, Tiwari RC. 1999. Bayes estimation of a distribution function using ranked set samples. Environmental and EcologicalStatistics 6: 11–22.
Lindsay B. 1983. The geometry of mixture likelihoods, Part I: a general theory. The Annals of Statistics 11: 86–94.McIntyre GA. 1952. A method of unbiased selective sampling using ranked sets. Australian Journal of Agricultural Research 3:
385–390.Mode NA, Conquest LL, Marker DA. 1999. Ranked set sampling for ecological research: accounting for the total costs of
sampling. Environmetrics 10: 179–194.Muttlak HA, McDonald LL. 1990. Ranked set sampling with size-biased probability of selection. Biometrics 46: 435–445.Neal RM. 2000. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational andGraphical Statistics 9(2): 249–265.
Nussbaum BD, Sinha BK. 1997. Cost effective gasoline sampling using ranked set sampling. In American Statistical Association1997 Proceedings of the Section on Statistics and the Environment. American Statistical Association: Alexandria, VA; 83–87.
Patil G, Sinha AK, Taillie C. 1999. Ranked set sampling: a bibliograpy. Environmental and Ecological Statistics 6: 91–98.Press WH, Teukolsky SA, Vetterling WT, Flannery BP. 1997. Numerical Recipes in C (2nd edn). Cambridge University Press:
Cambridge.Robert CP, Casella G. 1999. Monte Carlo Statistical Methods. Springer: New York.Silverman BW. 1981. Using kernel density estimates to investigate multimodality. Journal of The Royal Statistical Society 43:
97–99.Silverman BW. 1986. Density Estimation for Statistics and Data Analysis. Chapman & Hall: New York.Stokes SL, Sager TW. 1988. Characterization of a ranked set sample with applications to estimating distribution functions.Journal of the American Statistical Association 83: 374–381.
Takahashi K, Wakimoto K. 1968. On unbiased estimates of the population mean based on samples stratified by means ofordering. Annals of Institute of Mathematical Statistics 20: 1–31.
Wells MT, Tiwari RC. 1990. Estimating a distribution function based on minima-nominated sampling. In Topics in StatisticalDependence, Block H, Sampson A, Savits T (eds). Institute of Mathematical Statistics: Hayward, CA; 471–479.
Yu PLH, Lam K. 1997. Regression estimator in ranked set sampling. Biometrics 53: 1070–1080.
728 K. GHOSH AND R. C. TIWARI
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 711–728