Post on 30-Jul-2020
The mclust PackageJanuary 18, 2005
Version 2.1-8
Author C. Fraley and A.E. Raftery, Dept. of Statistics, University of Washington.
Title Model-based cluster analysis
Description Model-based cluster analysis: the 2002 version of MCLUST
Depends R (>= 1.7.0)
License See http://www.stat.washington.edu/mclust/license.txt
Maintainer Ron Wehrens <R.Wehrens@science.ru.nl>
URL http://www.stat.washington.edu/mclust
R topics documented:
Defaults.Mclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2EMclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4EMclustN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6Mclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8bic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9bicE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11bicEMtrain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12cdens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13cdensE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16chevron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18clPairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18classError . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20compareClass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21coordProj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22cv1EMtrain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24decomp2sigma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25dens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28diabetes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29em . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30emE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32estep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35estepE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37grid1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39
1
2 Defaults.Mclust
hc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40hcE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42hclass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43hypvol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44lansing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46mapClass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47mclust-internal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47mclust1Dplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48mclust2Dplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .49mclustDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51mclustDAtest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53mclustDAtrain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54mclustOptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56me . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58meE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60mstep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .62mstepE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .64mvn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65mvnX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67partconv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .68partuniq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .69plot.Mclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .69plot.mclustDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70randProj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .72sigma2decomp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .74sim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75simE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .77spinProj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79summary.EMclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .81summary.EMclustN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82summary.Mclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83summary.mclustDAtest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .84summary.mclustDAtrain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85surfacePlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .86uncerPlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88unmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .89
Index 91
Defaults.Mclust List of values controlling defaults for some MCLUST functions.
Description
A named list of values including tolerances for singularity and convergence assessment, and anenumeration of models used as defaults in MCLUST functions.
Details
A functionmclustOptions is supplied for assigning values to the.Mclust list.
Defaults.Mclust 3
Value
A list with the following components:
eps A scalar tolerance for deciding when to terminate computations due to com-putational singularity in covariances. Smaller values ofeps allow computa-tions to proceed nearer to singularity. The default is the relative machine pre-cision .Machine$double.eps , which is approximately $2e-16$ on IEEE-compliant machines.
tol A vector of length two giving relative convergence tolerances for the loglikeli-hood and for parameter convergence in the inner loop for models with iterativeM-step ("VEI", "VEE", "VVE", "VEV"), respectively. The default isc(1.e-5,1.e-5) .
itmax A vector of length two giving integer limits on the number of EM iterations andon the number of iterations in the inner loop for models with iterative M-step("VEI", "VEE", "VVE", "VEV"), respectively. The default isc(Inf,Inf)allowing termination to be completely governed bytol .
equalPro Logical variable indicating whether or not the mixing proportions are equal inthe model. Default:equalPro = FALSE .
warnSingular A logical value indicating whether or not a warning should be issued whenevera singularity is encountered. Default:warnSingular = TRUE .
emModelNames A vector of character strings indicating the models to be used for multivari-ate data in the functions such asEMclust and mclustDAtrain that in-volve multiple models. The default is all of the multivariate models availablein MCLUST:
"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume and shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume and shape"EEE": ellipsoidal, equal volume, shape, and orientation"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation
hcModelName A vector of two character strings giving the name of the model to be used in thehierarchical clustering phase for univariate and multivariate data, respectively,in EMclust andEMclustN . The default isc("V","VVV") , giving the un-constrained model in each case.
symbols A vector whose entries are either integers corresponding to graphics symbols orsingle characters for plotting for classifications. Classes are assigned symbolsin the given order.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and den-sity estimation. Journal of the American Statistical Association. Seehttp://www.stat.washington.edu/tech.reports (No. 380, 2000).
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/tech.reports .
4 EMclust
See Also
mclustOptions , EMclust , mclustDAtrain , em, me, estep , mstep
Examples
n <- 250 ## create artificial dataset.seed(0)x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),
matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])xclass <- c(rep(1,n),rep(2,n))odd <- seq(1, 2*n, 2)train <- mclustDAtrain(x[odd, ], labels = xclass[odd]) ## training stepeven <- odd + 1test <- mclustDAtest(x[even, ], train) ## compute model densities
data(iris)irisMatrix <- iris[,1:4]irisClass <- iris[,5]
.Mclust
.Mclust <- mclustOptions(tol = 1.e-6, emModelNames = c("VII", "VVI", "VVV"))
.MclustirisBic <- EMclust(irisMatrix)summary(irisBic, irisMatrix).Mclust <- mclustOptions() # restore defaults.Mclust
EMclust BIC for Model-Based Clustering
Description
BIC for EM initialized by hierarchical clustering for parameterized Gaussian mixture models.
Usage
EMclust(data, G, emModelNames, hcPairs, subset, eps, tol, itmax, equalPro,warnSingular, ...)
Arguments
data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.
G An integer vector specifying the numbers of mixture components (clusters) forwhich the BIC is to be calculated. The default is1:9 .
emModelNames A vector of character strings indicating the models to be fitted in the EM phaseof clustering. Possible models:
"E" for spherical, equal variance (one-dimensional)"V" for spherical, variable variance (one-dimensional)"EII": spherical, equal volume
EMclust 5
"VII": spherical, unequal volume"EEI": diagonal, equal volume, equal shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume, varying shape"EEE": ellipsoidal, equal volume, shape, and orientation"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation
The default is.Mclust$emModelNames .
hcPairs A matrix of merge pairs for hierarchical clustering such as produced by func-tion hc . The default is to compute a hierarchical clustering tree by applyingfunctionhc with modelName = .Mclust$hcModelName[1] to univari-ate data andmodelName = .Mclust$hcModelName[2] to multivariatedata or a subset as indicated by thesubset argument. The hierarchical clus-tering results are used as starting values for EM.
subset A logical or numeric vector specifying the indices of a subset of the data to beused in the initial hierarchical clustering phase.
eps A scalar tolerance for deciding when to terminate computations due to compu-tational singularity in covariances. Smaller values ofeps allow computationsto proceed nearer to singularity. The default is.Mclust$eps .
tol A scalar tolerance for relative convergence of the loglikelihood. The default is.Mclust$tol .
itmax An integer limit on the number of EM iterations. The default is.Mclust$itmax .
equalPro Logical variable indicating whether or not the mixing proportions are equal inthe model. The default is.Mclust$equalPro .
warnSingular A logical value indicating whether or not a warning should be issued whenevera singularity is encountered. The default iswarnSingular=FALSE .
... Provided to allow lists with elements other than the arguments can be passed inindirect or list calls withdo.call .
Value
Bayesian Information Criterion for the specified mixture models numbers of clusters. Auxiliaryinformation returned as attributes.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611:631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
summary.EMclust , EMclustN , hc , me, mclustOptions
6 EMclustN
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])
irisBic <- EMclust(irisMatrix)irisBicplot(irisBic)
irisBic <- EMclust(irisMatrix, subset = sample(1:nrow(irisMatrix), 100))irisBicplot(irisBic)
EMclustN BIC for Model-Based Clustering with Poisson Noise
Description
BIC for EM initialized by hierarchical clustering for parameterized Gaussian mixture models withPoisson noise.
Usage
EMclustN(data, G, emModelNames, noise, hcPairs, eps, tol, itmax,equalPro, warnSingular=FALSE, Vinv, ...)
Arguments
data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.
G An integer vector specifying the numbers of MVN (Gaussian) mixture compo-nents (clusters) for which the BIC is to be calculated. The default is0:9 where0 indicates only a noise component.
emModelNames A vector of character strings indicating the models to be fitted in the EM phaseof clustering. Possible models:
"E" for spherical, equal variance (one-dimensional)"V" for spherical, variable variance (one-dimensional)"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume, equal shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume, varying shape"EEE": ellipsoidal, equal volume, shape, and orientation"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation
The default is.Mclust$emModelNames .
EMclustN 7
noise A logical or numeric vector indicating whether or not observations are initiallyestimated to noise in the data. If there is no noiseEMclust should be use ratherthanEMclustN .
hcPairs A matrix of merge pairs for hierarchical clustering such as produced by func-tion hc . The default is to compute a hierarchical clustering tree by applyingfunctionhc with modelName = .Mclust$hcModelName[1] to univari-ate data andmodelName = .Mclust$hcModelName[2] to multivariatedata or a subset as indicated by thesubset argument. The hierarchical clus-tering results are used as starting values for EM.
eps A scalar tolerance for deciding when to terminate computations due to compu-tational singularity in covariances. Smaller values ofeps allow computationsto proceed nearer to singularity. The default is.Mclust$eps .
tol A scalar tolerance for relative convergence of the loglikelihood. The default is.Mclust$tol .
itmax An integer limit on the number of EM iterations. The default is.Mclust$itmax .
equalPro Logical variable indicating whether or not the mixing proportions are equal inthe model. The default is.Mclust$equalPro .
Vinv An estimate of the reciprocal hypervolume of the data region. The default isdetermined by applying functionhypvol to the data.
warnSingular A logical value indicating whether or not a warning should be issued whenevera singularity is encountered. The default iswarnSingular=FALSE .
... Provided to allow lists with elements other than the arguments can be passed inindirect or list calls withdo.call .
Value
Bayesian Information Criterion for the specified mixture models numbers of clusters. Auxiliaryinformation returned as attributes.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
summary.EMclustN , EMclust , hc , me, mclustOptions
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]
b <- apply( irisMatrix, 2, range)n <- 450set.seed(0)
8 Mclust
poissonNoise <- apply(b, 2, function(x, n=n)runif(n, min = x[1]-0.1, max = x[2]+.1), n = n)
set.seed(0)noiseInit <- sample(c(TRUE,FALSE),size=150+450,replace=TRUE,prob=c(3,1))Bic <- EMclustN(data=rbind(irisMatrix, poissonNoise), noise = noiseInit)Bicplot(Bic)
Mclust Model-Based Clustering
Description
Clustering via EM initialized by hierarchical clustering for parameterized Gaussian mixture models.The number of clusters and the clustering model is chosen to maximize the BIC.
Usage
Mclust(data, minG, maxG)
Arguments
data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.
minG An integer vector specifying the minimum number of mixture components (clus-ters) to be considered. The default is1 component.
maxG An integer vector specifying the maximum number of mixture components (clus-ters) to be considered. The default is9 components.
Value
A list representing the best model (according to BIC) for the given range of numbers of clusters.The following components are included:
BIC A matrix giving the BIC value for each model (rows) and number of clusters(columns).
bic A scalar giving the optimal BIC value.
modelName The MCLUST name for the best model according to BIC.classification
The classification corresponding to the optimal BIC value.
uncertainty The uncertainty in the classification corresponding to the optimal BIC value.
mu For multidimensional models, a matrix whose columns are the means of eachgroup in the best model. For one-dimensional models, a vector whose entriesare the means for each group in the best model.
sigma For multidimensional models, a three dimensional array in whichsigma[,,k]gives the covariance for thekth group in the best model. For one-dimensionalmodels, either a scalar giving a common variance for the groups or a vectorwhose entries are the variances for each group in the best model.
pro The mixing probabilities for each component in the best model.
bic 9
z A matrix whose[i,k] th entry is the probability that observationi belongs to thekcomponent in the model. The optimal classification is derived from this, chosingthe class to be the one giving the maximum probability.
loglik The log likelihood for the data under the best model.
Details
The following models are compared inMclust :
"E" for spherical, equal variance (one-dimensional)"V" for spherical, variable variance (one-dimensional)
"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume, equal shape"VVI": diagonal, varying volume, varying shape"EEE": ellipsoidal, equal volume, shape, and orientation"VVV": ellipsoidal, varying volume, shape, and orientation
Mclust is intended to combineEMclust and itssummary in a simiplified one-step model-basedclustering function. The latter provide more flexibility including choice of models.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
plot.Mclust , EMclust
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]irisMclust <- Mclust(irisMatrix)
## Not run: plot(irisMclust,irisMatrix)
bic BIC for Parameterized MVN Mixture Models
Description
Compute the BIC (Bayesian Information Criterion) for parameterized mixture models given theloglikelihood, the dimension of the data, and number of mixture components in the model.
10 bic
Usage
bic(modelName, loglik, n, d, G, ...)
Arguments
modelName A character string indicating the model. Possible models:
"E" for spherical, equal variance (one-dimensional)"V" for spherical, variable variance (one-dimensional)"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume, equal shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume, varying shape"EEE": ellipsoidal, equal volume, shape, and orientation"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation
loglik The loglikelihood for a data set with respect to the MVN mixture model speci-fied in themodelName argument.
n The number of observations in the data use to computeloglik .
d The dimension of the data used to computeloglik .
G The number of components in the MVN mixture model used to computeloglik .
... Arguments for diagonal-specific methods, in particular
equalPro A logical variable indicating whether or not the components in themodel are assumed to be present in equal proportion. The default is.Mclust$equalPro .
noise A logical variable indicating whether or not the model includes and op-tional Poisson noise component. The default is to assume that the modeldoes not include a noise component.
Value
The BIC or Bayesian Information Criterion for the given input arguments.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611:631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
bicE , . . . ,bicVVV , EMclust , estep , mclustOptions , do.call .
bicE 11
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]
n <- nrow(irisMatrix)d <- ncol(irisMatrix)G <- 3
emEst <- me(modelName="VVI", data=irisMatrix, unmap(irisClass))names(emEst)
args(bic)bic(modelName="VVI",loglik=emEst$loglik,n=n,d=d,G=G)## Not run: do.call("bic", emEst) ## alternative call
bicE BIC for a Parameterized MVN Mixture Model
Description
Compute the BIC (Bayesian Information Criterion) for a parameterized mixture model given theloglikelihood, the dimension of the data, and number of mixture components in the model.
Usage
bicE(loglik, n, G, equalPro, noise = FALSE, ...)bicV(loglik, n, G, equalPro, noise = FALSE, ...)bicEII(loglik, n, d, G, equalPro, noise = FALSE, ...)bicVII(loglik, n, d, G, equalPro, noise = FALSE, ...)bicEEI(loglik, n, d, G, equalPro, noise = FALSE, ...)bicVEI(loglik, n, d, G, equalPro, noise = FALSE, ...)bicEVI(loglik, n, d, G, equalPro, noise = FALSE, ...)bicVVI(loglik, n, d, G, equalPro, noise = FALSE, ...)bicEEE(loglik, n, d, G, equalPro, noise = FALSE, ...)bicEEV(loglik, n, d, G, equalPro, noise = FALSE, ...)bicVEV(loglik, n, d, G, equalPro, noise = FALSE, ...)bicVVV(loglik, n, d, G, equalPro, noise = FALSE, ...)
Arguments
loglik The loglikelihood for a data set with respect to the MVN mixture model.
n The number of observations in the data used to computeloglik .
d The dimension of the data used to computeloglik .
G The number of components in the MVN mixture model used to computeloglik .
equalPro A logical variable indicating whether or not the components in the model are as-sumed to be present in equal proportion. The default is.Mclust$equalPro .
noise A logical variable indicating whether or not the model includes and optionalPoisson noise component. The default is to assume that the model does notinclude a noise component.
... Catch unused arguments from ado.call call.
12 bicEMtrain
Value
The BIC or Bayesian Information Criterion for the MVN mixture model and data set correspondingto the input arguments.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611:631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
bic , EMclust , estepE , mclustOptions , do.call
Examples
## To run an example, see man page for bic## Not run:data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]
n <- nrow(irisMatrix)d <- ncol(irisMatrix)G <- 3
emEst <- meVVI(data=irisMatrix, unmap(irisClass))names(emEst)
bicVVI(loglik=emEst$loglik, n=n, d=d, G=G)do.call("bicVVI", emEst) ## alternative call## End(Not run)
bicEMtrain Select models in discriminant analysis using BIC
Description
For the ten available discriminant models the BIC is calulated. The models for one-dimensionaldata are "E" and "V"; for higher dimensions they are "EII", "VII", "EEI", "VEI", "EVI", "VVI","EEE", "EEV", "VEV" and "VVV". This function is much faster thancv1EMtrain .
Usage
bicEMtrain(data, labels, modelNames)
cdens 13
Arguments
data A data matrix
labels Labels for each row in the data matrix
modelNames Vector of model names that should be tested.
Value
Returns a vector where each element is the BIC for the corresponding model.
Author(s)
C. Fraley
See Also
cv1EMtrain
Examples
data(lansing)odd <- seq(from=1, to=nrow(lansing), by=2)round(bicEMtrain(lansing[odd,-3], labels=lansing[odd, 3]), 1)
cdens Component Density for Parameterized MVN Mixture Models
Description
Computes component densities for observations in parameterized MVN mixture models.
Usage
cdens(modelName, data, mu, ...)
Arguments
modelName A character string indicating the model. Possible models:
"E" for spherical, equal variance (one-dimensional)"V" for spherical, variable variance (one-dimensional)
"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume, equal shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume, varying shape"EEE": ellipsoidal, equal volume, shape, and orientation"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation
14 cdens
For fitting a single Gaussian:
"X": one-dimensional"XII": spherical"XXI": diagonal"XXX": ellipsoidal
data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.
mu The mean for each component. If there is more than one component,mu is amatrix whose columns are the means of the components.
... Arguments for model-specific functions. Specifically:
• logarithm : A logical value indicating whether or not the logarithm ofthe component densities should be returned. The default is to return thecomponent densities, obtained from the log component densities by expo-nentiation.
• An argument describing the variance (depends on the model):
sigmasq for the one-dimensional models ("E", "V") and spherical models("EII", "VII"). This is either a vector whosekth component is the vari-ance for thekth component in the mixture model ("V" and "VII"), ora scalar giving the common variance for all components in the mixturemodel ("E" and "EII").
decomp for the diagonal models ("EEI", "VEI", "EVI", "VVI") and someellipsoidal models ("EEV", "VEV"). This is a list with the followingcomponents:
d The dimension of the data.
G The number of components in the mixture model.
scale Either aG-vector giving the scale of the covariance (thedth rootof its determinant) for each component in the mixture model, or asingle numeric value if the scale is the same for each component.
shape Either aG by d matrix in which thekth column is the shapeof the covariance matrix (normalized to have determinant 1) for thekth component, or ad-vector giving a common shape for all compo-nents.
orientation Either ad by d by G array whose[,,k] th entry is the or-thonomal matrix of eigenvectors of the covariance matrix of thekthcomponent, or ad by d orthonormal matrix if the mixture compo-nents have a common orientation. Theorientation componentof decomp can be omitted in spherical and diagonal models, forwhich the principal components are parallel to the coordinate axesso that the orientation matrix is the identity.
Sigma for the equal variance model "EEE". Ad by d matrix giving thecommon covariance for all components of the mixture model.
sigma for the unconstrained variance model "VVV". Ad by d by G ma-trix array whose[,,k] th entry is the covariance matrix for thekthcomponent of the mixture model.The form of the variance specification is the same as for the output fortheem, me, or mstep methods for the specified mixture model.
cdens 15
• eps : A scalar tolerance for deciding when to terminate computations dueto computational singularity in covariances. Smaller values ofeps allowcomputations to proceed nearer to singularity. The default is.Mclust$eps .For those models with iterative M-step ("VEI", "VEV"), two values can beentered foreps , in which case the second value is used for determiningsingularity in the M-step.
• warnSingular : A logical value indicating whether or not a warningshould be issued whenever a singularity is encountered. The default is.Mclust$warnSingular .
Value
A numeric matrix whose[i,j] th entry is the density of observationi in componentj. The densitiesare not scaled by mixing proportions.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
cdensE , . . . ,cdensVVV , dens , EMclust , mstep , mclustDAtrain , mclustDAtest , mclustOptions ,do.call
Examples
n <- 100 ## create artificial data
set.seed(0)x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),
matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])xclass <- c(rep(1,n),rep(2,n))clPairs(x, cl = xclass, sym = c("1","2")) ## display the data
set.seed(0)I <- sample(1:(2*n)) ## random ordering of the datax <- x[I, ]xclass <- xclass[I]
odd <- seq(1, 2*n, by = 2)oddBic <- EMclust(x[odd, ])oddSumry <- summary(oddBic, x[odd, ]) ## best parameter estimatesnames(oddSumry)
even <- odd + 1temp <- cdens(modelName = oddSumry$modelName, data = x[even, ],
mu = oddSumry$mu, decomp = oddSumry$decomp)cbind(class = xclass[even], temp)
## alternative call
16 cdensE
## Not run:temp <- do.call( "cdens", c(list(data = x[even, ]), oddSumry))cbind(class = xclass[even], temp)## End(Not run)
cdensE Component Density for a Parameterized MVN Mixture Model
Description
Computes component densities for points in a parameterized MVN mixture model.
Usage
cdensE(data, mu, sigmasq, eps, warnSingular, logarithm = FALSE, ...)cdensV(data, mu, sigmasq, eps, warnSingular, logarithm = FALSE, ...)cdensEII(data, mu, sigmasq, eps, warnSingular, logarithm = FALSE, ...)cdensVII(data, mu, sigmasq, eps, warnSingular, logarithm = FALSE, ...)cdensEEI(data, mu, decomp, eps, warnSingular, logarithm = FALSE, ...)cdensVEI(data, mu, decomp, eps, warnSingular, logarithm = FALSE, ...)cdensEVI(data, mu, decomp, eps, warnSingular, logarithm = FALSE, ...)cdensVVI(data, mu, decomp, eps, warnSingular, logarithm = FALSE, ...)cdensEEE(data, mu, eps, warnSingular, logarithm = FALSE, ...)cdensEEV(data, mu, decomp, eps, warnSingular, logarithm = FALSE, ...)cdensVEV(data, mu, decomp, eps, warnSingular, logarithm = FALSE, ...)cdensVVV(data, mu, eps, warnSingular, logarithm = FALSE, ...)
Arguments
data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.
mu The mean for each component. If there is more than one component,mu is amatrix whose columns are the means of the components.
sigmasq for the one-dimensional models ("E", "V") and spherical models ("EII", "VII").This is either a vector whosekth component is the variance for thekth com-ponent in the mixture model ("V" and "VII"), or a scalar giving the commonvariance for all components in the mixture model ("E" and "EII").
decomp for the diagonal models ("EEI", "VEI", "EVI", "VVI") and some ellipsoidalmodels ("EEV", "VEV"). This is a list described in more detail incdens .
logarithm A logical value indicating whether or not the logarithm of the component den-sities should be returned. The default is to return the component densities, ob-tained from the log component densities by exponentiation.
... An argument giving the variance that takes one of the following forms:
decomp for models "EII" and "VII"; see above.
cholSigma see Sigma, for "EEE".
Sigma for the equal variance model "EEE". Ad by d matrix giving the commoncovariance for all components of the mixture model.
cdensE 17
cholsigma see sigma, for "VVV".
sigma for the unconstrained variance model "VVV". Ad by d by G matrixarray whose[,,k] th entry is the covariance matrix for thekth componentof the mixture model.The form of the variance specification is the same as for the output for theem, me, or mstep methods for the specified mixture model.Also used to catch unused arguments from ado.call call.
eps A scalar tolerance for deciding when to terminate computations due to compu-tational singularity in covariances. Smaller values ofeps allow computationsto proceed nearer to singularity. The default is.Mclust$eps .
warnSingular A logical value indicating whether or not a warning should be issued whenevera singularity is encountered. The default is.Mclust$warnSingular .
Value
A numeric matrix whose[i,j] th entry is the density of observationi in componentj. The densitiesare not scaled by mixing proportions.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
cdens , dens , EMclust , mstep , mclustOptions , do.call
Examples
n <- 100 ## create artificial data
set.seed(0)x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),
matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])xclass <- c(rep(1,n),rep(2,n))clPairs(x, cl = xclass, sym = c("1","2")) ## display the data
modelVII <- meVII(x, z = unmap(xclass))modelVVI <- meVVI(x, z = unmap(xclass))modelVVV <- meVVV(x, z = unmap(xclass))
names(modelVII)args(cdensVII)cdenVII <- cdensVII(data = x, mu = modelVII$mu, pro = modelVII$pro,
decomp = modelVII$decomp)names(modelVVI)args(cdensVVI)cdenVVI <- cdensVII(data = x, mu = modelVVI$mu, pro = modelVVI$pro,
decomp = modelVVI$decomp)names(modelVVV)
18 clPairs
args(cdensVVV)cdenVVV <- cdensVVV( data = x, mu = modelVVV$mu, pro = modelVVV$pro,
cholsigma = modelVVV$cholsigma)
cbind(class=xclass,VII=map(cdenVII),VVI=map(cdenVVI),VVV=map(cdenVVV))
## alternative call
## Not run:cdenVII <- do.call("cdensVII", c(list(data = x), modelVII))cdenVVI <- do.call("cdensVVI", c(list(data = x), modelVVI))cdenVVV <- do.call("cdensVVV", c(list(data = x), modelVVV))
cbind(class=xclass,VII=map(cdenVII),VVI=map(cdenVVI),VVV=map(cdenVVV))## End(Not run)
chevron Simulated minefield data
Description
A two-dimensional data set of simulated minefield data (1104 observations).
Usage
data(chevron)
References
C. Fraley and A.E. Raftery,Computer J., 41:578-588 (1998)
clPairs Pairwise Scatter Plots showing Classification
Description
Creates a scatter plot for each pair of variables in given data. Observations in different classes arerepresented by different symbols.
Usage
clPairs(data, classification, symbols, labels=dimnames(data)[[2]],CEX=1, col, ...)
clPairs 19
Arguments
data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.
classificationA numeric or character vector representing a classification of observations (rows)of data .
symbols Either an integer or character vector assigning a plotting symbol to each uniqueclassclassification . Elements insymbols correspond to classes in or-der of appearance in the sequence of observations (the order used by the functionunique ). Default: If G is the number of groups in the classification, the firstG symbols in.Mclust$symbols , otherwise ifG is less than 27 then the firstG capital letters in the Roman alphabet. If noclassification argument isgiven the default symbol is"." .
labels A vector of character strings for labeling the variables. The default is to use thecolumn dimension names ofdata .
CEX An argument specifying the size of the plotting symbols. The default value is 1.
col Color vector to use. Default is one color per class. Splus default: all black.
... Additional arguments to be passed to the graphics device.
Side Effects
Scatter plots for each combination of variables indata are created on the current graphics device.Observations of different classifications are labeled with different symbols.
References
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
pairs , coordProj , mclustOptions
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]
clPairs(irisMatrix, cl=irisClass, symbols=as.character(1:3))
20 classError
classError Classification error.
Description
Error for a given classification relative to a known truth. Location of errors in a given classificationrelative to a known truth.
Usage
classError(classification, truth)
Arguments
classificationA numeric or character vector of class labels.
truth A numeric or character vector of class labels. Must have the same length asclassification .
Details
classErrors will only return one possibility if more than one mapping between classificationand truth results in the minimum error.
Value
classError gives the fraction of elements misclassified forclassification relative totruth . classErrors is a logical vector of the same length asclassification andtruthwhich gives the location of misclassified elements inclassification relative totruth .
See Also
compareClass , mapClass , table
Examples
a <- rep(1:3, 3)ab <- rep(c("A", "B", "C"), 3)bclassError(a, b)classErrors(a, b)
a <- sample(1:3, 9, replace = TRUE)ab <- sample(c("A", "B", "C"), 9, replace = TRUE)bclassError(a, b)
compareClass 21
compareClass Compare classifications.
Description
Compare classifications via the normalized variation of information criterion.
Usage
compareClass(a, b)
Arguments
a A numeric or character vector of class labels.
b A numeric or character vector of class labels. Must have the same length asa.
Value
The variation of information criterion (Meila 2002) fora andb divided by the log of the length ofthe sequences so that it falls in[0,1].
References
Marina Meila (2002). Comparing clusterings. Technical Report no. 418, Department of Statistics,University of Washington.
Seehttp://www.stat.washington.edu/www/research/reports .
See Also
mapClass , classError , table
Examples
a <- rep(1:3, 3)ab <- rep(c("A", "B", "C"), 3)bcompareClass(a, b)a <- sample(1:3, 9, replace = TRUE)ab <- sample(c("A", "B", "C"), 9, replace = TRUE)bcompareClass(a, b)
22 coordProj
coordProj Coordinate projections of data in more than two dimensions modelledby an MVN mixture.
Description
Plots coordinate projections given data in more than two dimensions and parameters of an MVNmixture model for the data.
Usage
coordProj(data, ..., dimens = c(1, 2),type = c("classification","uncertainty","errors"), ask = TRUE,quantiles = c(0.75, 0.95), symbols, scale = FALSE,identify = FALSE, CEX = 1, PCH = ".", xlim, ylim)
Arguments
data A numeric matrix or data frame of observations. Categorical variables are notallowed. If a matrix or data frame, rows correspond to observations and columnscorrespond to variables.
dimens A vector of length 2 giving the integer dimensions of the desired coordinateprojections. The default isc(1,2) , in which the first dimension is plottedagainst the second.
... One or more of the following:
classification A numeric or character vector representing a classification of ob-servations (rows) ofdata .
uncertainty A numeric vector of values in(0,1)giving the uncertainty of eachdata point.
z A matrix in which the[i,k] th entry gives the probability of observationi belonging to thekth class. Used to computeclassification anduncertainty if those arguments aren’t available.
truth A numeric or character vector giving a known classification of each datapoint. If classification orz is also present, this is used for displayingclassification errors.
mu A matrix whose columns are the means of each group.
sigma A three dimensional array in whichsigma[,,k] gives the covariancefor thekth group.
decomp A list with scale , shape andorientation components givingan alternative form for the covariance structure of the mixture model.
type Any subset ofc("classification","uncertainty","errors") .The function will produce the corresponding plot if it has been supplied suf-ficient information to do so. If more than one plot is possible then users will beasked to choose from a menu ifask=TRUE.
ask A logical variable indicating whether or not a menu should be produced whenmore than one plot is possible. The default isask=TRUE.
coordProj 23
quantiles A vector of length 2 giving quantiles used in plotting uncertainty. The smallestsymbols correspond to the smallest quantile (lowest uncertainty), medium-sized(open) symbols to points falling between the given quantiles, and large (filled)symbols to those in the largest quantile (highest uncertainty). The default is(0.75,0.95).
symbols Either an integer or character vector assigning a plotting symbol to each uniqueclass inclassification . Elements insymbols correspond to classes inclassification in sorted order. Default: IfG is the number of groups inthe classification, the firstG symbols in.Mclust$symbols , otherwise ifGis less than 27 then the firstG capital letters in the Roman alphabet.
scale A logical variable indicating whether or not the two chosen dimensions shouldbe plotted on the same scale, and thus preserve the shape of the distribution.Default: scale=FALSE
identify A logical variable indicating whether or not to add a title to the plot identifyingthe dimensions used.
CEX An argument specifying the size of the plotting symbols. The default value is 1.
PCH An argument specifying the symbol to be used when a classificatiion has notbeen specified for the data. The default value is a small dot ".".
xlim, ylim Arguments specifying bounds for the ordinate, abscissa of the plot. This may beuseful for when comparing plots.
Side Effects
Coordinate projections of the data, possibly showing location of the mixture components, classifi-cation, uncertainty, and/or classification errors.
References
C. Fraley and A. E. Raftery (2002). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
clPairs , randProj , mclust2Dplot , mclustOptions , do.call
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]
msEst <- mstepVVV(irisMatrix, unmap(irisClass))
par(pty = "s", mfrow = c(1,2))coordProj(irisMatrix,dimens=c(2,3), truth = irisClass,
mu = msEst$mu, sigma = msEst$sigma, z = msEst$z)do.call("coordProj", c(list(data=irisMatrix, dimens=c(2,3), truth=irisClass),
msEst))
24 cv1EMtrain
cv1EMtrain Select discriminant models using cross validation
Description
For the ten available discriminant models the leave-one-out cross validation error is calulated. Themodels for one-dimensional data are "E" and "V"; for higher dimensions they are "EII", "VII","EEI", "VEI", "EVI", "VVI", "EEE", "EEV", "VEV" and "VVV".
Usage
cv1EMtrain(data, labels, modelNames)
Arguments
data A data matrix
labels Labels for each row in the data matrix
modelNames Vector of model names that should be tested.
Value
Returns a vector where each element is the error rate for the corresponding model.
Author(s)
C. Fraley
See Also
bicEMtrain
Examples
data(lansing)odd <- seq(from=1, to=nrow(lansing), by=2)round(cv1EMtrain(data=lansing[odd,-3], labels=lansing[odd,3]), 3)
cv1Modd <- mstepEEV(data=lansing[odd,-3], z=unmap(lansing[odd,3]))cv1Zodd <- do.call("estepEEV", c(cv1Modd, list(data=lansing[odd,-3])))$zcompareClass(map(cv1Zodd), lansing[odd,3])
even <- (1:nrow(lansing))[-odd]cv1Zeven <- do.call("estepEEV", c(cv1Modd, list(data=lansing[even,-3])))$zcompareClass(map(cv1Zodd), lansing[odd,3])$error
decomp2sigma 25
decomp2sigma Convert mixture component covariances to matrix form.
Description
Converts a set of covariances from a parameterization by eigenvalue decomposition to representa-tion as a 3-D array.
Usage
decomp2sigma(d, G, scale, shape, orientation, ...)
Arguments
d The dimension of the data.
G The number of components in the mixture model.
scale Either aG-vector giving the scale of the covariance (thedth root of its determi-nant) for each component in the mixture model, or a single numeric value if thescale is the same for each component.
shape Either aG by d matrix in which thekth column is the shape of the covariancematrix (normalized to have determinant 1) for thekth component, or ad-vectorgiving a common shape for all components.
orientation Either ad by d by G array whose[,,k] th entry is the orthonomal matrix ofeigenvectors of the covariance matrix of thekth component, or ad by d or-thonormal matrix if the mixture components have a common orientation. Theorientation component ofdecomp can be omitted in spherical and diag-onal models, for which the principal components are parallel to the coordinateaxes so that the orientation matrix is the identity.
... Catch unused arguments from ado.call call.
Value
A 3-D array whose[,,k] th component is the covariance matrix of thekth component in an MVNmixture model.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation, and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
sigma2decomp
26 dens
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]
meEst <- meVEV(irisMatrix, unmap(irisClass))names(meEst)meEst$decompmeEst$sigma
dec <- meEst$decompdecomp2sigma(d=dec$d, G=dec$G, shape=dec$shape, scale=dec$scale,
orientation = dec$orientation)## Not run:do.call("decomp2sigma", meEst$decomp) ## alternative call## End(Not run)
dens Density for Parameterized MVN Mixtures
Description
Computes densities of obseravations in parameterized MVN mixtures.
Usage
dens(modelName, data, mu, logarithm, ...)
Arguments
modelName A character string indicating the model. Possible models:
"E" for spherical, equal variance (one-dimensional)"V" for spherical, variable variance (one-dimensional)
"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume, equal shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume, varying shape"EEE": ellipsoidal, equal volume, shape, and orientation"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation
For fitting a single Gaussian,
"X": one-dimensional"XII": spherical"XXI": diagonal"XXX": ellipsoidal
dens 27
data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.
mu The mean for each component. If there is more than one component,mu is amatrix whose columns are the means of the components.
logarithm Return logarithm of the density, rather than the density itself. Default: FALSE
... Other arguments, such as an argument describing the variance. Seecdens .
Value
A numeric vector whoseith component is the density of observationi in the MVN mixture specifiedby muand... .
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
grid1 , cdens , mclustOptions , do.call
Examples
n <- 100 ## create artificial data
set.seed(0)x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),
matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])xclass <- c(rep(1,n),rep(2,n))clPairs(x, cl = xclass, sym = c("1","2")) ## display the data
set.seed(0)I <- sample(1:(2*n))x <- x[I, ]xclass <- xclass[I]
odd <- seq(1, 2*n, by = 2)oddBic <- EMclust(x[odd, ])oddSumry <- summary(oddBic, x[odd, ]) ## best parameter estimatesnames(oddSumry)
oddDens <- dens(modelName = oddSumry$modelName, data = x,mu = oddSumry$mu, decomp = oddSumry$decomp, pro = oddSumry$pro)
## Not run:oddDens <- do.call("dens", c(list(data = x), oddSumry)) ## alternative call## End(Not run)
even <- odd + 1
28 density
evenBic <- EMclust(x[even, ])evenSumry <- summary(evenBic, x[even, ]) ## best parameter estimatesevenDens <- do.call( "dens", c(list(data = x), evenSumry))
cbind(class = xclass, odd = oddDens, even = evenDens)
density Kernel Density Estimation
Description
This is exaclty the same function as in the base package but for themethod argument: if it isgiven and equals"mclust" , themclust density estimation is used. Optionally, the number ofgaussians to be considered can be given as well (G).
Usage
density(..., method, G)
Arguments
... Arguments to thedensity function in the base package.
method If equal to "mclust",EMclust is used to estimate the density.
G The number of gaussians to consider in the model-based density estimation.Default: 1:9. Ignored if method is not equal to "mclust".
Value
If give.Rkern is true, the numberR(K), otherwise an object with class"density" whoseunderlying structure is a list containing the following components.
x then coordinates of the points where the density is estimated.
y the estimated density values.
bw the bandwidth used.
N the sample size after elimination of missing values.
call the call which produced the result.
data.name the deparsed name of thex argument.
has.na logical, for compatibility (always FALSE).
References
Fraley, C. and Raftery, A.E. (2002) MCLUST: software for model-based clustering, density esti-mation and discriminant analysis. Technical Report No. 415, Dept. of Statistics, University ofWashington.
Scott, D. W. (1992)Multivariate Density Estimation. Theory, Practice and Visualization. NewYork: Wiley.
Sheather, S. J. and Jones M. C. (1991) A reliable data-based bandwidth selection method for kerneldensity estimation.J. Roy. Statist. Soc.B, 683–690.
Silverman, B. W. (1986)Density Estimation. London: Chapman and Hall.
Venables, W. N. and Ripley, B. D. (1999)Modern Applied Statistics with S-PLUS. New York:Springer.
diabetes 29
See Also
density (base package),bw.nrd , plot.density , hist .
Examples
plot(density(c(-20,rep(0,98),20)), xlim = c(-4,4))# IQR = 0
# The Old Faithful geyser datadata(faithful)d <- density(faithful$eruptions, bw = "sj")dplot(d)dmc <- density(faithful$eruptions, method="mclust")plot(dmc, type = "n")polygon(dmc, col = "wheat")lines(d, col="red")
## Missing values:x <- xx <- faithful$eruptionsx[i.out <- sample(length(x), 10)] <- NAdoRmc <- density(x=x, method="mclust", na.rm = TRUE)lines(doRmc, col="blue")doR <- density(x, bw = 0.15, na.rm = TRUE)lines(doR, col = "green")rug(x)points(xx[i.out], rep(0.01, 10))
## function formals returns something different now the original## density function is masked...base.density <- if(exists("density", envir = NULL)) {
get("density", envir = NULL)} else
stats::density(kernels <- eval(formals(base.density)$kernel))
## show the kernels in the R parametrizationplot (density(0, bw = 1), xlab = "",
main="R's density() kernels with bw = 1")for(i in 2:length(kernels))
lines(density(0, bw = 1, kern = kernels[i]), col = i)legend(1.5,.4, legend = kernels, col = seq(kernels),
lty = 1, cex = .8, y.int = 1)
data(precip)bw <- bw.SJ(precip) ## sensible automatic choiceplot(density(precip, bw = bw, n = 2^13))lines(density(precip, G=2:5, method="mclust"), col="red")rug(precip)
diabetes Diabetes data
Description
Diabetes data from Reaven and Miller. Number of objects: 145; 3 variables. Three classes.
30 em
Usage
data(diabetes)
References
G.M. Reaven and R.G. Miller,Diabetologica16:17-24 (1979).
em EM algorithm starting with E-step for parameterized MVN mixturemodels.
Description
Implements the EM algorithm for parameterized MVN mixture models, starting with the expecta-tion step.
Usage
em(modelName, data, mu, ...)
Arguments
modelName A character string indicating the model:
"E": equal variance (one-dimensional)"V": variable variance (one-dimensional)
"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume and shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume and shape"EEE": ellipsoidal, equal volume, shape, and orientation"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation
data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.
mu The mean for each component. If there is more than one component,mu is amatrix whose columns are the means of the components.
... Arguments for model-specific em functions. Specifically:
• An argument describing the variance (depends on the model):
sigmasq for the one-dimensional models ("E", "V") and spherical models("EII", "VII"). This is either a vector whosekth component is the vari-ance for thekth component in the mixture model ("V" and "VII"), ora scalar giving the common variance for all components in the mixturemodel ("E" and "EII").
em 31
decomp for the diagonal models ("EEI", "VEI", "EVI", "VVI") and someellipsoidal models ("EEV", "VEV"). For a description, seecdens .
Sigma for the equal variance model "EEE". Ad by d matrix giving thecommon covariance for all components of the mixture model.
sigma for the unconstrained variance model "VVV". Ad by d by G ma-trix array whose[,,k] th entry is the covariance matrix for thekthcomponent of the mixture model.The form of the variance specification is the same as for the output fortheem, me, or mstep methods for the specified mixture model.
• pro : Mixing proportions for the components of the mixture. There shouldone more mixing proportion than the number of MVN components if themixture model includes a Poisson noise term.
• eps : A scalar tolerance for deciding when to terminate computations dueto computational singularity in covariances. Smaller values ofeps allowcomputations to proceed nearer to singularity. The default is.Mclust$eps .For those models with iterative M-step ("VEI", "VEV"), two values can beentered foreps , in which case the second value is used for determiningsingularity in the M-step.
• tol : A scalar tolerance for relative convergence of the loglikelihood. Thedefault is.Mclust$tol .For those models with iterative M-step ("VEI", "VEV"), two values can beentered fortol , in which case the second value governs parameter conver-gence in the M-step.
• itmax : An integer limit on the number of EM iterations. The default is.Mclust$itmax .For those models with iterative M-step ("VEI", "VEV"), two values can beentered foritmax , in which case the second value is an upper limit on thenumber of iterations in the M-step.
• equalPro : Logical variable indicating whether or not the mixing propor-tions are equal in the model. The default is.Mclust$equalPro .
• warnSingular : A logical value indicating whether or not a warningshould be issued whenever a singularity is encountered. The default is.Mclust$warnSingular .
• Vinv : An estimate of the reciprocal hypervolume of the data region. Thedefault is determined by applying functionhypvol to the data. Used onlywhenpro includes an additional mixing proportion for a noise component.
Details
This function can be used with an indirect or list call usingdo.call , allowing the output of e.g.mstep to be passed without the need to specify individual parameters as arguments.
Value
A list including the following components:
z A matrix whose[i,k] th entry is the conditional probability of theith observa-tion belonging to thekth component of the mixture.
loglik The logliklihood for the data in the mixture model.
mu A matrix whose kth column is the mean of thekth component of the mixturemodel.
32 emE
sigma For multidimensional models, a three dimensional array in which the[,,k] thentry gives the the covariance for thekth group in the best model. <br> For one-dimensional models, either a scalar giving a common variance for the groups ora vector whose entries are the variances for each group in the best model.
pro A vector whosekth component is the mixing proportion for thekth componentof the mixture model.
modelName A character string identifying the model (same as the input argument).
Attributes: • "info" : Information on the iteration.
• "warn" : An appropriate warning if problems are encountered in the com-putations.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
emE, . . . ,emVVV, estep , me, mstep , mclustOptions , do.call
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]
msEst <- mstep(modelName = "EEE", data = irisMatrix,z = unmap(irisClass))
names(msEst)
em(modelName = msEst$modelName, data = irisMatrix,mu = msEst$mu, Sigma = msEst$Sigma, pro = msEst$pro)
## Not run:do.call("em", c(list(data = irisMatrix), msEst)) ## alternative call## End(Not run)
emE EM algorithm starting with E-step for a parameterized MVN mixturemodel.
Description
Implements the EM algorithm for a parameterized MVN mixture model, starting with the expecta-tion step.
emE 33
Usage
emE(data, mu, sigmasq, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)
emV(data, mu, sigmasq, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)
emEII(data, mu, sigmasq, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)
emVII(data, mu, sigmasq, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)
emEEI(data, mu, decomp, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)
emVEI(data, mu, decomp, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)
emEVI(data, mu, decomp, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)
emVVI(data, mu, decomp, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)
emEEE(data, mu, Sigma, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)
emEEV(data, mu, decomp, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)
emVEV(data, mu, decomp, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)
emVVV(data, mu, sigma, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)
Arguments
data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.
mu The mean for each component. If there is more than one component,mu is amatrix whose columns are the means of the components.
sigmasq for the one-dimensional models ("E", "V") and spherical models ("EII", "VII").This is either a vector whosekth component is the variance for thekth com-ponent in the mixture model ("V" and "VII"), or a scalar giving the commonvariance for all components in the mixture model ("E" and "EII").
decomp for the diagonal models ("EEI", "VEI", "EVI", "VVI") and some ellipsoidalmodels ("EEV", "VEV"). This is a list described in more detail incdens .
Sigma for the equal variance model "EEE". Ad by d matrix giving the common co-variance for all components of the mixture model.
sigma for the unconstrained variance model "VVV". Ad by d by G matrix array whose[,,k] th entry is the covariance matrix for thekth component of the mixturemodel.
... An argument giving the variance that takes one of the following forms:
decomp for models "VVV", "EII" and "VII"; seecdens .
cholSigma see Sigma, for "EEE".
cholsigma see sigma, for "VVV".
sigma see sigma, for "VVV".
34 emE
Sigma see Sigma, for "EEE".The form of the variance specification is the same as for the output for theem, me, or mstep methods for the specified mixture model.Also used to catch unused arguments from ado.call call.
pro Mixing proportions for the components of the mixture. There should one moremixing proportion than the number of MVN components if the mixture modelincludes a Poisson noise term.
eps A scalar tolerance for deciding when to terminate computations due to compu-tational singularity in covariances. Smaller values ofeps allow computationsto proceed nearer to singularity. The default is.Mclust$eps .
tol A scalar tolerance for relative convergence of the loglikelihood values. Thedefault is.Mclust$tol .
itmax An integer limit on the number of EM iterations. The default is.Mclust$itmax .
equalPro A logical value indicating whether or not the components in the model arepresent in equal proportions. The default is.Mclust$equalPro .
warnSingular A logical value indicating whether or not a warning should be issued whenevera singularity is encountered. The default is.Mclust$warnSingular .
Vinv An estimate of the reciprocal hypervolume of the data region. The default isdetermined by applying functionhypvol to the data. Used only whenproincludes an additional mixing proportion for a noise component.
Details
This function can be used with an indirect or list call usingdo.call , allowing the output of e.g.mstep to be passed without the need to specify individual parameters as arguments.
Value
A list including the following components:
z A matrix whose[i,k] th entry is the conditional probability of theith observa-tion belonging to thekth component of the mixture.
loglik The logliklihood for the data in the mixture model.
mu A matrix whose kth column is the mean of thekth component of the mixturemodel.
sigma For multidimensional models, a three dimensional array in which the[,,k] thentry gives the the covariance for thekth group in the best model. <br> For one-dimensional models, either a scalar giving a common variance for the groups ora vector whose entries are the variances for each group in the best model.
pro A vector whosekth component is the mixing proportion for thekth componentof the mixture model.
modelName Character string identifying the model.
Attributes: • "info" : Information on the iteration.
• "warn" : An appropriate warning if problems are encountered in the com-putations.
estep 35
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
em, mstep , mclustOptions , do.call
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]
msEst <- mstepEEE(data = irisMatrix, z = unmap(irisClass))names(msEst)
emEEE(data = irisMatrix, mu = msEst$mu, pro = msEst$pro,cholSigma = msEst$cholSigma)## Not run:do.call("emEEE", c(list(data=irisMatrix), msEst)) ## alternative call## End(Not run)
estep E-step for parameterized MVN mixture models.
Description
Implements the expectation step of EM algorithm for parameterized MVN mixture models.
Usage
estep(modelName, data, mu, ...)
Arguments
modelName A character string indicating the model:
"E": equal variance (one-dimensional)"V": variable variance (one-dimensional)
"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume and shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume and shape"EEE": ellipsoidal, equal volume, shape, and orientation
36 estep
"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation
data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.
mu The mean for each component. If there is more than one component,mu is amatrix whose columns are the means of the components.
... Arguments for model-specific functions. Specifically:
• An argument describing the variance (depends on the model):sigmasq for the one-dimensional models ("E", "V") and spherical models
("EII", "VII"). This is either a vector whosekth component is the vari-ance for thekth component in the mixture model ("V" and "VII"), ora scalar giving the common variance for all components in the mixturemodel ("E" and "EII").
decomp for the diagonal models ("EEI", "VEI", "EVI", "VVI") and someellipsoidal models ("EEV", "VEV"). This is a list described incdens .
Sigma for the equal variance model "EEE". Ad by d matrix giving thecommon covariance for all components of the mixture model.
sigma for the unconstrained variance model "VVV". Ad by d by G ma-trix array whose[,,k] th entry is the covariance matrix for thekthcomponent of the mixture model.The form of the variance specification is the same as for the output fortheem, me, or mstep methods for the specified mixture model.
pro Mixing proportions for the components of the mixture. There should onemore mixing proportion than the number of MVN components if the mix-ture model includes a Poisson noise term.
eps A scalar tolerance for deciding when to terminate computations due to com-putational singularity in covariances. Smaller values ofeps allow compu-tations to proceed nearer to singularity. The default is.Mclust$eps .
warnSingularA logical value indicating whether or not a warning should be issued when-ever a singularity is encountered. The default is.Mclust$warnSingular .
Vinv An estimate of the reciprocal hypervolume of the data region. The defaultis determined by applying functionhypvol to the data. Used only whenpro includes an additional mixing proportion for a noise component.
Details
This function can be used with an indirect or list call usingdo.call , allowing the output of e.g.mstep to be passed without the need to specify individual parameters as arguments.
Value
A list including the following components:
z A matrix whose[i,k] th entry is the conditional probability of theith observa-tion belonging to thekth component of the mixture.
loglik The logliklihood for the data in the mixture model.
modelName A character string identifying the model (same as the input argument).
Attribute • "warn" : An appropriate warning if problems are encountered in the com-putations.
estepE 37
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
estepE , ...,estepVVV , em, mstep , do.call , mclustOptions
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]
msEst <- mstep(modelName = "EII", data = irisMatrix,z = unmap(irisClass))
names(msEst)
estep(modelName = msEst$modelName, data = irisMatrix,mu = msEst$mu, sigmasq = msEst$sigmasq, pro = msEst$pro)
## Not run:do.call("estep", c(list(data = irisMatrix), msEst)) ## alternative call## End(Not run)
estepE E-step in the EM algorithm for a parameterized MVN mixture model.
Description
Implements the expectation step in the EM algorithm for a parameterized MVN mixture model.
Usage
estepE(data, mu, sigmasq, pro, eps, warnSingular, Vinv, ...)estepV(data, mu, sigmasq, pro, eps, warnSingular, Vinv, ...)estepEII(data, mu, sigmasq, pro, eps, warnSingular, Vinv, ...)estepVII(data, mu, sigmasq, pro, eps, warnSingular, Vinv, ...)estepEEI(data, mu, decomp, pro, eps, warnSingular, Vinv, ...)estepVEI(data, mu, decomp, pro, eps, warnSingular, Vinv, ...)estepEVI(data, mu, decomp, pro, eps, warnSingular, Vinv, ...)estepVVI(data, mu, decomp, pro, eps, warnSingular, Vinv, ...)estepEEE(data, mu, Sigma, pro, eps, warnSingular, Vinv, ...)estepEEV(data, mu, decomp, pro, eps, warnSingular, Vinv, ...)estepVEV(data, mu, decomp, pro, eps, warnSingular, Vinv, ...)estepVVV(data, mu, sigma, pro, eps, warnSingular, Vinv, ...)
38 estepE
Arguments
data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.
mu The mean for each component. If there is more than one component,mu is amatrix whose columns are the means of the components.
sigmasq for the one-dimensional models ("E", "V") and spherical models ("EII", "VII").This is either a vector whosekth component is the variance for thekth com-ponent in the mixture model ("V" and "VII"), or a scalar giving the commonvariance for all components in the mixture model ("E" and "EII").
decomp for the diagonal models ("EEI", "VEI", "EVI", "VVI") and some ellipsoidalmodels ("EEV", "VEV"). This is a list described in more detail incdens .
sigma for the unconstrained variance model "VVV" or the equal variance model "EEE".A d by d by G matrix array whose[,,k] th entry is the covariance matrix forthekth component of the mixture model.
Sigma for the equal variance model "EEE". Ad by d matrix giving the common co-variance for all components of the mixture model.
pro Mixing proportions for the components of the mixture. There should one moremixing proportion than the number of MVN components if the mixture modelincludes a Poisson noise term.
eps A scalar tolerance for deciding when to terminate computations due to compu-tational singularity in covariances. Smaller values ofeps allow computationsto proceed nearer to singularity. The default is.Mclust$eps .
warnSingular A logical value indicating whether or not a warning should be issued whenevera singularity is encountered. The default is.Mclust$warnSingular .
Vinv An estimate of the reciprocal hypervolume of the data region. The default isdetermined by applying functionhypvol to the data. Used only whenproincludes an additional mixing proportion for a noise component.
... Other arguments to describe the variance, in particulardecomp, sigma orcholsigma for model "VVV", decomp for models "VII" and "EII", andSigma or cholSigma for model "EEE". Sigma is and by d matrix givingthe common covariance for all components of the mixture model.Also used to catch unused arguments from ado.call call.
Details
This function can be used with an indirect or list call usingdo.call , allowing the output of e.g.mstep to be passed without the need to specify individual parameters as arguments.
Value
A list including the following components:
z A matrix whose[i,k] th entry is the conditional probability of theith observa-tion belonging to thekth component of the mixture.
loglik The logliklihood for the data in the mixture model.
modelName Character string identifying the model.
Attribute • "warn" : An appropriate warning if problems are encountered in the com-putations.
grid1 39
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and den-sity estimation. Journal of the American Statistical Association. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
estep , em, mstep , do.call , mclustOptions
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]
msEst <- mstepEII(data = irisMatrix, z = unmap(irisClass))names(msEst)
estepEII(data = irisMatrix, mu = msEst$mu, pro = msEst$pro,sigmasq = msEst$sigmasq)
## Not run:do.call("estepEII", c(list(data=irisMatrix), msEst)) ## alternative call## End(Not run)
grid1 Generate grid points
Description
Generate grid points in one or two dimensions.
Usage
grid1(n, range = c(0, 1), edge = TRUE)grid2(x, y)
Arguments
n Number of grid points.
range Range of grid points.
edge Logical: include edges or not?
x, y Vectors.
Value
The value returned is simple:grid1 generates a vector;grid2 generates a matrix.
40 hc
Author(s)
C. Fraley
See Also
lansing , dens
Examples
data(lansing)maples <- lansing[as.character(lansing[,"species"]) == "maple", -3]maplesBIC <- EMclust(maples)maplesModel <- summary(maplesBIC, maples)x <- grid1(100, range=c(0,1))y <- xxyDens <- do.call("dens", c(list(data=grid2(x, y)), maplesModel))xyDens <- matrix(xyDens, ncol=100)contour(xyDens)points(maples, cex=.2, col="red")
image(xyDens)points(maples, cex=.5)
hc Model-based Hierarchical Clustering
Description
Agglomerative hierarchical clustering based on maximum likelihood criteria for MVN mixturemodels parameterized by eigenvalue decomposition.
Usage
hc(modelName, data, ...)
Arguments
modelName A character string indicating the model. Possible models:
"E" : equal variance (one-dimensional)"V" : spherical, variable variance (one-dimensional)"EII": spherical, equal volume"VII": spherical, unequal volume"EEE": ellipsoidal, equal volume, shape, and orientation"VVV": ellipsoidal, varying volume, shape, and orientation
data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.
... Arguments for the method-specific hc functions. SeehcE.
hc 41
Details
Most models have memory usage of the order of the square of the number groups in the initialpartition for fast execution. Some models, such as equal variance or"EEE" , do not admit a fastalgorithm under the usual agglomerative hierarchical clustering paradigm. These use less memorybut are much slower to execute.
Value
A numeric two-column matrix in which theith row gives the minimum index for observations ineach of the two clusters merged at theith stage of agglomerative hierarchical clustering.
References
J. D. Banfield and A. E. Raftery (1993). Model-based Gaussian and non-Gaussian Clustering.Biometrics 49:803-821.
C. Fraley (1998). Algorithms for model-based Gaussian hierarchical clustering.SIAM Journal onScientific Computing 20:270-281. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
Note
If modelName = "E" (univariate with equal variances) ormodelName = "EII" (multivari-ate with equal spherical covariances), then the method is equivalent to Ward’s method for hierarchi-cal clustering.
See Also
hcE,...,hcVVV, hclass
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])
hcTree <- hc(modelName = "VVV", data = irisMatrix)cl <- hclass(hcTree,c(2,3))
par(pty = "s", mfrow = c(1,1))clPairs(irisMatrix,cl=cl[,"2"])clPairs(irisMatrix,cl=cl[,"3"])
par(mfrow = c(1,2))dimens <- c(1,2)coordProj(irisMatrix, classification=cl[,"2"], dimens=dimens)coordProj(irisMatrix, classification=cl[,"3"], dimens=dimens)
42 hcE
hcE Model-based Hierarchical Clustering
Description
Agglomerative hierarchical clustering based on maximum likelihood for a MVN mixture modelparameterized by eigenvalue decomposition.
Usage
hcE(data, partition, minclus=1, ...)hcV(data, partition, minclus = 1, alpha = 1, ...)hcEII(data, partition, minclus = 1, ...)hcVII(data, partition, minclus = 1, alpha = 1, ...)hcEEE(data, partition, minclus = 1, ...)hcVVV(data, partition, minclus = 1, alpha = 1, beta = 1, ...)
Arguments
data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.
partition A numeric or character vector representing a partition of observations (rows) ofdata . If provided, group merges will start with this partition. Otherwise, eachobservation is assumed to be in a cluster by itself at the start of agglomeration.
minclus A number indicating the number of clusters at which to stop the agglomeration.The default is to stop when all observations have been merged into a singlecluster.
alpha, beta Additional tuning parameters needed for initializatiion in some models. Fordetails, see Fraley 1998. The defaults provided are usually adequate.
... Catch unused arguments from ado.call call.
Details
Most models have memory usage of the order of the square of the number groups in the initialpartition for fast execution. Some models, such as equal variance or"EEE" , do not admit a fastalgorithm under the usual agglomerative hierachical clustering paradigm. These use less memorybut are much slower to execute.
Value
A numeric two-column matrix in which theith row gives the minimum index for observations ineach of the two clusters merged at theith stage of agglomerative hierarchical clustering.
References
J. D. Banfield and A. E. Raftery (1993). Model-based Gaussian and non-Gaussian Clustering.Biometrics 49:803-821.
C. Fraley (1998). Algorithms for model-based Gaussian hierarchical clustering.SIAM Journal onScientific Computing 20:270-281. Seehttp://www.stat.washington.edu/mclust .
hclass 43
C. Fraley and A. E. Raftery (2002). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
hc , hclass
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])
hcTree <- hcEII(data = irisMatrix)cl <- hclass(hcTree,c(2,3))
par(pty = "s", mfrow = c(1,1))clPairs(irisMatrix,cl=cl[,"2"])clPairs(irisMatrix,cl=cl[,"3"])
par(mfrow = c(1,2))dimens <- c(1,2)coordProj(irisMatrix, classification=cl[,"2"], dimens=dimens)coordProj(irisMatrix, classification=cl[,"3"], dimens=dimens)
hclass Classifications from Hierarchical Agglomeration
Description
Determines the classifications corresponding to different numbers of groups given merge pairs fromhierarchical agglomeration.
Usage
hclass(hcPairs, G)
Arguments
hcPairs A numeric two-column matrix in which theith row gives the minimum index forobservations in each of the two clusters merged at theith stage of agglomerativehierarchical clustering.
G An integer or vector of integers giving the number of clusters for which thecorresponding classfications are wanted.
Value
A matrix with length(G) columns, each column corresponding to a classification. Columns areindexed by the character representation of the integers inG.
44 hypvol
References
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
hc , hcE
Examples
data(iris)irisMatrix <- iris[,1:4]
hcTree <- hc(modelName="VVV", data = irisMatrix)cl <- hclass(hcTree,c(2,3))
par(pty = "s", mfrow = c(1,1))clPairs(irisMatrix,cl=cl[,"2"])clPairs(irisMatrix,cl=cl[,"3"])
hypvol Aproximate Hypervolume for Multivariate Data
Description
Computes a simple approximation to the hypervolume of a multivariate data set.
Usage
hypvol(data, reciprocal=FALSE)
Arguments
data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.
reciprocal A logical variable indicating whether or not the reciprocal hypervolume is de-sired rather than the hypervolume itself. The default is to return the approximatehypervolume.
Value
Computes the hypervolume by two methods: simple variable bounds and principal components,and returns the minimum value.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611:631. Seehttp://www.stat.washington.edu/mclust .
lansing 45
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])hypvol(irisMatrix)
lansing Maple trees in Lansing Woods
Description
The lansing data frame has 1217 rows and 3 columns. The first two columns give the location,the third column the tree type.
Usage
data(lansing)
Format
This data frame contains the following columns:
x a numeric vector
y a numeric vector
speciesa factor with levelshickory andmaple
Source
D.J. Gerrard, Research Bulletin No. 20, Agricultural Experimental Station, Michigan State Univer-sity, 1969.
See Also
grid1 , dens
Examples
data(lansing)plot(lansing[,1:2], pch=as.integer(lansing[,3]),
col=as.integer(lansing[,3]), main="Lansing Woods tree types")
46 map
map Classification given Probabilities
Description
Converts a matrix in which each row sums to1 into the nearest matrix of(0,1) indicator variables.
Usage
map(z, warn=TRUE, ...)
Arguments
z A matrix (for example a matrix of conditional probabilities in which each rowsums to 1 as produced by the E-step of the EM algorithm).
warn A logical variable indicating whether or not a warning should be issued whenthere are some columns ofz for which no row attains a maximum.
... Provided to allow lists with elements other than the arguments can be passed inindirect or list calls withdo.call .
Value
A integer vector with one entry for each row of z, in which thei-th value is the column index atwhich thei-th row ofz attains a maximum.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and densityestimation.Journal of the American Statistical Association 97:611-631.
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington.
Seehttp://www.stat.washington.edu/mclust .
See Also
unmap, estep , em, me
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]
emEst <- me(modelName = "VVV", data = irisMatrix, z = unmap(irisClass))
map(emEst$z)
mapClass 47
mapClass Correspondence between classifications.
Description
Best correspondence between classes given two vectors viewed as alternative classifications of thesame object.
Usage
mapClass(a, b)
Arguments
a A numeric or character vector of class labels.
b A numeric or character vector of class labels. Must have the same length asa.
Value
A list with two named elements,aTOb andbTOa which are themselves lists. TheaTOb list has acomponent corresponding to each unique element ofa, which gives the element or elements ofbthat result in the closest class correspondence.
ThebTOa list has a component corresponding to each unique element ofb, which gives the elementor elements ofa that result in the closest class correspondence.
See Also
mapClass , classError , table
Examples
a <- rep(1:3, 3)ab <- rep(c("A", "B", "C"), 3)bmapClass(a, b)a <- sample(1:3, 9, replace = TRUE)ab <- sample(c("A", "B", "C"), 9, replace = TRUE)bmapClass(a, b)
mclust-internal Internal MCLUST functions
Description
Internal tools functions.
Details
These are not to be called by the user directly.
48 mclust1Dplot
mclust1Dplot Plot one-dimensional data modelled by an MVN mixture.
Description
Plot one-dimensional data given parameters of an MVN mixture model for the data.
Usage
mclust1Dplot(data, ...,type = c("classification","uncertainty","density","errors"),ask = TRUE, symbols, grid = 100, identify = FALSE, CEX = 1, xlim)
Arguments
data A numeric vector of observations. Categorical variables are not allowed.
... One or more of the following:
classification A numeric or character vector representing a classification of ob-servations (rows) ofdata .
uncertainty A numeric vector of values in(0,1)giving the uncertainty of eachdata point.
z A matrix in which the[i,k] the entry gives the probability of observationibelonging to thekth class. Used to computeclassification anduncertainty if those arguments aren’t available.
truth A numeric or character vector giving a known classification of each datapoint. If classification orz is also present, this is used for displayingclassification errors.
mu A vector whose entries are the means of each group.sigma Either a vector whose entries are the variances for each group or a scalar
giving a common variance for the groups.pro The vector of mixing proportions.
type Any subset ofc("classification","uncertainty","density","errors") .The function will produce the corresponding plot if it has been supplied suffi-cient information to do so. If more than one plot is possible then users will beasked to choose from a menu ifask=TRUE.
ask A logical variable indicating whether or not a menu should be produced whenmore than one plot is possible. The default isask=TRUE.
symbols Either an integer or character vector assigning a plotting symbol to each uniqueclassclassification . Elements insymbols correspond to classes inclassification in order of appearance in the observations (the order usedby the functionunique ). The default is to use a single plotting symbol|.Classes are delineated by showing them in separate lines above the whole of thedata.
grid Number of grid points to use.
identify A logical variable indicating whether or not to add a title to the plot identifyingthe dimensions used.
CEX An argument specifying the size of the plotting symbols. The default value is 1.
xlim An argument specifying bounds of the plot. This may be useful for when com-paring plots.
mclust2Dplot 49
Side Effects
One or more plots showing location of the mixture components, classification, uncertainty, densityand/or classification errors. Points in the different classes are shown in separate lines above thewhole of the data.
References
C. Fraley and A. E. Raftery (2002). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
mclust2Dplot , clPairs , coordProj , do.call
Examples
n <- 250 ## create artificial dataset.seed(0)y <- c(rnorm(n,-5), rnorm(n,0), rnorm(n,5))yclass <- c(rep(1,n), rep(2,n), rep(3,n))
yEMclust <- summary(EMclust(y),y)
mclust1Dplot(y, identify = TRUE, truth = yclass, z = yEMclust$z, ask=FALSE,mu = yEMclust$mu, sigma = yEMclust$sigma, pro = yEMclust$pro)
do.call("mclust1Dplot",c(list(data = y, identify = TRUE, truth = yclass, ask=FALSE),yEMclust))
mclust2Dplot Plot two-dimensional data modelled by an MVN mixture.
Description
Plot two-dimensional data given parameters of an MVN mixture model for the data.
Usage
mclust2Dplot(data, ...,type = c("classification","uncertainty","errors"), ask = TRUE,quantiles = c(0.75, 0.95), symbols, scale = FALSE,identify = FALSE, CEX = 1, PCH = ".", xlim, ylim,swapAxes = FALSE)
50 mclust2Dplot
Arguments
data A numeric matrix or data frame of observations. Categorical variables are notallowed. If a matrix or data frame, rows correspond to observations and columnscorrespond to variables. In this case the data are two dimensional, so there aretwo columns.
... One or more of the following:
classification A numeric or character vector representing a classification of ob-servations (rows) ofdata .
uncertainty A numeric vector of values in(0,1)giving the uncertainty of eachdata point.
z A matrix in which the[i,k] the entry gives the probability of observationibelonging to thekth class. Used to computeclassification anduncertainty if those arguments aren’t available.
truth A numeric or character vector giving a known classification of each datapoint. If classification orz is also present, this is used for displayingclassification errors.
mu A matrix whose columns are the means of each group.sigma A three dimensional array in whichsigma[,,k] gives the covariance
for thekth group.decomp A list with scale , shape andorientation components giving
an alternative form for the covariance structure of the mixture model.
type Any subset ofc("classification","uncertainty","errors") .The function will produce the corresponding plot if it has been supplied suf-ficient information to do so. If more than one plot is possible then users will beasked to choose from a menu ifask=TRUE.
ask A logical variable indicating whether or not a menu should be produced whenmore than one plot is possible. The default isask=TRUE.
quantiles A vector of length 2 giving quantiles used in plotting uncertainty. The smallestsymbols correspond to the smallest quantile (lowest uncertainty), medium-sized(open) symbols to points falling between the given quantiles, and large (filled)symbols to those in the largest quantile (highest uncertainty). The default is(0.75,0.95).
symbols Either an integer or character vector assigning a plotting symbol to each uniqueclassclassification . Elements insymbols correspond to classes inclassification in order of appearance in the observations (the order usedby the S-PLUS functionunique ). Default: If G is the number of groups inthe classification, the firstG symbols in.Mclust$symbols , otherwise ifGis less than 27 then the firstG capital letters in the Roman alphabet.
scale A logical variable indicating whether or not the two chosen dimensions shouldbe plotted on the same scale, and thus preserve the shape of the distribution.Default: scale=FALSE
identify A logical variable indicating whether or not to add a title to the plot identifyingthe dimensions used.
CEX An argument specifying the size of the plotting symbols. The default value is 1.
PCH An argument specifying the symbol to be used when a classificatiion has notbeen specified for the data. The default value is a small dot ".".
xlim, ylim An argument specifying bounds for the ordinate, abscissa of the plot. This maybe useful for when comparing plots.
mclustDA 51
swapAxes A logical variable indicating whether or not the axes should be swapped for theplot.
Side Effects
One or more plots showing location of the mixture components, classification, uncertainty, and/orclassification errors.
References
C. Fraley and A. E. Raftery (2002). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
surfacePlot , clPairs , coordProj , randProj , spinProj , mclustOptions , do.call
Examples
n <- 250 ## create artificial dataset.seed(0)x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),
matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])xclass <- c(rep(1,n),rep(2,n))
xEMclust <- summary(EMclust(x),x)
mclust2Dplot(x, truth = xclass, z = xEMclust$z, ask=FALSE,mu = xEMclust$mu, sigma = xEMclust$sigma)
do.call("mclust2Dplot", c(list(data = x, truth = xclass, ask=FALSE), xEMclust))
mclustDA MclustDA discriminant analysis.
Description
MclustDA training and testing.
Usage
mclustDA(trainingData, labels, testData, G=1:6, verbose = FALSE)
52 mclustDA
Arguments
trainingData A numeric vector, matrix, or data frame of training observations. Categoricalvariables are not allowed. If a matrix or data frame, rows correspond to obser-vations and columns correspond to variables.
labels A numeric or character vector assigning a class label to each training observa-tion.
testData A numeric vector, matrix, or data frame of training observations. Categoricalvariables are not allowed. If a matrix or data frame, rows correspond to obser-vations and columns correspond to variables.
G An integer vector specifying the numbers of mixture components (clusters) tobe considered for each class. Default:1:6 .
verbose A logical variable telling whether or not to print an indication that the functionis in the training phase, which may take some time to complete.
Value
A list with the following components:
testClassificationmclustDA classification of the test data.
trainingClassificationmclustDA classification of the training data.
VofIindex Meila’s Variation of Information index, to compare classification of the trainingdata to the known labels.
summary Gives the best model and number of clusters for each training class.
models The mixture models used to fit the known classes.
postProb A matrix whose[i,k] th entry is the probability that observationi in the test databelongs to thekth class.
Details
The following models are compared inMclust :
"E" for spherical, equal variance (one-dimensional)"V" for spherical, variable variance (one-dimensional)
"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume, equal shape"VVI": diagonal, varying volume, varying shape"EEE": ellipsoidal, equal volume, shape, and orientation"VVV": ellipsoidal, varying volume, shape, and orientation
mclustDA is a simplified function combiningmclustDAtrain andmclustDAtest and theirsummaries.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
mclustDAtest 53
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
M. Meila (2002). Comparing clusterings. Technical Report 418, Department of Statistics, Univer-sity of Washington. Seehttp://www.stat.washington.edu/www/research/reports .
See Also
plot.mclustDA , mclustDAtrain , mclustDAtest , compareClass , classError
Examples
n <- 250 ## create artificial dataset.seed(0)x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),
matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])xclass <- c(rep(1,n),rep(2,n))
## Not run:par(pty = "s")mclust2Dplot(x, classification = xclass, type="classification", ask=FALSE)## End(Not run)
odd <- seq(from = 1, to = 2*n, by = 2)even <- odd + 1testMclustDA <- mclustDA(trainingData = x[odd, ], labels = xclass[odd],
testData = x[even,])
clEven <- testMclustDA$testClassification ## classify training setcompareClass(clEven,xclass[even])## Not run:plot(testMclustDA, trainingData = x[odd, ], labels = xclass[odd],
testData = x[even,])## End(Not run)
mclustDAtest MclustDA Testing
Description
Testing phase for MclustDA discriminant analysis.
Usage
mclustDAtest(data, models)
Arguments
data A numeric vector, matrix, or data frame of observations to be classified.
models A list of MCLUST-style models including parameters, usually the result of ap-plying mclustDAtrain to some training data.
54 mclustDAtrain
Value
A matrix in which the[i,j] th entry is the density for that test observationi in the model for classj.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
summary.mclustDAtest , mclustDAtrain
Examples
n <- 250 ## create artificial dataset.seed(0)x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),
matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])xclass <- c(rep(1,n),rep(2,n))## Not run:par(pty = "s")mclust2Dplot(x, classification = xclass, type="classification", ask=FALSE)## End(Not run)
odd <- seq(1, 2*n, 2)train <- mclustDAtrain(x[odd, ], labels = xclass[odd]) ## training stepsummary(train)
even <- odd + 1test <- mclustDAtest(x[even, ], train) ## compute model densitiessummary(test)$class ## classify training set
mclustDAtrain MclustDA Training
Description
Training phase for MclustDA discriminant analysis.
Usage
mclustDAtrain(data, labels, G, emModelNames, eps, tol, itmax,equalPro, warnSingular, verbose)
mclustDAtrain 55
Arguments
data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.
labels A numeric or character vector assigning a class label to each observation.
G An integer vector specifying the numbers of Gaussian mixture components (clus-ters) for which the BIC is to be calculated (the same specification is used for allclasses). Default:1:9.
emModelNames A vector of character strings indicating the models to be fitted in the EM phaseof clustering. Possible models:"E" for spherical, equal variance (one-dimensional)"V" for spherical, variable variance (one-dimensional)"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume, equal shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume, varying shape"EEE": ellipsoidal, equal volume, shape, and orientation"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation
The default is.Mclust$emModelNames .
eps A scalar tolerance for deciding when to terminate computations due to compu-tational singularity in covariances. Smaller values ofeps allow computationsto proceed nearer to singularity. The default is.Mclust$eps .
tol A scalar tolerance for relative convergence of the loglikelihood. The default is.Mclust$tol .
itmax An integer limit on the number of EM iterations. The default is.Mclust$itmax .
equalPro Logical variable indicating whether or not the mixing proportions are equal inthe model. The default is.Mclust$equalPro .
warnSingular A logical value indicating whether or not a warning should be issued whenevera singularity is encountered. The default iswarnSingular=FALSE .
verbose A logical value indicating whether or not to print the models and numbers ofcomponents for each class. Default:verbose=TRUE .
Value
A list in which each element gives the optimal parameters for the model best fitting each classaccording to BIC.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
56 mclustOptions
See Also
summary.mclustDAtrain , mclustDAtest , EMclust , hc , mclustOptions
Examples
n <- 250 ## create artificial dataset.seed(0)par(pty = "s")x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),
matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])xclass <- c(rep(1,n),rep(2,n))## Not run:mclust2Dplot(x, classification = xclass, type="classification", ask=FALSE)## End(Not run)
odd <- seq(1, 2*n, 2)train <- mclustDAtrain(x[odd, ], labels = xclass[odd]) ## training stepsummary(train)
even <- odd + 1test <- mclustDAtest(x[even, ], train) ## compute model densitiesclEven <- summary(test)$class ## classify training setcompareClass(clEven,xclass[even])
mclustOptions Set control values for use with MCLUST.
Description
Supplies a list of values including tolerances for singularity and convergence assessment, and anenumeration of models for use withMCLUST.
Usage
mclustOptions(eps, tol, itmax, equalPro, warnSingular, emModelNames,hcModelName, symbols)
Arguments
eps A scalar tolerance associated with deciding when to terminate computationsdue to computational singularity in covariances. Smaller values ofeps allowcomputations to proceed nearer to singularity. The default is the relative ma-chine precision.Machine$double.eps , which is approximately $2e-16$on IEEE-compliant machines.
tol A vector of length two giving relative convergence tolerances for the loglikeli-hood and for parameter convergence in the inner loop for models with iterativeM-step ("VEI", "VEE", "VVE", "VEV"), respectively. The default isc(1.e-5,1.e-5) .
itmax A vector of length two giving integer limits on the number of EM iterations andon the number of iterations in the inner loop for models with iterative M-step("VEI", "VEE", "VVE", "VEV"), respectively. The default isc(Inf,Inf)allowing termination to be completely governed bytol .
mclustOptions 57
equalPro Logical variable indicating whether or not the mixing proportions are equal inthe model. Default:equalPro = FALSE .
warnSingular A logical value indicating whether or not a warning should be issued whenevera singularity is encountered. The default iswarnSingular = TRUE .
emModelNames A vector of character strings associated with multivariate models in MCLUST.The default includes strings encoding all of the multivariate models available:
"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume and shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume and shape"EEE": ellipsoidal, equal volume, shape, and orientation"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation
hcModelName A vector of two character strings giving the name of the model to be used in thehierarchical clustering phase for univariate and multivariate data, respectively,in EMclust andEMclustN . The default isc("V","VVV") , giving the un-constrained model in each case.
symbols A vector whose entries are either integers corresponding to graphics symbols orsingle characters for plotting for classifications. Classes are assigned symbols inthe given order. The default isc(17,0,10,4,11,18,6,7,3,16,2,12,8,15,1,9,14,13,5) .
Details
mclustOptions is provided for assigning values to the.Mclust list, which is used to supplydefault values to various functions in MCLUST.
Calls tomclustOptions do not in themselves affect the outcome of computations.
Value
A named list in which the names are the names of the arguments and the values are the valuessupplied to the arguments.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
.Mclust
58 me
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]
.Mclust
.Mclust <- mclustOptions(tol = 1.e-6, emModelNames = c("VII", "VVI", "VVV"))
.MclustirisBic <- EMclust(irisMatrix)summary(irisBic, irisMatrix).Mclust <- mclustOptions() # restore default values.Mclust
me EM algorithm starting with M-step for parameterized MVN mixturemodels.
Description
Implements the EM algorithm for parameterized MVN mixture models, starting with the maximiza-tion step.
Usage
me(modelName, data, z, ...)
Arguments
modelName A character string indicating the model:"E": equal variance (one-dimensional)"V": variable variance (one-dimensional)"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume and shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume and shape"EEE": ellipsoidal, equal volume, shape, and orientation"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation
data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.
z A matrix whose[i,k] th entry is the conditional probability of the ith observa-tion belonging to thekth component of the mixture.
... Any number of the following:
eps A scalar tolerance for deciding when to terminate computations due to com-putational singularity in covariances. Smaller values ofeps allow compu-tations to proceed nearer to singularity. The default is.Mclust$eps .
me 59
For those models with iterative M-step ("VEI", "VEV"), two values can beentered foreps , in which case the second value is used for determiningsingularity in the M-step.
tol A scalar tolerance for relative convergence of the loglikelihood. The defaultis .Mclust$tol .For those models with iterative M-step ("VEI", "VEV"), two values can beentered fortol , in which case the second value governs parameter conver-gence in the M-step.
itmax An integer limit on the number of EM iterations. The default is.Mclust$itmax .For those models with iterative M-step ("VEI", "VEV"), two values can beentered foritmax , in which case the second value is an upper limit on thenumber of iterations in the M-step.
equalProLogical variable indicating whether or not the mixing proportions are equalin the model. The default is.Mclust$equalPro .
warnSingularA logical value indicating whether or not a warning should be issued when-ever a singularity is encountered. The default is.Mclust$warnSingular .
noise A logical value indicating whether or not the model includes a Poisson noisecomponent. The default assumes there is no noise component.
Vinv An estimate of the reciprocal hypervolume of the data region. The defaultis determined by applying functionhypvol to the data. Used only whennoise = TRUE .
Value
A list including the following components:
mu A matrix whose kth column is the mean of thekth component of the mixturemodel.
sigma For multidimensional models, a three dimensional array in which the[,,k] thentry gives the the covariance for thekth group in the best model. <br> For one-dimensional models, either a scalar giving a common variance for the groups ora vector whose entries are the variances for each group in the best model.
pro A vector whosekth component is the mixing proportion for thekth componentof the mixture model.
z A matrix whose[i,k] th entry is the conditional probability of theith observa-tion belonging to thekth component of the mixture.
loglik The logliklihood for the data in the mixture model.
modelName A character string identifying the model (same as the input argument).
Attributes: "info" Information on the iteration.
"warn" An appropriate warning if problems are encountered in the computations.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
60 meE
See Also
meE,...,meVVV, em, mstep , estep , mclustOptions
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]
me(modelName = "VVV", data = irisMatrix, z = unmap(irisClass))
meE EM algorithm starting with M-step for a parameterized MVN mixturemodel.
Description
Implements the EM algorithm for a parameterized MVN mixture model, starting with the maxi-mization step.
Usage
meE(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)
meV(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)
meEII(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)
meVII(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)
meEEI(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)
meVEI(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)
meEVI(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)
meVVI(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)
meEEE(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)
meEEV(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)
meVEV(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)
meVVV(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)
Arguments
data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.
meE 61
z A matrix whose[i,k] th entry is the conditional probability of the ith observa-tion belonging to thekth component of the mixture.
eps A scalar tolerance for deciding when to terminate computations due to compu-tational singularity in covariances. Smaller values ofeps allows computationsto proceed nearer to singularity. The default is.Mclust$eps .
tol A scalar tolerance for relative convergence of the loglikelihood values. Thedefault is.Mclust$tol .
itmax An integer limit on the number of EM iterations. The default is.Mclust$itmax .
equalPro Logical variable indicating whether or not the mixing proportions are equal inthe model. The default is.Mclust$equalPro .
warnSingular A logical value indicating whether or not a warning should be issued whenevera singularity is encountered. The default is.Mclust$warnSingular .
noise A logical value indicating whether or not the model includes a Poisson noisecomponent. The default assumes there is no noise component.
Vinv An estimate of the reciprocal hypervolume of the data region. The default isdetermined by applying functionhypvol to the data. Used only whennoise= TRUE.
Value
A list including the following components:
mu A matrix whose kth column is the mean of thekth component of the mixturemodel.
sigma For multidimensional models, a three dimensional array in which the[,,k] thentry gives the the covariance for thekth group in the best model. <br> For one-dimensional models, either a scalar giving a common variance for the groups ora vector whose entries are the variances for each group in the best model.
pro A vector whosekth component is the mixing proportion for thekth componentof the mixture model.
z A matrix whose[i,k] th entry is the conditional probability of theith observa-tion belonging to thekth component of the mixture.
modelName Character string identifying the model.
loglik The logliklihood for the data in the mixture model.
Attributes: The return value also has the following attributes:
"info" : Information on the iteration.
"warn" : An appropriate warning if problems are encountered in the computa-tions.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
62 mstep
See Also
em, me, estep , mclustOptions
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]
meVVV(data = irisMatrix, z = unmap(irisClass))
mstep M-step in the EM algorithm for parameterized MVN mixture models.
Description
Maximization step in the EM algorithm for parameterized MVN mixture models.
Usage
mstep(modelName, data, z, ...)
Arguments
modelName A character string indicating the model:
"E": equal variance (one-dimensional)"V": variable variance (one-dimensional)"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume and shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume and shape"EEE": ellipsoidal, equal volume, shape, and orientation"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation
data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.
z A matrix whose[i,k] th entry is the conditional probability of the ith observa-tion belonging to thekth component of the mixture.
... Any number of the following:
equalPro A logical value indicating whether or not the components in the modelare present in equal proportions. The default is.Mclust$equalPro .
noise A logical value indicating whether or not the model includes a Poissonnoise component. The default assumes there is no noise component.
mstep 63
eps A scalar tolerance for deciding when to terminate computations due to com-putational singularity in covariances. Smaller values ofeps allows com-putations to proceed nearer to singularity. The default is.Mclust$eps .Not used for models "EII", "VII", "EEE", "VVV".
tol For models with iterative M-step ("VEI", "VEE", "VVE", "VEV"), a scalartolerance for relative convergence of the parameters. The default is.Mclust$tol .
itmax For models with iterative M-step ("VEI", "VEE", "VVE", "VEV"), an in-teger limit on the number of EM iterations. The default is.Mclust$itmax .
warnSingular A logical value indicating whether or not a warning should be is-sued whenever a singularity is encountered. The default is.Mclust$warnSingular .Not used for models "EII", "VII", "EEE", "VVV".
Value
A list including the following components:
mu A matrix whose kth column is the mean of thekth component of the mixturemodel.
sigma For multidimensional models, a three dimensional array in which the[,,k] thentry gives the the covariance for thekth group in the best model. <br> For one-dimensional models, either a scalar giving a common variance for the groups ora vector whose entries are the variances for each group in the best model.
pro A vector whosekth component is the mixing proportion for thekth componentof the mixture model.
z A matrix whose[i,k] th entry is the conditional probability of theith observa-tion belonging to thekth component of the mixture.
modelName A character string identifying the model (same as the input argument).
Attributes:
"info" : Information on the iteration.
"warn" : An appropriate warning if problems are encountered in the computa-tions.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
mstepE , . . . ,mstepVVV, me, estep , mclustOptions .
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]
mstep(modelName = "VII", data = irisMatrix, z = unmap(irisClass))
64 mstepE
mstepE M-step in the EM algorithm for a parameterized MVN mixture model.
Description
Maximization step in the EM algorithm for a parameterized MVN mixture model.
Usage
mstepE(data, z, equalPro, noise = FALSE, ...)mstepV(data, z, equalPro, noise = FALSE, ...)mstepEII(data, z, equalPro, noise = FALSE, ...)mstepVII(data, z, equalPro, noise = FALSE, ...)mstepEEI(data, z, equalPro, noise = FALSE, eps, warnSingular, ...)mstepVEI(data, z, equalPro, noise = FALSE, eps, tol, itmax, warnSingular, ...)mstepEVI(data, z, equalPro, noise = FALSE, eps, warnSingular, ...)mstepVVI(data, z, equalPro, noise = FALSE, eps, warnSingular, ...)mstepEEE(data, z, equalPro, noise = FALSE, ...)mstepEEV(data, z, equalPro, noise = FALSE, eps, warnSingular, ...)mstepVVV(data, z, equalPro, noise = FALSE, ...)
Arguments
data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.
z A matrix whose[i,k] th entry is the conditional probability of the ith observa-tion belonging to thekth component of the mixture.
equalPro A logical value indicating whether or not the components in the model arepresent in equal proportions. The default is.Mclust$equalPro .
noise A logical value indicating whether or not the model includes a Poisson noisecomponent. The default assumes there is no noise component.
eps A scalar tolerance for deciding when to terminate computations due to compu-tational singularity in covariances. Smaller values ofeps allows computationsto proceed nearer to singularity. The default is.Mclust$eps .
Not used for models "EII", "VII", "EEE", "VVV".
tol For models with iterative M-step ("VEI", "VEE", "VVE", "VEV"), a scalar tol-erance for relative convergence of the parameters. The default is.Mclust$tol .
itmax For models with iterative M-step ("VEI", "VEE", "VVE", "VEV"), an integerlimit on the number of EM iterations. The default is.Mclust$itmax .
warnSingular A logical value indicating whether or not a warning should be issued whenevera singularity is encountered. The default is.Mclust$warnSingular .
Not used for models "EII", "VII", "EEE", "VVV".
... Provided to allow lists with elements other than the arguments can be passed inindirect or list calls withdo.call .
mvn 65
Value
A list including the following components:
mu A matrix whose kth column is the mean of thekth component of the mixturemodel.
sigma For multidimensional models, a three dimensional array in which the[,,k] thentry gives the the covariance for thekth group in the best model. <br> For one-dimensional models, either a scalar giving a common variance for the groups ora vector whose entries are the variances for each group in the best model.
pro A vector whosekth component is the mixing proportion for thekth componentof the mixture model.
z A matrix whose[i,k] th entry is the conditional probability of theith observa-tion belonging to thekth component of the mixture.
modelName A character string identifying the model (same as the input argument).
Attributes:
"info" Information on the iteration.
"warn" An appropriate warning if problems are encountered in the computa-tions.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
mstep , me, estep , mclustOptions
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]
mstepVII(data = irisMatrix, z = unmap(irisClass))
mvn Multivariate Normal Fit
Description
Computes the mean, covariance, and loglikelihood from fitting a single MVN or Gaussian to givendata.
66 mvn
Usage
mvn( modelName, data)
Arguments
modelName A character string representing a model name. This can be either"Spherical" ,"Diagonal" , or "Ellipsoidal" or an MCLUST-style model name:"E", "V", "X" (one-dimensional)"EII", "VII", "XII" (spherical)"EEI", "VEI", "EVI", "VVI", "XXI" (diagonal)"EEE", "EEV", "VEV", "VVV", "XXX" (ellipsoidal)
data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.
Value
A list of including the parameters of the Gaussian model best fitting the data, and the correspondingloglikelihood for the data under the model.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
mvnX, mvnXII , mvnXXI , mvnXXX, mstep
Examples
n <- 1000
set.seed(0)x <- rnorm(n, mean = -1, sd = 2)mvn(modelName = "X", x)
mu <- c(-1, 0, 1)
set.seed(0)x <- sweep(matrix(rnorm(n*3), n, 3) %*% (2*diag(3)),
MARGIN = 2, STATS = mu, FUN = "+")mvn(modelName = "XII", x)mvn(modelName = "Spherical", x)
set.seed(0)x <- sweep(matrix(rnorm(n*3), n, 3) %*% diag(1:3),
MARGIN = 2, STATS = mu, FUN = "+")mvn(modelName = "XXI", x)mvn(modelName = "Diagonal", x)
mvnX 67
Sigma <- matrix(c(9,-4,1,-4,9,4,1,4,9), 3, 3)set.seed(0)x <- sweep(matrix(rnorm(n*3), n, 3) %*% chol(Sigma),
MARGIN = 2, STATS = mu, FUN = "+")mvn(modelName = "XXX", x)mvn(modelName = "Ellipsoidal", x)
mvnX Multivariate Normal Fit
Description
Computes the mean, covariance, and loglikelihood from fitting a single MVN or Gaussian.
Usage
mvnX(data)mvnXII(data)mvnXXI(data)mvnXXX(data)
Arguments
data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.
Details
mvnXII computes the best fitting Gaussian with the covariance restricted to be a multiple of theidentity. mvnXXI computes the best fitting Gaussian with the covariance restricted to be diagonal.mvnXXXcomputes the best fitting Gaussian with ellipsoidal (unrestricted) covariance.
Value
A list of including the parameters of the Gaussian model best fitting the data, and the correspondingloglikelihood for the data under the model.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
mvn, mstepE
68 partconv
Examples
n <- 1000
set.seed(0)x <- rnorm(n, mean = -1, sd = 2)mvnX(x)
mu <- c(-1, 0, 1)
set.seed(0)x <- sweep(matrix(rnorm(n*3), n, 3) %*% (2*diag(3)),
MARGIN = 2, STATS = mu, FUN = "+")mvnXII(x)
set.seed(0)x <- sweep(matrix(rnorm(n*3), n, 3) %*% diag(1:3),
MARGIN = 2, STATS = mu, FUN = "+")mvnXXI(x)
Sigma <- matrix(c(9,-4,1,-4,9,4,1,4,9), 3, 3)set.seed(0)x <- sweep(matrix(rnorm(n*3), n, 3) %*% chol(Sigma),
MARGIN = 2, STATS = mu, FUN = "+")mvnXXX(x)
partconv Convert partitioning into numerical vector.
Description
partconv converts a partitioning into a numerical vector. The second argument is used to forceconsecutive numbers (default) or not.
Usage
partconv(x, consec=TRUE)
Arguments
x Partitioning. Maybe numerical or not.
consec Logical flag, whether or not to use consecutive class numbers.
Value
Vector of class numbers.
Examples
data(iris)partconv(iris[,5])
cl <- sample(1:10, 25, replace=TRUE)partconv(cl, consec=FALSE)partconv(cl, consec=TRUE)
partuniq 69
partuniq Classifies Data According to Unique Observations
Description
Gives a one-to-one mapping from unique observations to rows of a data matrix.
Usage
partuniq(x)
Arguments
x Matrix of observations.
Value
A vector of lengthnrow(x) with integer entries. An observationk is assigned an integeri when-ever observationi is the first row ofx that is identical to observationk (note thati <= k ).
Examples
data(iris)partuniq(as.matrix(iris[,1:4]))
plot.Mclust Plot Model-Based Clustering Results
Description
Plot model-based clustering results: BIC, classification, uncertainty and (for one- and two-dimensionaldata) density.
Usage
plot.Mclust(x, data, dimens = c(1, 2), scale = FALSE, ...)
Arguments
x Output fromMclust .
data The data used to producex .
dimens An integer vector of length two specifying the dimensions for coordinate pro-jections if the data is more than two-dimensional. The default isc(1,2) (thefirst two dimesions).
scale A logical variable indicating whether or not the two chosen dimensions shouldbe plotted on the same scale, and thus preserve the shape of the distribution.Default: scale=FALSE
... Further arguments to the lower level plotting functions.
70 plot.mclustDA
Value
Plots selected via a menu including the following options: BIC values used for choosing the numberof clusters For data in more than two dimensions, a pairs plot of the showing the classification, co-ordinate projections of the data, showing location of the mixture components, classification, and/oruncertainty. For one- and two- dimensional data, plots showing location of the mixture components,classification, uncertainty, and or density.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
Mclust
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])irisMclust <- Mclust(irisMatrix)
## Not run: plot(irisMclust,irisMatrix)
plot.mclustDA Plotting method for MclustDA discriminant analysis.
Description
Plots training and test data, known training data classification, mclustDA test data classification,and/or training errors.
Usage
plot.mclustDA(x, trainingData, labels, testData, dimens=c(1,2),scale = FALSE, identify=FALSE, ...)
Arguments
x The object produced by applyingmclustDA with trainingData and clas-sificationlabels to testData .
trainingData The numeric vector, matrix, or data frame of training observations used to obtainx .
labels The numeric or character vector assigning a class label to each training observa-tion.
plot.mclustDA 71
testData A numeric vector, matrix, or data frame of training observations. Categoricalvariables are not allowed. If a matrix or data frame, rows correspond to obser-vations and columns correspond to variables.
dimens An integer vector of length two specifying the dimensions for coordinate pro-jections if the data is more than two-dimensional. The default isc(1,2) (thefirst two dimesions).
scale A logical variable indicating whether or not the two chosen dimensions shouldbe plotted on the same scale, and thus preserve the shape of the distribution.Default: scale=FALSE
identify A logical variable indicating whether or not to print a title identifying the plot.Default: identify=FALSE
... Further arguments to the lower level plotting functions.
Value
Plots selected via a menu including the following options: training and test data, known trainingdata classification, mclustDA test data classification, training errors.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
mclustDA
Examples
n <- 250 ## create artificial dataset.seed(0)x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),
matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])xclass <- c(rep(1,n),rep(2,n))## Not run:mclust2Dplot(x, classification = xclass, type="classification", ask=FALSE)## End(Not run)odd <- seq(from = 1, to = 2*n, by = 2)even <- odd + 1testMclustDA <- mclustDA(trainingData = x[odd, ], labels = xclass[odd],
testData = x[even,])
clEven <- testMclustDA$testClassification ## classify training setcompareClass(clEven,xclass[even])
## Not run:plot(testMclustDA, trainingData = x[odd, ], labels = xclass[odd],testData = x[even,])## End(Not run)
72 randProj
randProj Random projections for data in more than two dimensions modelledby an MVN mixture.
Description
Plots random projections given data in more than two dimensions and parameters of an MVN mix-ture model for the data.
Usage
randProj(data, seeds = 0, ...,type = c("classification", "uncertainty", "errors"), ask = TRUE,quantiles = c(0.75,0.95), symbols, scale = FALSE, identify = FALSE,CEX = 1, PCH = ".", xlim, ylim)
Arguments
data A numeric matrix or data frame of observations. Categorical variables are notallowed. If a matrix or data frame, rows correspond to observations and columnscorrespond to variables.
seeds A vector of integers between 0 and 1000, specifying seeds for the random pro-jections. The default value is the single seed 0.
... Any number of the following:
classification A numeric or character vector representing a classification of ob-servations (rows) ofdata .
uncertainty A numeric vector of values in(0,1)giving the uncertainty of eachdata point.
z A matrix in which the[i,k] the entry gives the probability of observationibelonging to thekth class. Used to computeclassification anduncertainty if those arguments aren’t available.
truth A numeric or character vector giving a known classification of each datapoint. If classification orz is also present, this is used for displayingclassification errors.
mu A matrix whose columns are the means of each group.sigma A three dimensional array in whichsigma[,,k] gives the covariance
for thekth group.decomp A list with scale , shape andorientation components giving
an alternative form for the covariance structure of the mixture model.
type Any subset ofc("classification","uncertainty","errors") .The function will produce the corresponding plot if it has been supplied suf-ficient information to do so. If more than one plot is possible then users will beasked to choose from a menu ifask=TRUE.
ask A logical variable indicating whether or not a menu should be produced whenmore than one plot is possible. The default isask=TRUE.
quantiles A vector of length 2 giving quantiles used in plotting uncertainty. The smallestsymbols correspond to the smallest quantile (lowest uncertainty), medium-sized(open) symbols to points falling between the given quantiles, and large (filled)symbols to those in the largest quantile (highest uncertainty). The default is(0.75,0.95).
randProj 73
symbols Either an integer or character vector assigning a plotting symbol to each uniqueclassclassification . Elements insymbols correspond to classes inclassification in order of appearance inclassification (the orderused by the S-PLUS functionunique ). Default: If G is the number of groupsin the classification, the firstG symbols in.Mclust$symbols , otherwise ifG is less than 27 then the firstG capital letters in the Roman alphabet.
scale A logical variable indicating whether or not the two chosen dimensions shouldbe plotted on the same scale, and thus preserve the shape of the distribution.Default: scale=FALSE
identify A logical variable indicating whether or not to add a title to the plot identifyingthe dimensions used.
CEX An argument specifying the size of the plotting symbols. The default value is 1.
PCH An argument specifying the symbol to be used when a classificatiion has notbeen specified for the data. The default value is a small dot ".".
xlim, ylim Arguments specifying bounds for the ordinate, abscissa of the plot. This may beuseful for when comparing plots.
Value
Random projections of the data, possibly showing location of the mixture components, classifica-tion, uncertainty, and classficaition errors.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
coordProj , spinProj , mclust2Dplot , mclustOptions , do.call ,
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]
msEst <- mstepVVV(irisMatrix, unmap(irisClass))
par(pty = "s", mfrow = c(2,3))randProj(irisMatrix, seeds = 0:5, truth=irisClass,
mu = msEst$mu, sigma = msEst$sigma, z = msEst$z)do.call("randProj", c(list(data = irisMatrix, seeds = 0:5, truth=irisClass),
msEst))
74 sigma2decomp
sigma2decomp Convert mixture component covariances to decomposition form.
Description
Converts a set of covariance matrices from representation as a 3-D array to a parameterization byeigenvalue decomposition.
Usage
sigma2decomp(sigma, G, tol, ...)
Arguments
sigma Either a 3-D array whose [„k]th component is the covariance matrix for the kthcomponent in an MVN mixture model, or a single covariance matrix in the casethat all components have the same covariance.
G The number of components in the mixture. Whensigma is a 3-D array, thenumber of components can be inferred from its dimensions.
tol Tolerance for determining whether or not the covariances have equal volume,shape, and or orientation. The default is the square root of the relative machineprecision,sqrt(.Machine$double.eps) , which is about1.e-8 .
... Catch unused arguments from ado.call call.
Value
The covariance matrices for the mixture components in decomposition form, including the follow-ing components:
d The dimension of the data.
G The number of components in the mixture model.
scale Either aG-vector giving the scale of the covariance (thedth root of its determi-nant) for each component in the mixture model, or a single numeric value if thescale is the same for each component.
shape Either aG by d matrix in which thekth column is the shape of the covariancematrix (normalized to have determinant 1) for thekth component, or ad-vectorgiving a common shape for all components.
orientation Either ad by d by G array whose[,,k] th entry is the orthonomal matrix ofeigenvectors of the covariance matrix of thekth component, or ad by d or-thonormal matrix if the mixture components have a common orientation. Theorientation component ofdecomp can be omitted in spherical and diag-onal models, for which the principal components are parallel to the coordinateaxes so that the orientation matrix is the identity.
sim 75
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation, and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
decomp2sigma
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]
meEst <- meEEE(irisMatrix, unmap(irisClass))names(meEst)meEst$sigma
sigma2decomp(meEst$sigma)## Not run:do.call("sigma2decomp", meEst) ## alternative call## End(Not run)
sim Simulate from Parameterized MVN Mixture Models
Description
Simulate data from parameterized MVN mixture models.
Usage
sim(modelName, mu, ..., seed = 0)
Arguments
modelName A character string indicating the model. Possible models:
"E": equal variance (one-dimensional)"V": variable variance (one-dimensional)
"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume, equal shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume, varying shape"EEE": ellipsoidal, equal volume, shape, and orientation
76 sim
"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation
mu The mean for each component. If there is more than one component,mu is amatrix whose columns are the means of the components.
... Arguments for model-specific functions. Specifically:
• An argument describing the variance (depends on the model):
sigmasq for the one-dimensional models ("E", "V") and spherical models("EII", "VII"). This is either a vector whosekth component is the vari-ance for thekth component in the mixture model ("V" and "VII"), ora scalar giving the common variance for all components in the mixturemodel ("E" and "EII").
decomp for the diagonal models ("EEI", "VEI", "EVI", "VVI") and someellipsoidal models ("EEV", "VEV"). This is a list described incdens .
Sigma for the equal variance model "EEE". Ad by d matrix giving thecommon covariance for all components of the mixture model.
sigma for the unconstrained variance model "VVV". Ad by d by G ma-trix array whose[,,k] th entry is the covariance matrix for thekthcomponent of the mixture model.The form of the variance specification is the same as for the output fortheem, me, or mstep methods for the specified mixture model.
pro Component mixing proportions. If missing, equal proportions are assumed.
n An integer specifying the number of data points to be simulated.
seed A integer between 0 and 1000, inclusive, for specifying a seed for random classassignment. The default value is 0.
Details
This function can be used with an indirect or list call usingdo.call , allowing the output ofe.g. mstep , em, me, or EMclust to be passed directly without the need to specify individualparameters as arguments.
Value
A data set consisting of n points simulated from the specified MVN mixture model.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
simE , . . . ,simVVV, EMclust , mstep , do.call
simE 77
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])
irisBic <- EMclust(irisMatrix)irisSumry <- summary(irisBic,irisMatrix)names(irisSumry)irisSim <- sim(modelName = irisSumry$modelName, n = dim(irisMatrix)[1],
mu = irisSumry$mu, decomp = irisSumry$decomp, pro = irisSumry$pro)## Not run:irisSim <- do.call("sim", irisSumry) ## alternative call## End(Not run)
par(pty = "s", mfrow = c(1,2))dimens <- c(1,2)xlim <- range(rbind(irisMatrix,irisSim)[,dimens][,1])ylim <- range(rbind(irisMatrix,irisSim)[,dimens][,2])
cl <- irisSumry$classificationcoordProj(irisMatrix, par=irisSumry, classification=cl, dimens=dimens,
xlim=xlim, ylim=ylim)cl <- attr(irisSim,"classification")coordProj(irisSim, par=irisSumry, classification=cl, dimens=dimens,
xlim=xlim, ylim=ylim)
irisSumry3 <- summary(irisBic,irisMatrix, G=3)irisSim3 <- do.call("sim", c(list(n = 500, seed = 1), irisSumry3))clPairs(irisSim3, cl = attr(irisSim3,"classification"))
simE Simulate from a Parameterized MVN Mixture Model
Description
Simulate data from a parameterized MVN mixture model.
Usage
simE(mu, sigmasq, pro, ..., seed = 0)simV(mu, sigmasq, pro, ..., seed = 0)simEII(mu, sigmasq, pro, ..., seed = 0)simVII(mu, sigmasq, pro, ..., seed = 0)simEEI(mu, decomp, pro, ..., seed = 0)simVEI(mu, decomp, pro, ..., seed = 0)simEVI(mu, decomp, pro, ..., seed = 0)simVVI(mu, decomp, pro, ..., seed = 0)simEEE(mu, pro, ..., seed = 0)simEEV(mu, decomp, pro, ..., seed = 0)simVEV(mu, decomp, pro, ..., seed = 0)simVVV mu, pro, ..., seed = 0)
78 simE
Arguments
mu The mean for each component. If there is more than one component,mu is amatrix whose columns are the means of the components.
sigmasq for the one-dimensional models ("E", "V") and spherical models ("EII", "VII").This is either a vector whosekth component is the variance for thekth com-ponent in the mixture model ("V" and "VII"), or a scalar giving the commonvariance for all components in the mixture model ("E" and "EII").
decomp for the diagonal models ("EEI", "VEI", "EVI", "VVI") and some ellipsoidalmodels ("EEV", "VEV"). This is a list described incdens .
pro Component mixing proportions. If missing, equal proportions are assumed.
...
Other terms describing variance:
Sigma for the equal variance model "EEE". Ad by d matrix giving the commoncovariance for all components of the mixture model.
sigma for the unconstrained variance model "VVV". Ad by d by G matrixarray whose[,,k] th entry is the covariance matrix for thekth componentof the mixture model.The form of the variance specification is the same as for the output for theem, me, or mstep methods for the specified mixture model.
n An integer specifying the number of data points to be simulated.
seed A integer between 0 and 1000, inclusive, for specifying a seed for random classassignment. The default value is 0.
Details
This function can be used with an indirect or list call usingdo.call , allowing the output ofe.g. mstep , em me, or EMclust , to be passed directly without the need to specify individualparameters as arguments.
Value
A data set consisting ofn points simulated from the specified MVN mixture model.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
sim , EMclust , mstepE , do.call
spinProj 79
Examples
d <- 2G <- 2scale <- 1shape <- c(1, 9)
O1 <- diag(2)O2 <- diag(2)[,c(2,1)]O <- array(cbind(O1,O2), c(2, 2, 2))O
decomp <- list(d= d, G = G, scale = scale, shape = shape, orientation = O)mu <- matrix(0, d, G) ## center at the originsimdat <- simEEV(n=200, mu=mu, decomp=decomp, pro = c(1,1))
cl <- attr(simdat, "classification")sigma <- array(apply(O, 3, function(x,y) crossprod(x*y),
y = sqrt(scale*shape)), c(2,2,2))paramList <- list(mu = mu, sigma = sigma)coordProj( simdat, paramList = paramList, classification = cl)
spinProj Planar spin for random projections of data in more than two dimen-sions modelled by an MVN mixture.
Description
Plots random 2-D projections with suggessive rotations through a specified angles given data inmore than two dimensions and parameters of an MVN mixture model.
Usage
spinProj(data, ..., angles, seed = 0, reflection = FALSE,type = c("classification", "uncertainty", "errors"),ask = TRUE, quantiles = c(0.75,0.95), symbols, scale = FALSE,identify = FALSE, CEX = 1, PCH = ".", xlim, ylim)
Arguments
data A numeric matrix or data frame of observations. Categorical variables are notallowed. If a matrix or data frame, rows correspond to observations and columnscorrespond to variables.
... Any number of the following:
classification A numeric or character vector representing a classification of ob-servations (rows) ofdata .
uncertainty A numeric vector of values in(0,1)giving the uncertainty of eachdata point.
z A matrix in which the[i,k] the entry gives the probability of observationibelonging to thekth class. Used to computeclassification anduncertainty if those arguments aren’t available.
80 spinProj
truth A numeric or character vector giving a known classification of each datapoint. If classification orz is also present, this is used for displayingclassification errors.
mu A matrix whose columns are the means of each group.
sigma A three dimensional array in whichsigma[,,k] gives the covariancefor thekth group.
decomp A list with scale , shape andorientation components givingan alternative form for the covariance structure of the mixture model.
angles The angles (in radians) through which successive projections should be rotatedor reflected.
seed A integer between 0 and 1000, inclusive, for specifying a seed for generatingthe initial random projection. The default value is 0. The seed/projection corre-spondence is the same as inrandProj .
reflection A logical variable telling whether or not the data should be reflected or rotatedthrough the given angles. The default is rotation.
type Any subset ofc("classification","uncertainty","errors") .The function will produce the corresponding plot if it has been supplied suf-ficient information to do so. If more than one plot is possible then users will beasked to choose from a menu ifask=TRUE.
ask A logical variable indicating whether or not a menu should be produced whenmore than one plot is possible. The default isask=TRUE.
quantiles A vector of length 2 giving quantiles used in plotting uncertainty. The smallestsymbols correspond to the smallest quantile (lowest uncertainty), medium-sized(open) symbols to points falling between the given quantiles, and large (filled)symbols to those in the largest quantile (highest uncertainty). The default is(0.75,0.95).
symbols Either an integer or character vector assigning a plotting symbol to each uniqueclassclassification . Elements insymbols correspond to classes inclassification in order of appearance inclassification (the orderused by the S-PLUS functionunique ). Default: If G is the number of groupsin the classification, the firstG symbols in.Mclust$symbols , otherwise ifG is less than 27 then the firstG capital letters in the Roman alphabet.
scale A logical variable indicating whether or not the two chosen dimensions shouldbe plotted on the same scale, and thus preserve the shape of the distribution.Default: scale=FALSE
identify A logical variable indicating whether or not to add a title to the plot identifyingthe dimensions used.
CEX An argument specifying the size of the plotting symbols. The default value is 1.
PCH An argument specifying the symbol to be used when a classificatiion has notbeen specified for the data. The default value is a small dot ".".
xlim, ylim Arguments specifying bounds for the ordinate, abscissa of the plot. This may beuseful for when comparing plots.
Value
Rotations or reflections of a random projection of the data, possibly showing location of the mixturecomponents, classification, uncertainty and/or classfication errors.
summary.EMclust 81
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
coordProj , randProj , mclust2Dplot , mclustOptions , do.call
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]
msEst <- mstepVVV(irisMatrix, unmap(irisClass))
par(pty = "s", mfrow = c(2,2))spinProj(irisMatrix, seed = 1, truth=irisClass,
mu = msEst$mu, sigma = msEst$sigma, z = msEst$z)do.call("spinProj", c(list(data = irisMatrix, seeds = 2, truth=irisClass),
msEst))
summary.EMclust Summary function for EMclust
Description
Optimal model characteristics and classification forEMclust results.
Usage
summary.EMclust(object, data, G, modelNames, ...)
Arguments
object An "EMclust" object, which is the result of applyingEMclust to data .
data The matrix or vector of observations used to generate ‘object’.
G A vector of integers giving the numbers of mixture components (clusters) overwhich the summary is to take place (as.character(G) must be a subset ofthe column names ofobject ). The default is to summarize over all of thenumbers of mixture components used in the original analysis.
modelNames A vector of character strings denoting the models over which the summary isto take place (must be a subset of the row names of ‘object’). The default is tosummarize over all models used in the original analysis.
... Not used. For generic/method consistency.
82 summary.EMclustN
Value
A list giving the optimal (according to BIC) parameters, conditional probabilitiesz , and loglikeli-hood, together with the associated classification and its uncertainty.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
EMclust
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])
irisBic <- EMclust(irisMatrix)summary(irisBic, irisMatrix)summary(irisBic, irisMatrix, G = 1:6, modelName = c("VII", "VVI", "VVV"))
summary.EMclustN summary function for EMclustN
Description
Optimal model characteristics and classification forEMclustN results.
Usage
summary.EMclustN(object, data, G, modelNames, ...)
Arguments
object An "EMclustN" object, whch is the result of a pplyingEMclustN to datawith an initail noise estimate.
data The matrix or vector of observations used to generate ‘object’.
G A vector of integers giving the numbers of mixture components (clusters) overwhich the summary is to take place (as.character(G) must be a subsetof the column names of ‘object’). The default is to summarize over all of thenumbers of mixture components used in the original analysis.
modelNames A vector of character strings denoting the models over which the summary isto take place (must be a subset of the row names of ‘object’). The default is tosummarize over all models used in the original analysis.
... Not used. For generic/method consistency.
summary.Mclust 83
Value
A list giving the optimal (according to BIC) parameters, conditional probabilitiesz , and loglikeli-hood, together with the associated classification and its uncertainty.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
EMclustN
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])
b <- apply( irisMatrix, 2, range)n <- 450set.seed(0)poissonNoise <- apply(b, 2, function(x, n=n)
runif(n, min = x[1]-0.1, max = x[2]+.1), n = n)set.seed(0)noiseInit <- sample(c(TRUE,FALSE),size=150+450,replace=TRUE,prob=c(3,1))irisNoise <- rbind(irisMatrix, poissonNoise)
Bic <- EMclustN(data=irisNoise, noise = noiseInit)summary(Bic, irisNoise)summary(Bic, irisNoise, G = 0:6, modelName = c("VII", "VVI", "VVV"))
summary.Mclust Very brief summary of an Mclust object.
Description
Function gives a brief summary of an Mclust object: the type of model that is picked and the numberof clusters.
Usage
summary.Mclust(object, ...)
Arguments
object The result of a call to functionMclust .
... Not used.
84 summary.mclustDAtest
summary.mclustDAtestClassification and posterior probability from mclustDAtest.
Description
Classifications frommclustDAtest and the corresponding posterior probabilities.
Usage
summary.mclustDAtest(object, pro, ...)
Arguments
object The output ofmclustDAtest .
pro Prior probabilities for each class in the training data.
... Not used. For generic/method consistency.
Value
A list with the following two components:
classficationThe classification frommclustDAtest
z Matrix of posterior probabilities in which the[i,j] th entry is the probabilityof observationi belonging to classj.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
mclustDAtest
Examples
set.seed(0)n <- 100 ## create artificial data
x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])
xclass <- c(rep(1,n),rep(2,n))## Not run:par(pty = "s")mclust2Dplot(x, classification = xclass, type="classification", ask=FALSE)## End(Not run)
summary.mclustDAtrain 85
odd <- seq(1, 2*n, 2)train <- mclustDAtrain(x[odd, ], labels = xclass[odd]) ## training stepsummary(train)
even <- seq(1, 2*n, 2)test <- mclustDAtest(x[even, ], train) ## compute model densitiestestSummary <- summary(test) ## classify training set
names(testSummary)testSummary$classtestSummary$z
summary.mclustDAtrainModels and classifications from mclustDAtrain
Description
The models selected inmclustDAtrain and the corresponding classfications.
Usage
summary.mclustDAtrain(object, ...)
Arguments
object The output ofmclustDAtrain .
... Not used. For generic/method consistency.
Value
A list identifying the model selected bymclustDAtrain for each class of training data and thecorresponding classification.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
mclustDAtrain
86 surfacePlot
Examples
set.seed(0)n <- 100 ## create artificial data
x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])
xclass <- c(rep(1,n),rep(2,n))## Not run:par(pty = "s")mclust2Dplot(x, classification = xclass, type="classification", ask=FALSE)## End(Not run)
odd <- seq(1, 2*n, 2)train <- mclustDAtrain(x[odd, ], labels = xclass[odd]) ## training stepsummary(train)
surfacePlot Density or uncertainty surface for two dimensional mixtures.
Description
Plots a density or uncertainty surface given data in more than two dimensions and parameters of anMVN mixture model for the data.
Usage
surfacePlot(data, mu, pro, ..., type = c("contour", "image", "persp"),what = c("density", "uncertainty", "skip"),transformation = c("none", "log", "sqrt"),grid = 50, nlevels = 20, scale = FALSE, identify = FALSE,verbose = FALSE, xlim, ylim, swapAxes = FALSE)
Arguments
data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.
mu A matrix whose columns are the means of each group.
pro A list with scale , shape andorientation components giving an alterna-tive form for the covariance structure of the mixture model.
... An argument specifying the covariance structure of the model. If used an indi-rect function call viado.call (see example below), it is usually not necessaryto know the precise form for this argument. This argument usually take one ofthe following forms:
sigma A three dimensional array in whichsigma[,,k] gives the covariancefor thekth group.
decomp A list with scale , shape andorientation components givingan alternative form for the covariance structure of the mixture model.
type Any subset ofc("contour","image","persp") indicating the plot type.For more than one selection, users will be asked to choose from a menu.
surfacePlot 87
what Any subset ofc("density","uncertainty","skip") indicating whatto plot. For more than one selection, users will be asked to choose from a menu.The "skip" produces and empty plot, which may be useful if multiple plotsare displayed simultaneously.
transformationAny subset ofc("none","log","sqrt") indicating a transformation tobe applied to the surface values before plotting. For more than one selection,users will be asked to choose from a menu.
grid The number of grid points (evenly spaced on each axis). The mixture densityand uncertainty is computed atgrid x grid points to produce the surfaceplot. Default:50 .
nlevels The number of levels to use for a contour plot. Default:20 .
scale A logical variable indicating whether or not the two chosen dimensions shouldbe plotted on the same scale, and thus preserve the shape of the distribution.Default: scale=F
identify A logical variable indicating whether or not to add a title to the plot identifyingthe dimensions used.
verbose A logical variable telling whether or not to print an indication that the functionis in the process of computing values at the grid points, which typically takessome time to complete.
xlim, ylim An argument specifying bounds for the ordinate, abscissa of the plot. This maybe useful for when comparing plots.
swapAxes A logical variable indicating whether or not the axes should be swapped for theplot.
Value
An invisible list with components x, y, and z in which x and y are the values used to define the gridand z is the transformed density or uncertainty at the grid points.
Side Effects
One or more plots showing location of the mixture components, classification, uncertainty, and/orclassification errors.
Details
For an image plot, a color scheme may need to be selected on the display device in order to viewthe plot.
References
C. Fraley and A. E. Raftery (2002). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
mclust2Dplot , do.call
88 uncerPlot
Examples
n <- 250 ## create artificial dataset.seed(0)x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),
matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])xclass <- c(rep(1,n),rep(2,n))
xEMclust <- summary(EMclust(x),x)surfacePlot(x, mu = xEMclust$mu, sigma = xEMclust$sigma, pro=xEMclust$pro,
type = "contour", what = "density", transformation = "none")
## Not run: do.call("surfacePlot", c(list(data = x), xEMclust))
uncerPlot Uncertainty Plot for Model-Based Clustering
Description
Plots the uncertainty in converting a conditional probablility from EM to a classification in model-based clustering.
Usage
uncerPlot(z, truth, ...)
Arguments
z A matrix whose[i,k] th entry is the conditional probability of the ith observationbelonging to thekth component of the mixture.
truth A numeric or character vector giving the true classification of the data.
... Provided to allow lists with elements other than the arguments can be passed inindirect or list calls withdo.call .
Details
Whentruth is provided and the number of classes is compatible withz , the functioncompareClassis used to to find best correspondence between classes intruth andz .
Value
A plot of the uncertainty profile of the data, with uncertainties in increasing order of magnitude.If truth is supplied and the number of classes is the same as the number of columns ofz , theuncertainty of the misclassified data is marked by vertical lines on the plot.
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST: Software for model-based clustering, densityestimation and discriminant analysis. Technical Report, Department of Statistics, University ofWashington. Seehttp://www.stat.washington.edu/mclust .
unmap 89
See Also
EMclust , em, me, mapClass
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])
irisBic <- EMclust(irisMatrix)irisSumry3 <- summary(irisBic, irisMatrix, G = 3)
uncerPlot(z = irisSumry3$z)
uncerPlot(z = irisSumry3$z, truth = rep(1:3, rep(50,3)))
do.call("uncerPlot", c(irisSumry3, list(truth = rep(1:3, rep(50,3)))))
unmap Indicator Variables given Classification
Description
Converts a classification into a matrix of indicator variables.
Usage
unmap(classification, noise, ...)
Arguments
classificationA numeric or character vector. Typically the distinct entries of this vector wouldrepresent a classification of observations in a data set.
noise A single numeric or character value used to indicate observations correspondingto noise.
... Provided to allow lists with elements other than the arguments can be passed inindirect or list calls withdo.call .
Value
An n by m matrix of (0,1) indicator variables, wheren is the length ofclassification andm is the number of unique values or symbols inclassification . Columns are labeled by theunique values inclassification , and the[i,j] th entry is1 if classification[i] isthejth unique value or symbol in order of appearance in theclassification . If a noise valueof symbol is designated, the corresponding indicator variables are located in the last column of thematrix.
90 unmap
References
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .
See Also
map, estep , me
Examples
data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]
z <- unmap(irisClass)z
emEst <- me(modelName = "VVV", data = irisMatrix, z = z)emEst$z
map(emEst$z)
Index
∗Topic clusterbic , 8bicE , 10bicEMtrain , 11cdens , 12cdensE , 15classError , 18clPairs , 17compareClass , 19coordProj , 20cv1EMtrain , 22decomp2sigma , 23Defaults.Mclust , 1dens , 24density , 26em, 28EMclust , 3EMclustN , 5emE, 31estep , 33estepE , 36grid1 , 38hc , 39hcE, 40hclass , 42hypvol , 43map, 44mapClass , 45Mclust , 7mclust1Dplot , 46mclust2Dplot , 48mclustDA , 50mclustDAtest , 52mclustDAtrain , 53mclustOptions , 54me, 56meE, 58mstep , 60mstepE , 62mvn, 64mvnX, 65partconv , 66partuniq , 67
plot.Mclust , 68plot.mclustDA , 69randProj , 70sigma2decomp , 72sim , 74simE , 76spinProj , 77summary.EMclust , 80summary.EMclustN , 81summary.Mclust , 82summary.mclustDAtest , 82summary.mclustDAtrain , 83surfacePlot , 84uncerPlot , 86unmap, 88
∗Topic datasetschevron , 17diabetes , 28lansing , 43
∗Topic distributiondensity , 26
∗Topic internalmclust-internal , 46
∗Topic smoothdensity , 26
.Mclust , 56
.Mclust (Defaults.Mclust ), 1[.EMclust (mclust-internal ), 46[.EMclustN (mclust-internal ), 46[.mclustDAtest (mclust-internal ),
46
bic , 8, 11bicE , 9, 10bicEEE (bicE ), 10bicEEI (bicE ), 10bicEEV (bicE ), 10bicEII (bicE ), 10bicEMtrain , 11, 22bicEVI (bicE ), 10bicV (bicE ), 10bicVEI (bicE ), 10bicVEV (bicE ), 10bicVII (bicE ), 10
91
92 INDEX
bicVVI (bicE ), 10bicVVV , 9bicVVV (bicE ), 10bw.nrd , 27
cdens , 12, 15, 16, 25, 29, 31, 32, 34, 36, 74,76
cdensE , 14, 15cdensEEE (cdensE ), 15cdensEEI (cdensE ), 15cdensEEV (cdensE ), 15cdensEII (cdensE ), 15cdensEVI (cdensE ), 15cdensV (cdensE ), 15cdensVEI (cdensE ), 15cdensVEV (cdensE ), 15cdensVII (cdensE ), 15cdensVVI (cdensE ), 15cdensVVV , 14cdensVVV (cdensE ), 15charconv (mclust-internal ), 46chevron , 17classError , 18, 20, 45, 51classErrors (classError ), 18clPairs , 17, 21, 47, 49compareClass , 19, 19, 51coordProj , 18, 20, 47, 49, 72, 79cv1EMtrain , 12, 22
decomp2sigma , 23, 73Defaults.Mclust , 1dens , 14, 16, 24, 38, 44density , 26, 27diabetes , 28do.call , 9, 11, 14, 16, 21, 25, 30, 33, 35, 37,
47, 49, 72, 75, 77, 79, 86
em, 2, 28, 33, 35, 37, 45, 58, 60, 87EMclust , 2, 3, 6, 8, 9, 11, 14, 16, 54, 75, 77,
80, 87EMclustN , 4, 5, 81emE, 30, 31emEEE(emE), 31emEEI (emE), 31emEEV(emE), 31emEII (emE), 31emEVI (emE), 31emV(emE), 31emVEI (emE), 31emVEV(emE), 31emVII (emE), 31emVVI (emE), 31emVVV, 30
emVVV(emE), 31estep , 2, 9, 30, 33, 37, 45, 58, 60, 62, 63, 88estep2 (mclust-internal ), 46estepE , 11, 35, 36estepEEE (estepE ), 36estepEEI (estepE ), 36estepEEV (estepE ), 36estepEII (estepE ), 36estepEVI (estepE ), 36estepV (estepE ), 36estepVEI (estepE ), 36estepVEV (estepE ), 36estepVII (estepE ), 36estepVVI (estepE ), 36estepVVV , 35estepVVV (estepE ), 36
grid1 , 25, 38, 44grid2 (grid1 ), 38
hc , 4, 6, 39, 41, 42, 54hcE, 39, 40, 40, 42hcEEE (hcE), 40hcEII (hcE), 40hclass , 40, 41, 42hcV (hcE), 40hcVII (hcE), 40hcVVV, 40hcVVV (hcE), 40hist , 27hypvol , 43
lansing , 38, 43
map, 44, 88mapClass , 19, 20, 45, 45, 87Mclust , 7, 68mclust-internal , 46mclust1Dplot , 46mclust2Dplot , 21, 47, 48, 72, 79, 86mclust2DplotControl
(mclust-internal ), 46mclustDA , 50, 70mclustDAtest , 14, 51, 52, 54, 83mclustDAtrain , 2, 14, 51, 52, 53, 84mclustOptions , 2, 4, 6, 9, 11, 14, 16, 18,
21, 25, 30, 33, 35, 37, 49, 54, 54, 58,60, 62, 63, 72, 79
mclustProjControl(mclust-internal ), 46
me, 2, 4, 6, 30, 45, 56, 60, 62, 63, 87, 88meE, 58, 58meEEE(meE), 58
INDEX 93
meEEI (meE), 58meEEV(meE), 58meEII (meE), 58meEVI (meE), 58meV(meE), 58meVEI (meE), 58meVEV(meE), 58meVII (meE), 58meVVI (meE), 58meVVV, 58meVVV(meE), 58mstep , 2, 14, 16, 30, 33, 35, 37, 58, 60, 63,
64, 75mstepE , 62, 62, 66, 77mstepEEE (mstepE ), 62mstepEEI (mstepE ), 62mstepEEV (mstepE ), 62mstepEII (mstepE ), 62mstepEVI (mstepE ), 62mstepV (mstepE ), 62mstepVEI (mstepE ), 62mstepVEV (mstepE ), 62mstepVII (mstepE ), 62mstepVVI (mstepE ), 62mstepVVV, 62mstepVVV (mstepE ), 62mvn, 64, 66mvn2plot (mclust-internal ), 46mvnX, 64, 65mvnXII , 64mvnXII (mvnX), 65mvnXXI , 64mvnXXI (mvnX), 65mvnXXX, 64mvnXXX(mvnX), 65
nextPerm (mclust-internal ), 46
orth2 (mclust-internal ), 46
pairs , 18partconv , 66partuniq , 67plot.density , 27plot.EMclust (EMclust ), 3plot.EMclustN (EMclustN ), 5plot.Mclust , 8, 68plot.mclustDA , 51, 69print.density (density ), 26print.EMclust (EMclust ), 3print.EMclustN (EMclustN ), 5print.Mclust (Mclust ), 7print.mclustDA (mclustDA ), 50
print.summary.EMclust(summary.EMclust ), 80
print.summary.EMclustN(summary.EMclustN ), 81
randProj , 21, 49, 70, 79
shapeO (mclust-internal ), 46sigma2decomp , 24, 72sim , 74, 77simE , 75, 76simEEE (simE ), 76simEEI (simE ), 76simEEV (simE ), 76simEII (simE ), 76simEVI (simE ), 76simV (simE ), 76simVEI (simE ), 76simVEV (simE ), 76simVII (simE ), 76simVVI (simE ), 76simVVV, 75simVVV (simE ), 76spinProj , 49, 72, 77summary.EMclust , 4, 80summary.EMclustN , 6, 81summary.Mclust , 82summary.mclustDAtest , 52, 82summary.mclustDAtrain , 54, 83surfacePlot , 49, 84
table , 19, 20, 45traceW (mclust-internal ), 46
uncerPlot , 86unchol (mclust-internal ), 46unmap, 45, 88
vecnorm (mclust-internal ), 46