CLADAG is a member of the International Federation of ... · CLADAG is a member of the...
Transcript of CLADAG is a member of the International Federation of ... · CLADAG is a member of the...
CLADAG is a member of the International Federation of ClassificationSocieties(IFCS).Amongitsactivities,CLADAGorganizesabiennialscientificmeeting, schools related to classification and data analysis, publishes anewsletter, and cooperates with othermember societies of the IFCS to theorganization of their conferences. The scientific program comprises threeKeynote Lectures, an Invited Session, 10 Specialized Sessions, 15 SolicitedSessions and 15 Contributed Sessions. All the Specialized and SolicitedSessions have been promoted by the members of the Scientific ProgramCommittee. The organizers wish to thank them for their cooperation incontributingtothesuccessofCLADAG2015.TheBookofAbstractscontainsshortpapersofallthepresentationsscheduledintheconferenceprogram.Itis organized according to type of session/lecture: Keynote Lectures,SpecializedSessions,SolicitedSessionsandContributedSessions.
CLADAG2015
10thScientificMeetingoftheClassificationandDataAnalysisGroup
oftheItalianStatisticalSociety
FlamingoResort,SantaMargheritadiPula,October8-10,2015
BOOKOFABSTRACTS
Editors:
FrancescoMola,ClaudioConversano
CUECEditricebySardegnaNovamediaSoc.Coop.ViaBasilicatan.57/5909127Cagliari,ItalyTel.&Fax(+39)[email protected]
FirstElectronicEditionCUEC©2016ISBN:978-88-8467-949-9Euro6,90
eBookDesign:GiovanniCaprioliwww.servizi-per-editoria.it•[email protected]
ParticipatingOrganizations
TableofContents
PrefaceConferenceThemesCommitteesKEYNOTELECTURES
MiningTextNetworks[DavidBanks]
Variableselectionformodel-basedclusteringofcategoricaldata[BrendanMurphy]
Eigenvaluesinmixturemodeling:geometric,robustnessandcomputationalissues[SalvatoreIngrassia]
SPECIALIZEDSESSION•RobustmethodsfortheanalysisofEconomic(Big)data[OrganizerandChair:SilviaSalini]
FastandRobustSeeminglyUnrelatedRegression[MiaHubert,TimVerdonck,OzlemYorulmaz]
ApplicationtotheDetectionofCustomsFraudoftheGoodness-of-fitTestingfortheNewcomb-BenfordLaw[LucioBarabesi,AndreaCerasa,AndreaCerioli,DomenicoPerrotta]
MonitoringtheRobustAnalysisofaSingleMultivariateSample[MarcoRiani,AnthonyC.Atkinson,AndreaCerioli]
•Bayesiannonparametricclustering[Organizer:FabrizioRuggeriChair:RenataRotondi]
ABaysiannonparametricApproachtoModelAssociationbetweenClustersofSNPsandDiseaseResponses[RaffaeleArgiento,AlessandraGuglielmi,ChuhsingKateHsiao,FabrizioRuggeri,CharlotteWang]
ABayesiannonparametricModelforClusteringandBorrowingInformation[AntonioLijoi,BernardoNipoti,IgorPrünster]
SequentialClusteringbasedonDirichletProcessPriors[RobertoCasarin,AndreaPastore,StefanoF.Tonellato]
•CausalInferencewithComplexDataStructures[OrganizerandChair:AlessandraMattei]
ShorttermimpactofPM10exposureonmortality:Apropensityscoreapproach[MichelaBaccini,AlessandraMattei,FabriziaMealli]
IdentificationandEstimationofCausalMechanismsinClusteredEncouragementDesigns:DisentanglingBedNetsusingBayesianPrincipalStratification[LauraForastiere,FabriziaMealli,TylervanderWeele]
Theeffectsofadropoutpreventionprogramonsecondarystudents’outcomes[EnricoConti,SilviaDuranti,AlessandraMattei,FabriziaMealli,NicolaSciclone]
•ClusteringinTimeSeries[OrganizerandChair:MicheleLa
Rocca]
ProbabilisticBoosted-OrientedClusteringofTimeSeries[AntonioD'Ambrosio,GianlucaFrasso,CarmelaIorio,RobertaSiciliano]
Copula-basedfuzzyclusteringoftimeseries[PierpaoloD'Urso,MartaDisegna,FabrizioDurante]
Comparingmulti-stepaheadforecastingfunctionsfortimeseriesclustering[MarcellaCorduas,GiancarloRagozini]
•MultiwayAnalysis[OrganizerandChair:GiuseppeBove]
(Interactive)visualisationofthreewaydata[CasperJ.Albers,JohnC.Gower]
Robustfuzzyclusteringofmultivariatetimetrajectories[PierpaoloD'Urso,RiccardoMassari]
EstimationproceduresforavoidingdegeneratesolutionsinCandecomp/Parafac[PaoloGiordani]
•BigDataAnalysis[OrganizerandChair:DonatoMalerba]
Towardsastatisticalframeworkforattributecomparisoninverylargerelationaldatabases[CesareAlippi,ElisaQuintarelli,ManuelRoveri,LetiziaTanca]
MiningBigDatawithhighperformancecomputingsolutions[FabrizioAngiulli,StefanoBasta,StefanoLodi,GianlucaMoro,ClaudioSartori]
EnhancingBigDataExplorationwithFacetedBrowsing[SoniaBergamaschi,GiovanniSimoniniandSongZhu]
•NewMethodologiesforCompositeIndicators[OrganizerandChair:AgostinoDiCiaccio]
AdvancesinComposite-basedPathModelingforSyntheticIndicators[VincenzoEspositoVinzi,LauraTrinchera,GiorgioRussolillo]
CompositeIndicatorsModeling[MaurizioVichi]
Measuringtheimportanceofvariablesincompositeindicators[WilliamBecker,MichaelaSaisana,PaoloParuolo,AndreaSaltelli]
•Clusteranalysissoftwareandvalidation[OrganizerandChair:ChristianHennig]
AdaptiveChoiceOfInputParametersInRobustClustering[LuisA.Garcìa-Escudero,AugustinMayo-Iscar]
RobustModel-basedClusteringwithCovarianceMatrixConstraints[PietroCoretto,ChristianHennig]
FlexibleImplementationofResamplingSchemesforClusterValidation[FriedrichLeisch]
•Selectingamixturemodelwithaclusteringfocus[OrganizerandChair:GillesCeleux]
ClusteringinfinitemixturesusinganIntegratedCompleted
Likelihoodcriterion[MarcoBertoletti,NialFrielandRiccardoRastelli]
EstimationandModelSelectionforModel-BasedClusteringwiththeConditionalClassificationLikelihood[Jean-PatrickBaudry]
OnthedifferentwaystocomputetheIntegratedCompletedLikelihoodcriterion[GillesCeleux]
•Exploringrelationshipsbetweenblocksofvariables[OrganizerandChair:GiorgioRussolillo]
WeightedMultiblockClustering[NdéyeNiang,MoryOuattara]
ThematicModelExplorationthroughMultipleCo-Structuremaximisation:MethodandSoftware[XavierBry,ThomasVerron]
ANewComponent-basedApproachofRegularisationforMultivariateGeneralisedLinearRegression[CatherineTrottier,XavierBry,FredericMortier,GuillaumeCornu]
SOLICITEDSESSION•AdvancesinDensity-basedclustering[OrganizerandChair:FrancescaGreselin]
ANonparametricClusteringmethodforImageSegmentation[GiovannaMenardi]
RobustClusteringforHeterogenousSkewData[LuisA.
Garcìa-Escudero,FrancescaGreselin,AgustinMayo-Iscar]
RegularizingfinitemixturesofGaussianDistributions[BettinaGrün,GertraudMalsiner-Walli]
•LatentvariablemodelsforlongitudinaldataPartI[OrganizerandChair:SilviaBacci]
AJointModelForLongitudinalandSurvivalDataBasedonanAR(1)LatentProcess[SilviaBacci,FrancescoBartolucci,SilviaPandolfi]
FiniteMixtureModelsforMixedData:EMAlgorithmsandParafacRepresentations[MarcoAlfò,PaoloGiordani]
OntheuseofthecontaminatedGaussiandistributioninHiddenMarkovmodelsforlongitudinaldata[AntonioPunzo,AntonelloMaruotti]
•LatentvariablemodelsforlongitudinaldataPartII[OrganizerandChair:FrancescoBartolucci]
AhiddenMarkovapproachtotheanalysisofincompletemultivariatelongitudinaldata[FrancescoLagona]
LatentMarkovandgrowthmixturemodels:acomparison[FulviaPennoni,IsabellaRomeo]
Latentworthsandlongitudinalpairedcomparison.AMarkovmodelofdependence[BrianFrancis,AlexandraGrand,ReginaDittrich]
•Multivariatedataanalysisinenvironmentalsciences[Organizer:FabrizioRuggeri;Chair:RaffaeleArgiento]
Multivariatedownscalingfornon-Gaussiandata[DanielaCocchi,LuciaPaci,CarloTrivisano]
PreliminaryresultsontaperingmultivariatespatiotemporalmodelsforexposuretoairbornemultipollutantsinEurope[AlessandroFassò,FrancescoFinazziandFerdinandNdongo]
Clusteringmacroseismicfieldsbystatisticaldatadepthfunctions[ClaudioAgostinelli,RenataRotondiandElisaVarini]
•Advancedmodelsfortourismanalysis[OrganizerandChair:StefaniaMignani]
Analysingterritorialheterogenetyintourist’satisfactiontowardsItaliandestinations[CristinaBernini,AugustoCerquaandGuidoPellegrini]
Micro-economicdeterminantsoftouristexpenditure:Aquantileregressionapproach[EmanuelaMarrocu,RaffaelePaciandAndreaZara]
Inequalitiesandtourismconsumptionbehaviour:amixturemodelanalysis[CristinaBernini,MariaFrancescaCracolici,CinziaViroli]
•BayesianNetworksandGraphicalModelsinSocio-EconomicSciences[OrganizerandChair:PaolaVicard]
BayesianNetworksforFirmPerformanceEvaluation[MariaE.DeGiuli,PietroGottardo,AnnaM.MoiselloandClaudiaTarantola]
Graphicalmodelusingcopulasformeasurementerrormodeling[DanielaMarella,PaolaVicard]
•TimeSeriesinClustering[OrganizerandChair:MicheleLaRocca]
ParsimoniousClusteringofTimeSeries[CarmelaIorio,AntonioD’Ambrosio,GianlucaFrasso,RobertaSiciliano]
DynamicTimeWarping-basedfuzzyclusteringforspatialtimeseries[PierpaoloD'Urso,MartaDisegna,RiccardoMassari]
PeriodicalFeatureBasedTimeSeriesClustering[FrancescoGiordano,MicheleLaRoccaandMariaLuciaParrella]
•BigDataAnalysis[OrganizerandChair:DonatoMalerba]
InteractiveMachineLearningwithR[GiorgioMariaDiNunzio]
Workloadestimationforacallcenter[PierluigiRivaandRuggieroScommegna]
PredictioninOliveOilTradeusingRegressionModelsonTemporalDataNetwork[CorradoLoglisci,UmbertoMedicamento,ArturoCasieri]
•AdvancesinOrdinalandPreferenceData[OrganizerandChair:AntonioD’Ambrosio]
Measuringconsensusinthesettingofnon-uniformqualitativescales[JoséL.Garcìa-Lapresta,DavidPérez-Romàn]
AccurateAlgorithmsforConsensusRankingDetection[GiulioMazzeo,AntonioD’Ambrosio,RobertaSiciliano]
LogisticRegressionTreesforOrdinalandPreferenceData[ThomasRusch,AchimZeileis,KurtHornik]
•CasestudiesindatasciencefromLiguriancompanies[OrganizerandChair:DelioPanaro]
Statisticalmethodsfortheanalysisof«Ostreopsisovata»bloomeventsfrommeteo-marinedata[EnnioOttaviani,ValentinaAsnaghi,MariachiaraChiantore,AndreaPedroncini,RosellaBertolotto]
Dataminingforoptimalgambling[GabrieleTorre,FabrizioMalfanti]
AFraudDetectionAlgorithmforOnlineBanking[DelioPanaro,EvaRiccomagno,FabrizioMalfanti]
Doesdirectors’backgroundmatter?Firmperformance,boardfeaturesandfinancialreportingreliability[DelioPanaro,SilviaFerramoscaandSaraTrucco]
•Modelingordinaldata[OrganizerandChair:MaurizioCarpita]
PosteriorpredictivemodelchecksforassessingthegoodnessoffitofBayesianmultidimensionalIRTmodels[MariagiuliaMatteucci,StefaniaMignani]
InternationaltourisminItaly:aBayesianNetworkapproach[FedericaCugnata,GiovanniPerucca]
Clusteringupperlevelunitsinmultilevelmodelsforordinaldata[LeonardoGrilli,AgnesePanzera,CarlaRampichini]
•Functionaldataanalysisforenvironmentaldata[OrganizerandChair:TonioDiBattista]
ClusteringSpatiallydependentFunctionalData:amethodbasedontheconceptofspatialdispersionfunctionofacurve[ElviraRomano,AntonioBalzanella,RosannaVerde]
Twocasestudiesonobjectorientedspatialstatistics[PiercesareSecchi,SimoneVantini,ValeriaVitelli]
Inferenceonfunctionalbiodiversitytools[TonioDiBattista,FrancescaFortuna,FabrizioMaturo]
•Advancesinquantileregression[OrganizerandChair:CristinaDavino]
M-quantileregression:diagnosticsandparametricrepresentationofthemodel[AnnamariaBianchi,EnricoFabrizi,NicolaSalvati,NikosTzavidis]
QuantileRegression:aBayesianRobustApproach[MarcoBottone,MauroBernardi,LeaPetrella]
Acomparisonamongestimatorsforlinearregressionmethods[MarilenaFurno,DomenicoVistocco]
HandlingheterogeneityamongunitsinQuantileRegression[CristinaDavino,DomenicoVistocco]
•DirectionalData[OrganizerandChair:GiovanniC.Porzio]
Smallbiasedcirculardensityestimation[MarcoDiMarzio,StefaniaFensore,AgnesePanzera,CharlesC.Taylor]
Adepth-basedclassifierforcirculardata[GiuseppePandolfo]
Nonparametricestimatesofthemodefordirectionaldata[ThomasKirschstein,SteffenLiebscher,GiovanniC.Porzio,GiancarloRagozini]
•Recentdevelopmentsinstatisticalanalysisofnetworkdata[OrganizerandChair:DomenicoDeStefano]
GameTheoryandNetworkModelsfortheReconstructionofArchaeologicalNetworks[VivianaAmati,UlrikBrandes]
AmodelforclusteringaspatialnetworkwithapplicationtoLocalLabourSystemidentification[FrancescoPauli,NicolaTorelli,SusannaZaccarin]
OnthesamplingdistributionsoftheMLestimatorsinNetworkEffectModels[MicheleLaRocca,GiovanniC.Porzio,MariaProsperinaVitale,PatrickDoreian]
CorrespondenceAnalysiswithDoublingforTwo-ModeValuedNetworks[GiancarloRagozini,DomenicoDeStefanoDanielaD’Ambrosio]
•Currentchallengesinclusteringandclassificationofbiomedicaldata[OrganizerandChair:AdalbertF.X.Wilhelm]
Semanticmulticlassifiersystemsforthedetectionofagingrelatedprocesses[HansA.Kestler,LudwigLausser,Lyn-RouvenSchirra,FlorianSchmid]
Emotionrecognitioninhumancomputerinteractionusingmultipleclassifiersystems[FriedhelmSchwenker]
Ensembleofselectedclassifiers[BertholdLausen,AsmaGul,ZardadKhanandOsamaMahmoud]
CONTRIBUTEDPAPERS
Ageneralizeddistanceforinferenceonfunctionaldata[AndreaGhiglietti,AnnaM.Paganoni]
Longgapsinmultivariatespatio-temporaldata:anapproachbasedonFunctionalDataAnalysis[MariantoniettaRuggieri,AntonellaPlaiaandFrancescaDiSalvo]
Effectsoncurveclusteringofdifferenttransformationsof
chronologicaltextualdata[MatildeTrevisaniandArjunaTuzzi]
Anoteonthereliabilityofaclassifier[LucaFrigau]
Robustifiedclassificationofmultivariatefunctionaldata[FrancescaIeva,AnnaM.Paganoni]
SizeControlofRobustRegressionEstimators[SilviaSalini,AndreaCerioli,FabrizioLaurini,MarcoRiani]
TheMovementsofEmotions:anExploratoryClassificationonAffectiveMovementData[PasqualeDente,ArvidKappas,AdalbertF.X.Wilhelm]
ElectreTri-MachineLearningApproachtotheRecordLinkageProblem[ValentinaMinnetti,RenatoDeLeone]
QualityofClassificationapproachesforthequantitativeanalysisofinternationalconflict[AdalbertF.X.Wilhelm]
ThertclustProcedureforRobustClustering[FrancescoDotto,AlessioFarcomeni,LuisAngelGarcìa-Escudero,AgustinMayo-Iscar]
Whatarethetrueclusters?[ChristianHennig]
Anovelmodel-basedclusteringapproachformassivedatasetsofspatiallyregisteredtimeseries.Withapplicationtoseasurfacetemperatureremotesensinfdata[FrancescoFinazzi,MarianScott]
BigDataClassification:Simulationsinthemanyfeaturescase[ClausWeihs]
FromBigDatatoinformation:statisticalissuesthroughexamples[SilviaBiffignandi,SerenaSignorelli]
Bigdatameetpharmaceuticalindustry:anapplicationonsocialmediadata[CaterinaLiberati,PaoloMariani]
Definingthesubjectsdistanceinhierarchicalclusteranalysisbycopulaapproach[AndreaBonanomi,MartaNaiRuscone,SilviaAngelaOsmetti]
Supervisedclassificationofdefectivecrankshaftsbyimageanalysis[BeatrizRemeseiro,JavierTarrìo-Saavedra,MarioFrancisco-Fernàndez,ManuelG.Penedo,SalvadorNaya,RicardoCao]
ArchetypalAnalysisforData-DrivenPrototypeIdentification[GiancarloRagozini,FrancescoPalumbo,MariaR.D’Esposito]
PrincipalComponentAnalysisofComplexDataandApplicationtoClimatology[SergioCamizandSilviaCreta]
SparseexploratorymultidimensionalIRTmodels[LaraFontanella,SaraFontanella,PasqualeValentini,NickolayTrendafilov]
IterativeFactorClusteringforCategoricaldataReconsidered[AlfonsoIodiceD’Enza,AngelosMarkos,FrancescoPalumbo]
TestingAntipodalSymmetryofCircularData[GiovanniCasale,GiuseppePandolfo,GiovanniC.Porzio]
Howtodefinedevianceresidualsinmultinomialregression[GiovanniRomeo,MariangelaSciandra,MarcelloChiodi]
DiagnostictoolsforGAMLSSfittedobjects[AndreaMarletta,MariangelaSciandra]
BayesianRegressionAnalysiswithLinkedandDuplicatedData[AndreaTancredi,RebeccaSteorts,BruneroLiseo]
Asemi-parametricFayHerriot-typemodelwithunknownsamplingvariances[SilviaPolettini]
PosteriorDistributionsfromOptimallyB-RobustEstimatingfunctionsandApproximateBayesianComputation[IvanLucianoDanesi,FabioPiacenza,ErlisRuli,LauraVentura]
MCABasedCommunityDetection[CarloDrago]
Classifyingsocialrolesbynetworkstructures[SimonaGozzo,VeneraTomaselli]
AmultilevelHeckmanmodeltoinvestigatefinancialassetsamongoldpeopleinEurope[OmarPaccagnella,ChiaraDalBianco]
OptimalPricingUsingBayesianSemiparametricPriceResponseModels[WinfriedJ.Steiner,AnettWeber,StefanLangandPeterWechselberger]
Monetarytransmissionmodelsforbankinginterestrates[LauraParisi,PaoloGiudici,IgorGianfrancescoandCamilloGiliberto]
Estimatingtheeffectofprenatalcareonbirthoutcomes[EmilianoSironi,MassimoCannas]
Recursivepartitioning:anapproachbasedontheweightedKemenydistance[MariangelaSciandra,AntonellaPlaia,VeronicaPicone]
Whytostudyabroad?Anexampleofclustering[ValeriaCaviezel,AnnaM.Falzoni]
Agraphicalcopula-basedtoolfordetectingtaildependence[RobertaPappadà,FabrizioDurante,NicolaTorelli]
Classificationmodelsastoolofbankruptcyprediction–Polishexperience[JózefPociecha,BarbaraPawełek,MateuszBaryła,SabinaAugustyn]
Therelationshipbetweenindividualpriceresponseofbeerconsumersandtheirdemographic/psychographiccharacteristics[FriederikePaetz]
TheEnsembleConceptualClusteringofSymbolicDataforCustomerLoyaltyAnalysis[MarcinPełka]
Consumers’perceptionsofCorporateSocialResponsibilitiesandwillingnesstopay:APartialLeastSquares[KarstenLübke,ChristianHose,ThomasObermeier]
InspectingthequalityofItalianwinethroughcausalreasoning[EugenioBrentari,MaurizioCarpita,SilviaGolia]
Exploringsocio-economicfactorsassociatedwithadherencetotheMediterraneandiet:amultilevelapproach[TizianaLaureti,LucaSecondi]
Bigdataand‘social’reputation:afinancialexample[PaolaCerchiello]
BayesianNetworksforStockPicking[AlessandroGreppi,MariaElenaDeGiuli,ClaudiaTarantola]
PortfolioselectionwithLassoalgorithm[RiccardoBramante,SilviaFacchinetti,DiegoZappa]
SunspotinEconomicModelswithExternalities[BeatriceVenturiandAlessandroPirisinu]
SequentialClusteringbasedonDirichletProcessPriors
RobertoCasarin,AndreaPastoreandStefanoF.Tonellato1
1 Department of Economics, Ca’ Foscari University of Venice, (e-mail:[email protected])
Abstract: This paper proposes a new sequential clusteringmethod based on the sequential estimation of the randompartitioninducedbytheDirichletprocess.OurapproachreliesonSequentialImportanceResampling(SIR)andontheestimationofthe posterior probabilities that each pair of individuals aregeneratedbythesamemixturecomponent.Suchestimatesdonotrequire the identification ofmixture components, and thereforeare not affected by label switching. Then, a dissimilaritymatrixcan be easily built, allowing for the implementation ofagglomerativeclusteringmethods.
Keywords:Dirichlet process, sampling importance resampling,agglomerativeclustering.
1DirichletprocessmixtureA very important class ofmodels inBayesiannonparametrics isbasedontheDirichletprocessandisknownasDirichletprocessmixture (Antoniak, 1974). In thismodel, theobservable randomvariables, Xi, i = 1,…, n, are assumed to be exchangeable andgeneratedbythefollowinghierarchicalmodel:
where DP(α, G0) denotes a Dirichlet process (DP) with basemeasure G0 and precision parameter α > 0. Since the DPgenerates almost surely discrete random measures on theparameterspaceΘ,tiesamongtheparametervalueshavepositiveprobability,leadingtoabatchofclustersoftheparametervectorθ=[θ1,…,θn]T.ExploitingthePolyaurnrepresentationoftheDP,themodelcanberewrittenas
where{k}={1,…,k},s<i={sj,j∈{i−1}}(intherestofthepaper,thesubscript<iwillrefertothosequantitiesthat involvealltheobservationsXi'suchthati'<i),sj∈{k}forj∈{k−1},andnjisthenumber of θ1’s equal to θ*j. In this model representation, theparameterθcanbeexpressedas(s,θ*),withs={si:si∈{k}, i∈
{n}},θ*=[θ*1,…,θ*k]Twithθ*j~iidG0andθi=θ*si.Consequently,
themarginal distribution ofXi is amixturewithk components,wherekisanunknownrandominteger.
In a parametric non Bayesian approach, it would be quitestraightforwardtoclusterthedatabymaximisingtheprobabilityof the allocation of each datum to one of the k clusters (with kfixed and known), conditionally on the observed sample
(McLachlan&Peel,2000).Unfortunately,undertheassumptionswemade, such computations are not feasible even numerically,due to the well known label switching problem (Frühwirth-Schnatter, 2006). Nevertheless, equations (1)-(4) will be veryhelpfulinbuildingahierarchicalclusteringalgorithmbasedonaBayesiannonparametricmodelspecification.
2SamplingimportanceresamplingUnder the assumptions we introduced above, following thearguments of MacEachern et al., 1999, we can write theconditionalposteriordistributionofsigivenx1,…,xi,as
Wecanmarginalisetheconditionalposteriorofsiwithrespecttoθ*,obtaining
NoticethatwhenG0 isaconjugatepriorfor(1),thecomputationof(5)and(6)isoftenstraightforward.
MacEachernetal.,1999,introducedthefollowingimportancesampler.
SISalgorithm.Fori=1,…,n,repeatsteps(A)and(B)(A)Compute
withnk+1=α.(B)Generatesifromthemultinomialdistributionwith
TakingR independentreplicasofthisalgorithmweobtainsi(r), i
=1,…,n,r=1,…,R,andθ*j~p(θ|x(j)),withx(j)={xi:i∈{n},si=j},andcomputetheimportanceweights
such that ∑Rr=1wr = 1. Should the variance of the importance
weights be too small, the efficiency of the sampler could beimprovedbyresamplingasfollows(Cappéetal.,2005).ComputeNeff = (∑R
r=1w2r)(−1) = 1. IfNeff<R/2, drawR particles from the
current particle set with probabilities equal to their weights,replace the old particle with the new ones and assign themconstantweightswr=1/R.
3PairwisedissimilaritiesandhierarchicalclusteringIntuitively,wecanstatethattwoindividuals,iandj,aresimilarifxiandxjaregeneratedbythesamemixturecomponent,i.e.ifsi=
sj. Label switching prevents us from identifying mixturecomponents, but not from assessing similarities amongindividuals. In fact, the algorithm introduced in the previoussection may help us in estimating dissimilarities betweenindividuals.Theposteriorprobabilitythatxiandxjaregeneratedbythesamecomponent,i.e.theposteriorprobabilityoftheevent{si=sj},canbeestimatedas
whereI(x,y)=1 ifx=yandI(x,y)=0otherwise.Wecanthendefine a dissimilaritymatrixD with i j-th elementdij = 1 −pˆij,allowingustousestandardagglomerativehierarchicalclusteringmethodsbasedonposteriorevidence.
4DiscussionThe flexibility of Bayesian nonparametric models improvesrobustnessofclassificationwithrespecttofinitemixturemodels.Sampling importance resampling algorithms allow for efficientcomputations,particularlywhenthebasemeasureisconjugatetomodel likelihood. No restrictions on the parameters or postprocessingoftheposteriorsimulationsarerequired.
ReferencesAntoniak, C.E. 1974. Mixtures of Dirichlet processes with applications to
Bayesiannonparametricproblems.AnnalsofStatistics.,2,1152-1174.Cappé, O., Moulines, E., & T., Rydén. 2005. Inference in Hidden Markov
Models.NewYork:Springer.Frühwirth-Schnatter,S.2006.FiniteMixtureandMarkovSwitchingModels.
Berlin:Springer.MacEachern, S.N., Clyde, M., & Liu, J.S. 1999. Sequential importance
sampling for nonparametric Bayes models: The next generation. TheCanadianJournalofStatistics,27,251-267.
McLachlan,G.,&Peel,D.2000.FiniteMixtureModels.NewYork:Wiley.