46TH SCIENTIFIC MEETING OF THE ITALIAN...
Transcript of 46TH SCIENTIFIC MEETING OF THE ITALIAN...
INTERNATIONAL CONFERENCE ARCHIVES
46TH SCIENTIFIC MEETING OF THE ITALIAN
STATISTICAL SOCIETY
Sapienza University of Rome - Faculty of Economics
Via del Castro Laurenziano, 9
Roma, IT
June 20, 2012 – June 22, 2012
CALL FOR PAPERS
SPRINGER BOOK: Studies in Theoretical and Applied Statistics
A Book in the new International Series Studies in Theoretical and Applied Statistics will be published
by Springer under the supervision of the Italian Statistical Society. The book will be edited by Prof. G.
Alleva (Sapienza University of Rome), Prof. A. Giommi (University of Florence) and Prof. C.D.
Paulino (University of Lisbon).
All those who presented a paper/poster at the SIS2012 meeting are invited to submit an extended version
of their presentation for possible publication. The submission deadline is October 31st. Papers are sent
to reviewers immediately after submission. Early submissions are encouraged. Submissions will
undergo a blind reviewing process and only the selected papers will be included in the book.
Papers must be written in English according to the same standards used for submission to the conference
(http://meetings.sis-statistica.org/index.php/sm/sm2012/schedConf/overview)
and must not exceed 10 pages. Latex standards are strongly encouraged.
Papers can be submitted on-line using the conference web site, selecting the "Springer Book" track. To
submit a paper, login by using the same username and password you exploited for submitting your
presentation at the conference. Please submit PDF files only. The word or the latex source will be asked
to the authors after acceptance of the paper.
The Italian Statistical Society (SIS) promotes every two years an international scientific meeting where
both methodological and applied statistical research papers are welcomed. Founded in 1939, the Italian
Statistical Society is a non-profit scientific society which aims to promote the development of the
Statistical Sciences and their applications. It publishes an International Journal (Statistical Methods and
Applications).
Today there are about one thousand members including academics and researchers from governmental
and private organisations, active in statistical methodology, probability, economic statistics and
demography.
The 46th Scientific Meeting of the Italian Statistical Society will take place in Rome, in the period June
20, 2012 – June 22, 2012.
The conference will include 4 plenary sessions, 17 specialized sessions, 22 solicited sessions, and a
number of contributed sessions. The specialized sessions include 6 sessions, organized by the Royal
Statistical Society, the Société Française de Statistique, the Sociedad de Estadística e Investigación
Operativa, the Sociedade Portuguesa de Estatística and the Deutsche Statistische Gesellschaft. See the
conference overview for a list of session titles.
SOCIAL DINNER
The social dinner will be on Thursday evening 21st June at the Cloister of S. Pietro in Vincoli, Faculty
of Engineering SAPIENZA University of Rome, Via Eudossiana, 18.
Before dinner, there will be a short guided tour of the Basilica of S. Pietro in Vincoli which houses
Michelangelo's famous Moses statue. After dinner, the orchestra MUSA-Sapienza will perform a
concert of classical music.
The cost of the Social Dinner is 35€ for registered participants and 50€ for non-registered participants
and accompanying persons. Visit at the Basilica and concert are free.
For organizing reasons please confirm your participation to the dinner by sending an e-mail to the Local
Organizing Committee ([email protected]) at your convenience but preferably not later than
18th June 2012.
OVERVIEW
http://meetings.sis-statistica.org/index.php/sm/sm2012/schedConf/overview
Conference Venue
The 46th SIS Scientific Meeting will take place in Sapienza University of Rome. - Faculty of
Economics, Via Del Castro Laurenziano, 9
June 20, 2012:
June 21, 2012:
June 22, 2012:
Session Arrangements
The Conference will include plenary, specialized, solicited, contributed and poster sessions. There will
also be invited sessions from representatives of some European statistical societies.
Contributed papers must be submitted according to the directions below, before 2/15/2012: the Program
Committee will take care of the review process of these papers. Invited talks (to be included in
specialized and solicited sessions) must be submitted to the organizer (chair) of the session, who will
take care of the review process.
Contributed, solicited and specialized papers must be written in English, according to the conference
style standards (see the author guidelines below). Contributed and solicited papers cannot exceed 4
pages (including tables, figures and references). Specialized papers cannot exceed 8 pages.
Organization
The 46th Scientific Meeting of the Italian Statistical Society will be organized by the Department of
Methods and Models for Economics, Territory and Finance (MEMOTEF) and Department of Statistical
Sciences (DSS) of Sapienza University of Rome.
Organizing Team
The organizers for this international conference are:
SCIENTIFIC PROGRAM COMMITTEE
----------------------------------------------
Andrea Giommi (Chairman)
Giorgio Alleva
Silvia Biffignandi
Francesco Billari
Vincenza Capursi
Giuliana Coccia
Damiana Costanzo
Corrado Crocetta
Gustavo De Santis
Luigi Fabbris
Fabrizia Mealli
Andrea Pastore
Cira Perna
Marilena Pillati
Roberto Rocci
Roberta Siciliano
Nicola Torelli
Roberto Zelli
LOCAL ORGANIZING COMMITTEE --------------------------------------------
Giorgio Alleva (Chairman)
Maria Maddalena Barbieri
Maria Caterina Bramati
Alessandra De Rose
Augusto Frascatani
Daniele Frongia
Stefano Antonio Gattone
Cristina Giudici
Giuseppina Guagnano
Francesco Lagona
Brunero Liseo
Vincenzo Lo Moro
Guido Pellegrini
Maria Grazia Pittau
Francesco Maria Sanna
Isabella Santini
Maria Rita Sebastiani
Riccardo Sucapane
Andrea Tancredi
Donatella Vicari
Sapienza University of Rome
Sapienza University of Rome was founded in 1303 by Pope Boniface VIII, it is the first University in
Rome and the largest University in Europe: a city within a city, with over 700 years of history, 145,000
students, over 4,500 professors and almost 5,000 people are administrative and technical staff.
Sapienza’s Governance is composed of an internal body: a Vice Rector and a group of Deputy Rectors,
charged of specific activities, support the Rector in the management of the University, thanks also the
cooperation of ad hoc committees.
Sapienza has a wide academic offer which includes over 250 degree programmes and 200 one or two
year professional courses. Sapienza has 59 libraries and 21 museums as well as efficient student
services such as Ciao (Information, welcoming and counselling centre), SoRT (Counselling and
tutorship services) and assistance for disabled students.
Concerning with students’ origin, over 30,000 of them come from all parts of Italy; over 7,000 people
come from abroad. Incoming and outgoing Erasmus students are about 1,000 people per year. Sapienza
is implementing ICT services for students, such as online enrolment, University e-mail address and
wireless hotspots around Campus.
Sapienza plans and carries out important scientific investigations in almost all disciplines, achieving
high-standard results both on a national and on an international level, thanks of the work of its 11
faculties, 66 departments and several centres devoted to scientific research. There are also more than
100 PhD programmes which include almost all major fields of knowledge.
The first University in Rome is proud to have had many famous scholars among his students, such as
the poet Giuseppe Ungaretti, and to be considered an institution of capital importance in the field of
archaeological excavations, having achieved significant results in Libya, Syria, Turkey and on the
Palatin Hill in Rome. Dealing with the field of Physics’students, members of the so called ‘Via
Panisperna’ group – including the scientists Enrico Fermi, Edoardo Amaldi and Emilio Segrè – gave a
crucial contribute to Physics and left an important heritage in subjects like Quantum Physics, Physics
of Disordered Systems and Astrophysics.
Sapienza enhances research by offering opportunities also to international human resources. Thanks to
a special programme for visiting professors, many foreign researchers and professors periodically come
to Sapienza, consolidating the quality of its education and research programmes.
Professor Luigi Frati has been the Rector of Sapienza University since November 2008. He has started
a great innovation process which envisages full tax exemption as a prize for outstanding students,
elimination of useless structures and reorganisation of faculties.
Sapienza University of Rome is a public, autonomous and free university, involved in the development
of society through research, higher level of education and international cooperation.
Editorial standards
Papers must be written in English, according with SIS technical standards. Follow these links for the
current standards: LaTeX and Word. Papers to be presented in specialized sessions must not exceed 8
pages. All the other papers (solicited and contributed) must not exceed 4 pages. All submissions, except
invited talks, are subject to a blind refereeing process.
PROGRAM
The program for this conference is available via the following link:
http://meetings.sis-statistica.org/index.php/sm/sm2012/schedConf/program
PRESENTATIONS AND AUTHORS
The presentation and authors for this conference are available via the following link:
http://meetings.sis-statistica.org/index.php/sm/sm2012/schedConf/presentations
PLENARY
Research advances and new challenges in Cluster Analysis
Maurizio Vichi
Handling Measurement Error in Survey Estimation using Accuracy Indicators
Chris Skinner
Integrating micro and macro data in historical demography
Marco Breschi
SPECIALIZED
A Bayesian nonparametric model for count functional data
Antonio Canale, David B. Dunson
ROI analysis of pharmafMRI data: an adaptive approach for global testing
Giorgos Minas, John A.D. Aston, Thomas E Nichols, Nigel Stallard
Distance - Based Statistics for Covariance Operators in Functional Data Analysis
Davide Pigoli
Clustering Multivariate Longitudinal Data: Hidden Markov of Factor Analyzers
Antonello Maruotti, Francesca Martella
Model based clustering of multivariate spatio-temporal data: a matrix-variate
approach
Cinzia Viroli
Random coefficient based dropout models: a finite mixture approach
Alessandra Spagnoli, Marco Alfò
Bayesian inference for causal effects in randomized experiments with
noncompliance: The role of multivariate outcomes
Fan Li, Alessandra Mattei, Fabrizia Mealli
Unconditional and Conditional Quantile Treatment Effect: Identification
Strategies and Interpretations
Margherita Fort
Dealing with complex problems of confounding in mediation analysis
Stijn Vansteelandt
A Unified Approach for Defining Optimal Multivariate and Multi-Domains
Sampling Designs
Pietro Demetrio Falorsi, Paolo Righi
Forest Inventories: Multi-Phase Sampling Strategies for Estimating Forest and
Non-Forest Resources Over Large Areas
Lorenzo Fattorini
Recent Advances in Estimation of Poverty Indicators for Domains and Small
Areas
Risto Lehtonen, Ari Veijanen
Weighted likelihood in Bayesian inference
Claudio Agostinelli, Luca Greco
Disclosure risk estimation via nonparametric log-linear models
Cinzia Carota, Maurizio Filippone, Roberto Leombruni, Silvia Polettini
Bayesian Latent Class Models in Veterinary and Human Epidemiology
Luzia Goncalves, Ana Subtil, Nuno Brites, M. Rosario de Oliveira, Ana
Margarida Alho, Jose Meireles, L. M. Madeira de Carvalho, Silvana Belo
STAR modeling of pulmonary tubercolosis delay-time in diagnosis
Bruno de Sousa, Dulce Gomes, Patrıcia A. Filipe, Cristina Areias, Teodoro
Briz, Carla Nunes
ROC Curves in medical decision
Ana Cristina Braga, Lino Costa, Pedro Oliveira
On Dividing an Empirical Distribution into Optimal Segments
Jan W. Owsiński
A composite indicator of sustainable well-being: the relative importance of
weights in the European Strategy for Sustainable Development
Elena Giachin Ricca, Stefano Tarantola
Non-aggregative assessment of subjective well-being
Marco Fattore
On the Extraction of a Common Persistent Component from Several Volatility
Indicators
Fabrizio Cipollini, Giampiero M. Gallo
Estimating jumps in volatility using realized-range measures
Massimiliano Caporin, Eduardo Rossi, Paolo Santucci de Magistris
The Value of Model Sophistication: DJIA Option Pricing
Jeroen V.K. Rombouts, Lars Stentoft, Francesco Violante
Families in Italy: the Quiet Revolution
silvana Salvini, Gustavo De Santis
Enterprise in a globalised context and public and private statistical setups
Fabrizio Guelpa, Giovanni Foresti, Stefania Trenti
Imputation and outlier detection in banking datasets
Andrea Pagano, Domenico Perrotta, Spyros Arsenis
Something for nothing?
Shirley Coleman
On consistency of Bayesian variable selection procedures
Elias Moreno, Javier Giron, George Casella, Lina Martinez, F. Vazquez–
Polo, Maria Martel
Dynamic Classification Trees for imprecise data
Massimo Aria, Valentina Cozza
An innovative procedure for smoothing parameter selection
Gianluca Frasso, Paul Eilers
The Italian Statistical Institute Macroeconometric Model - MEMo-It
Fabio Bacchini, et al.
A statistical overview of the economic situation in the euro area
Gian Luigi Mazzi, Filippo Moauro, Rosa Ruggeri Cannata
M-Estimation of Shape Matrices under Incomplete and Serially Dependent Data
Gabriel Frahm
Convergence of Depths and Depth-Trimmed Regions
Rainer Dyckerhoff
On Robustifying Some Blind Source Separation Methods for Second Order
Nonstationary Data
Klaus Nordhausen
The impact of the three crises on health in Italy: evidence and lack of adequate
information systems
Giovanni Fattore
LISA map based on distances for functional data
Pedro Delicado, Sonia Broner
Smoothing mortality risks in space and time using flexible models
Maria Dolores Ugarte, Toma Goicoa, Jaione Etxeberria, Ana F. Militino
Economic recession and fertility in the developed world: Past evidence and recent
trends
Tomas Sobotka
Generalized boosted additive models
Sonia Amodio, Jacqueline Meulman
Geometrical approaches to the analysis of theshold exceedances
José M. Angulo, Ana E. Madrid
Conditional simulations for spatial max-stable processes for climate applications
Liliane Bel
Nonparametric estimation of the division rate of a size-structured population
Vincent Rivoirard
PM10 forecasting using mixture linear regression models
Jean-Michel Poggi
Employment outcomes of Short-time work scheme and Unemployment insurance
program beneficiaries: a longitudinal approach.
Maurizio Sorcioni, Giuseppe De Blasio
SOLICITED
Which family model makes couples more happy - dual earner or male
breadwinner ?
Anna Baranowska-Rataj, Anna Matysiak
Socioeconomic determinants of persistence in poor subjective health
Paolo Li Donni, Daria Mendola
Family structures and subjective wellbeing in Italy
Silvia Montecolle, Francesca Rinesi, Alessandra tinto
Identifiability of Discrete Graphical Models with Hidden Variables
Marco Valtorta, Elizabeth S. Allman, John A. Rhodes
Identifiability of hierarchical loglinear models with one hidden binary variable
Barbara Vantaggi
Binary models of marginal independence: a comparison of different approaches
Monia Lupparelli, Luca La Rocca
Small Area Estimation with Uncertain Random Effects
Gauri Sankar Datta, Abhyuday Mandal, Anthony Wanjoya
Interacting Multiple Try Algorithms
Roberto Casarin, Radu Craiu, Fabrizio Leisen
Higher-order asymptotics in Bayesian inference
Laura Ventura, Walter Racugno
Online Detection of Outliers and Structural Breaks
Giovanni Petris
Estimating the prevalence of cancer patients who have recurred
Angela B. Mariotto, Roberta De Angelis, Lucia Martina
Multivariate Permutation Test to Compare Survival Curves for Matched Data
Stefania Galimberti
Patterns of care and related costs of cancer prevalent cases by phase of disease
Silvia Francisci, Anna Gigli
Estimating the incidence of cancer disease using a Bayesian backcalculation
approach
Leonardo Ventura
Bayesian T-optimal designs by simulation: a case-study on model discrimination
Rossella Berni, Federico M Stefanini
From Markov moves in contingency tables to linear model estimability
Roberto Fontana, Fabio Rapallo, Maria Piera Rogantin
Sensitivity Analysis and FANOVA Graphs for Computer Experiments
Jana Fruth, Sonja Kuhnt
Factorial Graphical Lasso and Slowly Changing Graphical Models for Estimating
Dynamic Networks
Antonino Abbruzzo, Ernst Wit
New Statistics for Estimating the Parameter of the Stochastic Actor-Oriented Model
Viviana Amati
Graph embedding via dissimilarity mapping for network comparison
Domenico De Stefano
Statistical models for virtual water network analysis
Alessandra Petrucci, Emilia Rocco
Latent Class CUB Models
Leonardo Grilli, Maria Iannario, Domenico Piccolo, Carla Rampichini
Formative and reflective models to determine latent construct
Anna Simonetto
Log-ratios analysis to study the relative information of ordinal variables
Michele Gallo
Ordinal Models for Financial Evaluation
Paola Cerchiello
Bayesian model averaging for financial evaluation
Silvia Figini
Labour market response models for university evaluation
Daniele Checchi, Silvia Salini
A Statistical Framework to Measure Reputation Risk
Tiziano Bellini, Luigi Grossi
The integration of administrative data to analyse the business economic
performance: methodological aspects and results of a study
Fulvia Cerroni, Viviana De Giorgi, Marianna Mantuano
The micro economics of trade patterns and firm performances
Giovanni Dosi, Marco Grazzi, Federico Tamagni, Chiara Tomasi
The post-entry effect of exporting on productivity: inference on the conterfactual
distribution
Maria Ferrante, Marzia Freo, A. Viviani
Data Integration and Productivity Estimation at a Firm Level
Filippo Oropallo, Stefania Rossetti
A PLS algorithm version working with ordinal variables
Giuseppe Boari, Gabriele Cantaluppi
Bivariate logistic models for the analysis of the Students University “Success”
Marco Enea, Massimo Attanasio
University admission test and students’ careers: an analysis through a regression
chain graph with a hurdle model for the credits
Leonardo Grilli, Carla Rampichini, Roberta Varriale
Comparing degree programs using unadjusted performance indicators. Assessing
the bias from the Potential Confounding Factors
Mariano Porcu, Isabella Sulis
University of Pisa and academic performance: a sample survey on students with
no exams in 2011
Lucio Masserini, Monica Pratesi
Assessing the Impact of Financial Aids to Firms: Causal Inference in the presence
of Interference
Bruno Arpino, Alessandra Mattei
Inverse probability weighting to estimate causal effects of sequential treatments: a
latent class extension to deal with unobserved confounding
Francesco Bartolucci, Leonardo Grilli, Luca Pieroni
A two-part geoadditive model for geographical domain estimation
Chiara Bocci, Alessandra Petrucci, Emilia Rocco
Application of Marginal Structural Models in Chronic Kidney Disease (CKD)
Epidemiology: practical implementation in the Swedish National CKD Registry
Elena Pasquali, Marie Evans, Juan Jesus Carrero, Rino Bellocco
A Dimension Reduction Method for Approximating Integrals in Latent Variable
Models for Binary Data
Silvia Bianconcini, Silvia Cagnone, Dimitris Rizopoulos
Kalman Filter for Maximum Likelihood Estimation of GMRFs
Luigi Ippoliti, Luca Romagnoli
Monte Carlo Likelihood Inference in Multivariate Model-Based Geostatistics
Marco Minozzo, Clarissa Ferrari
Statistical Modelling of Spatial Extremes
Anthony Davison, Simone Padoan, Mathieu Ribatet
Nonparametric smoothing of circular data
Agnese Panzera, Charles C. Taylor
Inverse Batschelet Distributions as Models for Circular Data
Arthur Pewsey
Depth Analysis of Directional Data
Claudio Agostinelli, Mario Romanazzi
Simulation of random rotation matrices
John T. Kent, Asaad M. Ganeiber
Dynamically modelling of fuzzy sets for flexible data retrieval
Miroslav Hudec
Factor PD-Co-clustering in Official Statistics
Marina Marino, Germana Scepi, Cristina Tortora
Extracting meta-information by using Network Analysis tools
Agnieszka Stawinoga, Maria Spano, Nicole Triunfo
How the text mining measures complex phenomena in official statistics
Sergio Bolasco, Pasquale Pavone
Assessing assumptions for data fusion procedures
Alfonso Piscitelli, Antonio D'ambrosio
Filling in long gap sequences by performing EOF and FDA jointly
Antonella Plaia, Francesca Di Salvo, Mariantonietta Ruggeri, Gianna Agrò
Missing Data Imputation within the Statistical learning Paradigm
Antonio D'Ambrosio
The use of adminitrative registers in the 2011 Census in Germany
Stephanie Hirner
Robust methods for correction and control of Italian Agriculture Census data
Alessandra Reale, Francesca Torti, Marco Riani
The industry and services continuous census” based on administrative sources:
opportunities and problems
Manlio Calzaroni, Caterina Viviano
Using coarse resolution satellite images for crop area estimation: benchmarking
their efficiency
Javier Gallego, Mohamed El-Aydam
How to select sample sites onto a study area?
Lucio Barabesi, S. Franceschi, M. Marcheselli
Do personal characteristics affect the Rasch measures of perceived physical risk?
A quantile regression approach.
Fabio Aiello, Giovanni Boscaino, Monica Mandalà
Modeling nonignorable missingness in multidimensional latent class IRT models
Silvia Bacci, Francesco Bartolucci, Bruno Bertaccini
Risk profile versus portfolio selection: a case study
Valeria Caviezel, Sergio Ortobelli Lozza, Lucio Bertoli Barsotti
Cycles Syllogisms and Semantics: Examining the Idea of Spurious Cycles
Stephen Pollock
Spectral filtering for trend estimation
Marco Donatelli, Alessandra Luati, Andrea Martinelli
Robust estimation for multivariate data under the independent contamination
model
Claudio Agostinelli, R. A. Maronna, V. J. Yohai
Adaptive robust location-scale estimation
Pietro Coretto
Minimum Volume Peeling: a Multivariate Mode Estimator
Giovanni Porzio, Giancarlo Ragozini, Steffen Liebscher, Thomas Kirschstein
A comparison of robust methods with small sample experimental data
Marco Riani, Andrea Cerioli, Maria Adele Milioli, Gianluca Morelli
Patterns of Mortality Decline and Individual Ageing: An Overview
Elisabetta Barbi
Survival predictive models in centenarians
Rossella Miglio, Paola Gueresi
Health status in over 85 years old living in Residential Facilities in Italy
Giulia Cavrini, Claudia Di Priamo, Lorella Sicuro, Alessandra Battisti,
Alessandro Solicapa, Giovanni de Girolamo
CONTRIBUTED
Variation in Obstetric Intervention Rates across Hospitals in Sardinia
Massimo Cannas, Emiliano Sironi
On the role of normalized inverse-Gaussian priors in continuous-time models
Matteo Ruggiero
Randomly Reinforced Urn Designs whose Allocation Proportions Converge to
Arbitrary Prespecified Values
Giacomo Aletti, Andrea Ghiglietti, Anna Maria Paganoni
Calibration estimation in dual frame surveys
Maria Giovanna Ranalli, Annalisa Teodoro
Comparing model-assisted estimators of structural variables in forest surveys
Ivan Arcangelo Sciascia, Matteo Garbarino, Giorgio Vacchiano, Renzo
Motta
A study in panel cointegration and poolability: Long-run money demand equations
for Gulf Cooperation Council countries
Stefano Fachin
Uncertainty in statistical matching for discrete categorical variables
Pier Luigi Conti, Daniela Marella, Mauro Scanu
Independent Component Analysis of Milan Mobile Network Data
Paolo Zanini, Piercesare Secchi, Simone Vantini
An Objective Bayesian analysis of dichotomous sensitive data
Maria Maddalena Barbieri, Brunero Liseo
Ensuring comparability over time and between domains by means of complex
sample techniques
Tiziana Tuoto, Francesca Inglese
Confidence intervals for the Berger & Boos’ procedure in the 2x2 Binomial Trial
enrico ripamonti, piero quatto
Kalman Filter for Maximum Likelihood Estimation of GMRFs
Luigi Ippoliti, Luca Romagnoli
Bayesian inference for the multivariate skew-normal model: a Population Monte
Carlo approach
Antonio Parisi, Brunero Liseo
Reconstructing a multinormal covariance matrix from its spherically truncated
projection
Filippo Palombi, Simona Toti, Romina Filippini
Clustering of financial time series in extreme scenarios
Roberta Pappadà, Fabrizio Durante
Investments in Renewable energies: evidence from a panel of countries
Giuseppe Scandurra
A Topological Definition of Phase and Amplitude Variability of Functional Data
Simone Vantini
Nonparametric saddlepoint test and pairwise likelihood inference
Nicola Lunardon, Elvezio Ronchetti
On the stationarity of the Threshold Autoregressive process: the two regimes case
Marcella Niglio, Francesco Giordano, Cosimo Damiano Vitale
Filling in long gap sequences by performing EOF and FDA jointly
Francesca Di Salvo, Mariantonietta Ruggieri, Gianna Agro'
Parallel Adaptive Markov chain Monte Carlo with applications
Mauro Bernardi, Lea Petrella
Modern Bayesian Inference in Zero-Inflated Poisson Models
Erlis Ruli, Laura Ventura
A comparison of different procedures for combining high-dimensional
multivariate volatility forecasts
Giuseppe Storti, Alessandra Amendola
Estimation of wind speed prediction intervals by multi-objective genetic
algorithms and neural networks
valeria vitelli
On a predictive measure of discrepancy between classical and Bayesian estimators
Stefania Gubbiotti
A prediction error for a linear regression model with fuzzy random elements
Maria Brigida Ferraro
Some further results for the two-parameter Poisson-Dirichlet partition model
Annalisa Cerquetti
Matching immigrant and native workers: evidence from the recent downturn in
Italy
Adriano Paggiaro
The analysis of firm demography: an approach based on micro-geographic data
Diego Giuliani, Simonetta Cozzi, Giuseppe Espa
Regression estimators for capure-recapture frequency data
Irene Rocchetti
Towards an integrated surveillance system of road accidents
tiziana tuoto, Silvia Bruzzone, Luca Valentino, Giordana Baldassarre,
Nicoletta Cibella, Marilena Pappagallo
On the Design Based Inference for Continuous Spatial Populations
giorgio eduardo montanari, Giuseppe Cicchitelli
A Critical Look at Compositional Analysis for Assessing Habitat Selection
Caterina Pisani
Small area estimation for panel data
Annamaria Bianchi
Interpreting Deviations from Long-run Parity in an I(2) Model
Giuliana Passamani
Contributions from income components to Zenga's point and synthetic inequality
measures: an application to EU countries
Michele Zenga, Leo Pasquazzi
An income mobility measure based on Zenga’s inequality index
Mauro Mussini
Equivalence scales, inflation, and PPP: a unique (and simple) approach to
estimation
Gustavo De Santis
Sensitivity analysis on a Cellular Automata model for the diffusion of Pleural
Mesothelioma
Claudia Furlan, Cinzia Mortarino
Reproducibility Probability Estimation and Testing for some common
nonparametric tests
Daniele De Martini, Lucio De Capitani
Multilevel algorithmic models to measure item importance on latent variables'
indicators
Marica Manisera, Marika Vezzoli
A multiple imputation procedure of censored values in family-based genetic
association studies
Fabiola Del Greco M., Cristian Pattaro, Cosetta Minelli, Peter P.
Pramstaller, John R Thompson
Ridge analysis through profile likelihoods
Valeria Sambucini, Ludovico Piccinato
Migrant students classroom allocation policy in Italian schools
Andrea Scagni
Testing Phase and Amplitude Variability in Functional Data Analysis: a
Hierarchical Permutation Test Approach
Alessia Pini, Simone Vantini
A hierarchical bayesian model for modelling benthic macroinvertebrates densities
in lagoons
Serena Arima, Alberto Basset, Giovanna Jona Lasinio, Alessio Pollice, Ilaria
Rosati
Life-Course Transitions, Market Work and Domestic Work of Italian Couples
antonino Di Pino
Adjusting Time Series of Possible Unequal Lengths
Ilaria Lucrezia Amerise, Agostino Tarsitano
Variable selection in competing risks model
Marialuisa Restaino, Alessandra Amendola
A Bayesian Semiparametric Fay-Herriot-type model for Small Area Estimation
silvia polettini
Assessing Multivariate Measurement Systems in Multisite Testing
Michele Scagliarini, Stefania Evangelisti
Predicting EQ-5D responses from SF-12: should we take into account dependece
and ordering?
Caterina Conigliani, Andrea Tancredi, Andrea Manca
Large sample properties of Gibbs-type priors
Pierpaolo De Blasi, Antonio Lijoi, Igor Pruenster
Bayesian Unit Root Tests: a Monte Carlo Study
Margherita Gerolimetto, Isabella Procidano
Data fusion in pharmaceutical marketing: new perspective from administrative
data.
Paolo Mariani
How internal and international migrations shape the age structure of the Italian
regions, 1955-2008
Paola Di Giulio, Cecilia Reynaud, Luca Vergaglia
Bayesian modeling of presence-only data
Fabio Divino, Giovanna Jona Lasinio, Natalia Golini
An Evaluation of the Student Satisfaction based on CUB Models
barbara cafarelli
Limited Information Estimation Methods for Paired Comparison Data
Manuela Cattelan
Closed Likelihood-Ratio Testing Procedures to Assess Similarity of Covariance
Matrices
Francesca Greselin, Antonio Punzo
Fiducial Distributions for Real Exponential Families
Piero Veronese, Eugenio Melilli
Hedonic Indexes and GDP estimate in the USA
Gabriele Serafini
Some results on stochastic comparisons of ROC curves
Silvia Figini, Chiara Gigliarano, Pietro Muliere
Efficiency in the use of natural non-renewable resources from mining and
quarrying in Italy. Time series analysis and Economy-wide Material Flows
Accounts
Donatella Vignani
Marital Disruption and Subjective Well-being: Evidence from an Italian Panel
Survey
Giulia Rivellini, Alessandro Rosina, Emiliano Sironi
The Decision Making Process of Leaving Home: A Longitudinal Analysis of
Italian Women
Giulia Ferrari, Alessandro Rosina, Emiliano Sironi
Alternative Bayesian analysis of capture recapture data with behavioral effect
modelling
Danilo Alunni Fegatelli
PDE penalization for spatial fields smoothing
Laura Azzimonti, Maurizio Domanin, Laura Maria Sangalli, Piercesare
Secchi
Multivariate Nonlinear Least Squares: Direct and Beauchamp and Cornell
Methodologies
Renato Guseo, Cinzia Mortarino
Handling weak dependence structures with copulas
Enrico Foscolo, Fabrizio Durante
Bayesian nonparametric predictions for count time series
Luisa Bisaglia, Antonio Canale
How to integrate macro and micro perspectives: an example on Human
Development and Multidimensional Poverty.
Silvia Terzi
Composite Indicator of Social Inclusion in the European Union
Erasmo Vassallo, Francesca Giambona
Depth measures for the study of real and simulated ECG signals
Francesca Ieva
Experimental design for the estimation of Rician-distributed intensity fields in
MRI
Stefano Baraldo, Francesca Ieva, Luca Mainardi, Anna Maria Paganoni
A von Mises Markov random field model for the analysis of spatial circular data
Francesco Lagona
A Multivariate VEC-BEKK Model for Portfolio Selection
Andrea Federico Pierini, Alessia Naccarato
Combining the complete-data and nonresponse models for drawing imputations
under MAR
Shahab Jolani, Stef van Buuren, Laurence E. Frank
A Well-being Index Based on the Weighted Product Method
Matteo Mazziotta, Adriano Pareto
A comparison of semiparametric density estimation methods for multivariate risk
management
Marco Bee
Modelling poverty transitions in Luxembourg: true state dependence or
heterogeneity?
Alessio Fusco, Nizamul Islam
Causal analysis of education and birth inequalities through a latent class SEM
Silvia Bacci, Francesco Bartolucci, Luca Pieroni
Prediction of nonstationary functional data: Universal Kriging in a Hilbert Space
Alessandra Menafoglio, Matilde Dalla Rosa, Piercesare Secchi
School tracking and equality of opportunity in a multilevel perspective
Isabella Romeo, Emanuela Raffinetti
Indexing the Worthiness of Social Agents
Giulio D'Epifanio
Asymptotic estimation of right and left kurtosis measures, with applications to
finance
ANNA MARIA FIORI, Davide Beltrami
Ordinal Lorenz Regression with application in Customer Satifaction Surveys
Emanuela Raffinetti
A computational method to estimate sparce multiple Gaussian graphical models
Rossella Onorati, Luigi Augugliaro, Angelo Marcello Mineo
Deterministic or stochastic seasonality in daily electricity prices?
Paolo Chirico
Social capital and its impact on poverty reduction: measurement issues in
logitudinal and cross-country comparisons. The case of UE.
Isabella Santini, Anna De Pascale
The diffusion of nuclear energy in the developing countries
Alessandra Dalla Valle, Claudia Furlan
A model for the joint distribution of income and wealth
Markus Jantti, Eva Sierminska, Philippe Van Kerm
Mothers with children aged 0-2 years: work/family reconciliation and support
networks
Cinzia Castagnaro, Alessandra Fasano, Antonella Guarneri
A novel method for spatial smoothing
Laura M. Sangalli, James O. Ramsay
Poverty transitions in Italy
Lucia Coppola, Davide Di Laurea, Daniela Lo Castro, Mattia Spaziani
Spatial smoothing over non-planar domains
Bree Ettinger, Simona Perotto, Laura M Sangalli
Lattice Models for the analysis of Urban Crime
Enrico di Bella, Luca Persico, Lucia Leporatti
Family resources and cognitive decline among elderly in Italy
Stefano Mazzuco
The median of a set of histogram data
Lidia Rivoli, Rosanna Verde, Antonio Irpino
Estimating the Homeless Population through Indirect Sampling and Weight
Sharing Method
Claudia De Vitiis
Rates for Bayesian estimation of location-scale mixtures of super-smooth densities
Catia Luisa Scricciolo
Data gathering for elusive population. The case of foreigners during the XV
Italian Census. A focus on Prato
Linda Porciani
Immigrant entrepreneurship through the economic crisis in Italy
Benedetta Cassani, Cristina Giudici, Roberta Rizzi
International Mobility of University Students: the Italian case
Domenica Fioredistella Iezzi, Mario Mastrangelo, Scipione Sarlo
Chronological analysis of textual data and curve clustering: preliminary results
based on wavelets
Matilde Trevisani, Arjuna Tuzzi
Exponential Random Graph Model for multivariate networks: an application in
knowledge network analysis
Domenico De Stefano, Susanna Zaccarin
The Role of Social Capital in Preventing Irregural Work in Italian Regions
Maria Felice Arezzo
Bayesian model averaging for financial credit risk measurement
silvia figini
Considerations about the Quotient of two Correlated Normals
angiola pollastri, Vanda Tulli
Estimating Business Statistics from administrative data: a study on small and
medium enterprises
Orietta LUZI, Giovanni Seri, Viviana De Giorgi, Giampiero Siesto
The Coverage Survey of the 6th Agricultural Census
Matteo Mazziotta, Antonella Bernardini, Loredana De Gaetano, Lorenzo
Soriani
A Local Price Observatory – Price minimarket: innovations and additional
knowledge about prices - The experience of Umbria
Cristina Carbonari, Angiona sabrina, Paradisi Francesca
Frailty Multi-State Models based on Maximum Penalized Partial Likelihood
Federico Rotolo, Catherine Legrand
Reconciliation of Time Series according to a Growth Rates Preservation Principle
Tommaso Di Fonzo
A decision support system for duopolies with incomplete information
Paola Vicard, Julia Mortera
False discovery rate control and the dependence structure of test statistics
Claudio Lupi
Neural Network Approach Applied for Classification in Business and Trade
Statistics
Jana Juriová
The analysis of the material deprivation of foreigners in Italy
Anna Maria Milito, Annalisa Busetta, Antonino Mario Oliveri
Effective Facebook population: the Italian case
Cristiano Tessitore, Ester Macrì
A Clustream strategy for Functional Boxplots on multiple streaming time series
Antonio Balzanella, Elvira Romano
New approach to the identification of the Inverse Weibull model
Biagio Palumbo, Giuliana Pallotta
Stochastic Frontiers Approach: an Empirical Analysis of Italian Environmental
Spending
Sabrina Auci, Annalisa Castelli, Donatella Vignani
Estimating student learning value-added models from repeated cross-sections
Dalit Contini
Intergenerational Mobility and Gender Gap: Evidence from Mediterranean
Countries
Rosalia Castellano, Gennaro Punzo, Antonella Rocca
The role of Istat territorial offices for data quality control in the 15th Population
and Housing Census. The case of Tuscany
Alessandro Valentini, Sabina Giampaolo
Longitudinal patterns of financial product ownership: a latent growth mixture
approach
Francesca Bassi
Measuring job quality: a composite indicator
Giovanna Boccuzzo, Martina Gianecchini
On Measuring Inequity in Taxation Between Groups of Tax Payers
Achille Vernizzi, Simone Pellegrino, Maria Giovanna Monti
Capital income inequality: evidences from ECHP data
Francesca Greselin, Leo Pasquazzi, Ricardas Zitikis
Note on a new generalization of the skew-normal distribution
Valentina Mameli, Monica Musio
Estimates of Foreign Trade Using Genetic Programming
Miroslav Klucik, Miroslav Klucik
Machine learning techniques for Propensity score matching with clustered data. A
simulation study.
Massimo Cannas, Bruno Arpino, Francesco Billari
Impact of Audio Tools in Web Surveys
Daniele Toninelli, Silvia Biffignandi
Early-life circumstances and late-life income
Omar Paccagnella, Christelle Garrouste
A Simple Risk-Adjusted CUSUM chart for monitoring binary health data
Marco Marchi
From theory to practice: a methodological proposal for operationalising and
summarizing the concept of quality of work
Marco Centra, Maurizio Curtarelli, Valentina Gualtieri
Partners’ income and decision making
Lucia Coppola, Domenica Quartuccio
Dealing With a Potential Bias in Estimating the Share of Discriminated Women
Rosa Giaimo, Giovanni Luca Lo Magno
Border surveys and Time Location Sampling (TLS): an application on incoming
tourism in Sicily
Stefano De Cantis, Mauro Ferrante
The Use of Administrative Data for Short Term Business Statistics: Lessons from
a Cross-Country Experience
Ciro Baldi, Francesca Ceccato, Silvia Pacini, Donatella Tuzi
Price transmission and market power in the food market
Maria Caterina Bramati
Data imputation processes based on statistical analysis: the case of Kosovo census
data
Marco Scarnò, Bekim Canolli, Servete Muriqi, Hisni Ferizi
Dimension reduction for measuring the multidimensional demographic
convergence
Maria Rita Sebastiani
Is financial fragility a matter of illiquidity? An appraisal for Italian households
Marianna Brunetti
Burnout, learning and self-esteem at school: an empirical study.
Cristiana Ceccatelli
Autocorrelated non-normal data in control charts
Claudio Giovanni Borroni, Manuela Cazzaro, Paola Maddalena Chiodini
Social welfare orderings of the Generalized-Lorenz Type: applications of an
extended equivalence theorem
Alessandra Giovagnoli
Timely Indices for Residential Construction Sector
Attilio Gardini, Enrico Foscolo
Dimensions of well-being and their statistical measurements
Carla Ferrara, Francesca Martella, Maurizio Vichi
The diagnostics of the mean squared error of the Eblup in small area estimation
models
Renato Salvatore, Maria Chiara Pagliarella
REGISTRATION
The registration method for this conference is available via the following link:
http://meetings.sis-statistica.org/index.php/sm/sm2012/schedConf/registration
ACCOMMODATION
Rooms are optioned for attendees.
To book a room, please send an e-mail message or make a phone call to the hotel (don't use the form on
the hotel website) and, when booking, specify that you will be attending the SIS 2012 meeting (code
SIS2012).
Daily rates per room. Buffet breakfast is always included. City tax is NOT included (Euro 3,00 per night
per person, to be paid directly upon arrival).
Walking distance to the conference site from Google Maps in square brackets. If you need information
on public transportation, here is the link to the buses route planner: (English) (Italiano)
CONFERENCE TIMELINE AND INFORMATION
CONFERENCE
First day of conference June 20, 2012
Last day of conference June 22, 2012
WEBSITE
Go Live (as a Current Conference) December 2, 2011
Move to Conference Archive December 31, 2014
SUBMISSIONS
Author registration opened December 2, 2011
Author registration closed June 20, 2013
Call for Papers posted July 2, 2011
Submissions accepted July 28, 2012
Submissions closed November 5, 2012
REVIEWS
Reviewer registration opened December 2, 2011
Reviewer registration closed April 10, 2014
WEBSITE POSTING
Accepted papers May 1, 2012
XLVI Riunione Scientifica
Roma, 20-22 qiuqno 2012)
Editore: CLEUP – Padova
ISBN 978-88-6129-882-8
Research advances and new challenges in ClusterAnalysis
Maurizio Vichi
Abstract Methodologies for Cluster Analysis are among the most well-known andappreciated statistical techniques of multivariate analysis. In the last twenty years theyhave been increasingly applied in new disciplines and frequently almost reinvented inmany area of research such as computer science, engineering, bioinformatics and inspecific fields including machine learning, data mining and pattern recognition. In thispresentation we show recent statistical research advances in methodologies forclustering. The illustrated methods have in common the statistical approach offormulating a mathematical model for partitioning or hierarchical clusteringmultivariate observations, estimating parameters of the model and finally fitting it todata.
1. Introduction
The interpretation of the relationship within a set of objects can be helped by obtaininga hard partition of the objects into disjoint classes, with the property that objects in thesame class are perceived as similar to one another, while objects in different classes areconsidered dissimilar. Such partitions can be achieved from the application of ClusterAnalysis methodologies. Several methods have been proposed for clustering a set ofmultivariate objects. In this presentation we concentrate on the model based approachof formulating a clustering model for the data, e.g., a partition or a hierarchy specifiedfor reconstructing data (multivariate observations or dissimilarities) and then solvingthe least-squares or maximum likelihood corresponding fitting problem.
1 Maurizio Vichi, Sapienza Università di Roma, Dipartimento di Scienze Statistiche; email:[email protected]
2 Maurizio Vichi
The presentation in divided in three parts: model-based partitioning and hierarchicalclustering of a set of units for dissimilarity data, multi-partitioning of the modes of athree and two way data matrix including multivariate observations and clustering oflongitudinal multivariate observations.
2. Model-Based partitioning and hierarchical clustering
The Cluster Analysis problem of partitioning or hierarchical clustering a set of units,when dissimilarity data are observed, is here handled with the statistical model-basedapproach of fitting the “closest” classification matrix to the observed dissimilarities. Aclassification matrix represents a clustering model expressed in terms of dissimilarities.Three models for partitioning a set of units from dissimilarity data, are illustrated andtheir estimation -via least-squares- is given together with new fast coordinate descentalgorithms. Following the same statistical fitting approach a new model for hierarchicalclustering objects starting from dissimilarity data is also illustrated.
3. Bi-partitioning, multi-partitioning, clustering and disjointprincipal component analysis
New methodologies for three-mode (units, variables and occasions) and two-mode(units and variables) symmetrical or asymmetrical partitioning or multi-partitioningthree- and two-way data are presented. In particular, by reanalyzing the double k-means, that identifies a unique partition for each mode of the data, a relevant extensionis discussed which allows to synthesize classes of each mode symmetrically by meansof mean vectors or linear combinations (components) for all modes, or asymmetricallyby mixing a different strategy for each mode. Furthermore, the model allows thepartition of one mode, conditionally to the partition of the other one. The performanceof such generalized double k-means has been tested by both a simulation study and anapplication to gene microarray data. Clustering and disjoint principal component allowsto identify a partition of the units and a partition of the variables together with aprincipal component for each class of the partition of variables. This technique can beseen as a special case of the generalized double k-means.
4. Clustering longitudinal multivariate observations
Longitudinal multivariate data involve repeated observations of different features of thesame statistical units over a period of time. The aim is to study the developmentaltrends of the units across at least a part of their life span.
Research advances and new challenges in Cluster Analysis
The dynamic evolution of the partitions of units along time is in this presentationstudied in an unsupervised clustering context by using a model based clusteringapproach. A clustering together with a vector autoregression VAR(P) model -where Pis the lag length of the VAR- are combined into a new technique that identifies anhomogeneous partition in G classes for each time t and the autoregressive dynamicevolution of the clusters. The proposed clustering/VAR model can be used also toforecast a partition at time T+1. The parameters of the model are estimated both in aleast-squares and maximum likelihood framework and efficient recursive algorithms aregiven. A simulation study together with some applications of the proposedmethodologies are shown to appreciate performances of the models and the quality ofits estimates.In the final part of the presentation , similarities between trajectories describinghistories of units are studied. Trend, velocity and acceleration are three characteristicsof trajectories considered to assess pairwise dissimilarities between trajectories. TheTucker model for three-way data, modified for clustering units together with adimensional reduction of the observed variables, is estimated in the metric spacespecified by trend, velocity and acceleration. An application is given to show theperformances of the methodology.
References
Martella F., Alfò M., Vichi M. (2010). Hierarchical mixture models for biclustering in microarray,STATISTICAL MODELLING, 11(6): 489-505.
Martella F., Vichi M. (2012) Clustering microarray data using model-based double K-means,JOURNAL OF APPLIED STATISTICS, DOI:10.1080/02664763.2012.683172.
Maruotti A., Vichi M. (2012) Clustering Longitudinal Multivariate Observations: Model-BasedAutoregressive K-means, Submitted.
Rocci R., Gattone A., Vichi, M (2011). A New Dimension Reduction Method: FactorDiscriminant K-means, JOURNAL OF CLASSIFICATION, vol 28, DOI: 10.1007/s00357-011
Vicari D., Vichi M. (2009). Structural Classification Analysis of Three-Way Dissimilarity Data.JOURNAL OF CLASSIFICATION, vol. 26; p. ., ISSN: 0176-4268
Vichi M., Rocci R. (2008). Two-mode Multi-partitioning. COMPUTATIONAL STATISTICS &DATA ANALYSIS. vol. 52, pp. 1984-2003 ISSN: 0167-9473.
Vichi M., Saporta G. (2009). Clustering and Disjoint Principal Component Analysis.COMPUTATIONAL STATISTICS & DATA ANALYSIS vol. 53; p. 3194-3208, ISSN: 0167-9473, doi: 10.1016/j.csda.2008.05.028,
Vichi M. (2011) Fitting Hierarchical Clustering Models to Dissimilarity Data, Submitted.
Vichi M. (2008). Fitting Semiparametric Clustering Models to Dissimilarity Data, ADVANCES INDATA ANALYSIS AND CLASSIFICTION, vol, 2, 2, 121-161, DOI: 10.1007/s11634-008-0025-4
A Bayesian nonparametric model for countfunctional data
Antonio Canale and David B. Dunson
Abstract Count functional data arise in a variety of applications, including longi-tudinal, spatial and imaging studies measuring functional count responses for eachsubject under study. The literature on statistical models for dependent count datais dominated by models built from hierarchical Poisson components. The Poissonassumption is not warranted in many applications, and hierarchical Poisson mod-els make restrictive assumptions about over-dispersion in marginal distributions.This article discuss a class of nonparametric Bayes count functional data mod-els introduced in Canale and Dunson [3], which are constructed through roundingreal-valued underlying processes. Computational algorithms are developed usingMarkov chain Monte Carlo and the methods are illustrated through application toasthma inhaler usage.
Key words: Generalized linear mixed model; Hierarchical model; Longitudinaldata; Splines; Stochastic process.
1 Introduction
A stochastic process y = y(s),s ∈ D is a collection of random variables indexedby s ∈ D , with the domain D commonly corresponding to a set of times or spa-tial locations and y(s) to a random variable observed at a specific time or location s.There is a rich frequentist and Bayesian literature on stochastic processes, with com-mon choices including Gaussian processes and Levy processes, such as the Poisson,Wiener, beta or gamma process. Gaussian processes provide a convenient and well
Antonio CanaleUniversity of Turin and Collegio Carlo Alberto, Turin, Italy e-mail: [email protected]
David B. DunsonDuke University, Durham, NC e-mail: [email protected]
1
2 Antonio Canale and David B. Dunson
studied choice when y : D→ℜ is a continuous function. Our interest focuses on thecase in which y : D →N = 0, . . . ,∞, so that y is a count-valued stochastic pro-cess over the domain D . There are many applications of such processes includingdevelopemental toxicity epidemiology studies monitoring a count health responseover time.
Although there is a rich literature on count stochastic process models for lon-gitudinal and spatial data, most models rely on Poisson hierarchical specifications.Although such models have a flexible mean structure, the Poisson assumption isrestrictive in limiting the variance to be equal to the mean, with over-dispersionintroduced in marginalizing out the latent processes. Such modeling frameworkshave several disadvantages. Firstly the dependence structure is confounded withmarginals overdispersion and secondly under-dispersed count data are not accomo-date. To relax usual Poisson parametric assumptions [10] exploited a hierarchicalspecification of the Faddy model [6]. Although the gain in flexibility, the computa-tion for this model is challenging.
In considering models that separate the marginal distribution from the depen-dence structure, it is natural to focus on copulas. Nikoloulopoulos and Karlis [15]proposed a copula model for bivariate counts that incorporates covariates intothe marginal model. Erhard and Czado [5] proposed a copula model for high-dimensional counts, which can potentially allow under-dispersion in the marginalsvia a Faddy or Conway-Maxwell-Poisson [16] model. Genest and Neslehova [8]provide a review of copula models for counts.
An alternative approach relies on rounding of a stochastic process. For classifica-tion it is common to threshold Gaussian process regression [4, 9]. For example, [12]rounded a real discrete autoregressive process to induce an integer-valued time se-ries while [2] used rounding of continuous kernel mixture models to induce nonpara-metric models for count distributions. In this article we discuss a class of stochasticprocesses introduced in [3] that map a real-valued stochastic process y∗ : D →ℜ toa count stochastic process y : D →N .
2 Rounded Stochastic Processes
2.1 Notation and model formulation
Let y ∈ C denote a count-valued stochastic process, with D ⊂ ℜp compact and Cthe set of all D →N step functions with unit step and a finite number of jumps inD . Such an assumption is a count process version of the continuity condition rou-tinely assumed for D → ℜ functions. It ensures that for sufficiently small changesin the input the corresponding change in the output is small, being either zero orone. We are particularly motivated by applications in which counts do not changeerratically at nearby times but maintain some degree of similarity.
A Bayesian nonparametric model for count functional data 3
We choose a prior y ∼ Π , where Π is a probability measure over (C ,B), withB(C ) the Borel σ -algebra of subsets of C . The measure Π induces the marginalprobability mass functions
pry(s) = j= Πy : y(s) = j= π j(s), j ∈N , s ∈D , (1)
and the joint probability mass functions
pry(s1) = j1, ...,y(sk) = jk= Πy : y(s1) = j1, ...,y(sk) = jk= π j1... jk(s1, ...,sk),(2)
for jh ∈N and sh ∈D , h = 1, . . . ,k, and any k ≥ 1.In introducing the Dirichlet process, [7] mentioned three appealing characteris-
tics for nonparametric Bayes priors including large support, interpretability and easeof computation. Our goal is to specify a prior Π that gets as close to this ideal aspossible. Starting with large support, we would like to choose a Π that allocates pos-itive probability to arbitrarily small neighborhoods around any y0 ∈ C with respectto an appropriate distance metric, such as L1. To our knowledge, there is no pre-viously defined stochastic process that satisfies this large support condition. In theabsence of prior knowledge that allows one to assume y belongs to a pre-specifiedsubset of C with probability one, priors must satisfy the large support property to becoherently Bayesian. Large support is also a necessary condition for the posteriorfor y to concentrate in small neighborhoods of any true y0 ∈ C .
With this in mind, we propose to induce a prior y∼Π through
y = h(y∗), y∗ ∼Π∗, (3)
where y∗ : D → ℜ is a real-valued stochastic process, h is a thresholding operatorfrom Y → C , Y is the set of all D →ℜ continuous functions, Π ∗ is a probabilitymeasure over (Y ,B) with B(Y ) Borel sets. Unlike count-valued stochastic pro-cesses, there is a rich literature on real-valued stochastic processes. For example, Π ∗
could be chosen to correspond to a Gaussian process or could be induced throughvarious basis or kernel expansions of y∗.
There are various ways in which the thresholding operator h can be defined.For interpretability and simplicity, it is appealing to maintain similarity between y∗
and y in applying h, while restricting y ∈ C . Hence, using the informal definitionof rounding as an operation that reduces the number of digits while keeping thevalues similar, we focus on a rounding operator that let y(s) = 0 if y∗(s) < 0 andy(s)= j if j−1≤ y∗(s)< j for j = 1, . . . ,∞. Negative values will be mapped to zero,which is the closest non-negative integer, while positive values will be rounded upto the nearest integer. This type of restricted rounding ensures y(s) is a non-negativeinteger. Using a fixed rounding function h in (3), we rely on flexibility of the priory∗ ∼Π ∗ to induce a flexible prior y∼Π . For notational convenience and generality,we let y(s) = j if y∗(s) ∈ A j = [a j,a j+1), with a0 < · · · < a∞ and we focus ona0 =−∞,a j = j−1, j = 1, . . . ,∞.
In certain applications, count data can be naturally viewed as arising throughinteger-valued rounding of an underlying continuous process. For example, in the
4 Antonio Canale and David B. Dunson
longitudinal tumor count studies of Section 3.1, it tends to be difficult to distinguishindividual tumors and it is natural to posit a continuous time-varying tumor burden,with tumors fusing together and falling off over time. In collecting the data, tumorbiologists attempt to make an accurate count but even at the same time counts canvary. It is natural to accommodate this with a smoothly-varying continuous tumorburden specific to each animal with measurement errors and rounding producing theobserved tumor counts. However, even when there is no clear applied context moti-vating the existence of an underlying continuous process, the proposed formulationnonetheless leads to a highly flexible and computationally convenient model.
2.2 Count functional data
We have focused on the case in which there is a single count process y observedat locations s = (s1, . . . ,sn)
T . In many applications, there are instead multiple re-lated count processes yi, i = 1, . . . ,n, with the ith process observed at locationssi = (si1, . . . ,sini)
T . We refer to such data as count functional data. As in otherfunctional data settings, it is of interest to borrow information across the individualfunctions through use of a hierarchical model. This can be accomplished within ourrounded stochastic processes framework by first defining a functional data modelfor a collection of underlying continuous functions y∗i , i = 1, . . . ,n, and then let-ting yi = h(y∗i ), for i = 1, . . . ,n. There is a rich literature on appropriate models fory∗i , i = 1, . . . ,n ranging from hierarchical Gaussian processes [1] to wavelet-basedfunctional mixed models [14].
Let yi(s) denote the count for subject i at time s, yit = yi(sit) the number at thetth observation time, and xita predictor for subject i at time t. As a simple modelmotivated by the asthma inhaler use applications described below, we let
yit = h(y∗it), y∗it = ξi +b(xit)T
θ + εit , ξi ∼ Q, εit ∼ N(0,τ−1), (4)
where ξi is a subject-specific random effect, b(·) are B-splines basis functions thatdepend on predictors and time, θ are unknown basis coefficients, and εit is a resid-ual. To induce a penalization on finite differences of the coefficients of adjacentB-spline we let p(θ | λ ) ∝ exp(−1/2λθ TPθ), where P = DTD is a penalty matrixwith D the rth order difference matrix and λ ∼ Ga(ν/2,δν/2), δ ∼ Ga(a,b). Sucha prior for the basis coefficients induces a penalty on finite differences of the coef-ficients of adjacent B-splines with the parameter λ being a roughness penalty. Sucha construction is known as Bayesian P-spline (penalized B-spline) model [13]. Thehyperparameter δ controls dispersion of the prior. By choosing a hyperprior withsmall a,b values, one induces a prior with heavy tails and good performance in avariety of settings [11]. We additionally choose a hyperprior for the residual preci-sion p(τ) ∝ τ−1. To allow the random effect distribution to be unknown, we choosea Dirichlet process prior, with Q∼ DP(αQ0), with α a precision parameter and thebase measure Q0 chosen as N(0,ψ). As commonly done we fix α = 1.
A Bayesian nonparametric model for count functional data 5
3 Asthma inhaler use applications
We analyze data on daily usage of albuterol asthma inhalers [10]. Daily counts ofinhaler use were recorded for a period between 36 and 122 days at the KunsbergSchool at National Jewish Health in Denver, Colorado for 48 students previouslydiagnosed with asthma. The total number of observations was 5209. As discussedby Grunwald and coauthors [10], the data are under-dispersed.
Let yit denote the number of times the ith student used the inhaler on day t. In-terest focuses on the impact of morning levels of PM25, small particles less than25 mm in diameter in air pollution, on asthma inhaler use. At each day t, a vectorxt = (xt1, . . . ,xt p)
T of environmental variables are recorded including PM25, aver-age daily temperature (Fahrenheit degree/100), % humidity and barometric pressure(mmHg/1000). We modify (4) to include multiple predictors with an additive modelstructure as follows.
yit = h(y∗it), y∗it = ξi +4
∑j=1
b j(x jt)T
θ j + εit , (5)
where ξi is a random effect modeled as described in previous section, b j is a B-splinebasis with θ j the basis coefficients relative to jth predictor and εi∼N(0,τ−1R), withR an AR-1 tridiagonal correlation matrix with correlation parameter ρ . The prior foreach θ j is identical to the prior described above leading to an additive Bayesian P-splines model. Each predictor is normalized to have mean zero and unit varianceprior to analysis. The correlation parameter is given a uniform prior on [−1,1].Computational details are reported in [3].
We ran our Markov chain Monte Carlo algorithm for 10,000 iterations with a1,000 iteration burn-in discarded. To obtain interpretable summaries of the non-linear covariate effects on the inhaler use counts, we recorded for each predictor ata dense grid of x jt values at each sample after burn-in the conditional expectation ofthe count for a typical student having ξi = 0,
µ j(x jt) = E(yit |x jt ,x j′t = 0, j′ 6= j,ξi = 0,θ ,τ,ρ)
≈K
∑k=0
k[Φak+1; µ∗j (x jt),τ−Φak; µ
∗j (x jt),τ], (6)
where Φ(·; µ,τ) is the cumulative distribution function of a normal random variablewith mean µ and precision τ , K is the 99·99% quantile of Nµ∗j (x jt),τ
−1, and
µ∗j (x jt) = b j(x jt)
Tθ j +∑
l 6= jbl(0)T
θl , (7)
with the other predictors fixed at their mean value. Based on these samples, wecalculated posterior means and pointwise 95% credible intervals, with the resultsreported in Figure 1. Interestingly, each of the predictors had a non-linear impact
6 Antonio Canale and David B. Dunson
on the frequency of inhaler use, with inhaler use increasing with morning levels ofPM25.
Previous analysis conducted in [10] tackle the problem under a generalized linearmixed models setup with the Faddy distribution. The mean for each subject i at timet was
µit = exp(x1tβ1 + · · ·+ xptβp +ui + eit) (8)
where ui is a subject specific random effect and eit an error modeled as an AR-1process. They estimated a coefficient of just 0·013 for PM25, which is close to zerowith 95% intervals including zero. In contrast, we obtain clear evidence of non-linear effects of several of the covariates including PM25.
0.605 0.610 0.615 0.620 0.625 0.630
0.5
0.6
0.7
0.8
0.9
1.0
Temperature (F degree/100)
µ 1
(a)
0.1 0.2 0.3 0.4 0.5 0.6
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Humidity (%)
µ 2
(b)
0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Pressure (mmHg/1000)
µ 3
(c)
1 2 3 4
0.4
0.5
0.6
0.7
0.8
0.9
1.0
PM25 (µg/m3/10)
µ 4
(d)
Fig. 1 Posterior mean and 95% pointwise credible bands for the effect of (a) average daily tem-perature, (b) % of humidity, (c) barometric pressure, and (d) concentration of PM25 pollutant onasthma inhaler use calculated with equation (6).
A Bayesian nonparametric model for count functional data 7
4 Discussion
We have discussed a simple approach, introduced by [3] for modeling count stochas-tic processes based on rounding continuous stochastic processes. The general strat-egy is flexible and allows one to leverage existing algorithms and code for posteriorcomputation for continuous stochastic processes. Although rounding of continuousunderlying processes is quite common for binary and categorical data, such ap-proaches have not to our knowledge been applied to induce new families of countstochastic processes. Instead, the vast majority of the literature for count processesrelies on Poisson process and hierarchical Poisson constructions, which have somewell known limitations in terms of flexibility. The modeling framework can be easilygeneralized to the settings of count functional data, i.e. when one observe n differ-ent realizations of a stochastic process and its performance has been shown in anapplication to asthma inhaler use.
Acknowledgements
This research was partially supported by grant number R01 ES017240-01 from theNational Institute of Environmental Health Sciences (NIEHS) of the National Insti-tutes of Health (NIH) and grant CPDA097208/09 by University of Padua.
References
1. Behseta, S., Kass, R.E., Wallstrom, G.L.: Hierarchical models for assessing variability amongfunctions. Biometrika 92(2), 419–434 (2005)
2. Canale, A., Dunson, D.B.: Bayesian kernel mixtures for counts. Journal of the AmericanStatistical Association 106(496), 1528–1539 (2011)
3. Canale, A., Dunson, D.B.: Nonparametric Bayes modeling of count processes (2012). Sub-mitted
4. Chu, W., Ghahramani, Z.: Gaussian process for ordinal regression. Journal of Machine learn-ing Research 6, 1019–1041 (2005)
5. Erhard, V., Czado, C.: Sampling count variables with specified Pearson correlation - a com-parison between a naive and a C-vine sampling approach. In: D. Kurowicka, H. Joe (eds.)Dependence Modeling - Handbook on Vine Copulae, pp. 73–87. World Scientific (2009)
6. Faddy, M.J.: Extended Poisson process modeling and analysis of count data. BiometricalJournal 39, 431–440 (1997)
7. Ferguson, T.S.: A Bayesian analysis of some nonparametric problems. The Annals of Statis-tics 1, 209–230 (1973)
8. Genest, C., Neslehova, J.: A primer on copulas for count data. Astin Bulletin 37, 475–515(2007)
9. Ghosal, S., Roy, A.: Posterior consistency of Gaussian process prior for nonparametric binaryregression. The Annals of Statistics 34(5), 2413–2429 (2006)
10. Grunwald, G.K., Bruce, S.L., Jiang, L., Strand, M., Rabinovitch, N.: A statistical model forunder- or overdispersed clustered and longitudinal count data. Biometrical Journal 53(4),578–594 (2011).
8 Antonio Canale and David B. Dunson
11. Jullion, A., Lambert, P.: Robust specification of the roughness penalty prior distribution inspatially adaptive Bayesian P-splines models. Computational Statistics and Data Analysis51(5), 2542–2558 (2007)
12. Kachour, M., Yao, J.F.: First order rounded integer-valued autoregressive (RINAR(1)) pro-cess. Journal of time series analysis 30(4), 417–448 (2009)
13. Lang, S., Brezger, A.: Bayesian P-splines. Journal of Computational and Graphical Statistics13, 183–212 (2004)
14. Morris, J., Carroll, R.: Wavelet-based functional mixed models. JRSS-B 68, 179–199 (2006)15. Nikoloulopoulos, A., Karlis, D.: Regression in a copula model for bivariate count data. Jour-
nal of Applied Statistics 37(9), 1555–1568 (2010)16. Shmueli, G., Minka, T.P., Kadane, J.B., Borle, S., Boatwright, P.: A useful distribution for fit-
ting discrete data: revival of the Conway-Maxwell-Poisson distribution. JRSS-C 54(1), 127–142 (2005).
ROI analysis of pharmafMRI data: an adaptiveapproach for global testing
Giorgos Minas, John A.D. Aston, Thomas E. Nichols and Nigel Stallard
Abstract Pharmacological fMRI (pharmafMRI) is a new highly innovative tech-nique utilizing the power of functional Magnetic ResonanceImaging (fMRI) tostudy drug induced modulations of brain activity. FMRI recordings are very infor-mative surrogate measures for brain activity but still veryexpensive and thereforepharmafMRI studies have typically small sample sizes. The high dimensionalityof fMRI data and the arising high complexity requires sensitive statistical analy-sis in which often dimensionality reductions are crucial. We consider Region ofInterest (ROI) analysis and propose an adaptive two-stage testing procedure for re-spectively formulating and testing the fundamental hypothesis as to whether thedrug modulates the control brain activity in selected ROI. The proposed tests areproved to control the type I error rate and they are optimal interms of the predictedchance of a true positive result at the end of the trial. Poweranalysis is performedby re-expressing the high dimensional domain of power function into a lower di-mensional easily interpretable space which still gives a complete description of thepower. Based on these results, we show under which circumstances our procedureoutperforms standard single-stage and sequential two-stage procedures focusing onthe small sample sizes typical in pharmafMRI. We also apply our methods to ROIdata of a pharmafMRI study.
Giorgos MinasDepartment of Statistics and Warwick Centre of Analytical Sciences, University of Warwick, UKe-mail: [email protected]
John A.D. AstonCRISM, Department of Statistics, University of Warwick, UKe-mail: [email protected]
Thomas E. NicholsDepartment of Statistics and Warwick Manufacturing Group,University of Warwick, UK e-mail:[email protected]
Nigel StallardDivision of Health Sciences, Warwick Medical School, University of Warwick, UK e-mail:[email protected]
1
2 Giorgos Minas, John A.D. Aston, Thomas E. Nichols and NigelStallard
Key words: functional Magnetic Resonance Imaging, global testing, dimension re-duction, adaptive designs, predictive power
1 Introduction
Pharmacological fMRI (pharmafMRI) is an exciting new technique employing func-tional Magnetic Resonance Imaging (fMRI) to study brain activity under drug ad-ministration. The so-called Blood Oxygenation Level Dependent (BOLD) fMRIcontrast, often used in pharmafMRI studies, measures localblood flow changesknown to be associated with changes in brain activity. Whilebecoming more es-tablished, pharmafMRI faces a number of challenges of whichsome are statistical.
FMRI datasets are extremely high dimensional with enormousspatial resolution(≈ 3mm) and moderate temporal resolution (≈ 3s). The typical fMRI dataset pro-duced by a single scanning session consists of BOLD recordings acquired during arelatively short period of time (few hundreds time points) from around 105 voxels(3-dimensional volume elements) throughout the brain. To handle such high dimen-sional datasets it is often appropriate to formulate specific regional hypotheses forthe drug action and reduce accordingly the dimension of the data. The need for thistype of analysis, which can provide regional summary measures of drug effect, isparticularly acute in the typical pharmafMRI setting, in which due to the high costof fMRI scans only a small number of subjects can be recruited.
Region of Interest (ROI) analysis can reduce an fMRI datasetinto a relativelysmall number of ROI response summary measures expressing the local strength oftreatment effect across the selected brain regions. If boththe definition of ROI andthe computation of the ROI response measures are cautiouslyconducted, a statis-tical analysis based on these ROI measures can potentially achieve high levels ofsensitivity. We wish to go along this path and apply a multivariate test assessing thefundamental null hypothesis as to whether the new compound of interest changesthe underlying brain activity in the selected ROI.
In previous work [5], we showed that tests based on a scalar linear combinationof multivariate ROI responses can outperform fully multivariate methods, especiallyfor the typically small sample sizes of fMRI studies. The decisive question for theformer tests is the selection of the weights applied to ROI responses. In his seminalcontribution O’Brien [6] use equal weights for all coordinates while Lauter [3] ex-tract the weighting vector from the data sums of products matrix. In Minas et al. [5]the weights are optimally derived based on prior information and pilot data.
Here, we develop an adaptive two-stage procedure where a weighting vector,initially chosen based on prior information, is optimally adapted at a subsequentinterim analysis based on the collected first stage data. Thefirst and the secondweighting vector are applied to the first and second stage responses, respectively, toproduce the stage-wise linear combination test statistics. A combination function,combining the test statistics of the two studies, is used to perform the final analysis.
Adaptive global testing for fMRI ROI data 3
Both weighting vectors are optimal in terms of the predictive power [7] of this two-stage test which is analytically proved to control type I error rate.
Finally, we perform power analysis of the proposed tests andpower compar-isons to alternative methods. Note that the performance of atest with such a highdimensional domain of the power function can be hard to interpret. We tackle thisproblem by proving that the high dimensional power domain can be re-expressedinto a lower dimensional easily interpretable space which still gives a complete de-scription of the power. Using these results, our power analysis shows clearly thosecircumstances where our procedure outperforms standard single stage and two-stagesequential procedures. We also apply our methods to ROI dataof a pharmafMRIstudy in which our tests are shown to be far more powerful thanthe latter methods.
2 Formulation
In this section we formally introduce our problem. We start by giving a brief de-scription of the methods for extracting ROI measures from fMRI data.
ROI measures are typically extracted from mass univariate General Linear Mod-els (GLMs) applied to the preprocessed series of 3-dim fMRI image at voxel-by-voxel resolutions (see figure 1). Estimates of the treatmenteffect in each voxel ofeach subject are first extracted from these GLMs and then averaged across the pre-defined, based on either brain anatomy or brain function, ROI. The coordinates ofthe produced multivariate outcomes correspond to representative measures of thetreatment effect within each ROI of each subject.
In our methods, we assume that these ROI responses of then j subjects partici-pating in stagej of the study are independent multivariate Normal random variables
Yji ∼ NK (µ ,Σ) , i = 1,2, ...,n j , j = 1,2, (1)
with meanµ and covariance matrixΣ . Normality is typically an acceptable assump-tion for modeling ROI linear measures in fMRI [2].
We summarise the ROI responses using scalar linear combinations
Fig. 1 Typical steps of fMRI data analysis producing a multivariate ROI outcome. The prepro-cessed series of fMRI images are modeled at voxel-by-voxel resolution using mass univariateGLMs. Suitable estimates of parameter values (β ) expressing the treatment effect in each voxelare first extracted from the GLM and then averaged across the predefined ROI.
4 Giorgos Minas, John A.D. Aston, Thomas E. Nichols and NigelStallard
L ji =K
∑k=1
wjkYjik , (2)
wherewjk is the non-zero weight applied to thek-th ROI response,k = 1, ...,K, ofstagej. Using these linear combinations, we wish to test the globalnull hypothesisof no treatment effect across all ROIH0 : µ = 0 (=(0,0, ...,0)T) against the two-sided alternativeH1 : µ 6= 0.
The stage-wise test statistics in our design are the linear combinationz and tstatistics
Z j =L j
σ j/n1/2j
, Tj =L j
sj/n1/2j
(3)
for Σ known or unknown, respectively. Here,σ2j , L j , s2
j are the variance, samplemean and sample variance of the linear combinationL j , respectively. The two-sidedp values,p j , j = 1,2, may be obtained from thez or t statistics in(3). We use atwo-stage design which instructs the investigators to:
1. stop the trial (after the first stage) and rejectH0 if p1 <α1 or stop the trial withoutrejection if p1 > α0,
2. continue to the second stage ifα1 ≤ p1 ≤ α0 and rejectH0 if p1p2 < c.
Here, the Fisher’s product combination function [1],p1p2, is used for the final anal-ysis. We also consider alternative functions including theInverse Normal combi-nation function [4]. Under this design, thetype I error rate is controlled at thenominalα level if the rejection probability of the two-stagezor t test,
pr (p1 < α1)+
∫ α0
α1
pr (p1p2 < c | p1)g(p1)dp1, g(·) density ofp1, (4)
is under the null hypothesisH0 equal toα.We target on maximizing thepower of the above two-stage tests, i.e. the rejection
probability in (4) underH1, with respect to the weighting vectorsw1,w2, whilecontrolling the type I error rate. In other words, we wish to find the optimal directionin which the projection of the treatment effect vector produces optimal power.
3 Methods
Here, we develop the proposed adaptive two-stage testing procedure. We start byproviding the optimal weighting vector for the two-stagez and t tests describedabove.
Theorem 1. Under the assumption in(1), the power of the above two stage tests,i.e. the rejection probability in(4) under H1, is maximized with respect to w1 andw2 if and only if the latter are both proportional toω = Σ−1µ .
Adaptive global testing for fMRI ROI data 5
The optimal weighting vectorω is unknown and therefore we use the availableinformation at the planning stage (prior) and at the interimstage (posterior) to selectw1,w2.
Prior informationD0 elicited from previous studies and experts clinical opinionis used to inform the following Normal and Inverse-Wishart priors for µ and Σ ,respectively,
(µ | Σ ,D0)∼ NK (m0,Σ/n0) , (Σ | D0)∼ IWK×K(
ν0,S−10
)
. (5)
Here,m0 represents a prior estimate forµ , n0 the number of observationsm0 isbased on; andν0, S0 respectively represent the degrees of freedom and scale matrixof the inverse-Wishart prior.
Under this Bayesian model, the posterior distributions, given the prior informa-tion D0 and the first stage datay1, have the same form as the prior distributions
(µ | Σ ,D0,y1)∼NK (m1,Σ/(n0+n1)) , (Σ | D0,y1)∼ IWK×K(
ν0+n1,S−11
)
. (6)
where the posterior estimates
m1 =n0m0+n1y1
n0+n1, S1 = S0+(n1−1)Sy1 +
n0n1
n0+n1(y1−m0)(y1−m0)
T (7)
can be thought as “weighted averages” of the prior and first stage estimates ofµ andΣ , respectively.
We wish to optimally select the weighting vectors of the two stages. Here opti-mality is defined in terms of the predictive power of the test.Predictive power ex-presses“the chance, given the data so far, that the planned test rejects H0 when thetrial is completed”. GivenD0, the predictive powerBz,1 andBt,1 for the two-stagezandt tests, respectively, are defined as
pr ( p1 < α1 | D0)+ pr (p1 ∈ [α1,α0], p1p2 < c | D0) (8)
and if we continue to the second stage, the predictive powerBz,2 andBt,2 given theprior informationD0 and the first stage datay1 are defined as
pr (p1p2 < c | D0,y1) , (9)
for p j corresponding to either thezor t statistics in(3), respectively.
Theorem 2. Under the assumptions(1) and(6), the first and second stage predic-tive power of the z test, Bz,1 and Bz,2 are maximized with respect to w1, w2, respec-tively, if the latter are proportional to wz,1 = Σ−1m0 and wz,2 = Σ−1m1, respectively.
Further, for largeν0, i.e. ν0 → ∞, the weighting vectorswt,1 = S−10 m0 andwt,2 =
S−11 m1 maximise the predictive power functionsBt,0 andBt,1.
We can now describe the proposed adaptive two-stagez andt tests. These fol-low the two-stage design described earlier with the first andsecond stage weightingvectors of the stage-wisez andt statistics being equal to the vectorswz,1,wz,2 and
6 Giorgos Minas, John A.D. Aston, Thomas E. Nichols and NigelStallard
wt,1,wt,2, respectively. These tests are power optimal based on the collected infor-mation. We can also prove that they control the type I error rate.
4 Power analysis
The design variables that need to be considered for the analysis of the power func-tion of the abovez andt tests are: (i) the stopping boundariesα0, α1 andc, (ii) thesample sizesn0, n1 andn2 (andν0), (iii) the parametersµ andΣ and (iv) the priorestimate(s)m0 (andS0). While the variables in (i) and (ii) are scalar, those in (iii) and(iv) are high dimensional (RK ×R
K×K ×RK (×R
K×K)). Without any dimensionalityreduction, it would be challenging to get a full picture and explain the power per-formance of our tests. However, we can prove that for thez test, (iii) and (iv) can bereplaced by: (a) the Mahalanobis distance(µΣ−1µ)1/2 of the nullNK(0,Σ) to thealternativeNK(µ ,Σ) distribution expressing the strength of the treatment effect and(b) the angleθ between the transformed optimal weighting vectorω = Σ−1/2µ andthe transformed selected first stage weighting vector ˜wz,1 = Σ−1/2m0 (both transfor-mations correspond to left multiplication byΣ1/2). Considering thet test the angulardistance in (b) is replaced by one expressed in terms of easily interpretable vectorsin [0,π/2]K × [0,π/2]K ×R
K+. In figure 2, we illustrate how these results can be
used to compare our procedure to standard testing procedures. For small samplesizes, the power of the single-staget test is larger (smaller) than the power of the
10 15 20 25 30 35 40 45 50 55 60
0.2
0.4
0.6
0.8
1
nT
pow
er
Fig. 2 Simulation-based approximation of the power,βt , of the single-stage (green−−) and adap-tive (blue —) linear combinationt test as well as the Hotelling’sT2 test (red· · ·) plotted against thetotal sample sizenT . The angleθ betweenω and the transformed selected weighting vectors ˜w andwz,1 of the single-staget test and the first stage of the adaptivet test, respectively, are taken to beequal to 0 (∗), 15 ( ), 30 (), 45 (H), 60 (×), 75 (+) and 90 (⋆). Further,α0 = 1,α1 = 0.01,c= 0.0087 (α = 0.05),K = 15,n0 = 5, ν0 = 4, f = n1/nT = 0.5 andD1 = Σ−1/2S0Σ−1/2 = I .
Adaptive global testing for fMRI ROI data 7
adaptivet test if the selected weighting vector is relatively close (distant) to the op-timal weighting vector. For relatively large sample sizes,in contrast to the singlestage test, the adaptivet test reaches high power levels even for first stage weightingvector orthogonal (θ = 90) to the optimal. For increasingnT and all other designvariables remaining fixed, the angleθ , for which the power of Hotelling’sT2 test(applicable only fornT > K) is equal to the power of thet tests, is decreasing.
4.1 Application to a pharmafMRI study
We use the sample mean and sample covariance matrix (see table 1) of ROI dataextracted from a GlaxoSmithKline pharmafMRI study (K = 11,nT = 13) to performpower comparisons. As we can see in table 1, effect sizes differ across ROI andgenerally high correlations are observed. Further, the prior estimates presented arefairly poor resulting in angleθ betweenω and wt,1 equal to 67. However, evenfor these prior estimates and such small sample sizes the adaptive t test might beconsidered as sufficiently powered (βt = 0.82). This is in contrast to standard singlestage tests, such as Hotelling’sT2, OLS [6], SS and PC [3]t tests (β
T2 = 0.30,βOLS = 0.13, βSS= 0.13, βPC = 0.14) as well as their corresponding sequentialtwo-stage versions (β s
OLS= 0.10,β sSS= 0.09,β s
PC = 0.10, sequential Hotelling’sT2
test not applicable fornT = 13) which give very low power values. Note that for
Table 1 Means (line 1), variances (line 3) and correlations (upper triangle of matrix in lines 5−15)and the corresponding prior estimates (lines 2, 4 and lower triangle of matrix in lines 5−15) ofROI data of the sample (nT = 13) of a GSK pharmafMRI study. The ROI are:Anterior Cingulate(AC), Atlas Amygdala (A), Caudate (C), Dorsolateral Prefrontal Cortex (DLPFC), Globus Pallidus(GP),Insula (I), Orbitofrontal cortex (OFC), Putamen (P),Substantia Nigra (SA), Thalamus (T),Ventral Striatum (VS).
ROI AC A C DLPFC GP I OFC P SA T VS1 µk −0.01 0.06 −0.08 −0.08 −0.14 −0.02 −0.08 −0.06 −0.10 −0.10 −0.132 m0,k 0 0.10 −0.10 −0.10 −0.15 0 −0.15 0 −0.10 −0.10 −0.153 σk 0.11 0.11 0.03 0.05 0.11 0.08 0.13 0.15 0.10 0.11 0.104 s0,k 0.15 0.10 0.02 0.10 0.10 0.10 0.15 0.15 0.10 0.10 0.105 AC 1 0.70 0.87 0.88 0.73 0.89 0.66 0.81 0.26 0.95 0.706 A 0.70 1 0.54 0.61 0.72 0.77 0.65 0.68 0.59 0.68 0.667 C 0.70 0.50 1 0.89 0.72 0.87 0.47 0.80 0.27 0.90 0.748 DLFPC 0.70 0.70 0.70 1 0.71 0.76 0.73 0.77 0.27 0.87 0.629 GP 0.70 0.70 0.70 0.70 1 0.86 0.51 0.90 0.54 0.70 0.9010 I 0.70 0.70 0.70 0.70 0.70 1 0.45 0.85 0.46 0.86 0.8411 OFC 0.50 0.50 0.50 0.70 0.50 0.50 1 0.44 0.09 0.65 0.3012 P 0.70 0.70 0.70 0.70 0.70 0.70 0.50 1 0.49 0.82 0.8913 SA 0.50 0.70 0.30 0.50 0.50 0.50 0.50 0.30 1 0.30 0.5514 T 0.70 0.70 0.70 0.70 0.70 0.70 0.50 0.70 0.50 1 0.7415 VS 0.70 0.50 0.70 0.70 0.70 0.70 0.50 0.70 0.50 0.70 1
8 Giorgos Minas, John A.D. Aston, Thomas E. Nichols and NigelStallard
improved prior estimates (smaller angles) the power of the adaptivet test can beincreased further.
5 Discussion
The formulation of specific regional hypotheses for drug action and the associateddimensionality reductions are crucial for further establishment of pharmafMRI. Aswe illustrate in our methods, ROI analysis combined with multivariate methods canbe successfully used to answer the fundamental question as to whether the drugmodulates the brain activity over the regions of greatest interest for a particularstudy. We show that reduction of ROI responses to a scalar linear combination maysubstantially increase sensitivity compared to fully multivariate methods on ROI re-sponses, without any cost in terms of specificity. For the latter reduction, we proposederiving the weights of the linear combination by exploiting the available prior in-formation and allowing for data dependent adaptation at an interim analysis. Theseweights are optimal in terms of the predictive power given the available informationat each selection time. Further, we show how the high dimensional power func-tion domain space can be reduced to a lower dimensional easily interpretable spacewhich allows us to show clearly under which circumstances the improvement oversingle stage and sequential designs is achieved. We finally show that our methodscan outperform standard single stage and sequential two-stage multivariate tests ina pharmafMRI study.
References
1. Bauer, P., Kohne, K.: Evaluation of Experiments with Adaptive Interim Analyses. Biometrics50, 1029−−1041 (1994)
2. Friston, K., Ashburner, J., Kiebel, S., Nichols, T., and Penny, W.: Statistical parametric map-ping: the analysis of funtional brain images. Elsevier/Academic Press, Amsterdam; Boston(2007)
3. Lauter, J.: Exact t and F tests for analyzing studies with multiple endpoints. Biometrics52,964−−970 (1996)
4. Lehmacher, W., Wassmer, G.: Adaptive Sample Size Calculations in Group Sequential Trials.Biometrics55, 1286−−1290 (1999)
5. Minas, G., Rigat, F., Nichols, T.E., Aston, J.A.D., Stallard, N.: A hybrid procedure for de-tecting global treatment effects in multivariate clinicaltrials: theory and applications to fMRIstudies. Statist. Med.31, 253−−268 (2012)
6. OBrien, P. C.: Procedures for comparing samples with multiple endpoints. Biometrics40,1079−−1087 (1984)
7. Spiegelhalter, D. J., Freedman, L. S., Blackburn, P.: Monitoring clinical trials: conditional orpredictive power?. Control. Clin. Trials7, 8−−17 (1986)
Distance - Based Statistics for CovarianceOperators in Functional Data Analysis
Davide Pigoli
Abstract The statistical analysis of covariance operators in a functional data anal-ysis setting is considered. Many suitable distances to compare covariance operatorsare presented and in particular the problem of estimating the average covariance op-erators among different groups is addressed. Finally, an applied problem in whichthis methodology has proved useful is introduced, namely, exploring phonetic re-lationships among Romance languages looking at covariance operators across fre-quencies.
Key words: Trace class Operators, Functional Data, Linguistic Data
1 Introduction
The aim of this work is to set up a framework for the comparison of covarianceoperators on L2(Ω), Ω ⊆R. This problem arises in Functional Data Analysis whenfeatures of curve populations lie in their covariance structure rather than in the meanfunction. In Section 2 some definitions and properties of operators on L2(Ω) arerecalled. Section 3 illustrates suitable distances to measure differences between co-variance operators and to explore their properties. In Section 4, the application of theproposed methodology to a linguistic problem is introduced and some preliminaryresults are shown.
Davide PigoliMOX- Department of Mathematics, Politecnico di Milano , Piazza Leonardo da Vinci, 32, 20133Milano, Italy. e-mail: [email protected]
1
2 Davide Pigoli
2 Some remarks on compact operators on L2(Ω)
In this section we review some properties and definitions that will be of use whendescribing our proposed methodology. More details and proofs can be found, e.g.,in Zhu (2007).
Definition 1. Let B1 be the closed ball in L2(Ω), i.e. it consists in all f ∈ L2(Ω)so that || f ||L2(Ω) ≤ 1. A bounded linear operator T : L2(Ω)→ L2(Ω) is compact ifT (B1) is compact in the norm of L2(Ω). A bounded linear operator T is self-adjointif T = T ∗.
An important property of compact operators on L2(Ω) is the existence of acanonical decomposition. This means that two orthonormal bases ukk,vkk existso that
T f =+∞
∑k=1
σk〈 f ,vk〉uk,
or, equivalently,T vk = σkuk,
where 〈., .〉 indicates the scalar product in L2(Ω). σk is called the sequence ofsingular value for T . If the operator is self-adjoint, a basis vkk exists such that
T f =+∞
∑k=1
λk〈 f ,vk〉vk,
or, equivalently,T vk = λkvk
and λk is called the sequence of eigenvalues for T .A compact operator T is said to be trace class if
trace(T ) :=+∞
∑k=1〈Tek,ek〉<+∞
for an orthonormal basis ek. It has been proved that the definition is independentfrom the choice of the basis and
trace(T ) =+∞
∑k=1
σk
where σkk are singular values for T . We indicate with S(L2(Ω)) the space of thetrace class operator on L2(Ω).
A compact operator T is said to be Hilbert-Schmidt if its Hilbert-Schmidt normis bounded, i.e.
||T ||2HS = trace(T ∗T )<+∞.
This is a generalization of the Frobenius norm for finite-dimensional matrices.
Distance - Based Statistics for Covariance Operators in Functional Data Analysis 3
Definition 2. A bounded linear operator R on L2(Ω) is said to be unitary if
||R f ||L2(Ω) = || f ||L2(Ω) ∀ f ∈ L2(Ω)
We indicate with SO(L2(Ω)) the space of unitary operators on L2(Ω).Let now f be a random variable which takes values in L2(Ω), Ω ⊆ R, such that
E[||f||2L2(Ω)]<+∞. Then, the covariance operator Cf(s, t) = cov(f(s), f(t)) is a trace
class compact operator on L2(Ω) (see Bosq, 2000, Section 1.5).
3 Distances between covariance operators
In this section novel distances to compare trace class compact operators are pro-posed. These are generalizations to the functional setting of metrics that have beenproved useful for the case of positive definite matrices.
Distance between kernels in L2(Ω)
Every covariance operator S on L2(Ω) can be associated with an integral kernels(x,y) ∈ L2(Ω ×Ω), so that
S f =∫
Ω
s(x,y) f (y)dy, ∀ f ∈ L2(Ω).
Thus, distance between covariance operators can be naturally defined with thedistance between kernels in L2(Ω),
dL(S1,S2) = ||s1− s2||L2(Ω) =
√∫Ω
∫Ω
(s1(x,y)− s2(x,y))2dxdy.
This distance is correctly defined, since it inherits all the properties of the distancein the Hilbert space L2(Ω). However, it does not exploit in any way the particularstructure of the covariance operators and therefore it may not highlight the signifi-cant differences between covariance structures.
Spectral distance
A second possibility is to see the covariance operator as an element of L(L2(Ω)),the space of the linear bounded operators on L2(Ω). It follows that the distancebetween S1 and S2 can be defined as the operator norm of the difference. We recallthat the norm of a self-adjoint bounded linear operator on L2(Ω) is defined as
||T ||L(L2(Ω)) = supv∈L2(Ω)
|〈T v,v〉|||v||2L2(Ω)
4 Davide Pigoli
and for a covariance operator it coincides with the absolute value of the first (i.e.largest) eigenvalue. Thus,
dL(S1,S1) = ||S1−S2||L(L2(Ω)) = |λ1|
where λ1 is the first eigenvalue of the operator S1−S2. dL(., .) generalizes the ma-trix spectral norm which is often used in the finite dimensional case (see, e.g., ElKaroui, 2008). This distance takes into account the spectral structure of the covari-ance operators, but it seems somehow restrictive to focus only on the behavior onthe first mode of variation.
Procrustes size-and-shapes distance
In Dryden et al. (2009), a Procrustes size-and-shape distance is proposed to com-pare two positive definite matrices. Our aim is to generalize this distance to thecase of covariance operators on L2(Ω). Let S1 and S2 be two trace class covarianceoperators on L2. We define the Procrustes distance in S(L2(Ω)) as
dP(S1,S2)2 = inf
R∈SO(L2(Ω))||L1−L2R||2HS = inf
R∈SO(L2(Ω))trace((L1−L2R)∗(L1−L2R)),
where ||.||HS indicates the Hilbert-Schmidt norm on L2(Ω) and Li are so thatSi = L∗i Li. The evaluation of the Procrustes distance asks for the solution of a mini-mization problem. However, an analytical solution is available and the distance hastherefore an expression based on the canonical decomposition of the operator L∗2L1.The unitary operator R that minimizes ||L1−L2R||2HS is defined by
Rvk = uk ∀k = 1, . . . ,+∞.
where ukk,vkk are the orthogonal bases in the canonical decomposition of L∗2L1.
Proposition 1. The Procrustes distance in S(L2(Ω)) is
dP(S1,S2)2 = ||L1||2HS + ||L2||2HS−2
+∞
∑k=1
σk
where σk are the singular values of the compact operator L∗2L1.
Square root operator distance
We can also generalize the square root matrix distance (see Dryden et al., 2009) tocompare S1,S2 ∈ S(L2(Ω)). Since S1/2
i is an Hilbert-Schmidt operator,
dR(S1,S2) = ||S1/21 −S1/2
2 ||HS
Distance - Based Statistics for Covariance Operators in Functional Data Analysis 5
This is a special case of the Procrustes distance above, when no unitary transfor-mation is allowed.
3.1 Averaging of covariance operators
Once appropriate distances for dealing with covariance operators have been defined,many statistical tools can be developed, conveniently generalizing traditional meth-ods based on Euclidean distance. For the sake of brevity, here only the case of esti-mating the average from a sample of covariance operators is presented. Let S1, . . . ,Sgbe the covariance operators for g different groups. Then, a possible estimator of thecommon covariance operator Σ may be
Σ =1
n1 + · · ·+ng(n1S1 + · · ·+ngSg).
However, this formula arises from the minimization of square Euclidean deviations,weighted with the number of observations. If we choose a different distance to com-pare covariance operators, it is more coherent to average covariance operators withrespect to the chosen distance. A least square estimator for Σ can be defined for ageneral distance d(., .),
Σ = argminS
g
∑i=1
nid(S,Si)2.
The actual computation of the sample Frechet mean Σ j depends on the choice of thedistance d(., .). In general, it asks for the solution of a high dimensional minimiza-tion problem but some distances allows for an analytic solution while for othersefficient minimization algorithms are available. For those concerning Kernel dis-tances, it is easy to see that the L2(Ω) kernel of the Frechet average is obtained withthe weighted average of the kernels s1(s, t), . . . ,sg(s, t) of the data, i.e.
σ(x,y) =1
n1 + · · ·+ng(n1s1(x,y)+ · · ·+ngsg(x,y)).
For the Square root distance, the following result can be proved.
Proposition 2.
Σ = argminS
g
∑i=1
nidS(S,Si)2 = (
1G
g
∑i=1
niS12i )
2. (1)
where G = n1 + · · ·+ng.
The Procrustes mean can be obtained by an adaptation of the algorithm proposedin Gower (1975) or Ten Berge (1977). This works very well in practice if the algo-rithm is initialized with the estimate provided by (1).
6 Davide Pigoli
4 Exploring phonetic relationships among Romance languages
The traditional way of exploring relationships among languages consists in lookingat textual similarity. However, this often neglects any phonetic characteristics of thelanguages. Here a novel approach is proposed to compare languages on the basis ofphonetic structure.
In particular, people speaking different languages (French, Italian, Portuguese,Iberian Spanish and American Spanish) are registered while pronouncing wordscorresponding with the numbers from one to ten in each language. The output of theregistration for each word and for each speaker consist in the intensity of the soundover time and frequencies.
Fig. 1 Frechet average along time of covariance operators of log-spectrogram among frequenciesfor five Romance languages, using square root distance.
The aim is to use this data to explore linguistic hypotheses concerning the re-lationship among different languages. However, while many possible phonetic fea-tures may be of interest, it has been shown that covariance operators associatedwith frequencies can provide some phonetic insight (Hajipantelis et al., 2012). Fre-quency covariances indeed can summarize phonetic information for the language,disregarding particular characteristics of speakers and words. For the scope of thiswork, we focus on the covariance operators among frequencies obtained from thelog-spectrogram with estimates being obtained using the sample of all speaker ofthe language. We consider different time points as replicates of the same covarianceoperator among frequencies. It is clear that this is a major simplification of the richstructure in the data but it already leads to some interesting conclusions. Here some
Distance - Based Statistics for Covariance Operators in Functional Data Analysis 7
preliminary results are reported, focusing on the covariance operator for the word“one”.
Fig. 2 Left: Distance matrix among Frechet average of Fig. 1, obtained with Square root dis-tance. Right: Dendrogram obtained from distance matrix using an average linking, where I=Italian,F=French, P=Portuguese, SA=American Spanish, SI=Iberian Spanish.
Fig. 1 shows the covariance operator estimated for each language via Frechetaveraging along time, using square root distance, for the word “one”. Fig. 2 showsdissimilarity matrix among average covariance operators for each language and thecorrespondent dendrogram, while Fig. 3 compares a two-dimensional projection ofthe data obtained with a classical (metric) multidimensional scaling with the mapcoming from linguistic experts, containing information about historical and geo-graphical relationship among languages. Indeed, it seems that focusing on the co-variance operator captures some important information about languages. There is anoverall similarity between the map predicted by experts and relationships among co-variance structures. However, some unexpected features may suggest new researchlines. For example, it is worth to notice that Portuguese covariance structure is aconsiderable distance from all the others, thus highlighting particular linguistic in-fluences on the language.
5 Conclusions
In this work the problem of dealing with the covariance operator has been addressed.The choice of the appropriate metric is crucial in the analysis of covariance opera-tors. Here some suitable metrics have been proposed and their properties have beenhighlighted. On the basis of appropriate metric, statistical methods can be devel-oped to deal with covariance operators in the functional data analysis framework.The notable case of estimating the average from a sample of covariance operators is
8 Davide Pigoli
Fig. 3 Left:Map of languages built by linguistic experts using historical and geographical in-formation. Some languages are shown for whom phonetic data are not available. Right: Bidi-mensional metric multidimensional scaling. The extreme behavior of the Portuguese languagelead to a slightly difference configuration. Label correspond to languages: I=Italian, F=French,P=Portuguese, SA=American Spanish, SI=Iberian Spanish.
illustrated. Moreover, in many applications, the covariance operator itself is the ob-ject of interest, as illustrated by the linguistic data of Section 4. Using the square rootdistance between covariance operator among frequencies, some significant phoneticfeatures of Romance languages have been found.
References
1. Bosq, D. : Linear processes in function spaces. Springer, New York (2000)2. Dryden, I.L., Koloydenko, A., Zhou, D. : Non-euclidean statistics for covariance matrices,
with applications to diffusion tensor imaging. Ann. Appl. Stat. 3, 1102-1123 (2009)3. El Karoui, N. : Operator norm consistent estimation of large-dimensional sparse covariance
matrices. Ann. Stat. 36, 2717–2756 (2008)4. Gower, J. C.: Generalized Procrustes analysis. Psychometrika 40,33–50 (1975)5. Hadjipantelis, P.Z., Aston, J.A.D., Evans, J.P. : Characterizing fundamental frequency in Man-
darin: A functional principal component approach utilizing mixed effect models. J. Acoust.Soc. Am. In press (2012)
6. Ten Berge, J. M. F. : Orthogonal Procrustes rotation for two or more matrices. Psychometrika42, 267–276 (1977)
7. Zhu, K. : Operator theory in function spaces (2nd ed.). American Mathematical Society(1977)
Clustering Multivariate Longitudinal Data:Hidden Markov of Factor Analyzers
Antonello Maruotti and Francesca Martella
Abstract Parsimonious Hidden Markov of Factor Analyzers models are developedby using a modified factor analysis covariance structure. This framework can be seenas a extension of the Parsimonious Gaussian mixture models (PGMMs) accountingfor heterogeneity in a longitudinal setting. In particular, a class of 12 models are in-troduced and the maximum likelihood estimates for the parameters in these modelsare found using an AECM algorithm. The class of models includes parsimoniousmodels that have not previously been developed. The performance of these modelsis discussed on a benchmark gene expression data. The results are encouraging andwould deserve further discussion.
Key words: Clustering Longitudinal Data, Factor Analyzers, Hidden Markov Mod-els, Dimensionality reduction
1 Introduction
In a longitudinal setting, repeated measurements are collected on the same (inde-pendent) units over several periods of time. Standard methods for longitudinal dataanalysis focus on the dependence of the variables on covariates, serial dependence,and heterogeneity in the individuals/units (see, e.g., [9]). A growing interest hasbeen recently devoted to appropriately account for heterogeneity across the individ-ual sequences (see e.g. [16]). To capture heterogeneity in a longitudinal setting, itis common to assume the existence of a latent process, driving and characterizing
Antonello MaruottiDipartimento di Isitituzioni Pubbliche, Economia e Societa, Universita di Roma Tre, Via GabrielloChiabrera, 199 - 00145 Roma e-mail: [email protected]
Francesca MartellaDipartimento di Scienze Statistiche, Sapienza Universita di Roma, P.le Aldo Moro, 5 - 00185 Romae-mail: [email protected]
1
2 Antonello Maruotti and Francesca Martella
different data generation mechanisms ([6, 10, 12] provide interesting reviews on thistopic in different contexts).
Recently, [16] introduce a model-based clustering technique for clustering longi-tudinal in a finite mixture framework. Each longitudinal sequence can be consideredas a single object or entity belonging to one of the mixture components and all theindividual sequences within the same component are characterized by the same gen-erating mechanism. Other approaches have been developed by using hierarchicalmodels ([4, 5, 11, 1, 13]) at the cost of an increasing computational burden.
In the following we are going to consider a multivariate Gaussian hidden Markovmodel (HMM; see [23] for a general introduction on HMMs), which can be seen asan extension of the finite mixture model [14] where individuals are allowed to movebetween the (hidden) components during the period of observation. Starting fromthe Parsimonious Gaussian mixture models (PGMMs) introduced by [15] and fur-ther extended by [17], we introduce a hidden Markov of factor analyzers by spec-ifying a modified factor analysis covariance structure, including the possibility ofimposing constraints which leads to a family of 12 models, including parsimoniousmodels.Parameter estimates can be obtained by an Alternating Expectation ConditionalMaximization algorithm (AECM, [18]) in a HMM framework by adapting the well-know forward-backward algorithm ([3, 21]). The hidden Makov framework of factoranalyzers is illustrated in the clustering of a representative dataset in the microarrayliterature: the yeast galactose data of [8]. The paper is organized as follows. Section2 introduces the model by specifying some preliminaries on HMMs and providingextensions of the basic HMM in a multivariate clustering setting. Computationaldetails are briefly described in Section 3, while Section 4 provides an illustrativeexample of the proposed models.
2 Model-based clustering of longitudinal data
In this section we firstly introduce the basic notation and the main assumptions onHMMs. Afterwards, we introduce in detail the hidden Markov of factor analyzers,pointing out the considered covariance structures and the computational aspects re-lated to the estimate of model parameters.
2.1 Hidden Markov models
In a basic HMM for longitudinal data, the existence of two processes is assumed:an unobservable finite-state first-order Markov chain, Sit , i = 1, . . . ,n, t = 0, . . . ,Twith state space S = 1, . . . ,m and an observed process, Yit = Yit1,Yit2, . . . ,YitJ,where Yit j denotes the j-th response variable for individual i at time t (similarly forSit ).
Clustering Multivariate Longitudinal Data: Hidden Markov of Factor Analyzers 3
We assume that the distribution of Yit depends only on Sit , specifically the Yit ,t = 1, . . . ,T , are conditionally independent given the Sit :
f (Yit = yit | Yi0 = yi0, . . . ,Yit−1 = yit−1,Si0 = si0, . . . ,Sit = sit) =
f (Yit = yit | Sit = sit) (1)
Typically it is assumed that the state-dependent distributions, i.e. the distributionsof Yit given Sit , come from a parametric family of continuous or discrete distribu-tions. Thus, the unknown parameters in a HMM involve both the parameters of theMarkov chain and those of the state-dependent distributions of the random variablesYit . In particular, the parameters of the Markov chain are the elements of the transi-tion probability matrices Q = qitlk, where qitlk = Pr(Sit = k | Sit−1 = l), l,k ∈S isthe probability that individual i visits state k at time t given that at time t −1 he/shewas in state l, and the initial probabilities δ = δil, where δil = Pr(Si0 = l), i.e.the probability of being in state l at time 0. The simplest model in this frameworkis the homogeneous HMM, which assumes common transition and initial probabil-ities, i.e. qitlk = qlk and δil = δl . We will focus on homogeneous HMMs to simplifythe discussion, but of course the hidden Markov chain can be assumed to be non-homogeneous: the transition probabilities may be individual and/or time varyingand modeled via a logit function of explanatory variables.
2.2 Hidden Markov of Factor Analyzers
Consider an HMM with Yit being multidimensional with the conditional distribu-tion of Yit given Sit = sit being N(µsit
,Σ sit ), i.e. multivariate Gaussian with state-dependent mean, µsit
, and covariance matrix Σ sit . In line with the more generalmixture of factor analyzers framework, we assume that conditionally to the sit-thstate, the random vector yit is modelled using a H-dimensional vector of latent fac-tors wisit (typically H ≪ J) as yit = µsit
+Λ sit wisit +eit , where Λ sit is a J×H matrixof factor weights, the latent variables wisit ∼ MV N(0,IH), and eit ∼ MV N(0,Ψ sit ),where Ψ sit is a J×J diagonal matrix. Thus, conditionally on the sit -th state, the den-sity of yit is MV N(0,Λ sit Λ
′sit+Ψ sit ). Therefore, the marginal density of a hidden
Markov of factor analyzers is given by:
f (yi) = ∑S T
δsi0
T
∏t=1
qsit−1sit
T
∏t=0
exp[− 1
2 (yit −µsit)′(Λ sit Λ
′sit+Ψ sit )
−1(yit −µsit)]
(2π)J/2 | Λ sit Λ′sit+Ψ sit |1/2 .(2)
where ∑S T denotes summation over all realizations sit, t = 0, . . . ,T , for individ-ual i.Note that the proposed model can be seen as an extension of the mixture of fac-tor analyzers model by allowing time dependence and, following the idea in [17],constraints across groups on the Λ sit and Ψ sit matrices and on whether or notΨ sit = ψsit Ξ sit , where ψsit ∈R+ and Ξ sit = diagξ1, ...,ξJ such that |Ξ sit |= 1. The
4 Antonello Maruotti and Francesca Martella
full range of possible constraints provides a class of 12 different Hidden Markov ofFactor Analyzers models, which are given in Table 1. Note that CCCC and CCCUassume the equal isotropic noise whereas UCCC and CUUU assume the unequalisotropic noise. The other eight covariance structures incorporating constraints onthe loading matrices dramatically reduces the number of covariance parameters andlead to parsimonious models.
Table 1 Covariance structure in a hidden Markov of factor analyzers framework
Model ID Λ sit = Λ Ξ sit = Ξ ψsit = ψ Ξ sit = IJ
CCCC Constrained Constrained Constrained Constrained Σ = ΛΛ ′+ψIJCCCU Constrained Constrained Constrained Unconstrained Σ = ΛΛ ′+ψΞCCUC Constrained Constrained Unconstrained Constrained Σ sit = ΛΛ ′+ψsit IJCUUU Constrained Unconstrained Unconstrained Unconstrained Σ sit = ΛΛ ′+ψsit Ξ sitUCCC Unconstrained Constrained Constrained Constrained Σ sit = Λ sit Λ
′sit+ψIJ
UCCU Unconstrained Constrained Constrained Unconstrained Σ sit = Λ sit Λ′sit+ψΞ
UCUC Unconstrained Constrained Unconstrained Constrained Σ sit = Λ sit Λ′sit+ψsit IJ
UUUU Unconstrained Unconstrained Unconstrained Unconstrained Σ sit = Λ sit Λ′sit+ψsit Ξ sit
CCUU Constrained Constrained Unconstrained Unconstrained Σ sit = ΛΛ ′+ψsit ΞUCUU Unconstrained Constrained Unconstrained Unconstrained Σ sit = Λ sit Λ
′sit+ψsit Ξ
CUCU Constrained Unconstrained Constrained Unconstrained Σ sit = ΛΛ ′+ψΞ sitUUCU Unconstrained Unconstrained Constrained Unconstrained Σ sit = Λ sit Λ
′sit+ψΞ sit
3 Computational details
Even if this form of the likelihood has several appealing properties, as it stands ex-pression (2) is of little or no computational use, because it involves a sum over mT
terms for each unit i and cannot be directly evaluated. It quickly becomes infeasi-ble to compute even for small values of m as T grows to moderate size. Clearly, amore efficient procedure is needed to perform the calculation of the likelihood func-tion. This issue may be addressed via the so-called forward variables ([3, 21]). Toestimate model parameters, an Alternating Expectation Conditional Maximization(AECM) algorithm introduced by [18] is used. This algorithm is an extension ofthe EM algorithm using different definitions of missing data at different stages. TheAECM algorithm tends to be preferred to its alternatives due to its robustness andease of application in various scenarios, especially when the model parameters areconstrained. For homogeneous HMMs, the AECM reduces to an iterative procedurewith simple, closed form expressions for parameter estimates at each iteration. It isbased on complete-data log-likelihood, i.e., the log-likelihood of the observations(the incomplete data) plus the states (the missing data). Before deriving the com-plete data log-likelihood, we define uitl = I(Sit = l) as an indicator variable equal to
Clustering Multivariate Longitudinal Data: Hidden Markov of Factor Analyzers 5
1 if unit i is in state l at time t and 0 otherwise, and vitlk = I(Sit = k,Sit−1 = l) as anindicator variable equal to 1 if unit i is in to state l at time t −1 and in state k at timet, 0 otherwise. Moreover, we partition the vector of unknown parameters Φ in (Φ1, Φ2 ); Φ1 contains the transition probabilities qlk and δl and µsit
. The Φ2 containsthe elements of Λ sit , Ψ sit and wisit . At the first stage of the algorithm, we define thestate labels as missing data, and the complete data log-likelihood function has thefollowing form:
ℓc1(θ) =n
∑i=1
m
∑l=1
ui0l logδl +T
∑t=1
m
∑l=1
m
∑k=1
vitlk logqlk
+T
∑t=0
m
∑l=1
uitl log f (yit | Sit = l)
, (3)
where
f (yit | Sit = l) =exp
[− 1
2 (yit −µ l)′(Λ lΛ ′
l +Ψ l)−1(yit −µ l)
](2π)J/2 | Λ lΛ ′
l +Ψ l |1/2
Thus, the first E-step consists of calculating the conditional expectation of expres-sion (3) by replacing all the quantities uitl and vitlk with their conditional expecta-tions uitl and vitlk, given the current values of the parameters and the observed data.On the other hand, at the first CM-step, the expected complete-data log-likelihoodis maximized with respect to µl , δl and qlk obtaining:
µl =∑n
i=1 ∑Tt=0 uitlyit
∑ni=1 ∑T
t=0 uitl, δl =
∑ni=1 ui0l
n; qlk =
∑ni=1 ∑T
t=1 vitlk
∑ni=1 ∑T
t=1 ∑mk=1 vitlk
.
At the second stage of the AECM algorithm, we use µl , δl and qlk obtainedabove, when estimating Λ l and Ψ l and consider the state labels and the factors to bethe missing data. Therefore, the complete data log-likelihood is
ℓc2(θ) =n
∑i=1
m
∑l=1
ui0l logδl +T
∑t=1
m
∑l=1
m
∑k=1
vitlk logqlk
+T
∑t=0
m
∑l=1
uitl log f (yit | Sit = l,wil)+T
∑t=0
m
∑l=1
loguitl f (wil)
. (4)
In a similar manner as before, the estimates of Λ l and Ψ l can be easily derivedunder the different imposed constraints (not shown her for sake of brevity). TheAECM algorithm iteratively updates the parameters until convergence to maximumlikelihood estimates of the parameters. As a by-product of the estimation procedurewe have the possibility of classifying genes on the basis of their posterior probabilityestimates uitl . In fact, the i-th gene can be classified to the l-th group (component of
6 Antonello Maruotti and Francesca Martella
the estimated mixture) if uitl = max(uit1, uit2, . . . , uitm). It is worth noticing that eachgroup is characterized by homogeneous values of the estimated parameters.
4 The yeast galactose data
To discuss the empirical performance of the proposed model, we use a typical geneexpression dataset where the expression levels are measured at many time points orunder different conditions to elucidate genetic networks or some important biologi-cal process. Specifically, this dataset has been used to study integrated genomic andproteomic analyses of a systemically perturbed metabolic network ([8]). The exper-iments included single gene deletion involving nine of the key genes (GAL1, GAL2GAL3 GAL4, GAL5(PGM2), GAL6(LAP3), GAL7 , GAL10, GAL80) that partic-ipate in yeast galactose metabolism. For each experiment, one of the nine geneswas deleted, or alternatively, the experiment used a wild-type cell wherein no geneswere deleted. For each of those 10 experimental conditions, galactose was avail-able extracellularly in one set of experiments and absent in another set. Thus, therewere a total of T = 20 different experimental conditions. Since each of those 20experiments refers J = 4 experimental conditions, the overall dataset contains 80experiments. As in ([22]) and ([19]), we imputed all the missing values using a k-nearest neighbor method. The resulting n = 205 gene expression levels reflect fourfunctional categories in the Gene Ontology (GO) listings ([2]). Thus, we applieda hidden Markov of factor analyzers to group genes into m=4 states; we do notdiscuss fitting for varying numbers of states m, since we would analyze the perfor-mance of our proposal in reproducing the known functional categories. Genes areallowed to move among the states during the period of observation. In fact, a genecan be associated with multiple biological functions, due to the fact that genes of-ten have several distinct roles in regulation processes. Therefore the assumption ofassigning a gene only to one state (or cluster) is an oversimplification for a biologi-cal system. In the following we summarize the potential of the proposed approach.We look at three over twelve factorial parameterizations as illustrative examples.The evolution over time is presented in Figure 1, while a comparison in terms ofBIC = 2× ℓ+ #parameters× log(n) and goodness-of-classification with PGMMsis provided in Table 2. We classify each gene in the state maximizing its posteriormembership probability deriving the unobserved sequence of states. Figure 1 showsthe hidden sequences of hidden states; it is clear that time dependence and hetero-geneity play an important role in the classification, since genes seem to change theirbehavior, moving across states over time.
Furthermore, we provide a measure of the quality of the classification by theindex
S =∑n
i=1 ∑Tit=1
(max(uit1, uit2, . . . , uitm)− 1
m
)(1− 1
m )∑ni=1 Ti
Clustering Multivariate Longitudinal Data: Hidden Markov of Factor Analyzers 7
Index S is always between 0 and 1, with 1 corresponding to the situation of ab-sence of uncertainty in the classification, since one of such posterior probabilities isequal to 1 for every individual at every time, with all the other probabilities equalto 0. It helps in identifying if the population clusters are sufficiently well separated.It is worth noting that each state is characterized by homogeneous values of esti-mated random effects; thus, conditionally on observed covariates values, subjectsfrom that state have a similar propensity to the event of interest. The UCCU HMMis the preferred model, providing the best goodness-of- classification and the bestBIC. This confirms the importance of appropriately account for all longitudinal datacharacteristics.
Table 2 Summary results
Model H PGMM HMMBIC S BIC S
UUUU2 14821.53 0.758 16867.12 0.9313 14759.23 0.756 16840.02 0.932
UCCU2 14659.44 0.795 16820.41 0.9343 14859.23 0.758 16903.45 0.932
UCUC2 7611.761 0.735 10413.41 0.9703 12162.73 0.864 14131.45 0.919
Fig. 1 Hidden states sequences for the 205 genes over 20 times
Times
Gen
es
5 10 15 20
20
40
60
80
100
120
140
160
180
2001
1.5
2
2.5
3
3.5
4
8 Antonello Maruotti and Francesca Martella
References
1. Alfo M. and Maruotti, A.: A hierarchical model for time dependent multivariate longitudinaldata. In: Data Analysis and Classification. Springer Series on Studies in Classification, DataAnalysis and Knowledge Organization. C. Lauro, F. Palumbo (eds), Springer-Verlag, 271-279(2010).
2. Ashburner, M. Ball, C.A. Blake, J.A. Botstein, D. Butler, H. Cherry, J.M. et al. Gene Ontol-ogy: tool for the unification of biology. Nat Genet 25, 25-29.
3. Baum, L.E., Petrie, T., Soules, G. and Weiss, N.: A maximization technique occurring inthe statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist. 41,164–171 (1970).
4. Celeux, G., Martin, O. and Lavergne, C.: Mixture of linear mixed models for clustering geneexpression profiles from repeated microarray experiments. Stat. Model. 5, 243–267.
5. De la Cruz-Meıa, R., Quintana, F. A. and Marshall, G.: Model-based clustering for longitudi-nal data. Comp. Stat. and Data Anal. 52, 1441–1457 (2008).
6. Fruhwirth-Schanatter, S.: Panel data analysis: a survey on model-based clustering of timeseries. Adv. Data Anal. Classif. 5, 251–280 (2011).
7. Ghahramani, Z. and Hinton, G. E.: The EM algorithm for factor analyzers. Technical ReportCRG-TR- 96-1, University of Toronto (1997).
8. Ideker, T. Thorsson, V. Ranish, J.A. Christmas, R. Buhler, J. Eng, J.K. et al. Integrated ge-nomic and proteomic analyses of a systematically perturbed metabolic network. Science 92,929-934 (2001).
9. Laird, N.M. and Ware, J.H.: Random effects models for longitudinal data. Biometrics 38,963–974 (1982).
10. Martella, F. and Vermunt, J.K.: Model-based approaches to synthesize microarray data:a unifying review using mixture of SEMs. Stat. Methods Med. Res. (2012), doi:10.1177/0962280211419482
11. Maruotti, A. and Ryden, T.: A semiparametric approach to hidden Markov models underlongitudinal observations. Statist. Comput. 19, 381–393 (2009).
12. Maruotti, A.: Mixed hidden Markov models for longitudinal data: an overview. Int. Stat. Rev.79,427–454 (2011).
13. Maruotti, A. and Rocci, R.: A mixed non-homogeneous hidden Markov model for categoricaldata, with application to alcohol consumption. Stat. Med. (2012) doi: 10.1002/sim.4478.
14. McLachlan, G.J. and Peel, D.: Finite mixture models. Wiley series in probability and statis-tics. Wiley, New York (2000).
15. McNicholas P.D. and Murphy T.B.: Parsimonious Gaussian mixture models. Stat. Comput.18, 285–296 (2008).
16. McNicholas P.D. and Murphy T.B.: Model-based clustering of longitudinal data. The Cana-dian Journal of Statistics 38, no. 1, 153–168 (2010).
17. McNicholas P.D. and Murphy T.B.: Model-based clustering of microarray expression data vialatent Gaussian mixture models. Bioinformatics 26, no. 21, 2705–2712 (2010).
18. Meng, X.L. and Van Dyk, D.A.: The EM algorithm -an old folk song sung to a fast new tune.Journal R. Statist. Soc., B 59 511–567 (1997).
19. Ng, S.K. McLachlan, G.J. Bean, R.W. and Ng, S.W. Clustering replicated microarray data viamixtures of random effects models for various covariance structures. In: Boden Mand BaileyTL (eds.) Conferences in research and practice in information technology. The AustralianComputer Society, Sydney, 73 29-33 (2006).
20. Tipping, T. E. and Bishop, C. M.: Mixtures of probabilistic principal component analysers.Neural Comp. 11, 443–482 (1999).
21. Welch, L.R.: Hidden Markov models and the Baum-Welch algorithm. IEEE Inf. Theory Soc.Newsl. 53, 1–13 (2003).
22. Yeung, K.Y. Medvedovic, M. Bumgarner, R.E. Clustering gene-expression data with repeatedmeasurements. Genome Biol, 4(R34) (2003).
23. Zucchini, W. and MacDonald, I.L.: Hidden Markov Models for Time Series: An IntroductionUsing R. Boca Raton, FL: Chapman & Hall (2009).
Random coefficient based dropout models: afinite mixture approach
Alessandra Spagnoli and Marco Alfo
Abstract In longitudinal studies, subjects may be lost to follow up (a phenomenonwhich is often referred to as attrition) or miss some of the planned visits thus gen-erating incomplete responses. When the probability for nonresponse, once condi-tioned on observed covariates and responses, still depends on the unobserved re-sponses, the dropout mechanism is known to be informative. A common objectivein these studies is to build a general, reliable, association structure to account fordependence between the longitudinal and the dropout processes. Starting from theexisting literature, we introduce a random coefficient based dropout model wherethe association between outcomes is modeled through discrete latent effects; theselatent effects are outcome-specific and account for heterogeneity in the univariateprofiles. Dependence between profiles is introduced by using a bidimensional rep-resentation for the corresponding distribution. In this way, we define a flexible latentclass structure, with possibly different numbers of locations in each margin, and afull association structure connecting each location in a margin to each location inthe other one. By using this representation we show how, unlike standard (unidi-mensional) finite mixture models, an informative dropout model may properly nesta non informative dropout counterpart.
Key words: Finite mixtures, informative dropout, concomitant latent variables
Alessandra SpagnoliDipartimento di Scienze Statistiche, Sapienza Universita di Roma, e-mail: [email protected]
Marco AlfoDipartimento di Scienze Statistiche, Sapienza Universita di Roma, e-mail:[email protected]
1
2 Alessandra Spagnoli and Marco Alfo
1 Introduction
In longitudinal studies, measurements from the same individuals (units) are takenrepeatedly over time. These kind of studies often suffer of attrition, since indi-viduals may dropout of the study before the scheduled completion time and thuspresent incomplete data. When the reasons for dropout is related to unobservedresponses, even after controlling for available covariates and responses, the miss-ingness is known to be informative. In such studies, scientific interest may focus onthe association structure between the longitudinal measurements and the missignessprocess. In a seminal paper, [10] discuss a class of statistical models for non ignor-able dropout, referred to as Random Coefficient Based Dropout Models (RCBDM),where marginal association between the longitudinal and the survival process arisesdue only to dependent, outcome-specific, random coefficients. Separate models arehypothesized for the two partially observed processes, which share a common (cor-related) set of random coefficients. In the context of binary responses, [2] proposean extension of these models by defining a semi-parametric selection model wherethe longitudinal and the dropout processes are linked through correlated random ef-fects. The random effects are usually assumed to be Gaussian, but this assumptionhas been questioned by some authors, see eg [14], since the resulting inferences canbe sensitive to assumptions that cannot be verified from the available data. In thisperspective, [13], investigated the effect of misspecifying the random effect distri-bution on parameter estimates and standard errors when a shared parameter model isconsidered. They showed that, as the number of repeated longitudinal measurementsper individual grows, the effect of misspecifying the random effect distribution van-ishes for certain parameter estimates, thus referring, implicitly, to theoretical resultsin [4]. But in several contexts, for example in clinical research, the follow up timesare usually short, and individual sequences include only a few information on therandom effects; therefore, the choice of the random effect distribution may be im-portant. As far as selection models are entailed, just to mention a few, [19] used aMonte Carlo EM algorithm for linear mixed model with Gaussian random effects,[8] propose a Laplace approximation to overcome the high-dimensional integrationover the distribution of the random effects. Numerical integration techniques, suchas standard or adaptive Gaussian quadrature, can be used as well. In this paper,we are interested to investigate the association structure between measurement anddropout processes when the random coefficient distribution is left completely un-specified, adopting a finite mixture perspective. We consider a bivariate distributionfor the random coefficients that is equal to the product of the marginal distributionsonly when the dropout mechanism is ignorable. The structure of the paper follows.Section 2 discusses a random coefficient based dropout model where the associationbetween outcomes is modeled through discrete latent effects. Section 3 describes theproposed ML algorithm. Last section contains concluding remarks.
Random coefficient based dropout models: a finite mixture approach 3
2 Random coefficient-based models
Let Yit represent a set of longitudinal measurements recorded on i= 1, . . . ,n subjectsat time t = (1, . . . ,T ), associated to a row vector of p covariates xit = (xit1, . . . ,xit p).Let us assume that the observed responses yit are realizations of a random variablewith density in the exponential family and canonical parameter, θit . The canonicalparameter is defined as follows:
θit = xTit β +xTit bi (1)
The terms bi, i = 1, . . . ,n, are used to model unobserved individual-specific (time-invariant) heterogeneity common to each lower-level unit (time) within the same ithupper-level unit (individual), while β is a p−dimensional vector of fixed regressionparameters. Those effects that vary across individuals are collected in the designvector zit = (zit1, . . . ,zitm). We denote with Ri the missing data indicator vector,with generic element defined as Rit = 1 if the ith unit drops out at any point in thewindows (t − 1, t), Rit = 0 otherwise. Using this representation, we are implicitlyassuming a discrete structure for the time to dropout; however the following argu-ments apply to continuous time survival process as wel. We assume that, once aperson drops out, he or she is out forever (attrition). If the designed completion timeis denoted by T , we will have Ti ≤ T measures for each unit. We may introduce anexplicit model for the dropout mechanism, conditioning on a set of dropout specificcovariates, vi, and the random coefficients in the longitudinal response model:
h(ri|vi,yi,bi) = h(ri|vi,bi) =Ti
∏t=1
h(rit |vi,bi) i = 1, . . . ,n (2)
where the corresponding canonical parameter is: φit = vTit γ + dT
it bi. These modelsare usually referred to as shared parameter models, see [21],[22], and are basedon the assumption of conditional independence between the longitudinal responseand the dropout indicator; as it can be easily noticed, they assume a perfect linearcorrelation between the latent variables in the two equations. In this framework, thejoint density of the measurements process Yit and the missigness process Rit may bewritten as: ∫ [ T
∏t=1
f (yit |xit ,bi)Ti
∏t=1
h(rit |vit ,bi)
]dG(bi) (3)
where G(·) represents a discrete or a continuous random coefficient distribution.Here, measurement and missigness processes are assumed to be independent giventhe random effects bi; therefore, if any, association is completely accounted for bythis latent structure. Correlated random effects represent a further alternative, see eg[1]: the unobservable latent characteristics control for potential overdispersion in theunivariate profiles and for association between the measurements and missingnessprocesses; this structure, however, avoids unit correlation estimates, and represents amore flexible approach when compared to shared random effects, where conditional
4 Alessandra Spagnoli and Marco Alfo
independence still hold. Let bi = (b1i,b2i) denote a set of subject and outcome spe-cific random coefficients; then, the joint density of the measurement process Yit andthe missigness process Rit can be factorized as:
∫ [ T
∏t=1
f (yit |xit ,b1i)Ti
∏t=1
h(rit |vit ,b2i)
]dG(b1i,b2i) (4)
An extension of this association structure between random coefficients in the twoequations may e defined following [5] where a general random effect model is intro-duced, where common, partially shared and independent (response-specific) randomeffects influences the measurement and the dropout processes. While it is commonto assume that random effects follows a Gaussian distribution, this does not usu-ally lead a tractable form of the integral in eqs (3) and (4). Among others, [20],[15], [17], show that the choice of the random effect distribution does not have greatimpact on parameter estimates, except for extreme cases, such as discrete distribu-tions. On the same line, [13] show that when all subjects have a relatively largenumber of repeated measurements, the effects of a misspecifying the random effectdistribution became minimal for model parameter estimates. However, [18] observethat the choice of an appropriate random effect distribution is generally difficult for,at least, three reasons. There is often little information about these unobservables,thus any distributional assumption is difficult to justify, by looking only at observeddata. When high dimensional random coefficients are considered, the use of a para-metric multivariate distribution imposing the same shape on every dimension canbe restrictive. A potential dependence of the random effects on unobserved covari-ates induces heterogeneity that cannot be captured by common parametric assump-tions. In studies where some subjects have few measurements, ie due to dropout,the choice of the random coefficient distribution may therefore be important. A fi-nite mixture approach avoids any unverifiable assumptions upon this distribution,frequently referred to as the mixing distribution. In this perspective, [18] proposea semi-parametric shared parameter model to analyze continuous longitudinal re-sponses while adjusting for non monotone missingness. On the same line, [2] jointlyanalyze longitudinal binary responses subject to dropout trough a selection modelwith correlated, outcome-specific, random coefficients. Using a finite mixture ap-proach, the log-likelihood function in equation (4) can be written as follows:
`(·) =n
∑i=1
K
∑k=1
f (yi|xi,b1k)h(ri|vi,b2k)πk
=
n
∑i=1
K
∑k=1
f (yi,ri|xi,vi,bk)πk
(5)
where πk =Pr(bk)=Pr(b1k,b2k) is the joint probability of locations bk =(b1k,b2k),k=1, . . . ,K. The use of finite mixture has several significant advantages over paramet-ric models; for instance, this approach is computationally efficient, and the discretenature of the estimate may help classify subjects in components corresponding toclusters characterized by homogeneous values of random parameters. However, wemay notice that the latent variables, as well as the corresponding number of loca-tions, considered in the model to account for individual extra-model departures can
Random coefficient based dropout models: a finite mixture approach 5
be different when the longitudinal and the missingness processes are considered.For this reason, according to [3], we propose to consider different number of com-ponents, locations and/or masses for the latent variables in the two equations. Whencompared to previously mentioned proposals, see eg equation (5), this is a moreflexible representation for the random coefficient distribution and, in particular, thismodel properly nests a model which describes the dropout as being non informative.That is, the proposed MNAR model properly nests a MAR countepart, while in caseof equation (5) this is not true. Let us suppose the joint bivariate distribution of therandom effects has the following marginal representation [9]:
P1 = (u1g,π1g) ,g = 1, . . . ,K1 P2 = (u2l ,π2l) , l = 1, . . . ,K2
with π1g = Pr(b1i = b1g), g = 1, . . . ,K1, π2l = Pr(b2i = b2l), l = 1, . . . ,K2. That is,we associate to each couple of random coefficients, say (b1g,b2l), g = 1, . . . ,K1,l = 1, . . . ,K2, a mass πgl = Pr(b1i = b1g,b2i = b2l), where we do not restrict to con-sider the same number of components in each profile. While marginals control forheterogeneity in the univariate profiles, joint probabilities describe the associationbetween latent effects in the two submodels. This approach can be related to a stan-dard finite mixture approach where K = K1×K2 components are used and each ofthe K1 locations in the first profile appears in a couple with each of the K2 locationscorresponding to the second profile. Theorem 1 in [6] shows that the elements ofany probability matrix π ∈ Π K1K2 , where the latter represents the set of K1×K2probability matrices, can be decomposed as:
πgl =M
∑h=1
τhπ1g|hπ2l|h (6)
for an appropriate choice of M. Obviously, the following constraints hold:
∑h
τh = ∑g
π1g|h = ∑l
π2l|h = ∑g
∑l
πgl = 1
Therefore, the two set of random coefficients b1i and b2i, i= 1, . . . ,n are independentconditional on belonging to the h-th (upper level) latent class h = 1, . . . ,M. Randomcoefficients control for heterogeneity in the univariate profiles, while the hierarchyof the latent components control for potential dependence between outcome-specificrandom coefficients; this somewhat leads to separability of univariate heterogeneityand bivariate dependence. In some way, the hierarchical structure for πgl resemblesa copula-based model, where dependence between profiles is modeled through acopula function joining the marginal distributions for the outcome-specific randomcoefficients, see [13]. The independence case arises simply when M = 1; in this case,the dropout mechanism is non ignorable. The dropout probability still depends onunobserved sources of variation, but these are independent on those influencing thelongitudinal measurements. When M ≥ 2 we have some form of dependence andwe can define different non ignorable dropout mechanisms according to the valuesassumed by the parameter M. In this sense, it may be interesting to investigate the
6 Alessandra Spagnoli and Marco Alfo
sensitivity of the results with respect to model assumptions when M moves awayfrom 1, as for example in [16].
3 ML Parameter Estimation
The data vector is composed by an observable part yi and by unobservables zi =(zi1, . . . ,ziK) and ζ i = (ζi1, . . . ,ζiM) representing lower and upper level membershipvectors. For fixed K1, K2 and M, we assume zi and ζ i have multinomial distributions,with probabilities πgl , g= 1, . . . ,K1, l = 1, . . . ,K2 and τh, h= 1, . . . ,M. The completedata likelihood is given by
Lc(·) =n
∏i=1
K1
∏g=1
K2
∏l=1
f (yi,ri | zigl)
[M
∏h=1
π1g|hπ2l|hτh
]ζih
zigl
=n
∏i=1
K1
∏g=1
K2
∏l=1
Ti
∏t=1
f (yit | zigl)h(rit | zigl)
[M
∏h=1
π1g|hπ2l|hτh
]ζih
zigl
where τh is the prior probability for the h−th upper level latent class, π1g|h andπ2l|h are the conditional probabilities of belonging to the g-th and the l.th lowerlevel components, conditional on being in the h-th class. We partition the parametervector Ψ =
(Ψ g,Ψ l ,Ψ glh
), where Ψ g and Ψ l denote the parameter vectors for the
longitudinal and the dropout process, respectively, while Ψ glh = (π1g|h,π2l|h),τh.By writing figl = f (yi,ri | zigl) = f (yi | zigl)h(ri | zigl), the score function is:
Sc(Ψ g) =n
∑i=1
K2
∑l=1
wigl∂
∂Ψ g
[log( figl)+ log(πgl)
]=
n
∑i=1
wig∂
∂Ψ g[log( fig)]
Sc(Ψ l) =n
∑i=1
K1
∑g=1
wigl∂
∂Ψ l
[log( figl)+ log(πgl)
]=
n
∑i=1
wil∂
∂Ψ l[log(hil)]
Sc(Ψ glh) =n
∑i=1
wiglωih|gl∂
∂Ψ glh
[log(πg|h)+ log(πl|h)+ log(τh)
]where fig = f (yi | zigl), hil = f (ri | zigl) and ωih|gl is the posterior probability that thei−th belongs to the h-th upper level component, given the observed data, the lowerlevel components and the current parameter estimates Ψ
(r). Terms wigl represent the
posterior probability of the unit being in the g-th component and the l-th componentin the measurement and dropout profiles, respectively. In this way, we may test forindependence of the two processes, through standard Wald-type or χ2-based statis-tics; in particular, when the probability of dropout depends on unobserved sourcesof variation, eg unobserved heterogeneity, which influences also the longitudinal re-sponse, then the dropout process is non ignorable. Molemberghs et al.(2007) show
Random coefficient based dropout models: a finite mixture approach 7
that for every MNAR model, there is an MAR counterpart that produces exactlythe same fit to observed data. This can be more easily understood if we look atprevious score equations that resemble the score equations for univariate mixtureregression models, representing a potential MAR solution. The ML estimates canbe achieved, conditional on w(r)
igl , in subsequent maximization steps. To speed ud theEM algorithm, and to ensure identifiability of a two-level latent structure with onlyone observation level, we may proceed by discretizing w(r)
igl using a MAP rule, as inCEM algorithm (condition choice=”C” in the algorithm below), or by drawing thecomponent indicator z(r)igl from a multinomial distribution using posterior probabili-ties as in SEM algorithms (condition else below), see [11]. In this case, the last thescore equation resembles the one for a polytomous latent class model. The resultingEM algorithm is sketched below.
begininitialize w(0)
igl ,Ψ(0),ε > 0 repeat
update w(t)igl Expectation Step
if (choice=”C”)z(t)igl = 1 ⇐⇒ w(t)
igl = maxr,v w(t)irv; Classification Step
elsedraw z(t)igl with probs given by w(t)
igl ; Stochastic Step
estimate β(t)1 , β
(t)2 , u1, u2 given z(t)igl Maximization Step
estimate π(t)g|h,π
(t)l|h ,τ
(t)h Maximization Step
until Q(·)(t)−Q(·)(t−1) < ε;end
Algorithm 1: Pseudo-code of the proposed SEM-CEM algorithm
4 Conclusions
We have defined a random coefficient based dropout model where the associationbetween the longitudinal and the dropout processes is modeled through discreteoutcome-specific latent effects. A bidimensional representation for the random ef-fect distribution is used with possibly different numbers of locations in each margin,and a full association structure connecting each location in a margin to each locationin the other one. The proposed approach may also be used, for example, in clinicalcontext, where we have only few repeated measurements for subjects. The main ad-vantage of a more flexible representation for the random effects distribution is thatthe general MNAR model properly nests a model where the dropout mechanism isnon informative. This opens to a sensitivity analysis of changes in model parameterestimates as the number of upper level components, M, moves from 1.
8 Alessandra Spagnoli and Marco Alfo
References
1. Aitkin, M., Alfo, M.: Variance component models for longitudinal count data with baselineinformation: epilepsy data revisited. Stat. Comput. 3, 291–303 (2003)
2. Alfo, M., Maruotti, A.: A selection model for longitudinal binary responses subject to non-ignorable attrition. Statist. Med. 28, 2435–2450 (2009)
3. Alfo, M., Rocchetti, I.: A flexible approach to finite mixture regression models for multivari-ate mixed responses. Submitted (2012)
4. Carlin, BP, Louis TA Bayes and Empirical Bayes Methods for Data Analysis, Chapman andHall: New York, 2000.
5. Creemers, A., Hens, N., Aerts, M., Molenberghs, G., Verbeke, G., Kenward,M.: Generalizedshared-parameter models and missingness at random. Statist. Mod. 11, 279–310 (2011)
6. Dunson, D., Chuanhua, X.: Nonparametric Bayes Modeling of Multivariate Categorical Data.J.Am. Statist. Assoc. 104, 1042–1051 (2009)
7. Follmann, D., Wu, M.: An Approximate Generalized Linear Model with Random Effects forInformative Missing Data. Biometrics 51, 151–168 (1995)
8. Gao, S.: A shared random effect parameter approach for longitudinal dementia data withnon-ignorable missing data. Statist. Med. 23, 211–219 (2004)
9. Lagona, F.: Model-based classification of clustered binary data with non ignorable missingvalues. In Proceedings of Italian Statistical Society. CLEUP, Padova (2010)
10. Little, R.J.A.: Modeling the drop-out mechanism in repeated-measures studies. J. Amer.Statist. Assoc. 90, 1112–1121 (1995)
11. McCulloch, C. Maximum likelihood algorithms for generalized linear mixed models. J.Am.Statist. Assoc. 92, 162–170 (1997)
12. Molenberghs, G., Beunckens, C., Sotto, C., Kenward, M. G.: Every missing not at randommodel has got a missing at random counterpart with equal fit. J. Roy. Statist. Soc. Ser. B 70,371–388 (2008)
13. Rizopoulos, D., Verbeke, G., Lesaffre, E., Vanrenterghem, Y.: A two-part joint model for theanalysis of survival and longitudinal binary data with excess zeros. Biometrics 64, 611–619(2008)
14. Scharfstein, D., Rotnitzky, A., Robins, J.: Adjusting for Nonignorable Drop-Out Using Semi-parametric Nonresponse Models. J. Am. Statist. Assoc. 94, 1096–1120 (1999)
15. Song, X., Davidian, M., Tsiatis, A.: semiparametric likelihood approach to joint modeling oflongitudinal and time-to-event data. Biometrics 58, 742–753 (2002)
16. Troxel, AB, Ma, G, Heitjan, DF: An Index f Local Sensistivity to Non Ignorability, Statist.Sinica 14, 1221–1237 (2004)
17. Tsiatis, A., Davidian, M.: An overview of joint modeling of longitudinal and time-to-eventdata. Statist. Sinica 14, 793–818 (2004)
18. Tsonaka, R., Verbeke, G., Lesaffre, E.: A Semi-Parametric Shared Parameter Model to HandleNonmonotone Nonignorable Missingness. Biometrics 65, 81–87 (2009)
19. Verzilli, C.J., Carpenter, J.R.: A Monte Carlo EM algorithm for random-coefficient-baseddropout models. J. Appl. Statist. 29, 1011–1021 (2002)
20. Wang, Y., Taylor, J.: Jointly Modeling Longitudinal and Event Time Data with Applicationto Acquired Immunodeficiency Syndrome. J.Am. Statist. Assoc. 96, 895–905 (2001)
21. Wu, M., Carroll, R.: Estimation and comparison of changes in the presence of informativeright censoring by modeling the censoring process. Biometrics 44, 175–188 (1988)
22. Wu, M., Bailey, K.: Estimation and comparison of changes in the presence of informativecensoring by modeling the censoring process. Biometrics 44, 175–188 (1989)
Bayesian inference for causal effects inrandomized experiments with noncompliance:The role of multivariate outcomes
Fan Li, Alessandra Mattei and Fabrizia Mealli
Abstract Principal Stratification (PS) is a principled framework for addressingnoncompliance issues. Due to the latent nature of principal strata, model-based PSanalysis usually involves weakly identified models and identification of causal ef-fects relies on untestable structural assumptions, such as exclusion restriction. Thisarticle develops a Bayesian approach to exploit multivariate outcomes to sharpen in-ferences for weakly identified models within PS. Simulation studies are performedto illustrate the potential gains in identifiability of jointly modelling more than oneoutcome. The method is applied to evaluate the causal effect of a job search programon depression.
Key words: Bayesian statistics, Causal inference, Principal stratification, Mixturemodels, Multivariate outcome, Noncompliance
1 Introduction
Many randomized experiments suffer from noncompliance, which breaks random-ization, implying that assignment to the treatment rather than the treatment itself israndomly administered to individuals. In the presence of noncompliance, the treat-ment actually received is a post-treatment intermediate variable, which is potentiallyaffected by the assignment and also may affect the response. A standard intention-to-treat analysis gives a valid inference of the effect of assignment on outcome,
Fan LiDepartment of Statistical Science, Duke University, e-mail: [email protected]
Alessandra MatteiDipartimento di Statistica “G. Parenti”, Universita di Firenze e-mail: [email protected]
Fabrizia MealliDipartimento di Statistica “G. Parenti”, Universita di Firenze e-mail: [email protected]
1
2 Li, Mattei, Mealli
but usually the goal is to study the effect of receiving the treatment rather than theassignment.
A principled framework to noncompliance is principal stratification (PS) (Fran-gakis and Rubin, 2002), a generalization of the instrumental variable approach tononcompliance by Angrist et al.(1996) and Imbens and Rubin (1997). While PS isapplicable to a wide range of situations involving intermediate variables, such astruncation by death, mediation, this paper focuses on the special case of noncom-pliance. A PS with respect to the intermediate variable “receipt of the treatment” isa cross-classification of units into latent classes defined by the joint potential com-pliance statuses under both treatment and control. Principal causal effects (PCE),that is, comparisons of potential outcomes under different treatment levels withincompliance principal strata, are in general the causal estimands of primary interestin a PS analysis.
Since at most one potential outcome is observed for any unit, compliance princi-pal strata are generally latent and the key of PS analysis is to address the identifia-bility issue of PCEs. There are two streams of work in the existing literature regard-ing this: (1) deriving nonparametric bounds for the causal effects under minimalstructural assumptions (e.g., Manski, 1990); (2) specifying additional structural ormodelling assumptions, such as exclusion restriction, to identify PCEs and conduct-ing sensitivity analysis to check the consequences of violations to such assumptions(e.g., Schwartz et al., 2012).
Using auxiliary information from covariates to identify causal effects has beenalso discussed (e.g., Jo, 2002). However, the importance of exploiting multiple out-comes is less acknowledged. In fact, information on multiple outcomes is routinelycollected in randomized experiments and observational studies, but it is rarely usedin analysis unless the goal is to study the relationships between outcomes. Excep-tions include Jo and Muthen (2001), Mattei et al. (2012) and Mealli and Pacini(2012). In this article we further investigate the role of multivariate outcomes tosharpen inferences for weakly identified models within PS, proceeding from a para-metric perspective, particularly under the Bayesian paradigm.
The article is organized as follows. Section 2 introduces the PS framework andSection 3 proposes a Bayesian approach to exploit multivariate outcomes to sharpeninferences for weakly identified models within PS. In Section 4, we perform simula-tion studies to examine the benefit from using multivariate outcomes under variousscenarios. In Section 5, we re-analyze the Job Search Intervention Study (JOBS II)using the proposed bivariate approach. Section 6 concludes with a discussion.
2 The principal stratification approach to noncompliance
Discussion of causal inference in this article is carried out under the potential out-come framework, also known as the Rubin Causal Model (RCM) (Rubin, 1978).Consider a large population of units, each of which can potentially be assigned atreatment indicated by z, with z = 1 for treatment and z = 0 for control. A ran-
Multivariate Bayesian inference for causal effects with noncompliance 3
dom sample of n units from this population comprises the participants in a study,designed to evaluate the effect of the treatment on all or a subset of M outcomesY = (Y1, . . . ,YM)′.
Assuming the standard Stable Unit Treatment Value Assumption (SUTVA) (Ru-bin, 1980), for each outcome Ym, we can define two potential outcomes for eachunit, Yim(0) and Yim(1), corresponding to each of the two possible treatment level.For each unit i, let Yi(z) = (Yi1(z), . . . ,Yim(z))′ be the vector of the potential out-comes given assignment z.
In the presence of noncompliance, the actual taking of the treatment is beyondthe control of the researcher, therefore there are also two potential treatment receiptindicators for each unit: Di(0) and Di(1). Let Si = (Di(0),Di(1)) be the joint po-tential treatment outcomes. Applying the idea of principal stratification, units canbe classified into four principal strata according to their compliance behaviour, de-fined by Si: compliers (Si = (0,1) = c); never takers (Si = (0,0) = n); always tak-ers (Si = (1,1) = a); and defiers (Si = (1,0) = d). By definition the principal stra-tum membership Si is not affected by treatment assignment. Therefore, comparisonsof summaries of Y (1) and Y (0) within a principal stratum, the so-called principalcausal effects (PCEs), have a causal interpretation because they compare quantitiesdefined on a common set of units. The causal estimands of interest in this article arethe population principal average causal effects for the first outcome:
τs = E(Yi1(1)−Yi1(0)|Si = s), (1)
for s = c,a,n, where τc is the well-known complier average causal effect (CACE).Since Di(0) and Di(1) are never jointly observed, principal stratum Si is latent.
Specifically, for each unit i and for each post-treatment variable, only one potentialoutcome is observed. Let Zi for i = 1, ...,n be the binary variable indicating whetherunit i is assigned to the treatment (Zi = 1) or to the control (Zi = 0). Then, the ob-served potential outcomes are: Dobs
i = Di(Zi) and Yobsi = Yi(Zi). Let Z,Dobs,Yobs
denote column vectors/matrices of the corresponding unit-level observed variables.The other potential outcomes Dmis
i = Di(1− Zi) and Ymisi = Yi(1− Zi), are miss-
ing. Henceforth, the bold denotes column vectors/matrices of the correspondingunit-level variables. Without loss of generality, we concentrate on the case of twooutcomes (M = 2). Since we focus on randomized experiments, the following as-sumption holds by design:
Assumption 1. Randomization of treatment assignment
Pr(Zi|Di(0),Di(1),Yi(0),Yi(1)) = Pr(Zi).
3 Multivariate Bayesian principal stratification analysis
Following Imbens and Rubin (1997), we model the conditional distribution of thecompliance type: πs = Pr(Si = s), s = a,c,d,n; and the conditional distribution
4 Li, Mattei, Mealli
of potential outcomes given compliance type: f isz = Pr(Yobs
i |Si = s,Zi = z;θθθ s,z),z = 0,1. Let θθθ = (πa,πc,πd ,πn,θθθ s,zs=a,c,d,n;z=0,1) be the parameter vector andlet p(θθθ) denote its prior distribution. Then, the posterior distribution of θθθ can bewritten as
Pr(θθθ |Z,Dobs,Yobs) ∝ p(θθθ) ∏i:Zi=1,Dobs
i =1
[πc f i
c1 +πa f ia1]
∏i:Zi=1,Dobs
i =0
[πn f i
n1 +πd f id1]
∏i:Zi=0,Dobs
i =1
[πa f i
a0 +πd f id0]
∏i:Zi=0,Dobs
i =0
[πn f i
n0 +πc f ic0]
Without additional assumptions, inference on PCEs, τs, though possible and rel-atively straightforward from a Bayesian perspective, can be very imprecise, even inlarge samples, because models are only weakly identified. Jointly modelling multi-ple outcomes may help to reduce uncertainty about the treatment effects on the pri-mary outcomes. Specifically, though additional outcomes do not play extra role inthe compliance model, they can improve the prediction of principal strata member-ship through the outcome model. In addition, some key substantive identifying as-sumptions, such as exclusion restriction (ER), may be more plausible for secondaryoutcomes than the primary one. This condition is referred to as “partial exclusionrestriction (PER)” in Mealli and Pacini (2012):
Assumption 2. Stochastic Partial Exclusion Restriction
Pr(Yi2(0)|S = s) = Pr(Yi2(1)|S = s) f or s = a,n.
Restrictions on secondary outcomes reduce the parameter space of the joint distri-bution of all outcomes and in turn the marginal distribution of the primary one.
In our setting a strong monotonicity assumption holds by design, Di(1)≥ Di(0)and Di(0) = 0 for all i, implying that πd = 0 and πa = 0, so that the population isonly composed of compliers and never-takers. Therefore a simple Bernoulli modelis used for the compliance principal strata membership: Pr(Si = c) = πc, πc ∈ (0,1).
The outcome variables we focus on consists of either two continuous vari-ables or a continuous variable and a binary indicator. For two continuous out-comes, conditional on the principal stratum, we assume a bivariate normal distribu-
tion: Yi(z)|Si = s,∼ N2 (µµµs,z,ΣΣΣ s,z) , where µµµs,z =
(µ
s,z1
µs,z2
)and ΣΣΣ
s,z =
(σ
s,z11 σ
s,z12
σs,z12 σ
s,z22
),
s = c,n; z = 0,1. In the model for a continuous outcome Y1 and a binary out-come Y2, we replace Yi2 in the previous normal model by a latent variable Y ∗i2and assume in addition that Yi2(z) = I(Y ∗i2(z) > 0) with σ
s,z2 = 1. This is equiva-
lent to assuming a generalized linear model with probit link for Y2: Pr(Yi2(z) =1|Si = s) = Φ(µs,z
2 ). The full set of parameters is θθθ = πc,µµµs,z,ΣΣΣ s,z. We as-
sume that parameters are a priori independent. A conjugate prior Beta distribu-tion is used for the compliance principal strata: πc ∼ Beta(α0,β0). Conjugate priordistributions are also used for the parameters of bivariate continuous outcomemodels: ΣΣΣ
s,z ∼ Inv−Wishartν0((Λs,z0 )−1); and µµµs,z|ΣΣΣ s,z ∼ N2(µµµ
s,z0 ,ΣΣΣ s,z/ks,z
0 ). Forcontinuous-binary outcomes, we use semi-conjugate diffused normal prior distri-
Multivariate Bayesian inference for causal effects with noncompliance 5
Table 1 True values of parameters of the six simulation scenarios.
µµµc,0 µµµc,1 µµµn,0 µµµn,1 ΣΣΣc,0
ΣΣΣc,1
ΣΣΣn,0
ΣΣΣn,1
I[
2.58
] [0.56.5
] [2.7512
] [4.2513
] [0.09 0.240.24 1
] [0.01 0.080.08 1
] [0.16 0.160.16 4
] [0.04 0.08
0.082 4
]II
[2.58
] [0.56.5
] [2.7512
] [4.2524
] [0.09 0.240.24 1
] [0.01 0.080.08 1
] [0.16 0.160.16 4
] [0.04 0.120.12 9
]III
[2.58
] [0.56.5
] [2.7524
] [4.2536
] [0.09 0.240.24 1
] [0.01 0.080.08 1
] [0.16 0.960.96 9
] [0.04 0.8
0.8 25
]
butions for the mean parameters, µµµs,z ∼ N2(µµµs,z0 ,ΣΣΣ s,z
0 ). For the covariance matricesΣΣΣ
s,z, there is no conjugate prior, due to the constraint of σ22 = 1. As in Chib andHamilton (2000), we assume a flexible truncated bivariate normal prior for the co-variance parameters σσσ s,z = (σ s,z
11 ,σs,z12 ): σσσ s,z ∼N2(σσσ
s,z0 ,V s,z
0 )IA (σσσ s,z) where σσσs,z0 and
V s,z0 are hyperparameters, A = σσσ s,z :∈ℜ2 : σ
s,z11 > (σ s,z
12 )2 is the region where ΣΣΣ
s,z
is a positive definite matrix, and IA is the indicator function taking the value one ifσσσ s,z is in A and the value zero otherwise.
The joint posterior distribution, Pr(θθθ ,Dmis|Yobs,Dobs,Z), is obtained from aMarkov Chain algorithm that uses the Data Augmentation method (Tanner andWong 1987) to impute at each step the missing indicators and to exploit completecompliance data posterior distributions to update the parameter distribution. .
4 Simulations
To assess the improvement in the estimation of the PCEs by exploiting multivari-ate outcomes, we conduct simulation studies to compare the posterior inferencesobtained by jointly modelling two outcomes with those by only one outcome. Con-sistently with the JOBS II data, the primary outcome is considered to be “depressionscore”, measured with a 5-point rating scale (1 = not at all distressed to 5 = extremelydistressed). To simplify the computation, we focus on two continuous outcomes, us-ing alcohol use (in percent) as auxiliary outcome in the simulation study.
Here we present simulation results under three different scenarios, accounting fordifferent correlation structures between the outcomes for compliers and never-takersand various deviations from the ER for the secondary outcome. The true simulationparameters are shown in Table 1 and all simulated data sets have N = 600 sampleunits, generated using principal strata probabilities of 0.7 for compliers and 0.3for never-takers. The simulated samples are randomly divided in two groups, halfassigned to the treatment and half to the control.
Figure 1 shows the histograms and 95% posterior intervals of the PCEs for com-pliers and never-takers on the primary outcome, in both the univariate and bivari-ate cases. The results clearly demonstrate that simultaneous modelling of both out-
6 Li, Mattei, Mealli
Principal Causal Effects for CompliersI II III
−2.3 −2.2 −2.1 −2.0 −1.9 −1.8 −2.3 −2.2 −2.1 −2.0 −1.9 −1.8 −2.3 −2.2 −2.1 −2.0 −1.9 −1.8
Principal Causal Effects for Never-takersI II III
1.1 1.3 1.5 1.7 1.9 2.1 1.1 1.3 1.5 1.7 1.9 2.1 1.1 1.3 1.5 1.7 1.9 2.1
True Value Univariate Approach Bivariate Approach
Fig. 1 Simulation Results: Histograms and 95% Posterior Intervals of PCEs for Compliers andNever-Takers.
comes significantly reduces posterior uncertainty for the causal estimates, providingconsiderably more precise estimates of the PCEs for compliers and never-takers.
In addition, the histograms in the upper and lower panels of Figure 1 suggest thatthe posterior distributions of the PCEs are much more informative in the bivariatecase. Specifically, histogram (I)s show that the posterior distributions of the PCEsfor compliers and never-takers are somewhat flat in the univariate approach, butbecome much tighter in the bivariate case. The improvement is even more dramaticin scenario (II) and (III), where the histograms show that posterior distributionsof the PCEs for compliers and never-takers are bimodal in the univariate case, butboth become unimodal in the bivariate case. Biases and MSEs (based on posteriormean) were also calculated (not shown here) and suggest that jointly modellingthe two outcomes reduces the average biases by more than 64% and the MSEs bymore than 79% in these scenarios. Several other scenarios with additional structuralassumptions were also examined: magnitude of the improvement varies, and thepattern is consistent with what is described here.
5 Application to the JOBS II study
The Job Search Intervention Study (JOBS II) (Vinokur et al., 1995) is a randomizedfield experiment intended to prevent poor mental health and to promote high-qualityreemployment among unemployed workers. The intervention consisted of 5 half-
Multivariate Bayesian inference for causal effects with noncompliance 7
Table 2 Posterior Distributions of PCEs on Depression for Compliers and Never-takers.
Bivariate Approach Univariate ApproachWithout PER With PER Without ER With ER
τc τn τc τn τc τn τc
Mean −0.135 −0.192 −0.211 −0.110 −0.207 −0.097 −0.269SD 0.157 0.176 0.196 0.229 0.178 0.207 0.1702.5% −0.486 −0.526 −0.620 −0.587 −0.573 -0.532 −0.62150% −0.122 −0.197 −0.200 −0.100 −0.201 -0.086 −0.26297.5% 0.143 0.179 0.144 0.306 0.123 0.281 0.045Width PCI0.95 0.629 0.706 0.764 0.893 0.696 0.812 0.666
Note that PER in the bivariate model is for remployment, whereas ER in the univariate model isfor depression.
day job-search skills seminar. The control condition consisted of a mailed bookletbriefly describing job-search methods and tips. Our analysis focuses on a sample of398 subjects who were at high-risk of depression.
Since the treatment condition is only available to the individuals assigned to theintervention in JOBS II, there is no defiers and always-takers. Noncompliance arisesin JOBS II because a substantial proportion (45%) of individuals invited to partic-ipate in the job-search seminar did not show up to the intervention. Our focus ison estimating causal effects of the intervention on a depression score measured sixmonths after the intervention, relaxing ER, but using reemployment status as sec-ondary outcome. ER on depression may be controversial, because, for example,never-takers randomized to the intervention might feel more demoralized by inabil-ity to take advantage of the opportunity. We use reemployment status as secondaryoutcome.
Table 2 reports summaries of the posterior distributions of PCEs for compliersand never-takers on depression in the bivariate (columns 1 through 4) and univariate(columns 5 through 7) case. Although the benefits of the bivariate approach are notpronounced, jointly modelling the two outcomes improves inference: the bivariateapproach (without PER) provides more precise estimates of the PCEs for compliersand never-takers, and tighter 95% posterior credible intervals. Also it is worth notingthat the bivariate approach leads to posterior distributions of τc and τn cantered atdifferent means and medians. In the light of the simulation results, which show thatjointly modelling two outcomes generally reduces the average biases, these findingsmake the bivariate estimates more faithful, suggesting that the univariate estimatesmay be affected by larger biases.
6 Discussion
We develop a Bayesian parametric bivariate model that exploits multiple outcomesof different types to improve the estimation of weakly identified causal estimands.
8 Li, Mattei, Mealli
Although we focus on randomized experiments with noncompliance, our approachis immediately applicable to casual inference problems with alternative confoundedpost-treatment variables, and also in observational studies, whenever the exclusionrestriction assumptions for the instrument are often questionable.
Our approach has several benefits. First, the Bayesian approach provides a refinedmap of identifiability, clarifying what can be learned when causal estimands are in-trinsically not fully identified, but only weakly identified. Second, in a Bayesiansetting, the effect of relaxing or maintaining assumptions (regardless of structuralor modelling) can be directly checked by examining how the posterior distributionsfor causal estimands change, therefore serving as a natural framework for sensitivityanalysis. Third, the use of multiple outcomes improves model identifiability, leadingto smaller posterior variance of the parameters. However the additional informationprovided by secondary outcomes is obtained at the cost of having to specify morecomplex multivariate models, which may increase the possibility of misspecifica-tion. Therefore, model checking procedures to ensure sensible model specificationis a valuable topic for future research.
References
1. Angrist, J.D., Imbens, G. W., Rubin, D.B.: Identification of causal effects using instrumentalvariables. Journal of the American Statistical Association, 91, 444–455 (1996)
2. Chib, S., Hamilton, B.H.: Bayesian analysis of cross-section and clustered data treatmentmodels. Journal of Econometrics, 97, 25–50 (2000)
3. Frangakis, C.E., Rubin, D.B.: Principal stratification in causal inference. Biometrics, 58, 191–199 (2002)
4. Imbens, G. W., Rubin, D. B.: Bayesian inference for causal effects in randomized experimentswith noncompliance. Annals of Statistics, 25, 305–327 (1997).
5. Jo, B.: Estimation of intervention effects with noncompliance: Alternative model specifica-tions. Journal of Educational and Behavioral Statistics. 27, 385–420 (2002).
6. Jo, B., Muthen, B.: Modeling of intervention effects with noncompliance: a latent variable ap-proach for randomized trials. In G. Marcoulides, R. Schumacker (Eds.), New developmentsand techniques in structrual equation modeling, 57–87. Lawrence Erlbaum Associates, Pub-lishers. Mahwah, New Jersey (2001).
7. Manski, C.: Nonparametric bounds on treatment effects. The American Economic Review,80, 319–323 (1990).
8. Mattei A., Mealli F., Pacini B.: Exploiting Multivariate Outcomes in Bayesian Inferencefor Causal Effects with Noncompliance. In Studies in Theoretical and Applied Statistics(SIS2010 Scientific Meeting). Forthcoming (2012).
9. Mealli, F., Pacini, B.: Using secondary outcomes and covariates to sharpen inference in ran-domized experiments with noncompliance. Technical report, Department of Statistics, Uni-versity of Florence (2012).
10. Rubin, D.B: Comment on ‘Randomization analysis of experimental data: The Fisher random-ization test’ by D. Basu. Journal of the American Statistical Association, 75, 591–593 (1980).
11. Schwartz, S., Li, F., Reiter J.: Sensitivity analysis for unmeasured con- founding in principalstratification. Statistics in Medicine. In press (2012).
12. Tanner, M., Wong, W.: The calculation of posterior distributions by data augmentation (withdiscussion). Journal of the American Statistical Association, 82, 528–550 (1987).
13. Vinokur, A., Price, R., Schul, Y.: Impact of the jobs intervention on unemployed workers vary-ing in risk for depression. Journal of American Community Psychology, 23, 39–74 (1995).
Unconditional and Conditional QuantileTreatment Effect: Identification Strategies andInterpretations
Margherita Fort
Abstract This paper reviews strategies that allow one to identify the effects of pol-icy interventions on the unconditional or conditional distribution of the outcome ofinterest. This distiction is irrelevant when one focuses on average treatment effectssince identifying assumptions typically do not affect the parameter’s interpretation.Conversely, finding the appropriate answer to a research question on the effects overthe distribution requires particular attention in the choice of the identification strat-egy. Indeed, quantiles of the conditional and unconditional distribution of a randomvariable carry a different meaning even if identification of both these set of param-eters may require conditioning on observed covariates.
Key words: impact heterogeneity, quantile treatment effects, rank invariance
1 Introduction
In the recent years there has been a growing interest in the evaluation literature formodels that allow essential heterogeneity in the treatment parameters and more gen-erally for models that are informative on the impact distribution. The recent increasein the attention devoted to the identification and estimation of quantile treatment ef-fects (QTEs) is due to their intrinsic ability to characterize the heterogenous impactof the treatment on various points of the outcome distribution. QTEs are informativeabout the impact distribution when the potential outcomes observed under variouslevels of the treatment are comonotonic random variables. The variable describingthe relative position of an individual in the outcome distribution thus plays a specialrole in this setting, representing at the same time the main dimension along whichtreatment effects are allowed to vary as well as a key ingredient to relate poten-tial outcomes. Several identification approaches currently used in the literature for
Department of Economics, University of Bologna, IZA and CHILD; Piazza Scaravilli 2 Bologna,e-mail: [email protected]
1
2 Margherita Fort
the assessment of mean effects have thus been extended to quantiles. Most of thesestrategies require to condition on a set of variables to achieve identification. Whileconditioning on a set of observed regressors does not affect the interpretation of theparameters in a mean regression, this is not the case for quantiles. The law of iter-ated expectations guarantees that the parameters of a mean regression have both aconditional and an unconditional mean interpretation. This does not carry over toquantiles, where conditioning on covariates affects the interpretation of the residualdisturbance term. Indeed, since quantile regression allows one to characterize theheterogeneity of the treatment response only along this latter dimension, condition-ing on covariates in quantile regression generally affects the interpretation of theresults.This paper reviews strategies aimed at identifying quantile treatment effects, cover-ing strategies that deal with the identification of conditional and unconditional quan-tile treatment effects with particular attetion to cross-sectional data applications inwhich the treatment is endogenous without conditioning on additional covariates.The aim of the paper is to provide useful guidance for users of quantile regressionmethods in choosing the most appropriate approach while addressing a specific re-search question.The remainder of the paper is organized as follows. After introducing the basic no-tation and the key parameters of interest in Section 2, Section 3 reviews solutionsto the identification of quantile treatment effects. The review covers strategies thatare appropriate only when the outcome of interest is a continuous variable, i.e. incases where the quantiles of the outcome distribution are unambiguosly defined. Itconcludes illustrating some of the methods through two illustrative examples aimedat assessing the distributional impacts of training on earnings and of education onwages. Section 4 concludes.
2 What Are We After: Notation and Parameters of Interest
In this section I first introduce the notation used throughout the paper and then definethe objects whose identification is sought.
Y denotes the observed outcome, D the intensity of the treatment received and Wa set of observable individual characteristics. W may include exogenous variablesX and instruments Z.1 Y is restricted to be continuous while D,W can be eithercontinuous or discrete random variables. Both Y and D can be decomposed in twocomponents: one of which is deterministic and one of which is stochastic. These twocomponents need not be additively separable. The stochastic components accountfor differences in the distribution of D and Y across otherwise identical individuals.The econometric models reviewed in Section 3 place restriction on : i) the scale ofD; ii) the number of independent sources of stochastic variation in the model; iii)the distribution (joint, marginal, conditional) of these stochastic components and Dor W ≡ (X ,Z); iv) the scale of Z. Y d
i denotes the potential outcome for individual iif the value of the treatment is d: it represents the outcome that would be observed
1 Capital letters denote random variables and lower case letters denote realizations.
Unconditional and Conditional QTEs 3
had the individual i been exposed to level d of the treatment. FY d (·), fyd (·) andF−1
Y d (·) = q(d, ·) denote the corresponding cumulative distribution and density func-tion and the quantile function. The conditional distribution and conditional quantileare denoted by FY d (·|x) and F−1
Y d (·|x) = q(d,x, ·).We are interested in characterizing the dependence structure between Y and D
eventually conditioning on a set of covariates W in the presence of essential hetero-geneity and in the absence of general equilibrium effects. Knowledge of the jointdistribution (Y d)d∈D or the conditional joint distribution (Y d|x)d∈D would allow tocharacterize a distribution for the outcome for any possible level of the treatment.When potential outcomes are comonotonic, they can be described as different func-tions of the same (single) random variable and quantile treatment effects (QTEs)are informative on the impact distribution. The potential outcome could be writtenas yd = q(d,u),u˜U (0,1), q(d,u) is increasing in u as is refereed in the litera-ture as the structural quantile function. If the potential outcomes are not comono-tonic, QTEs are informative on the distance between potential outcomes distribu-tions, which may be interesting per se, but not on the impact distribution. We thusconcentrate on strategies that focus on QTEs.2 In the binary case, QTEs (see equa-tion (2)) are defined as the horizontal distance between the distribution function inthe presence and in the absence of the treatment ([9]; [15]) .3
δ (τ) = F−1Y 1 (τ)−F−1
Y 0 (τ) 0 < τ < 1 (2)
We can distinguish conditional and unconditional quantile treatment effects by char-acterizing the uniformly distributed random variable that describes the quantile ofthe outcome variable. This distinction becomes clearer if we think about a specificempirical example.Motivating Example: Returns to Education or TrainingThere is a large literature that studies the returns to education. Key questions inthis literature (e.g. does additional education cause wage increase? does additionalschooling increase wages more for the more able than for the less able? does ad-ditional schooling increase or decrease wage inequality?) can be addressed usingquantile regression methods. In this applications, the treatment is likely endogenousin the outcome equation without conditioning on additional covariates: typically re-searchers seek instruments that allow to isolate exogenous variation in education inthe wage equation. Suppose we could measure the individual ability ai that drivesthe endogeneity of education in the wage equation. Now, consider the alternativespecifications for the wage model presented in equation (3), (4) where D denotesschooling (the treatment).
2 The review will not cover strategies that focus on other objects and may deliver QTEs as byprod-uct such as [8], for instance.3 In the continuous case δ (τ) represents the change in Y induced by a change in D from d to d+ε
when ε is small.
δ (τ) =∂QY (τ|d)
∂d0 < τ < 1 (1)
4 Margherita Fort
Yi = α0( f (εi,ai))+α1( f (εi,ai))D (3)
Yi = β0(εi)ai +β1(εi)D (4)
These specifications differ because they impose different structures of the variablesgoverning the heterogeneity in the returns to education. In equation (3) the relativeposition of an individual in the wage distribution is determined by (εi,ai), i.e. byboth an unobserved uniformly distributed error component εi and by the observedindividual ability level while in equation (4) the relative position of the individualis only determined by εi. In both cases, we can think about the relative position ofan individual in the wage distribution as his/her proneness ([9]) to earn a high wagefor a given level of schooling D. However, in model (3) we would refer to the totalproneness/ability while in model (4) we would be speaking only about unobservedproneness/ability.4 Using model (3) we can explore whether the returns to educationvary depending on the individuals’ total ability levels while using model (4) we canstudy how the returns to education vary for given observed ability levels. Individ-uals who earn high wages conditional on some specific level of ability may not bethe same individuals who earn high wages in the sample. However conditioning onobserved ability maybe important to be able to isolate the causal effect of schoolingD on the distribution of wages Y . Equation (5) and Equation (6) represent the struc-tural quantile function corresponding to model (3) and (4) respectively5: equation(5) is an example of an unconditional quantile regression model while equation (6)is an example of a conditional quantile regression model. This distinction might beempirically relevant since, in general, for a given τ ∈ (0,1), α1(τ) 6= β1(τ).
f (ε,a)≡ ε∗,ε∗˜U (0,1) QY (τ|d) = α0(τ)+α1(τ)d (5)
ε˜U (0,1) QY (τ|d) = β0(τ)ai +β1(τ)d (6)
Table 1 Moment conditions under assumptions in [2] and [11]
Quantile conditional unconditionalY1 E[1(Y < q(1,x))− τ ·wy,d,x ·D|X] = 0 E[1(Y < q(1))− τ ·wy,d ·D] = 0Y0 E[1(Y < q(0,x))− τ ·wy,d,x · (1−D)|X] = 0 E[1(Y < q(0))− τ ·wy,d · (1−D)] = 0weight 1− D[1−P(Z=1|Y,D,X)]
1−P(Z=1|X) − (1−D)P(Z=1|Y,D,X)]P(Z=1|X) E[ Z−P(Z=1|X)
P(Z=1|X)[1−P(Z=1|X)] |Y,D](2D−1)
Note: Positive weights are reported. See [2] and [11] for other definitions of weights.
3 Identification Strategies and Estimation
In cross-sectional applications, two main identification approaches have been ex-tended to QTEs: strategies based on the unconfoundedness assumption and strate-gies based on the availability of an instrumental variable. In the first case, the re-searcher must be willing to assume that the joint distribution of the potential out-
4 To the best of my knowledge, [17] is the first to distinguish between total and observed proneness.5 Under comonotonicity of potential outcomes, the structural quantile function describes the linkbetween potential outcomes.
Unconditional and Conditional QTEs 5
comes is independent of the treatment conditional on a set of exogenous covariates.Under this assumptions, conditional QTEs can be estimated as originally proposedby [14] and unconditional QTEs can be estimated as proposed by [10]. [2] and [6],[7] propose identifying assumptions for conditional quantiles when an instrumentalvariable is available. The assumptions of [2] guarantee identification of conditionaland unconditional QTEs when the treatment is binary and endogenous and a binaryinstrument is available. They lead to the moment conditions described in Table 1: inboth cases, identification relies on previous results ([1], [13]) that guarantee that inthe subpopulation of compliers comparisons by treatment D, conditional on X , havea causal interpretation. Recall that compliers are individuals whose treatment sta-tus is affected by the instrument Z but that this sub-population cannot be identifieddirectly from the data, because it is defined by means of potential outcomes. Themoment conditions highlight that is possible to construct weights that ’find compli-ers in the population in an average sense’ ([1]). The weights will differ when one isinterested in the conditional or in the unconditional quantiles. Only the weights con-sidered in the second case ’simultaneously balance the distribution of the covariatesbetween treated and non-treated compliers’ ([12]). In both cases weights are func-tions of P(Z = 1|X) and observed variables. Estimation thus proceeds in two steps:1) weights are estimated; 2) weighted quantile regressions are run.6. Estimation re-quires two steps also under the identification strategy proposed by [6], [7] and [17],[18] but does not involve re-weighting. The crucial assumption for identification inthe approach by [6] is rank invariance or rank similarity, i.e. we require that theindividual’s rank in the potential outcome distribution, conditional on exogenouscovariates, is not systematically affected by the treatment. The assumptions by [6]lead to the moment condition in equation (7). Equation (7) suggests an estimationprocedure that first requires to compute the conditional quantiles of the random vari-able Y −q(d,x,τ) given X and Z; then, choose as estimate of q(d,x,τ) the one thatminimizes the absolute value of the coefficient associated with Z in the first step.7
Pr[Y −q(d,x,τ)≤ 0|X ,Z] = τ. (7)
The instrumental variable approach for the identification of unconditional QTEsproposed by [17] delivers the moment condtions in equation (8)
E[Z1(Y≤ q(d,τ))− τX] = 0,τX ≡ P[Y≤ q(d,τ)|X]. E[1(Y≤ q(d,τ))− τ] = 0.(8)
These moment conditions reflect the idea that, first, the instrument Z does not affectthe distribution of the disturbance once X is controlled for and, second, the jointdistribution of X and the disturbance is unrestricted. Estimation involves first anestimation of the quantiles of Y −q(d,τ) given X and Z and τX ; then, a second step
6 When identification is achieved relying on uncounfoundedness, the moment conditions are simi-lar but the weights are identically 1 for conditional quantiles ([14]) and are D
P(D=1|X) +1−D
1−P(D=1|X)
for unconditional quantiles ([10]).7 This approach can be used when the treatment and instrument are binary, discrete as well ascontinuous.
6 Margherita Fort
choose as estimate of q(d,τ) the value that minimizes the coefficient of Z averagingover all possible values of X .
We now apply these strategies to two illustrative examples taken from the lit-erature. Table 2 reports estimates of the effect of training (or education) on theconditional and unconditional distribution of earnins (or log wages) using data ofmales from [2] and data from [5], respectively.8 Column (1) and (2) reports resultsdelivered when training or education are treated as exogenous in the estimation ofconditional and unconditional quantiles respectively. Column (3) and (4) report esti-mates that address the endogeneity of training or education in the outcome equationrelying on [2]. These estimates apply to the sub-population of compliers. Column(5)-(8) report estimates based on [6] or [17]. These approaches guarantee globalidentification of conditional and unconditional QTEs. We discuss the top-panel es-timates first: in the example from [2] the treatment assignment is randomized thuscovariates are not needed for identification. Indeed, under both the identification ap-proaches considered, training effects on the conditional and unconditional quantilesdo not exhibit substantial differences in magnitude and all suggest that the effectof training is larger at the top of the earnings distribution.9 In addition, both theidentification strategies deliver similar results, suggesting that key assumptions areunlikely to be violated in both cases. Let’s now turn to the estimates in the bottompart of the table. In this example, covariates are needed for identification: we needto control for country specific secular trends in education and differences acrosscountries in the levels of education and wages to be able to isolate the exogenousvariation in education induced by school reforms. In this example, addressing en-dogeneity seems to have relevant consequences: the estimates in column (1) and(2) suggest that returns are increasing over the wage distribution, while estimatesin columns (3) suggest the opposite -although precision of these estimates is low-and in column (4) we find no evidence of heterogeneity.10 Estimates of conditionalQTEs under rank invariance are reported in column (5); estimates of unconditionalQTEs in column (6) assume rank invariance and do not use covariates for identifica-tion. The estimates in column (5) are unrealistic and suggest that rank invariance is
8 In the second example, only reforms that increased compulsory schooling for 3 years are con-sidered (i.e. only Greece, Italy and Finland) and the original treatment (years of education) andinstrument (years of compulsory schooling) were recoded to binary. Estimates of column (1), (2),(3), (4) have been computed by the author using the STATA package ivqte by [12], except col-umn (3) for the first example (taken from the article). Estimates in column (1) replicate originalresults in the papers except that standard errors are now robust to heteroskedasticity; estimates ofcolumns (5)-(8) are taken from [18] for the AAI02 example and obtained using the STATA packageivqreg by Do Wan Kwack available from Christian Hansen’s research page.9 When endogeneity of training is addressed, point estimates of the returns to training are gener-ally lower in the unconditional distribution with respect to the returns observed holding race, age,education and marital status fixed.10 In this example, we look at the effect of three more additional years of schooling on wages.Assuming linearity and dividing point estimates reported by three, the results in columns (1)-(3)are fairly consistent with the literature: association is lower that causal effects; causal estimatesuggest a return between 10% and 4% for each additional year of education.
Unconditional and Conditional QTEs 7
unlikely to hold. Estimates in column (6) are negative and confirm that controllingfor covariates is necessary for identification.
Table 2 Effect of Training on the Conditional and Unconditional Distribution of Earnings ([2],males only) and Effect of Education on the Conditional and Unconditional Distribution of LogWages ([5], males, Italy Greece and Finland,treatment and instrument recoded to binary)
Exogenous Training Endogenous TrainingStrategy Monotonicity Rank Invariance
Conditional Unconditional Conditional Unconditional Conditional Unconditional(1) (2) (3) (4) (5) (6) (7) (8)
q KB78 F07 AAI02 FM10 CH08 CH08 w/o controls P11 logit P11 probitEffect of Training on Earnings, Abadie et al. (2002) Obs. 5102
0.25 2510 3058 702 414 530 200 100 100417)∗∗∗ (377)∗∗∗ (670) (754) (629) (746) (753) (750)
0.50 4420 4678 1544 1291 310 1320 790 790(613)∗∗∗ (771)∗∗∗ (1074) (1239) (1101) (1234) (1151) (1161)
0.75 4678 4626 3131 2457 2660 1710 1490 1490(901)∗∗∗ (1056)∗∗∗ (1376)∗∗ (1650) (1845) (1712) (1542) (1530)
0.85 4806 5532 3378 3971 3190 3580 3410 3410(1045)∗∗∗ (1241)∗∗∗ (1811)∗ (1886)∗∗ (1185)∗∗ (1427)∗∗ (1542)∗ (1550)∗
Effect of Education on Log Wages, Brunello et al. (2009) Obs. 22920.30 0.168 0.223 0.303 0.514 0.836 -0.198
(0.024)∗∗∗ (0.064)∗∗∗ (0.142)∗∗ (16.48) (0.063)∗∗∗ (0.033)∗∗∗ - -0.50 0.177 0.208 0.328 0.521 0.985 -5.119
(0.024)∗∗∗ (0.062)∗∗∗ (0.126)∗∗∗ (5.95) (0.063)∗∗∗ (0.124)∗∗∗ - -0.75 0.213 0.297 0.154 0.599 1.868 0.996
(0.026)∗∗∗ (0.072)∗∗∗ (0.168)∗∗∗ (10.13) (0.998)∗∗ (0.037)∗∗∗ - -
Legend: Column labels refer to the estimation method. KB78: as in [14]; F07: as in [10]; AAI02: as in [2]; FM10: as in[11], [12]; CH08: as in [7]; P11: as in [18].
4 Conclusions
In this paper, I reviewed approaches that guarantee the identification of quantiletreatment effects (QTEs). In many cases, these approaches correspond to extensionsof strategies conventionally used in linear regression models (selection on observ-ables, instrumental variables, fixed effects) to quantile regressions. An importantconsequence of the difference between the statistical tools applied in these two set-tings is that the interpretation of treatment parameters differs between conditionaland unconditional quantile regressions, while, conversely, the law of iterated ex-pectations guarantees that the treatment parameter in a linear regression as both aconditional and an unconditional mean interpretation. It is crucial to bear this inmind while using QTEs to answer a specific research question. Consider the recentproposal of [3] to link educator compensation to the ranks of their students withinwhat the author call an appropriately defined comparison sets. The authors’ suggestto employ methods in [4] to contrast actual ranks of students of a given teacher withsome predicted counterfactual rank. Betebenner ([4]) however employs conditionalquantile regression methods aimed specifically at answering questions like Are therestudents with unusually low growth who need special attention?, i.e. a value-addedspecification of the achievement. Barlevy and Neal ([3]) instead look for a methodthat allows to isolate the teachers contribution to a student rank in the achievementdistribution in a given period, eventually conditioning on covariates for identifica-
8 Margherita Fort
tion. In other words Barlevy and Neal would like to avoid attributing to a teacher thechanges in perfomance of a student that are only due to his initial proficiency level.Standard value added specifications for students’ achievement in quantile regres-sion context are not the appropriate instrument to address questions about the het-erogeneity in students’ achievement depending on their initial ability level. Thosequantile regression describe instead how students experiecing the largest gains inperformance over a given time period perform relative to students experiencing thelowest gains in the same period. Cross-sectionally, some of the high-gain studentsmay be in the lower part of the test score distribution.11
Acknowledgements This paper benefited from comments by E. Rettore, B. Pacini and F. Mealli.Financial support of MIUR- FIRB 2008 project RBFR089QQC-003-J31J10000060001 grant isgratefully acknowledged.
References
1. Abadie, A. : Semiparametric instrumental variable estimation of treatment response models.Journal of Econometrics 113, 231-263 (2003)
2. Abadie, A. Angrist, J. and Imbens, G.: Instrumental variables estimates of the effect of subsi-dized training on the quantiles of trainee earnings. Econometrica 70, 91-117 (2002)
3. Barlevy, G. and Neal, D. : Pay for Percentile,NBER Working Paper 17194 (2010)4. Betebenner, D. W.: Norm and Criterion-Referenced Student Growth. Educational Measure-
ment: Issues and Practice, 28(4) 42-51 (2009)5. Brunello, G. and Fort, M. and Weber, G.: Changes in Compulsory Schooling, Education and
the Distribution of Wages in Europe. Economic Journal vol. 119(536), 516-539 (2009)6. Chernozhukov, V. and Hansen, C. : An IV model of quantile treatment effects. Econometrica
73, 245-261 (2005)7. Chernozhukov, V. and Hansen, C. : Instrumental variable quantile regression: A Robust Infer-
ence Approach.Journal of Econometrics 142(1), 379-398 (2008)8. Chesher, A. : Identication in nonseparable models. Econometrica 71, 1405-1441 (2003)9. Doksum, K.: Empirical probability plots and statistical inference for nonlinear models in the
two-sample case. The Annals of Statistics 2, 267-277 (1974)10. Firpo, S.: Efficient semiparametric estimation of quantile treatment effect. Econometrica 75,
259-276 (2007)11. Froelich, M. and Melly, B. (2010a) Unconditional Quantile Treatment Effects Under Endo-
geneity. IZA DP 3288 (2010)12. Froelich, M. and Melly, B. : Estimation of Quantile Treatment Effects with STATA. The Stata
Journal 10(3), 423-457 (2010b)13. Imbens,G. and Rubin,D.: Estimating the Outcome Distribution for Compliers in Instrumental
Variables Models. Review of Economic Studies 64, 555-574 (1997)14. Koenker, R. and Bassett, G.S.: Regression Quantiles. Econometrica 46, 33-50 (1978)15. Lehmann, E.H. : Nonparametrics: Statistical Models Based on Ranks. San Francisco,CA
(1974)16. Powell, D.: Unconditional Quantile Regression for Panel Data with Exogenous or Endogenous
Treatment Variables. RAND Working Paper No. WR-710 (2010a)17. Powell, D.: Unconditional Quantile Treatment Effects in the Presence of Covariates. RAND
Working Paper No. WR-816 (2010b)18. Powell, D.: Unconditional Quantile Regression for Exogenous or Endogenous Treatment Vari-
ables. RAND Working Paper No. WR-824 (2011)
11 A similar point was made by [16] in his discussion of the analysis of the effect of vouchers onstudent achievements.
Dealing with complex problems of confoundingin mediation analysis
Stijn Vansteelandt
Abstract Mediation analysis is frequently utilized in diverse scientific fields such aspsychology, sociology and epidemiology, to develop insight into the causal mech-anism whereby an exposure affects an outcome. It concerns the study of indirecteffects of that exposure that are mediated through a given intermediate variable ormediator, and/or the study of the remaining direct effect. Despite its popularity, thetraditional approach to mediation analysis proceeds in a predominantly heuristicfashion, which can largely be ascribed to the lack of precise definitions of direct andindirect effect in the traditional mediation analysis literatures. Moreover, problemsof confounding bias have been largely ignored.James Robins, Sander Greenland and Judea Pearl laid the foundations for a rigor-ous approach towards mediation analysis, which is based on counterfactuals. Theygave precise definitions of direct and indirect effect and elucidated the kind of datathat must be collected in order to control for confounding bias. In addition, theyprovided generic ways to decompose a total effect into a direct and indirect effectthat is not tied to a specific statistical model. In this presentation, after a brief re-view of some of these developments, I will concentrate on the - partly unsolved- methodological challenges that arise when confounders of the mediator-outcomeassociation are affected by the exposure. In particular, I will present results on thethe identification of (natural) direct and indirect effects in such settings, and on theestimation of (controlled) direct effects, thereby focussing on matched case-controlstudies and/or survival analysis.
Key words: causal inference, direct effect, G-estimation, indirect effect, interme-diate confounding, mediation, time-varying confounding
Stijn VansteelandtGhent University, Department of Applied Mathematics and Computer Science, Krijgslaan 281, S9,9000 Gent, Belgium, e-mail: [email protected]
1
2 Stijn Vansteelandt
1 Introduction
For many decades, scientists from diverse scientific fields - most notably, psychol-ogy, sociology and epidemiology - have been occupied with questions as to whetheran exposure affects an outcome through pathways other than those involving a givenmediator or intermediate variable. The answer to such questions is of interest be-cause it brings insight into the mechanisms that explain the effect of exposure onoutcome [12]. Mediation analyses are used for this purpose. They attempt to sep-arate so-called ‘indirect effects’ from ‘direct effects’. The former term is typicallyused in a loose sense to designate that part of an exposure effect which arises indi-rectly by affecting a (given) set of intermediate variables; the latter then refers to theremaining exposure effect.
In traditional mediation analysis, the direct effect is commonly connected withthe residual association between outcome and exposure after adjusting for the medi-ator(s); the indirect effect is then obtained through a combination of the exposure’seffect on the mediator and the mediator’s effect on the outcome [1, 5]. For instance,when the associations between exposure A and mediator M and outcome Y can bemodeled through linear regressions as
E(Y |A,M) = β0 +βaA+βmM
E(M|A) = α0 +αaA,
then βa is commonly interpreted as a direct effect and βmαa as an indirect effect [1].It is well known from the causal inference literature that these interpretations areoften not justified as a result of confounding of the mediator-outcome association[9, 3]. Even when confounders L of this association have been measured, standardregression adjustment is not applicable when - as often - some of these confoundersare themselves affected by the exposure, in which case we say that there is interme-diate or time-varying confounding [4, 10, 13]. Furthermore, decomposition of a totaleffect into a direct and indirect effect becomes subtle when certain nonlinear asso-ciations exist between mediator and outcome [9, 7], e.g. when a logistic regressionmodel for a dichotomous outcome is adopted [11].
Robins and Greenland [9] and Pearl [7] introduced model-free definitions of di-rect and indirect effect. Unlike the foregoing development due to Baron and Kenny[1], their formalism of so-called natural direct and indirect effects can therefore ac-commodate nonlinear associations between mediator and outcome. Natural directand indirect effects are defined in terms of so-called composite or nested counter-factuals such as Y (a,M(0)), which denotes the counterfactual outcome that wouldhave been observed if the exposure A were set to a and the mediator M to the valueM(0) that it would have taken at some reference exposure level 0. Because suchcomposite counterfactuals are unobservable when a 6= 0, strong assumptions mustbe imposed for identification. The development of Robins and Greenland [9] pre-cludes the existence of moderation, i.e. exposure effect modification by the mediatoron the additive scale; it precludes such moderation even at the unit level. The de-velopment of Pearl [7] precludes the possibility of intermediate confounding of the
Dealing with complex problems of confounding in mediation analysis 3
mediator-outcome association. This places severe restrictions on the range of realis-tic applications that can be addressed. In fact, the prior absence of methodology todeal with intermediate confounding has been one of the difficulties with the causalinference literature on mediation.
This presentation will primarily focus on this problem of intermediate confound-ing in mediation analysis. First, I will consider the problem of estimating so-calledcontrolled direct effects in the presence of exposure-induced confounding of the as-sociation between mediator and outcome. I will thereby focus on diverse settingslike survival analysis and the analysis of matched case-control studies. Next, I willpropose novel results on the identification of natural direct and indirect effect in thepresence of intermediate confounding.
Fig. 1 Causal diagram with exposure A, mediator M, outcome Y , intermediate confounder L, andwith U an unmeasured confounder of the L-Y relationship.
2 The problem of intermediate confounding in mediationanalysis
The causal diagram of Figure 1 displays a setting with intermediate confounding. Itvisualizes prognostic factors L of the mediator (other than the exposure) that mayalso be associated with the outcome, and which thereby confound the associationbetween mediator and outcome. This situation is representative of most empiricalstudies, including randomized experiments, because the fact that the exposure israndomly assigned does not prevent confounding of the mediator-outcome associ-ation. In the presence of such confounding, the residual association between out-come and exposure after adjusting for the mediator(s) (cfr. βa in the above model)does not encode a direct exposure effect. This is technically seen because adjust-ment for a collider M (i.e. a node in which two edges converge) along the pathA→M← L←U →Y may render exposure A and outcome Y dependent along thatpath, and may thus induce a non-causal association [8, 3]. One of the major contri-butions of the causal inference literature has been to point this out and to make clear
4 Stijn Vansteelandt
that specialized estimation techniques are often needed to be able to adjust for suchconfounders, as these may themselves be affected by the exposure (as illustrated inFigure 1). Indeed, additional regression adjustment for the confounder L once againamounts to adjustment for a collider L along the path A→ L←U → Y . It therebyrenders A and Y dependent along that path, even in the absence of a direct effect.
3 Estimation of controlled direct effects in the presence ofintermediate confounding
Let Y (a,m) denoting the counterfactual outcome that would have been observed forgiven subject if the exposure were set to a and the mediator to m. Then a controlleddirect effect [9, 7] refers to a contrast between two counterfactual outcomes forthe same subject, corresponding to different exposure levels, but the same fixedmediator level. For instance, the controlled direct effect of exposure level a versusreference exposure level 0, controlling for M, can then be defined as the expectedcontrast
EY (a,m)−Y (0,m).
Likewise, the conditional controlled direct effect, given covariates C, of exposurelevel a versus reference exposure level 0, controlling for M, can then be defined asthe expected contrast
EY (a,m)−Y (0,m)|C.
Robins [8] showed that, under specific identification assumptions that we shalldescribe next, controlled direct effects can be identified in the presence of inter-mediate confounding. Specifically, provided that data have been recorded on allconfounders of the exposure-outcome relationship, as well as all confounders ofthe mediator-outcome relationship, the conditional controlled direct effect can beidentified using the so-called G-formula:
EY (a,m)−Y (0,m)|C =∫
E(Y |A = a,M = m,L) f (L|A = a,C)dL
−∫
E(Y |A = 0,M = m,L) f (L|A = 0,C)dL.
It thus follows that parametric models for the outcome and intermediate confounderscan be combined to result in an expression for the controlled direct effect. However,the G-formula does not admit a practical approach. It requires parametric modelsfor the intermediate confounders, which can be problematic when the confounderis high-dimensional. Moreover, it can be computationally cumbersome as a resultof the possibly high-dimensional integration which it involves. Finally, even simplemodels for the outcome and intermediate confounder may combine into intractableexpressions for the controlled direct effect, which depend on the exposure level aand covariate C in a highly contrived way. This not only makes results impracticalfor reporting, but also makes interesting hypotheses difficult to test [8].
Dealing with complex problems of confounding in mediation analysis 5
Various approaches have been developed to accommodate this, some of whichwe will review in this presentation.
One class of approaches involves weighting each subject’s data by the reciprocalof the likelihood of the observed mediator, given exposure and confounders, andthen regressing the outcome on exposure and mediator [8, 10]. Since the weight-ing corrects for confounding bias, the weighted regression analysis of the outcomecan ignore confounders and therefore does not suffer the aforementioned problemof collider-stratification that was observed in Figure 1. However, a limitation of in-verse probability weighting approaches is that they can perform poorly when someindividuals get assigned large weights.
An alternative class of approaches avoids inverse probability weighting by usingG-estimation strategies instead. These involve transforming the outcome in a waythat removes the mediator’s effect on the outcome and thereby the indirect effect.Next, the resulting transformed outcome is regressed on the exposure to obtain ameasure of direct effect. This idea has been considered for additive and multiplica-tive models [8, 4, 13], for logistic regression models [14], for survival models [6],and for unmatched [13, 14] and matched [2] retrospective studies; see Vansteelandt[15] for a detailed review.
4 Identification results for natural direct and indirect effects inthe presence of intermediate confounding
These developments on controlled direct effect have a number of limitations. First,the concept of controlling the mediator at level m uniformly in the population isoften rather restrictive as it is often difficult to conceptualize a single level of themediator that is realistic for all units in the population. Second, the difference be-tween the total effect and a controlled direct effect cannot generally be interpretedas an indirect effect [9]. To overcome these limitations, alternative definitions havebeen proposed of so-called natural direct and indirect effect [9, 7]. These are morenatural by allowing for variation between subjects in the level at which the mediatoris controlled and, moreover, combine to the total effect regardless of the underlyingdata distribution. However, natural direct effects require stronger identification con-ditions than controlled direct effects. In particular, it remains unclear to date hownatural direct and indirect effects can be identified in the presence of intermediateconfounding, unless in the unrealistic case where the exposure and mediator do notinteract (at the unit level) in the effect that they produce on the outcome.
Vansteelandt and VanderWeele [16] overcome this limitation by basing their de-velopment on the following definitions of natural direct and indirect effects in theexposed:
E Y −Y (0,M)|AE Y (0,M)−Y (0,M(0))|A ,
6 Stijn Vansteelandt
respectively. The first expresses, within each exposure stratum, how much the out-come would change on average if the exposure were set to the reference level 0, butthe mediator were held fixed at its observed level. The second evaluates how muchthe outcome would change on average if the exposure’s effect acted only throughmodifying the mediator. These definitions enable decomposition of the total effect(in the exposed) into a direct and indirect effect (in the exposed), as follows
E Y −Y (0)|A = E Y −Y (0,M(0))|A= E Y −Y (0,M)|A+E Y (0,M)−Y (0,M(0))|A .
Vansteelandt and VanderWeele [16] show that natural direct and indirect effectson the exposed allow for effect decomposition under weaker identification condi-tions than population natural direct and indirect effects. When no confounders of themediator-outcome association are affected by the exposure, identification is possi-ble under essentially the same conditions as for controlled direct effects. Other-wise, identification is still possible with additional knowledge on a non-identifiableselection-bias function which measures the dependence of the mediator effect onthe observed exposure within confounder levels, and which evaluates to zero in alarge class of realistic data-generating mechanisms.
Vansteelandt and VanderWeele [16] furthermore argue that natural direct and in-direct effects on the exposed are of intrinsic interest in various applications. Theymoreover show that these natural direct and indirect effects on the exposed coin-cide with the corresponding population natural direct and indirect effects when theexposure is randomly assigned. In such settings, their results are thus also of rele-vance for assessing population natural direct and indirect effects in the presence ofexposure-induced mediator-outcome confounding, which existing methodology hasnot been able to address.
Acknowledgements The author acknowledges support from IAP research network grant nr.P06/03 from the Belgian government (Belgian Science Policy) and is grateful to Carlo Berzuini(University of Cambridge), Torben Martinussen (University of Copenhagen) and Tyler Vander-Weele (Harvard University), with whom parts of this work have been developed.
References
[1] R.M. Baron and D.A. Kenny. The moderator-mediator variable distinction insocial psychological research: conceptual, strategic, and statistical considera-tions. J. Pers. Soc. Psychol., 51:1173–1182, 1986.
[2] C. Berzuini, S. Vansteelandt, L. Foco, R. Pastorino, and L. Bernardinelli. Di-rect genetic effects and their estimation from matched case-control data. Tech-nical report, University of Cambridge, 2011.
[3] S.R. Cole and M.A. Hernan. Fallibility in estimating direct effects. Interna-tional Journal of Epidemiology, 31:163–165, 2002.
Dealing with complex problems of confounding in mediation analysis 7
[4] S. Goetgeluk, S. Vansteelandt, and E. Goetghebeur. Estimation of controlleddirect effects. Journal of the Royal Statistical Society, Series B, 70:1049–1066,2008.
[5] D.P. MacKinnon. An Introduction to Statistical Mediation Analysis. NewYork: Lawrence Erlbaum Associates, 2008.
[6] T. Martinussen, S. Vansteelandt, M. Gerster, and J.v.B. Hjelmborg. Estima-tion of direct effects for survival data using the aalen additive hazards model.Journal of the Royal Statistical Society, Series B, 73(5):773–788, 2011.
[7] J. Pearl. Direct and indirect effects. In Proceedings of the Seventeenth Confer-ence on Uncertainty and Artificial Intelligence, pages 411–420, San Francisco,2001. Morgan Kaufmann.
[8] J.M. Robins. Testing and estimation of direct effects by reparameterizing di-rected acyclic graphs with structural nested models. In Computation, causa-tion, and discovery, pages 349–405. AAAI Press, Menlo Park, CA, 1999.
[9] J.M. Robins and S. Greenland. Identifiability and exchangeability for directand indirect effects. Epidemiology, 3:143–155, 1992.
[10] T. J. VanderWeele. Marginal structural models for the estimation of direct andindirect effects. Epidemiology, 20:18–26, 2009.
[11] T. J. VanderWeele and S. Vansteelandt. Conceptual issues concerning medi-ation, interventions and composition. Statistics and its Interface, 2:457–468,2009.
[12] T.J. VanderWeele. Mediation and mechanism. European Journal of Epidemi-ology, 24:217–224, 2009.
[13] S. Vansteelandt. Estimating direct effects in cohort and case-control studies.Epidemiology, 20:851–860, 2009.
[14] S. Vansteelandt. Estimation of controlled direct effects on a dichotomous out-come using logistic structural direct effect models. Biometrika, 97:921–934,2010.
[15] S Vansteelandt. Estimation of direct and indirect effects. In C. Berzuini,P. Dawid, and L. Bernardinelli, editors, Causal Inference: Statistical Perspec-tives and Applications. Wiley and Sons, 2012.
[16] S. Vansteelandt and T.J. VanderWeele. Natural direct and indirect effects onthe exposed: effect decomposition under weaker assumptions. Biometrics, inpress, 2012.
Which family model makes couples more happy - dual earner or male breadwinner ?
Anna Baranowska-Rataj and Anna Matysiak
Abstract We investigate the effects of men’s and women’s employment on their spouses’ subjective well-being in Poland. We use panel data techniques that allow us to account for selection of intrinsically happy individuals into male-breadwinner or dual earner models. We find that women’s employment has positive impact on women’s well-being, but reduces the happiness of their husbands. Our findings suggest that the sex-role specialisation model is rooted in the perception of Polish men.
1 Introduction
The research discussion on the effects of partners’ involvement in the labour market on marital stability has lasted already for several decades (e.g. Ross and Sawhill, 1975, Simpson and England, 1981, Oppenheimer, 1997, Cherlin, 2000, Jalovaara, 2003, Raz-Yurovich, 2012). What is at the core of this debate and still has been hardly investigated in detail are the effects of spouses’ labour force participation on both partners’ personal satisfaction with life. This issue is the main point of interest in our study.
Most empirical studies show that husbands work reduces anxiety and psychological distress among spouses (see Ross et al., 1990 for a review, Stolzenberg, 2001). There is more controversy when it comes to the effects of wives’ employment on the subjective well-being of their partners. The role specialisation model suggests that women’s labour market participation lowers the gains from marriage (Becker et al., 1977). Moreover, Stolzenberg (2001) argued that women are socialised to promote household members’ well-being whereas men are socialised to earn income and simultaneously to ignore their own physical and mental health, which implies that women’s involvement 1 Anna Baranowska-Rataj, Institute of Statistics and Demography, Warsaw School of Economics; email: [email protected] Anna Matysiak, Institute of Statistics and Demography, Warsaw School of Economics; email: [email protected]
2 Anna Baranowska-Rataj and Anna Matysiak
outside household might be detrimental to their spouses’ health. Furthermore, woman’s involvement in paid work might be indicative of a man’s failure to fulfil the breadwinner duties and consequently may lead to psychological distress among men (Macmillan and Gartner, 1999).
These arguments have been undermined recently because of the changes in the gender roles and a social shift from household production to household consumption (Cherlin, 2000, Raz-Yurovich, 2012). It has been emphasized that in modern societies similarity of economic activities and interests may improve understanding between spouses (Simpson and England, 1981) and hence improve their subjective well-being. Furthermore, it was presupposed that women’s earnings have the potential to increase living standards (Oppenheimer, 1997, Cherlin, 2000). The benefits of women’s employment on their partner’s subjective well-being may be particularly evident in countries with less traditional gender roles or lower living standards coupled with high or strongly increasing material aspirations (Rogers and DeBoer, 2001).
Empirical research on the effects of spouses’ employment on satisfaction with marriage, or more generally, the psychological well-being, is relatively scarce. The existing studies, carried out in US, suggest that while women’s employment is usually beneficial or neutral for their own well-being it seems to be detrimental for their husbands (Rosenfield, 1992, Stolzenberg, 2001, Rogers and DeBoer, 2001, Schoen et al., 2006). Such effects may not be present in all country contexts, however. For instance, Lee and Ono (2008) found an opposite effect for Japan and interpreted it with the strong prevalence of the role specialisation model in this country. Apart from the restricted geographical and cultural range, an important limitation of the empirical studies mentioned above is that they fail to control for a selection of intrinsically (un)happy individuals both into unions with (non)working partners and these selection effects may bias the results.
In this paper we contribute to the literature on the effects of spouses’ employment on their subjective well-being by extending the discussion to Poland. This country adopted a so called dual earner – female double burden model meaning that women are perceived as major care providers, but they are also expected to contribute to the household budget. Furthermore, this study takes a methodological step forward and uses panel data techniques to account for the time-invariant unobserved characteristics of individuals that jointly determine marriage behaviours and happiness levels. The applied statistical approach allows us thus to account for selection of intrinsically happy individuals into male-breadwinner or dual earner models.
2 Data and Method
In our study we used data from Social Diagnosis, which is a national representative panel survey. Its subsequent waves took place in 2003, 2005, 2007, 2009, and 2011. For our analysis, we selected women who entered the survey at ages 18-35. This gave us a sample of 27,251 female observations. Our dependent variable is self-rated happiness, derived from a question: “Taking all things together, would you say you are very happy, quite happy, somewhat happy or not at all happy?”; with responses coded on a four-point scale. In order to account for unobserved time-constant individual characteristics we estimated a fixed-effects ordered logit model. Fixed-effects approach removes the potential selection bias but accommodating it to models with dependent
Which family model makes couples more happy - dual earner or male breadwinner 3
variables measured on an ordinal scale is problematic. In order to solve this problem we employed the “Blow and Cluster Estimator” (BUC) recently developed by Baetschmann et al. (2011). This method dichotomizes the dependent variable at all possible cutpoints and then estimates the resulting fixed-effects logit models jointly by applying the conditional ML estimation. This method is relatively less likely to cause incidental parameter problems than the previously developed estimator proposed by the Ferrer-i-Carbonell and Frijters (2004) which performs the dichotomization at only one a priori specified cutpoint (i.e. the mean of the dependent variable).
Additionally, as a sensitivity analysis, we also used two other estimators: namely, the above mentioned Ferrer-i-Carbonell and Frijters (2004) ordered-logit estimator and the correlated random effects ordered probit model proposed by Mundlak (1978). The results from these two additional models were rather consistent with our main findings presented here, for the sake of brevity we do not include these results here, but they are available from authors on request.
3 Results and conclusions
Our findings show that all the three variables of our interest, namely marital status, employment status of the individual and the employment status of his/her partner, are important determinants of psychological well-being for both women and men. Consistently with the majority of the empirical studies conducted so far we find that employment improves subjective well-being of both women and men whereas unemployment has negative effect. Interestingly, we do not observe any significant difference between employment and inactivity.
Table 1. Results from the ordered logit model with BUC estimator Covariates
Model for women Model for men Coeff. S.E. Coeff. S.E.
Labour market status (ref. employed) unemployment -0.460*** (0.147) -0.932*** (0.162) inactivity -0.018 (0.130) -0.234 (0.164)
Partnership and partner’s employment (ref. non-working spouse) Single -0.884*** (0.286) -0.757** (0.326) Working spouse 0.121 (0.170) -0.347** (0.175) Divorced / Widowed -1.285*** (0.426) -1.167** (0.560) LL -1321.993 -1139.579 N 4395 3768 Source: authors’ calculations based on Social Diagnosis, * p<0.05; ** p<0.01, *** p<0.001. Control variables (not displayed) include: age, education, self-rated health and income, number of children and age of the youngest child.
We observe that the life satisfaction is the lowest among the divorced or widowed, followed by the single. Married persons are clearly most happy. Nevertheless, there are interesting gender differences among the married with respect to the effects of employment status of the partner. It turns out that husband’s involvement in the labour market does not affect the subjective well-being of wives. Among men, however, we
4 Anna Baranowska-Rataj and Anna Matysiak
observe a clearly detrimental effect of their wives employment on husbands’ psychological well-being. Thus men in Poland are satisfied with a male breadwinner family model – they are happier if they work and they prefer to have non-working wives.
References
1. Baetschmann, G., Staub, K. E. and Winkelmann, R. (2011) 'Consistent Estimation of the Fixed Effects Ordered Logit Model', IZA Discussion Paper 5443. Bonn, IZA.
2. Becker, G. S., Landes, E. M. and Michael, R. T. (1977) 'An economic analysis of marital instability', Journal of Political Economy, 85: 1141-1188.
3. Cherlin, A. (2000) 'Toward a new home socioeconomics of union formation', in Waite, L. J., Bachrach, C., Hindin, M., Thomson, E. and Thornton, A. (eds) The ties that bind - Perspectives on marriage and cohabitation, pp. 126-144. Hawthorne, New York, Aldine De Gruyter.
4. Ferrer-i-Carbonell, A. and Frijters, P. (2004) 'How Important Is Methodology For The Estimates Of The Determinants Of Happiness?', The Economic Journal, 114: 641–659.
5. Jalovaara, M. (2003) 'The joint effects of marriage partners’ socioeconomic positions on the risk of divorce', Demography, 40(1): 67–81.
6. Lee, K. S. and Ono, H. (2008) 'Specialization and happiness in marriage: A U.S.-Japan comparison', Social Science Research, 37(4): 1216-1234.
7. Macmillan, R. and Gartner, R. (1999) 'When She Brings Home the Bacon: Labor-Force Participation and the Risk of Spousal Violence against Women', Journal of Marriage and Family, 61(4): 947-958.
8. Mundlak, Y. (1978) 'On the pooling of time series and cross section data', Econometrica, 46: 69–85.
9. Oppenheimer, V. K. (1997) 'Women's Employment and the Gain to Marriage: The Specialization and Trading Model', Annual Review of Sociology, 23: 431-453.
10. Raz-Yurovich, L. (2012) 'Economic Determinants of Divorce Among Dual-Earner Couples: Jews in Israel', European Journal of Population/Revue européenne de Démographie: 1-27.
11. Rogers, S. J. and DeBoer, D. D. (2001) 'Changes in Wives' Income: Effects on Marital Happiness, Psychological Well-Being, and the Risk of Divorce', Journal of Marriage and Family, 63(2): 458-472.
12. Rosenfield, S. (1992) 'The Costs of Sharing: Wives' Employment and Husbands' Mental Health', Journal of Health and Social Behavior, 33(3): 213-225.
13. Ross, C. E., Mirowsky, J. and Goldsteen, K. (1990) 'The Impact of the Family on Health: The Decade in Review', Journal of Marriage and Family, 52(4): 1059-1078.
14. Ross, H. L. and Sawhill, I. V. (1975) Time of Transition. The Growth of Families Headed by Women, Washington, DC: The Urban Institute.
15. Schoen, R., Rogers, S. J. and Amato, P. R. (2006) 'Wives' Employment and Spouses' Marital Happiness', Journal of Family Issues, 27(4): 506-528.
16. Simpson, I. H. and England, P. (1981) 'Conjugal Work Roles and Marital Solidarity', Journal of Family Issues, 2(2): 180-204.
17. Stolzenberg, Ross M. (2001) 'It's about Time and Gender: Spousal Employment and Health', American Journal of Sociology, 107(1): 61-100.
Family structures and subjective wellbeing in
Italy
Silvia Montecolle, Francesca Rinesi and Alessandra Tinto
Abstract In the last decades an increasing attention has been paid to the issue of
subjective wellbeing. This paper aims to shed light on this topic by analysing which
characteristics are associated with high levels of subjective wellbeing in Italy. Special
emphasis will be given to selected socio-demographic and other domains
characteristics, such as the role of individuals within the family structure.
1 Introduction and aim of the work
In recent decades a renewed attention has been given to the concept of subjective
wellbeing. At the same time studies on the interrelation between socio-demographic
variables and subjective wellbeing in Italy are relatively scarce. This paper aims at
contributing to the existing literature by investigating the association between
subjective wellbeing and a set of domains of individual life. We will take into account
not only individual socio-demographic characteristics but also socio-economic aspects,
health conditions and interpersonal trust. Particular attention will be given to selected
socio-demographic characteristics such as marital status, family structure and the role
of individuals within the family.
Subjective wellbeing can be seen as a construct made up of two distinct, yet
interrelated, components: cognitive and affective [6,9,10]. The cognitive component of
subjective wellbeing, measured through life-satisfaction, grows out of the process of
comparison between individual life conditions and personal standards (expectations,
ideals, believes). Life satisfaction, as a consequence, can be seen as an individual and
retrospective evaluation of own life as a whole. The affective component (distinct in
1 Silvia Montecolle, Italian National Institute of Statistics (ISTAT); email:
Francesca Rinesi, Italian National Institute of Statistics (ISTAT); email: [email protected]
Alessandra Tinto, Italian National Institute of Statistics (ISTAT); email: [email protected]
2 Silvia Montecolle, Francesca Rinesi and Alessandra Tinto
positive and negative affects) is identified by the emotions or affects that people
experience during their daily life, and it is considered conceptually distinct from the
cognitive component because it is influenced by different variables [2,3,7]. While the
cognitive component implies a retrospective reflection on his own life at a given point
in time, the affective component has to do with the present situation.
The present study focuses on the cognitive component of subjective wellbeing, and
aims at highlighting which characteristics of individuals are associated with higher
levels of life satisfaction in Italy.
2 Data and Method
Data used come from the cross-sectional survey “Aspects of Daily Life” carried out
annually since 1993, by the Italian National Institute of Statistics (ISTAT). This survey
is based on a representative sample of about 20,000 households, consisting of over
50,000 individuals. The annual multipurpose survey “Aspects of Daily Life” collects a
set of data on individuals, households and events which enable a wide range of social
phenomena to be investigated [8]. The advantage of using this data source consists in
the large sample, in the prompt release of data and in the fact that each member of the
household is interviewed.
In the latest available wave of the survey a question on life satisfaction was
introduced for the first time, for all individuals aged 14 and over, therefore our sub-
sample is made of approximately 42,000 individuals. The question is harmonized with
the majority of international sample surveys and it reads: “All things considered, how
satisfied are you with your life as a whole these days? Use a 0 to 10 scale, where 0 is
completely dissatisfied and 10 is completely satisfied”. The choice of this wording was
done according to the research path developed in the 70s in the United States [1,4,5], as
each respondent was asked to evaluate autonomously their life satisfaction using a 0 to
10 scale.
A logistic regression model has been estimated with the aim of studying the
association between the variables considered. The dependent variable y (subjective
well-being) is equal to 1 when the individual gives a score between 8 and 10, y is equal
to 0 when the individual gives a lower score.
The independent variables considered are:
- socio-demographic variables: age group, gender, marital status, role within the
household, geographical area of residence;
- socio-economic variables: level of education, occupational status;
- social participation: meeting friends, religious participation;
- interpersonal trust: degree of trust in the majority of people;
- health conditions: self-perceived health.
Family structures and subjective wellbeing in Italy 3
3 Preliminary findings and future developments
The mean score for life-satisfaction in Italy is 7.2, while the median is 7.0 over 10.
Moreover, as shown in Figure 1, the distribution of interviewees according to their life-
satisfaction score is skewed to the right: in particular 43.3% of population aged 14 and
over scores 8 or more. However it must be noted that the life satisfaction score varies
considerably according to individual characteristics.
Figure 1: Answers to the question “All things considered, how satisfied are you with your life as
a whole these days?”
Source: ISTAT - Aspects of daily life (2010)
The results of the logistic regression model show how the majority of explanatory
variables introduced in the model are significantly associated with the definition of a
high/low level of life-satisfaction.
Particularly relevant is the association between the latter and the role of individuals
within the household (Table 1): adults living in couples and parents living with
children display significantly higher levels of life satisfaction compared to individuals
living alone (reference category). On the other hand, significantly lower levels of life-
satisfaction characterizes lone parent households (both when the respondent is the
parent and the children living in this kind of household) and members of extended
households.
Other variables positively associated with the level of life satisfaction are the
attendance of places of worship, and the degree of trust in the majority of people. In
particular, from the analysis of the interactions between variables, it was observed that
the impact of trust on life satisfaction is much stronger among individuals with a higher
level of education.
4 Silvia Montecolle, Francesca Rinesi and Alessandra Tinto
Table 1: Characteristics associated with the expression of high level of life-satisfaction in Italy .
Results from a logistic model.
Role within the family Odds Ratio p-value
Living alone (Reference Category) --- Member of extended household 0.759 0.0034
Couple without children 1.395 <0,0001
Parent within a Couple with children 1.299 <0,0001
Child within a Couple with children 1.055 0.3418
Parent within a Lone parent family 0.861 0.0172
Child within a Lone parent family 0.767 0.0001
Other 0.917 0.1704 Source: ISTAT - Aspects of daily life (2010). Results are controlled for: Gender, Age class,
Educational level, Perceived health, Marital status, Geographical area of residence, Role within
the family, Occupational status, Attendance of places of worship, Trust on the majority of people,
Meeting friends.
In conclusion, the present work gives us the opportunity to analyse which are the
individual characteristics which are mostly associated with a high level of life-
satisfaction. However it must be noted that, due to the cross-sectional nature of
available data, it is not possible to take into account the possible (non observable)
heterogeneity among individuals and the consequent selection effects. Further
developments of this research are aimed at introducing environment and social
contextual variables which can help capturing, through a multilevel approach, the effect
of the context on the individual definition of life-satisfaction.
References
1. Andrews, F.M., Withey, S.B.: Social Indicators of Well-Being: Americans’ Perceptions of
life quality. Plenum Press, New York (1976)
2. Argyle, M.: The Psychology of Happiness. Methuen, London, (1987)
3. Bradburn, N.M.: The Structure of Psychological Well-being. Aldine, Chicago (1969)
4. Campbell, A., Converse, P.: The Human Meaning of Social Change. Russell Sage
Foundation, New York (1972)
5. Campbell, A., Converse, P., Rodgers, W.: The Quality of American Life. Russell Sage
Foundation, New York (1976)
6. Diener, E.: Subjective Well-being. Psichol. Bull. 95, 542--575 (1984)
7. Diener, E., Emmons, R.A.: The independence of positive and negative affect. J Pers Soc
Psychol. 47 (5), 1105- -1117 (1984)
8. ISTAT: Il sistema di indagini sociali multiscopo, Metodi e norme, n. 31, Istat, Rome (2006)
9. OECD: Subjective well-being in Factbook 2009: Economic, Environmental and Social
Statistics,
http://miranda.sourceoecd.org/vl=73684696/cl=14/nw=1/rpsv/factbook2009/11/02/02/index
.htm. Cited 15 Mar 2012
10. Stiglitz, JE, Sen, A., Fitoussi, J.-P.: Report by the Commission on the Measurement of
Economic Performance and Social Progress (2009), http://www.stiglitz-sen-fitoussi.fr. Cited
15 Mar 2012
Identifiability of Discrete Graphical Models withHidden Variables
Marco Valtorta, Elizabeth S. Allman, and John A. Rhodes
Abstract We define a space of identifiability problems in causal Bayesian networksand concentrate on two of them. The first problem involves the generic identifiabilityof all parameters with restrictions on the state space of the variables. We present atechnique that, given an arbitrary directed graphical model with a single hiddenvariable, modifies the model in such a way that we can apply Kruskal’s theoremand solve the first identifiability problem. The second problem involves the globalidentifiability of the causal effect of a set T of variables on a set S of variables.Pearl’s do-calculus solves the second identifiability problem.
Key words: Causal Bayesian networks, Semi-Markovian models, Intervention,Identifiability, Unidentifiability
1 Two Settings for Identifiability
Markovian models are popular graphical models for encoding distributional andcausal relationships. A Markovian model consists of an acyclic directed graph(DAG) G over a set of variables V = V1, . . . ,Vn, called a causal graph, and a prob-ability distribution over V , which satisfies two constraints: each variable in the graphis independent of all its non-descendants given its direct parents, and the directededges in G represent direct causal influences. A Markovian model for which onlythe first constraint holds is called a Bayesian network. This explains why Markovianmodels are also called causal Bayesian networks.
Marco ValtortaDepartment of Computer Science and Engineering, University of South Carolina e-mail:[email protected]
Elizabeth S. Allman and John A. RhodesDepartment of Mathematics and Statistics, University of Alaska, Fairbanks AK 99775 USA e-mail:e.allman,[email protected]
1
2 Marco Valtorta, Elizabeth S. Allman, and John A. Rhodes
The chain rule for Bayesian networks states that the joint probability functionP(v) = P(v1, . . . ,vn) can be factorized as
P(v) = ∏Vi∈V
P(vi|pa(Vi)) (1)
The simplest kind of intervention [4] is fixing a subset of V , T , to some constantst, denoted by do(T = t) or just do(t), and then the post-intervention distributionPT (V )(T = t,V = v) = Pt(v) is compatible with the excision semantics and givenby:
Pt(v) =
∏Vi∈V\T P(vi|pa(Vi)) v consistent with t0 v inconsistent with t
(2)
Let N and U stand for the sets of observable (observed) and unobservable (hid-den) variables in graph G, i.e., N and U partition V . The observed probability distri-bution is:
P(n) = ∑U
∏Vi∈N
P(vi|pa(Vi)) ∏V j∈U
P(v j|pa(Vj)) (3)
One can define a space of identifiability problems based on equation (3). Weconcentrate on three dimensions of this space: identifiability of all parameters oronly some of them, identifiability of parameters in their whole range (global iden-tifiability) or with the exception of some subspace of measure zero (generic iden-tifiability), and identifiability with restrictions on the cardinality of the state spaceof variables or without them. We call identifiability 1 the generic identifiability ofall the probabilities in (3) with appropriate bounds on the state spaces of variables,and identifiability 2 the global identifiability with no bounds on the state space ofvariables of the causal effect Pt(s), given by:
Pt(s) =
∑Vl∈(N\S)\T ∑U ∏Vi∈N\T P(vi|pa(Vi))×∏V j∈U P(v j|pa(Vj)) s consistent with t0 s inconsistent with t
(4)
2 Kruskal Theorem and Its Use to Solve Identifiability 1
Kruskal’s theorem applies to a simple latent class model, in which three observedvariables are independent when conditioned on a single hidden one. We outline atechnique that, given an arbitrary directed graphical model with a single hiddenvariables, modifies the model in such a way that we can apply Kruskal’s theorem.The technique, which we have been developing and generalizes the one in [1], isbased on the following operations:
• Clump several variables (all hidden or all observed) into a single one, with largerstate space.
• Condition on the state of an observed variable.• Marginalize over an observed variable (making it hidden).
Identifiability of Discrete Graphical Models with Hidden Variables 3
Each of these can be done multiple times, and in combination with one another.The goal in applying these modifications is always to produce a model to whichKruskal’s theorem applies, so one needs to use them so that:
• At least 3 observed variables remain, which are independent when conditionedon the hidden variable.
• The resulting hidden state spaces are “not too large” relative to observed ones.(Letting a,b,c, and q be the sizes of the state spaces of the observed and hiddenvariables, in order, then min(a,q)+min(b,q)+min(c,q)≥ 2q+2.)
• Parameters for the resulting model are easily related to those of the original one.
It is easy to show by a counting argument that all Bayesian networks of fournodes in which there is at least one edge between children of the hidden variable arenot identifiable. An example of such a network is in Figure 1(a).
The causal Bayesian network of Figure 1(b) is identifiable by conditioning onvariable 2, applying Kruskal’s theorem on the resulting network of three observednodes and inverting the resulting conditional probability tables.
3 Using the Do-calculus to Solve Identifiability 2
The do-calculus consists of three rules that allow the replacement of interventionswith observations in modified graphs [4]. Let X , Y , Z be arbitrary disjoint sets ofnodes in a causal graph G. We denote by GX the graph obtained by deleting from Gall edges pointing to nodes in X and by GX the graph obtained by deleting from Gall edges emerging from nodes in X . To represent the deletion of both incoming andoutgoing edges, we use the notation GXZ .
(Rules of Do-Calculus) Let G be the DAG of a causal Bayesian network, andlet P(.) stand for its probability distribution. For any disjoint subsets of variablesX ,Y ,Z, and W we have the following rules.
Rule 1 (Insertion/deletion of observations)
Fig. 1 The causal Bayesian network (whose graph is) (a) is not identifiable 1, but the causal effectPt(s) is identifiable 2. The causal Bayesian network (b) is identifiable 1, but Pt(s) is not identifi-able 2.
4 Marco Valtorta, Elizabeth S. Allman, and John A. Rhodes
Px(y|z,w) = Px(y|w) i f (Y ⊥ Z|X ,W )GX(5)
Rule 2 (Action/observation exchange)
Pxz(y|w) = Px(y|z,w) i f (Y ⊥ Z|X ,W )GXZ(6)
Rule 3 (Insertion/deletion of actions)
Pxz(y|w) = Px(y|w) i f (Y ⊥ Z|X ,W )GX ,Z(W )(7)
where Z(W ) is the set of Z-nodes that are not ancestors of any W -node in GX .It was shown that the do-calculus is sound and complete [2] for the identifia-
bility 2 problem, i.e., a causal effect is identifiable 2 if and only if the quantityPt(s) can be tranformed into a formula that includes only observable quantities (i.e.,quantities derivable from P(N)) by using the rules of the do-calculus and standardprobability manipulations. To show that a causal effect is unidentifiable, it is how-ever more convenient to use the algorithm of Tian [6], which was also shown to besound and complete [5, 3]. For example, PT (S) is identifiable 2 in the graph of Fig-ure 1(a), because PT (S) = P(S); in other words, T has no causal effect on S. This canalso be shown by applying rule 3 (equation (7)), with X = , Y = S,Z = T,W = ,;consequently, Z(W ) = T , and GX ,Z(W ) is the graph of Figure 1(a) without the edge(U,T ). PT (S) is not identifiable 2 in the graph of Figure 1(b).
Acknowledgements The authors thanks the American Institute of Mathematics for financial sup-port to this research, through the Square initiative, that also involves Elena Stanghellini, whosecontribution is also gratefully acknowledged.
References
1. Allman, E.S., Mathias, C., Rhodes, J.A.: Identifiability of Parameters in Latent StructureModels with Many Observed Variables. The Annals of Statistics 37, 3099-3132 (2009)
2. Huang, Y., Valtorta, M.: Pearl’s Calculus of Intervention Is Complete. In: Proceedings of the22nd Conf. on Uncertainty in Artificial Intelligence (UAI-06), pp.217-224. Cambrige, MA,July 13-16, 2006
3. Huang, Y., Valtorta, M.: Identifiability in Causal Bayesian Networks: A Sound and CompleteAlgorithm. In: Proceedings of the 21st National Conf. on Artificial Intelligence (AAAI-06),pp. 1149-1154, Boston, MA, July 16-20, 2006
4. Pearl, J.: Causality — Models, Reasoning, and Inference, 2nd ed. Camb. Univ. Press, 20095. Shpitser, I., Pearl, J.: Identification of Conditional Interventional Distributions. In: Proceed-
ings of the 22nd Conf. on Uncertainty in Artificial Intelligence (UAI-06), pp.437-444. Cam-brige, MA, July 13-16, 2006
6. Tian, J.: Studies in Causal Reasoning and Learning. Technical Report R-309. Cognitive Sys-tems Laboratory, Department of Computer Science, University of California, Los Angeles.August 2002
Bayesian T-optimal designs by simulation: acase-study on model discrimination
Rossella Berni and Federico M. Stefanini
Abstract In a case study on wine making, total anthocyanins are measured duringwine pre-fermentation. An inhomogeneous Markov Chain is developed to obtainthe Bayesian T-optimal design for the next year. Results are discussed in view ofextensions of the utility function often needed in actual applications.
Key words: Bayesian T-optimal designs, utility function, Monte Carlo simulation
1 Introduction
Optimal design criteria received growing attention in the last ten years, both at the-oretical and computational levels, in part following the increase of computationalpower. Since the 70s, there is a long history of seminal papers in literature on D andT-optimality, both to estimate model parameters and to discriminate among models(for example, [7],[2], [3], [1]). Each experimental point and the final optimal designare selected according to the General Equivalence Theorem (G.E.T.). Model depen-dency may be considered as the main disadvantage for an optimal design, thus theresult depends on the hypothesized statistical model and its parameters: in presenceof uncertainty on model and parameters this dependence is crucial.
More recently, this dependence was considered in a Bayesian framework [5],by introducing prior distributions on models and parameters. Later [6] extended T-optimality by adopting the Kullback-Leibler distance to address the heteroschedas-ticity and the non-Gaussian nature of response variables: they defined the KL-optimality.
Notwithstanding the generality achieved, in actual applications further flexibilityis often needed, for example by defining a utility function in which the cost of eachobservation depends on the value taken by the independent variable.
Stefanini F.M. e-mail: [email protected] · Berni R. e-mail: [email protected] of Statistics ”G.Parenti”, University of Florence
1
2 Rossella Berni and Federico M. Stefanini
In this paper, a utility function based on the T-optimality criterion is definedand an inhomogeneous Markov Chain algorithm is developed [4] to perform MonteCarlo optimization.
2 Basic Theory
The class of linear and non-linear parametric models are considered as:
E(Yi|xi,θ j) = η j(xi,θ j); j = 1,2;xi ∈X ,θ j ∈Θ j (1)
with j = 1,2 the index of the considered model. Vector xi =(xi1, ..,xi j, ..,xik) denotesthe set of k independent variables for the i-th observation (i = 1, ..,n); for simplicitywe assume there are no replicates; the two vectors of unknown parameters θ j, j =1,2 have size m j, θ j ∈ Θ j ⊂ Rm j . We assume that η j(xi,θ j) are continuous realfunctions of (xi,θ j) ∈X ×Θ j. In an experimental design setting each xi is the i-thtrial chosen or/and controlled by the experimenter; xi belongs to the compact set Xwhich is the experimental region defined in the Rk field. Regarding random errorsεi, j, we assume that εi, j
i.i.d.∼ N(0,σ2j ).
In this context we consider a discrete or exact design, i.e. a design formed by aset of n points (x1, ..,xi, ..,xn) in X and denoted by Dn. Furthermore, ξ is the de-sign measure defined as a probability measure on the compact set X and it satisfies∫X ξ (x) dx = 1. It must be noted that a continuous design is a design which depends
only on the assumed probability measure ξ , without considering the number of ex-perimental points. A discrete design is defined for a specific set of trials and for eachdesign point of Dn given the total number of observations n by assigning masses pito each xi: ξn = (pi,xi); i = 1, ..,n;∑i pi = 1 where ρ is the rounded number of ob-servations taken at xi, with ∑i ρ(pin) = n. In [5], the theory is extended to situationsin which model uncertainty is present, and it is described by an elicited prior distri-bution with parameters π0 j; j = 1,2. Moreover elicited prior distributions of modelparameters are p(θ j), j = 1,2. The central result is that the Bayesian T-optimal cri-terion satisfies the G.E.T. theorem while it no longer depends on unknown values of
Fig. 1 Maximum likelihoodestimates of polynomial (left,continuous line) and logis-tic (right, continuous line)models. Dashed lines joinobserved values.
Pol Log
50
100
150
200
5 10 15 5 10 15Day
To
ta
l A
nth
ocya
nin
s
T-optimal Bayesian designs by simulation 3
the true model parameters. The two noncentrality parameters are:
∆ j(ξ ,θ j) = infθ j∈Θ j
∫X(η j(x,θ j)−η j(x,θ j))
2ξ (dx) (2)
with j ∈ 1,2 and j its complement in 1,2, thus j is in turn the index of the truemodel, after [2]. A T-optimal design ξ ? maximizes the criterion:
Γ (ξ ) = ∑j∈1,2
π0 j
∫Θ j
∆ j(x,θ j)p(θ j)dθ j (3)
3 A Monte Carlo algorithm for wine making
The kinetic of total anthocyanins (TAs) during a vinification procedure which isquite popular in Tuscany is considered. Year 2010 data were collected by daily sam-pling during maceration just after a pumping over. One hundred ml were withdrawnfrom the sampling valve and the UV/VIS spectra of the supernatants were recordedafter centrifugation. Two models were considered (Figure 1): a polynomial (η1)which is motivated by the possible role played by sulfitation, a logistic curve (η2)which is based on chemical considerations about the presence of TAs in solution.The expected values are:
η1(x,θ1) = θ1,0 +θ1,1x+θ1,2x2 +θ1,3x3 +θ1,4x4 (4)η2(x,θ2) = θ2,2 +((θ2,1−θ2,2)/(1+ exp((θ2,3− x)/θ2,4))) (5)
thus discrimination is performed between two non-linear functions in coefficientsand/or factors with only one independent variable (k = 1).
The prior probability values on models are π0,1 = 0.1 and π0,2 = 0.9. Discretizedposterior distributions on a grid of 25 points given observed data d were derived aftercalculating Laplace approximations under weakly informative priors, respectivelyp(θ1 | d) and p(θ2 | d). Following [4], we defined the inhomogeneous Markov Chainto optimize the utility function:
u(ξ ,θ1,θ2) = 1 ·10−10 +π01∆2(ξ ,θ1)+π02∆1(ξ ,θ2) (6)
with a multivariate normal jump distribution g(ξ | ξ ) defined on suitably trans-formed coordinates and weights (hx(xi),hp(pi)), i = 1, . . . ,5. Given the current state(ξ ,v) of this chain, the steps of our algorithm are:
1. Generate a candidate design ξ given the current state using g.2. Generate J points (θ ( j)
1 ,θ( j)2 ) j from the distributions of model parameters p(θ1 |
d) and p(θ2 | d). Calculate
v = J−1J
∑j=1
log(
u(ξ ,θ ( j)1 ,θ
( j)2 ))
4 Rossella Berni and Federico M. Stefanini
and set the candidate (ξ , v).3. Calculate αJ = min(1,exp(Jv−Jv), and with probability αJ accept the candidate
(ξ , v), otherwise let v unchanged.4. Increase the current value of J according to a suitable cooling schedule Jt : t =
1,2, . . . with Jt+1 ≥ Jt , e.g. every fifty steps increase J by 25.5. Go to step (1) up to convergence.
Among the designs made by six distinct points, we found the optimum at,(xi, pi):(1.00,0.48),(3.40,0.35),(3.26,0.11),(8.87,0.02),(11.64,0.02),(16.06,0.02).
4 Discussion
The proposed algorithm is based on a utility function which is remarkably close towhat optimized in the classic framework (up to a small constant to make it positive).Nevertheless the implementation does not change much if equation (6) is changed toincorporate the cost of observations, that may eventually depend on the value takenby x. Similarly, the total number of observations could be explicitly accounted for.
From the standpoint of the quality of calculations, approximated prior distribu-tions were considered to better compare results of simulations to the solution pro-vided by strictly following [5]. In [4] it is presented a more general algorithm tosimultaneously obtain the posterior distribution and the optimized decision. In theliterature, it has been already hypothesized that the prior distribution on modelscould be updated given the observed data, as we performed for model parameters,even though a substantial increase of computational burden is expected after suchextension.
References
1. Atkinson, A.C., Cox, D.R.: Planning experiments for discriminating between models (withdiscussion). J. R. Stat. Soc. ser. B, 36, 321–348 (1974)
2. Atkinson, A.C., Fedorov, V.V.: The design of experiments for discriminating two rival models.Biometrika, 62, 57–70 (1975)
3. Atkinson, A.C., Fedorov, V.V.: The design of experiments for discriminating between severalmodels. Biometrika, 62, 289–303 (1975)
4. Muller, P., Sanso, B., De Iorio, M.: Optimal Bayesian design by inhomogeneous Markovchain simulation. J. Am. Stat. Assoc., 99, 788–798 (2004)
5. Ponce De Leon, A.C., Atkinson, A.C.: Optimum experimental design for discriminating be-tween two rival models in the presence of prior information. Biometrika, 78, 601–618 (1991)
6. Tommasi, C., Lopez-Fildago, J.: Bayesian optimum designs for discriminating between mod-els with any distribution. Comput. Stat. Data An., 54, 143–150 (2010)
7. Wynn, H.P.: The sequential generation of D-optimum experimental designs. Ann. Math. Stat.,41, 1655–1664 (1970)
Monte Carlo Likelihood Inference inMultivariate Model-Based Geostatistics
Marco Minozzo and Clarissa Ferrari
Abstract Though in the last decade many works have appeared in the literature deal-ing with model-based extensions of the classical (univariate) geostatistical mappingmethodology based on linear Kriging, very few authors have concentrated, mainlyfor the inferential problems they pose, on model-based extensions of classical mul-tivariate geostatistical techniques like the linear model of coregionalization, or therelated ‘factorial kriging analysis’. Nevertheless, in presence of multivariate spatialnon-Gaussian data, in particular count data, as in many environmental applications,the use of these classical techniques can lead to incorrect predictions about the un-derling factors. To overcome this problem, here we discuss a hierarchical geosta-tistical factor model that extends, following a model-based geostatistical approach,the classical geostatistical proportional covariance model. For this model we investi-gate a likelihood-based inferential procedure using the Monte Carlo EM algorithm.In particular, we discuss some of its theoretical properties and show, through somethorough simulation studies, its sampling performances.
Key words: Cokriging, generalized linear mixed models, linear model of coregion-alization, Monte Carlo EM, spatial factor model, spatial prediction
1 Introduction
The classical linear model of coregionalization, or its simpler counterpart, the pro-portional covariance model, otherwise known as intrinsic correlation model, and
Marco MinozzoDepartment of Economics, University of Verona, Via dell’Artigliere 19, IT-37129 Verona, Italy,e-mail: [email protected]
Clarissa FerrariDepartment of Economics, University of Verona, Via dell’Artigliere 19, IT-37129 Verona, Italy,e-mail: [email protected]
1
2 Marco Minozzo and Clarissa Ferrari
the related ‘factorial kriging analysis’ have become standard tools in many areasof application for the analysis of multivariate spatial data. However, in presenceof non-Gaussian data, in particular count or skew data, the use of these geostatis-tical instruments can lead to misleading predictions and to erroneous conclusionsabout the underling factors. To cope with these situations, following the proposalput forward in the univariate case by Diggle et al. (1998), and somehow extend-ing the works of Zhang (2007) and of Zhu et al. (2005), we propose in Section 2 ahierarchical multivariate spatial model, built upon a generalization of the classicalgeostatistical proportional covariance model. Adopting a non-Bayesian inferentialframework, and assuming that the number of underlying common factors and theirspatial autocorrelation structure are known, in Section 3 we show how to carry outlikelihood inference on the parameters of the model by exploiting the capabilities ofthe Monte Carlo EM (MCEM) algorithm (see Wei and Tanner, 1990).
2 The Modeling Framework
Let us consider the following hierarchical extension of the classical geostatisticallinear model of coregionalization. Let yi(xk), i = 1, . . . ,m, k = 1, . . . ,K, be a set ofgeo-referenced data measurements relative to m regionalized variables, gathered atK spatial locations xk. These m regionalized variables are seen as a partial realizationof a set of m random functions Yi(x), i = 1, . . . ,m, x ∈ R2. For these functions weassume, for any x, and for i = j,
Yi(x)⊥⊥Yj(x)|Zi(x) and Yi(x)⊥⊥Z j(x)|Zi(x), (1)
and, for x′ = x′′, and i, j = 1, . . . ,m,
Yi(x′)⊥⊥Yj(x′′)|Zi(x′) and Yi(x′)⊥⊥Z j(x′′)|Zi(x′), (2)
where Zi(x), i= 1, . . . ,m, x∈R2, are mean zero joint stationary Gaussian processes.Moreover, for any given i and x, we assume that, conditionally on Zi(x), the ran-
dom variables Yi(x) have conditional distributions fi(y;Mi(x)), that is, Yi(x)|Zi(x)∼fi(y;Mi(x)), specified by the conditional expectations Mi(x) = E[Yi(x)|Zi(x)], andthat hi(Mi(x)) = βi +Zi(x), for some parameters βi and some known link functionshi(·). For instance, we might assume that for some or all i, and for any given x, thedata are conditionally Poisson distributed, that is, that
fi(y;Mi(x)) = exp−Mi(x)(Mi(x))y/y!, y = 0,1,2, . . . , (3)
and that the linear predictor βi + Zi(x) is related to the conditional mean Mi(x)through a logarithmic link function so that ln(Mi(x)) = βi + Zi(x). On the otherhand, for the rest of the i, we might assume that, for any given x, conditionallyon Zi(x), the random variables Yi(x) are Gamma distributed with conditional ex-pectations Mi(x) = E
[Yi(x)
∣∣Zi(x)]= exp
βi + Zi(x)
= νb, (here again hi(·) =
Multivariate Model-Based Geostatistics 3
ln(·)) and conditional variances Var[Yi(x)
∣∣Zi(x)]= νb2 = ν−1 exp
2βi+2Zi(x)
=
ν−1(Mi(x))2, where ν > 0 and b > 0 are parameters, that is, we might assume
fi(y;Mi(x)) = (yν−1/Γ (ν))exp−yν/Mi(x)(ν/Mi(x))ν , y > 0. (4)
Here the ‘shape’ parameter ν is constant for x ∈ R, whereas the ‘scale’ parameterb varies over R depending on the conditional expectation Mi(x). In addition to thePoisson or Gamma distributions, other discrete or continuous distributions could beconsidered to account for particular set of data.
For the latent part of the model, we adopt the following structure. For the m jointstationary Gaussian processes Zi(x), let us assume the linear factor model
Zi(x) =P
∑p=1
aipFp(x)+ξi(x), (5)
where aip are m×P coefficients, Fp(x), p = 1, . . . ,P, are P ≤ m non-observablespatial components (common factors) responsible for the cross correlation be-tween the variables Zi(x), and ξi(x) are non-observable spatial components (uniquefactors) responsible for the residual autocorrelation in the Zi(x) unexplained bythe common factors. We assume that Fp(x) and ξi(x) are mean zero stationaryGaussian processes with covariance functions Cov
[Fp(x),Fp(x+h)
]= ρ(h), and
Cov[ξi(x),ξi(x+h)
]= ψiρ(h), where h ∈R2, ρ(h) is a real spatial autocorrelation
function common to all factors such that ρ(0) = 1 and ρ(h)→ 0, as ||h|| → ∞, andψi are non-negative real parameters. We also assume that the processes Fp(x) andξi(x) have all cross-covariances identically equal to zero.
Assuming that the number P of latent common factors and that the spatial au-tocorrelation function ρ(h) have already been chosen, the model depends on theparameter vector θ = (β ,A,ψ), where β = (β1, . . . ,βm)
T , A = (a1, . . . ,am)T , with
ai = (ai1, . . . ,aiP), for i = 1, . . . ,m, and ψ = (ψ1, . . . ,ψm)T . Let us note that, as the
classical linear factor model, our model is not identifiable. However, the only inde-terminacy stays in a rotation of the matrix A.
3 Inference with the Monte Carlo EM algorithm
Likelihood inference on the parameters of the model would require the maximiza-tion, with respect to θ = (β ,A,ψ), of the likelihood based on the marginal densityfunction of the observations yi(xk). However, since this marginal density is not avail-able, and since the integration required in the E-step of the EM algorithm would notbe easy, here, to maximize the log-likelihood, we will resort to the MCEM algo-rithm.
Our implementation of the algorithm proceeds as follows. Let us define ξ =(ξ 1, . . . ,ξ m) where ξ i = (ξi(x1), . . . ,ξi(xK))
T , i = 1, . . . ,m, and F = (F1, . . . ,FP)where Fp = (Fp(x1), . . . ,Fp(xK))
T , p = 1, . . . ,P, and let f (y,ξ ,F;θ) be the joint
4 Marco Minozzo and Clarissa Ferrari
distribution of the model, that is, the complete log-likelihood, accounting also forthe unobserved factors. Assuming that the current guess for the parameters afterthe (s− 1)th iteration is given by θ s−1, and that Rs is a fixed positive integer, thesth iteration of the MCEM algorithm involves the following three steps (stochastic,expectation, maximization):
S step – draw Rs samples (ξ (r),F(r)), r = 1,. . .,Rs, from the (filtered) conditionaldistribution f (ξ ,F|y;θ s−1);
E step – compute Qs(θ ,θ s−1) = (1/Rs)∑Rsr=1 ln f (y,ξ (r),F(r);θ);
M step – take as the new guess θ s the value of θ which maximizes Qs(θ ,θ s−1).
In our modeling framework, the S-step of the algorithm can be implemented throughimportance sampling or Markov chain Monte Carlo (MCMC) techniques, whereasthe M-step typically requires the use of numerical procedures.
When the matrix A is known, and the conditional distributions fi(y;Mi(x)) are,for instance, Poisson, Gamma or Binomial, it is possible to see that the completelog-likelihood belongs to the curved exponential family and by choosing an appro-priate increasing sequence Rs the algorithm converges to the maximum likelihoodestimate (Fort and Moulines, 2003). On the other hand, when the matrix A is un-known, the complete likelihood does not belong any more to the curved exponentialfamily and theoretical convergence properties are not available. In this case, andassuming, for instance, fi(y;Mi(x)) to be Poisson, Gamma or Binomial, it is possi-ble to show that the complete log-likelihood to be maximized in the M-step of theMCEM algorithm is concave and so admits just one local maximum. Although thisdoes not guarantee by itself the convergence of the algorithm to some local max-imum, it allows a straightforward computational implementation of the M-step ofthe algorithm. Despite the lack of theoretical results on the sampling properties ofthe MCEM estimator, either in the case in which A is known or unknown, we show,through some extensive simulation studies, that the MCEM algorithm provides es-timates with quite reasonable sampling distributions.
Acknowledgements We gratefully acknowledge funding from the Italian Ministry of Education,University and Research (MIUR) through PRIN 2008 project 2008MRFM2H.
References
1. Diggle P.J., Moyeed R.A., Tawn J.A.: Model-based geostatistics (with discussion). Appl. Stat.47, 299–350 (1998)
2. Fort G., Moulines E.: Convergence of the Monte Carlo expectation maximization for curvedexponential families. Ann. Stat. 31, 1220–1259 (2003)
3. Wei G.C.G., Tanner M.A.: A Monte Carlo implementation of the EM algorithm and poorman’s data augmentation algorithm. J. Am. Stat. Assoc. 85, 699–704 (1990)
4. Zhang H.: Maximum-likelihood estimation for multivariate spatial linear coregionalizationmodels. Environmetrics 18, 125–139 (2007)
5. Zhu J., Eickhoff J.C., Yan P.: Generalized linear latent variable models for repeated measuresof spatially correlated multivariate data. Biometrics 61, 674–683 (2005)
Simulation of random rotation matrices
John T. Kent and Asaad M. Ganeiber
Key words: matrix Fisher distribution, acceptance-rejection simulation, angularcentral Gaussian distribution, Bingham distribution
1 Introduction
Directional data analysis is concerned with statistical analysis on various non-Euclidean manifolds, starting with circle and the sphere, and extending to relatedmanifolds [6]. Directional distributions can used as building blocks in more sophis-ticated statistical models which are studied using MCMC methods. For example,[2] used the matrix Fisher distribution in a Bayesian model to align two configura-tions of points in R3 in an unlabelled version of shape analysis, and they applied themodel to a problem of protein alignment in bioinformatics. Hence there is a needto develop simulation methods for directional distributions which are efficient overa wide range of concentration parameters. In this paper we focus on the simula-tion of the matrix Fisher distribution on the space of 3× 3 rotations using a newacceptance-rejection method to simulate the Bingham distribution.
John T. KentUniversity of Leeds, Leeds LS2 9JT, UK, e-mail: [email protected]
Asaad M. GaneiberUniversity of Leeds, Leeds LS2 9JT, UK, e-mail: [email protected]
1
2 John T. Kent and Asaad M. Ganeiber
2 Directional distributions
The following table gives some of the common spaces associated with directionaldata analysis, together with the main distributions.
Space Notation Distributionscircle S1 von Mises, wrapped Cauchysphere Sp Fisher (p = 2) and von-Mises-Fisher (p≥ 1),
Fisher-Binghamreal projective space RPp Bingham, angular central Gaussianspecial orthogonal group SO(q) matrix Fisher
The sphere Sp = x ∈ Rp+1 : xT x = 1 represents the space of “directions” inRp+1. Real projective space consists of the “axes” or “unsigned directions” ±x. Insome sense this space is half of a sphere; it can also be represented as the space ofrank 1 projection matrices,
RPp = P ∈ R(p+1)×(p+1) : P = PT , P2 = P, tr(P) = 1. (1)
A rank one projection matrix can be written as P = xxT where x is a unit vector. Thespecial orthogonal group of p× p rotation matrices is defined by
SO(q) = R ∈ Rp×p : detR = 1,RT R = Ip,
On each of these spaces there is a unique uniform distribution which is invariantunder rotations. Further each of these spaces is naturally embedded in a Euclideanspace. A natural “linear-exponential” family of distributions can be generated byletting the density (with respect to the uniform measure) be proportional to the ex-ponential of a linear function of the Euclidean variables. This construction generatesthe first named distribution in each row of the table above. It should be noted thatthe Bingham distribution, whose log density is linear in P = xxT in (1), can also beviewed as a distribution on the sphere whose log density is quadratic in x.
3 The matrix Fisher distribution
The linear-exponential family on SO(p) is known as the matrix Fisher distribution,with density
f (X) = cF exptr(FT X), X ∈ SO(p),
with respect to the underlying invariant Haar measure. This density was introducedby [4]; it is unimodal about a fixed rotation matrix determined by the p× p param-eter matrix F .
Now specialize to the case q = 3. A matrix in X ∈ SO(3) can be written in theform X = H23(φ)H13(θ)H12(ψ), where for1 ≤ i < j ≤ 3, Hi j(θ) denotes a 3×3 matrix which looks like an identity matrix except for values cosθ in locations
Simulation of random rotation matrices 3
(i, i) and ( j, j), and values sinθ and −sinθ in locations (i, j) and ( j, i). Thus Xis constructed as a product of three two-dimensional rotations about each of thecoordinate axes in turn. The angles φ ,θ ,ψ are known as Euler angles. They lie inranges, 0 ≤ φ ,ψ < 2π and −π/2 ≤ θ ≤ π/2. In these coordinates the underlyingHaar measure can be represented as
[dX ] = cosθdθdφdψ.
Note the presence of the cosθ term. This arises because small circles of constantlatitude have a smaller circumference near the poles than near the equators.
Let F have the signed singular value decomposition
F = U∆V T ,
where U and V are 3×3 rotation matrices and ∆ = diag(δ j) is diagonal with δ1 ≥2δ2 ≥ |δ3|. It differs from the usual singular value decomposition by requiring Uand V to be rotation matrices (rather than just orthogonal matrices) and by allowingthe smallest singular value to be negative if necessary to compensate.
This distribution reduces to the uniform distribution if F = 0 and becomes moreconcentrated about its modal value X = VUT as the overall concentration ||F || =√tr(FT F) increases.For theoretical purposes it suffices to limit attention to the diagonal case F = ∆ .
As the concentration increases, the distribution becomes concentrated near θ = φ =ψ = 0, and asymptotically, f (X) becomes a trivariate normal distribution,
f (X) ∝ exp−12[(δ1 +δ3)θ 2 +(δ1 +δ2)φ 2 +(δ2+δ3)ψ2],
with respect to Lesbesgue measure dθdφdψ .
4 Simulation
When developing simulation methods for directional distributions, there are severalissues to consider:
• the need for good efficiency for a wide range of concentration parameters, fromnear uniform to highly concentrated. In similar problems on Rp, the task is sim-pler when distributions are closed under affine transformations; in such cases itis sufficient to consider just a single standardized form of the distribution.
• the need for a tractable envelope distribution.• the presence of trigonometric factors in the base measure.
Efficient acceptance-rejection methods are available for the simpler directionaldistributions, most notably the Best-Fisher method [1] for the von Mises distribu-tion. For the more complicated distributions, several MCMC algorithms have re-cently been proposed, e.g. [2], [5] and [3]. However, acceptance-rejection methods
4 John T. Kent and Asaad M. Ganeiber
with reasonable acceptance probabilities are to be preferred when available. Thefollowing simulation method is based on a new acceptance-rejection algorithm forthe Bingham distribution. The new simulation method is based around the followingobservations.
• A classic result from differential geometry states that the space SO(3) can beidentified with real projective space RP3 under a one-to-one mapping, or equiv-alently with the unit sphere S3 in R4 under a one-to-two mapping. Each rotationmatrix on SO(3) maps to two antipodal points on this unit sphere. This identifi-cation is limited to the case p = 3. There does not seem to be any useful analoguefor SO(p), p > 3.
• The matrix Fisher distribution on SO(3) corresponds to the Bingham distributionon S3.
• The PhD thesis of the second author gives a new method to simulate from theBingham distribution on Sp for any p ≥ 1 using an acceptance-rejection algo-rithm with the angular central Gaussian distribution as an envelope.
• The angular central Gaussian distribution on Sp is very simple to simulate.Given a (p + 1)× (p + 1) covariance matrix Σ , simulate y ∼ Np+1(0,Σ) andset z = y/||y||. Given the parameters of a Bingham distribution, it is possible todetermine a choice of Σ to give a good envelope.
• The use of an angular central Gaussian envelope for the Bingham distributionis closely related to the use of a multivariate Cauchy density as an envelope forsimulating a multivariate normal distribution.
• the acceptance ratio is typically about 45% for a wide range of parameters. Thisvalue is very reasonable for practical purposes.
References
1. Best, D. J. and Fisher, N. I. Efficient simulation of the von Mises distribution. J. Appl. Statist.28, 152–157 (1979).
2. Green, P. J. and Mardia, K. V. Bayesian alignment using hierarchical models, with applicationsin protein bioinformatics, Biometrika 93, 235–254 (2006).
3. Hoff, P. D. Simulation of the matrix Bingham-von Mises-Fisher distribution with applicationto multivariate and relational data. Journal of Computational and Graphical Statistics 18,438–456 (2009)
4. Khatri, C. G. and K. V. Mardia, K. V. The von Mises-Fisher matrix distribution in orientationstatistics. J. Roy. Statist. Soc. B 39, 95–106 (1977).
5. Kume, A., and Walker, S. G. Sampling from compositional and directional distributions.Statistics and Computing 16, 261–265 (2006),
6. Mardia, K. V. and Jupp, P. E. Directional Statistics. Wiley, Chichester (2000).
Dynamically modelling of fuzzy sets for flexibledata retrieval
Miroslav Hudec
Abstract Flexible querying allows users to implement linguistic terms to better qualifydata they wish to obtain and rules to reveal. The question is how to properly constructfuzzy sets for each linguistic term. This issue is considered from the two aspects: user’sview on particular linguistic term and on the current content in database. Evidently, theuser can obtain the picture about stored data before running a query. This approach canbe used in situations when a non-commutative operator is required. The rules extractionby linguistic quantifiers is another task where modelling of fuzzy sets can be applied.Institutions of official statistics deal with large amount of surveyed data and potentiallyuseful administrative data, what makes them interesting for this approach.
1 Introduction
The increasing use of computers by business and governmental agencies has createdmountains of data that contain potentially valuable knowledge (Rasmussen and Yager,1997). The same holds for agencies of official statistics. Firstly, databases could containcrisp values which are not always accurately surveyed. Secondly, data fromadministrative sources contain valuable information which should be examined.
Flexible querying allows users to implement linguistic terms to better qualify datathey wish to obtain and rules to reveal. For example, to find municipalities wheremigration is small and unemployment is high, or to find to which extent the rule mostof companies which report to Intrastat have value of trade near exemption threshold istrue. The linguistic terms clearly suggest that there is a smooth transition betweenacceptable and unacceptable records.
1 Miroslav Hudec, Institute of Informatics and Statistics, Bratislava; email:[email protected]
2 Miroslav Hudec
Several fuzzy query implementations have been proposed e.g. (Bosc and Pivert,2000; Hudec, 2009; Kacprzyk and Zadrożny, 1995) and fuzzy queries for data mining(Rasmussen and Yager, 1997). In all approaches, the matching degree criticallydepends on constructed membership functions (Hudec and Sudzina, 2012).
This paper examines construction of fuzzy sets for flexible queries and its usage inaggregation by fuzzy linguistic quantifiers and in situation when commutative operatorsare not appropriate.
2 Defining appropriate fuzzy sets for each linguistic term
Let Dmim and Dmax be the lowest and the highest domain values of the attribute A(database column) i.e. Dom(A) = [Dmin, Dmax] and L and H be the lowest and thehighest values from an current database content; that is, [L, H] [Dmin, Dmax]. Formany attributes in databases holds [L, H] [Dmin, Dmax]; that is, intervals [Dmin, L]and/or [H, Dmax] are empty. This fact should be considered in data retrieval and ruleextraction. Theoretically, the domain of attribute value of export is [exemptionthreshold value, ]. The highest value of realized export is far from the “upperlimit” of R+. In construction of term high, we need to consider stored real values.
Let the linguistic domain have elements small, medium, high. The linguisticdomain covers the crisp sub domain of an attribute in a way illustrated in figure 1.
Figure 1: Linguistic and crisp domain
The first aspect allows users to freely define parameters of fuzzy sets (A, B, C, D).If the user is not familiar with the current database content, the query might easily endup with an empty answer. Moreover, the user is usually familiar with values of Dmin andDmax but not with values of L and H.
The second aspect is focused on construction of membership functions (A, B, C, D)directly from current content of a database. The first method is the uniform domaincovering method (Tudorie, 2008), depicted in figure 1. At the beginning, values of Land H are obtained from current database content. The length of fuzzy set core β andthe slope α (Figure 1) are created in the following way (Tudorie, 2008):
)(81 LH (1)
)(41 LH (2)
Consequently, it is easy to calculate required parameters A, B C and D.The uniform domain covering method is appropriate when the distribution of
attribute values in the domain is more or less uniform. If it is not the case, the uniformdomain coverage could lead to a conclusion that the meaning of the linguistic term is
Dynamically modelling of fuzzy sets for flexible data retrieval 3
far from real data. For these situations, the method can be improved by the statisticalmean (Tudorie, 2008) or the logarithmic transformation (Hudec and Sudzina, 2012).
For the solution of data retrieval task both aspects should be taken into account.The above mentioned methods could be used to suggest parameters of fuzzy sets. In thesecond step, users can modify these parameters, if they are not satisfied with suggestedones, before running a query (Hudec and Sudzina, 2012).
3 Linguistic quantifiers
A special role among the aggregation operators play linguistic quantifiers such as mostor few. For example, to find out whether in the Intrastat database most of businesseshave small value of intra-EU trade (are near the exemption threshold).
This problem is depicted in way Qx(Px), where Q denotes a linguistic quantifier, X=x is a universe of disclosure (set of all companies) and P(x) is a predicatecorresponding to a query condition. In the first step we need to construct membershipfunction for the term small value of trade. The uniform domain covering method (1)and (2) is the best option, because the main goal is not to retrieve data but to revealrules. Value of L (figure 1) is the exemption value. The truth value of statement iscomputed by the following equation (Zadrożny and Kacprzyk, 2009):
))(n1())((
1P
n
iiQ xPxQxTruth (3)
where n is the cardinality of X and µQ (the quantifier most) might be given as:
5.0,085.05.0,62
85.0,1)(
yforyfory
yforyQ
4 Non-commutative aggregation operator
T-norm functions are used for the aggregation under uncertainty. From the axiom thatall t-norm functions are commutative, implies that they are applicable only if the orderof elementary conditions is irrelevant. There exists a class of problems whereelementary conditions are not independent, that is, the second elementary conditiondepends on answers obtained from the first one. Obviously this requires using a non-commutative operator. The among operator (Tudorie, 2008) meets this requirement:
))(a),(amin( 2P1/P 22121 PPAMONGP (4)
where a1 and a2 are database attributes, µP2 is the membership function definingfulfilment of independent elementary condition and µP1/P2 is the fulfilment degree ofdepended elementary condition relative to the independent one.
The example of this query is: select companies which exported small amount ofgoods (P1) among companies having high value of trade (P2).
4 Miroslav Hudec
In the first step companies with high value of trade (vt) are selected. Themembership function of linguistic term high is calculated by one of methods examinedin section 2 for the domain [Lvt, Hvt] from the current content in database. Companiesselected by P2 create sub set of all companies in database. This subset constitutesreduced sub domain [Lag-red, Hag-red] [Lag, Hag] of amount of goods (ag). The fuzzyset small amount of goods is created on sub domain [Lag-red, Hag-red]. Even if the usercan define parameters for membership function µp2, without suggestion from currentdatabase content, defining the membership function for µp1/p2 is beyond his capabilities.
5 Conclusion
In this paper, we suggested a flexible SQL-like query language for data retrieval anddata mining. The problem of construction of membership functions for data retrievaltasks and data mining can be satisfactorily solved if we merge the user’s opinion aboutlinguistic terms with the current content in database. This approach is also a supportingtool for queries where elementary conditions are not independent and for extractingrules by linguistic quantifiers.
In addition, this approach is open for further improvements like: querying overmissing values when users know functional dependencies between attributes andquerying using priorities between elementary conditions.
The research reported herein was funded by the European Commission via the SeventhFramework Programme for Research (FP7/2007-2013) under Grant agreementn°244767. This work was supported by the Slovak Research and Development Agencyunder the contract No. DO7RP-0024-10.
References
1. Bosc, P., Pivert, O.: SQLf query functionality on top of a regular relational databasemanagement system. In: Pons, M., Vila, M.A., Kacprzyk, J. (eds.) Knowledge Managementin Fuzzy Databases, pp. 171-190. Physica-Verlag, Heidelberg (2000).
2. Hudec, M.: An approach to fuzzy database querying, analysis and realisation. Comput. Sci.Inf. Syst. 6(2), 127-140 (2009).
3. Hudec, M., Sudzina, F.: Construction of fuzzy sets and applying aggregation operators forfuzzy queries. In: In: 14th International Conference on Enterprise Information Systems(ICEIS 2012), Wroclav (2012). Accepted for publication.
4. Kacprzyk, J., Zadrożny, S.: FQUERY for Access: Fuzzy querying for windows-basedDBMS. In: Bosc, P., Kacprzyk, J. (eds.) Fuzziness in Database Management Systems, pp.415-433. Physica-Verlag, Heidelberg (1995).
5. Rasmussen, D., Yager, R.R.: Summary SQL - A Fuzzy Tool for Data Mining. Intell. DataAnal. 1, 49-58 (1997)
6. Tudorie, C.: Qualifying objects in classical relational database querying. In: Galindo J. (ed.)Handbook of Research on Fuzzy Information Processing in Databases, pp. 218-245. IGIGlobal, London (2008).
7. Zadrożny, S., Kacprzyk, J.: Issues in the practical use of the OWA operators in fuzzyquerying. J. Intell. Inf. Syst. 33, 307-325 (2009).
How the text mining measures complex phenomena
in official statistics Come il text mining misura fenomeni complessi nelle statistiche ufficiali
Bolasco Sergio, Pavone Pasquale Dipartimento Memotef Università di Roma “La Sapienza”,
Scuola Superiore S.Anna di Pisa, [email protected]
Riassunto:
Il presente lavoro si propone, attraverso strumenti di text mining, di misurare
quantitativamente caratteristiche dell’attività quotidiana descritta nei diari individuali
dell’indagine Istat sull’Uso del Tempo (TUS). In particolare, vengono studiati fenomeni
riguardo la localizzazione delle attività relazionali riconducibili al “comunicare con”. Le
maggiori potenzialità di un’analisi condotta direttamente su informazioni in linguaggio
naturale sono dovuti alla migliore “risoluzione” della misurazione, in quanto l’analisi
dei concetti è più flessibile, precisa e accurata di quella basata su codifiche. Ciò
migliora la produzione, anche in forma tradizionale, di statistiche ufficiali aprendo
nuove prospettive nella valutazione di fenomeni complessi quali sono quelli da misurare
nel calcolo dei bilanci del tempo.
Keywords: text mining, information extraction, ETL, linguistic resources
1 Introduction
The applications of textual statistics1 handling information expressed in natural
language (unstructured textual data) in the same way as classical structured (quantitative
and / or categorical) data have increased in recent years. The greatest potential for the
direct analysis of textual information depends on the better "resolution" of the
measurement, because analyses based on concepts are more flexible, precise and
accurate than those conducted through keywords or coding. This paper aims, through
lexical and textual analysis, to measure quantitatively the characteristics of the everyday
activities of individuals described in the diaries of the Istat Time Use Survey (TUS).
The survey aims to establish a free text daily diary to describe the activities carried out
during the day. For the first time in the survey of 2002-2003 (Romano, 2007), Istat has
agreed to acquire the full text of individual diaries, thereby providing an archive of great
importance, not only in size (over 50,000 diaries, equivalent to 16,000 pages of text) but
especially in content, clearing the way for numerous developments. The limits resulting
from the ambiguity of natural language are largely resolved at the start of the treatment,
by appropriate tools for this type of data2. Each application of the models and
1 Lebart et al. (1998); Aureli & Bolasco (2004), Dulli et al. (2004); www.jadt.org : online JADT Proceedings, 2000--2010.
2 There are several software for the natural language processing and automatic analysis of texts, which differ according to the type
of analysis to be conducted. In this study, considering the statistical purpose of the analysis, we used the TaLTaC2 software, which
techniques of text mining is characterized by strong multidisciplinary integration
involving statistics, computer science and linguistics in equal measures.
We will illustrate the procedure adopted to automatically extract information from the
non-structured text of the diaries, record them in a structured way (as a Boolean or
frequency) in a matrix of individual data and then cross the variables generated by the
textual analysis with the categorical characteristics of individuals in order to produce
official statistics. In particular, phenomena concerning the intensity of social interaction
– that can be related to the "who you are communicating with " – and the different
locations of this type of activity are regarded here. The study is conducted by
considering individuals as units of analysis, where the diary of a day is regarded as a
single context (see Bolasco et al. 2007).
2 Definition of the resource "place" and relational activities
The places of individual daily activities described in the diaries are captured through a
general model presented in our previous work (Bolasco and Pavone 2010). This model
allows us to identify a wide variety of adverbial locutions indicating place, based on the
linguistic structure of a prepositional syntagm, as follows:
PREPOSITION (ADJECTIVE) SUBSTANTIVE (ADJECTIVE)
where the adjectives are in brackets because their presence is optional and / or repeated.
For example, starting from the primary term "home", the model recognizes sequences
such as "at home", "my second home", "nella mia casa futura (in my future home)".
The whole syntagm may be repeated several times, with the adjectival function of the
first noun, for example: <on the seat | of the car>; <alla festa | di compleanno | di un
amico (at the Birthday Party | of a friend)> (Table 1).
Table 1 - Examples of expressions of place from the model
PREP POSS AGG SOST PREP POSS AGG SOST
a casa
davanti a casa
nella mia seconda casa
nella mia casa futura
a casa mia
a casa di mia madre
a casa del vicino
vicino (a) casa
The model, based on a hybrid system consisting of rules and dictionaries, is done in two
stages. An initial exploratory phase of training, used to develop the basic components of
the model and a second application stage to detect their actualization in the corpus of
the TUS. The application of the model is divided into: i) the launch of a query,
consisting in a single regular expression composed of 39 sequences in the OR for a total
of over 150 relations (rules) between the concepts expressed by 16 semantic dictionaries
able to extract locutions, ii) the evaluation of the entities found, iii) the calculation of
the occurrences of each term, for a total number of occurrences (redundant) equal to
stands for Automatic Treatment for Lexical and Textual Analysis of the Contents of a Corpus, developed from research at the University of Rome "La Sapienza " (Bolasco 2010; www.taltac.it)
22% of the entire corpus. These sequences, as space-time modifiers, were divided ex-
post into sub-thematic classes, distinguishing between activities "at home" (his own,
with relatives, friends or others) from activities "away" related to movement (walking,
cycling, on public transport, ...) or activities related to roles-places (at the hairdressers,
newsagents, ...) or linked to different environments / sites (in the office, at the bank, in a
shop, among the market stalls, ...).
Relational activities are identified by studying a sequence of two "components",
interlaced by the keyword <con> (in some cases <a>). In particular, the first component
of the verbal type, limited to verbs expressing communication "talk / communicate
with", and "call / tell". These concepts have been captured even when expressed in
similar terms (phone call, phone) in a compound verb phrase: "make (a) p." or "be (on)
the p.". The second component is the "who", ie the actor who is addressed by the
speaker. Several classes of actors already defined (parents, spouses, children,
grandchildren, grandparents, friends etc.) are used to reconstruct the sequence, even
with more complex expressions such as: <parlo di politica con mia moglie (I’m talking
about politics with my wife)>. For a list of verbs and actors considered, see Bolasco et
al. (2007).
3 Measuring the characteristic places of relational activities
After having defined the entities and their concepts, created thematic dictionaries, the
search in the text was based on the construction of complex queries, using regular
expressions, in order to identify the sequences in the diaries that realise these activities
in relation to different categories of actors (relatives / friends) in conjunction with the
different classes of place identified by the model as described above. In particular, in
our case the set of queries takes the following form:
"CATSEM(Verb) LAG3 CATSEM(Prep) LAG4 CATSEM(Actor#) LAG8 WH LAG3 CATSEM(Place#) LAG2 |"
where CATSEM denotes the classes of: i) verbs of "communicating", ii) actors
(“relatives / friends"), iii) prepositions “con/a/tra/in (with / to / between / in)”, iv) places
("own home / home of other people / other places / means of transport"). The LAG #
expresses the maximum number of words in the interval between two operands of the
expression and the token <|> denotes the end of the sentence. Some examples of the
sequences extracted are shown in Table 2.
Table 2 – Some examples of sequences
raccontato a mia moglie cosa ho fatto oggi WH a casa mia |
litigo con mia sorella WH a letto a casa |
parlo con mio marito WH a casa di amici |
giocato a calcio con mio fratello e i nostri amici WH parco |
parlavo con i miei familiari con l' autoradio accesa WH in macchina |
chiacchierato con gli amici § ho ascoltato la radio WH a casa mia |
gioco con un amichetto § WH a casa della nonna |
chiacchierato con amici e parenti aspettando gli sposi § WH al ristorante |
chiacchiero con degli amici e ascolto musica WH in corriera |
Each query captures an instance, whenever the sequence is present in the diary. The
result of the query produces a new variable that measures the presence / absence (or
frequency) of the entity for each individual. This new structured information can be
placed in connection with the individual a priori information, such as structural
variables (age, sex, marital status, education level) to produce traditional statistics.
By applying this model to a sub-sample (10,000 units) of the Istat survey, we obtain a
statistic of the type shown in Table 3, corresponding to 18,628 sentences.
Table 3 – Sentences concerning relation activities with parents/friends of the sub-
sample by gender, age groups and type of place (percentage values)
References
Aureli E., Bolasco S. (a cura di) (2004) Applicazioni di analisi statistica di dati testuali
Casa Editrice Università "La Sapienza", Roma.
Bolasco S. (2010). Taltac2.10 Sviluppi, esperienze ed elementi essenziali di analisi
automatica dei testi, LED, Milano.
Bolasco S., Canzonetti A., Capo F. M. (2005) Text mining: uno strumento strategico
per imprese e istituzioni, CISU, Roma.
Bolasco S., D’Avino E., Pavone P. (2007) Analisi dei diari giornalieri con strumenti di
statistica testuale e text mining, in: I tempi della vita quotidiana. Un approccio
multidisciplinare all'analisi dell'uso del tempo, Romano, M. C. (ed.), ISTAT,
Roma, 309-340.
Bolasco S., Pavone P. (2010) Automatic Dictionary and Rule-Based Systems for
Extracting Information from Text, in: Data Analysis and Classification
Proceedings of the 6th Conference of the Classification and Data Analysis Group
of the Società Italiana di Statistica, Palumbo, F. , Lauro, C. N. , Greenacre, M.
(Eds.), Springer, Berlin-Heidelberg, 189-198.
Dulli S., Polpettini P., Trotta M. (2004) Text mining: teoria e applicazioni, Franco
Angeli, Milano.
Lebart L., Salem A., Berry L. (1998) Exploring textual data, Kluwer Academic Publ.,
Dordrecht.
Romano M. C. (ed.) (2007) L'uso del tempo - Indagine multiscopo sulle famiglie "Uso
del tempo" - Anni 2002-2003, Collana: Informazioni, n. 2, ISTAT, Roma.
14-24 25-44 45-64 65+ Total 14-24 25-44 45-64 65+ Total
with relatives at own home 2.4 9.0 7.6 4.5 23.5 2.9 14.7 10.9 6.2 34.7 58.2 with relatives at home of other people 0.1 0.2 0.1 0.1 0.4 0.1 0.2 0.1 0.1 0.4 0.8 with relatives in other places 0.3 1.8 1.5 0.5 4.2 0.5 2.4 1.6 0.7 5.2 9.4 with relatives on a mean of trasport 0.3 1.6 1.4 0.5 3.8 0.5 2.4 1.6 0.7 5.2 9.0
with friends at own home 0.3 0.5 0.2 0.1 1.1 0.5 0.8 0.5 0.4 2.1 3.3 with friends at home of other people 0.0 0.3 0.1 0.0 0.5 0.1 0.3 0.1 0.1 0.6 1.0 with friends in other places 2.2 3.0 1.7 1.1 8.0 1.8 1.7 0.6 0.3 4.5 12.5 with friends on a mean of trasport 1.1 1.2 0.5 0.4 3.2 1.2 1.0 0.2 0.2 2.7 5.9
Total gender by age 6.7 17.6 13.2 7.2 44.7 7.6 23.4 15.6 8.7 55.3 100.0
Relation activities Men
Age groups Women
Age groups Total place
Robust estimation for multivariate data underthe independent contamination model
C. Agostinelli, R.A. Maronna, and V.J. Yohai
Abstract We introduce a new kind of robust procedures for estimating the meanvector and covariance matrix for multivariate normal observations, when outliersare generated according to an independent contamination model. These estimators,namely composite likelihood M-estimates (CLM-estimates) are related to the com-posite likelihood methods.
Key words: Multivariate scatter, independent contamination model, M-estimators
1 Composite Likelihood M-estimates
In Alqallaf et al (2009) a new contamination model, called independent contami-nation model, is introduced. In this model each component of a multivariate obser-vation has a probability ε of being replaced by an outlier. Then, even if ε is small,the fraction of observations with at least one contaminated component tends to onewhen the dimension of the data p increases. Alqallaf et al (2009) showed that forthis type of contamination the breakdown point of the usual affine equivariant robustmethods for estimating multivariate location tends to 0 when p increases. A similarresult can be proved for affine equivariant robust estimates of the scatter matrix.
Scatter estimates which are robust, for the independent contamination model canbe obtained using separate robust estimates of the covariances for each pair of vari-
C. AgostinelliDepartment of Environmental Sciences, Informatics and Statistics, Ca’ Foscari University, Venice,e-mail: [email protected]
R.A. MaronnaDepartment of Mathematics, University of La Plata, Argentina
V.J. YohaiDepartment of Mathematics, University of Buenos Aires, Argentina
1
2 C. Agostinelli, R.A. Maronna, and V.J. Yohai
ables. A shortcoming of this approach is that the resulting covariance matrix maynot be positive definite. This is specially true in the presence of outliers.
In this talk we will present a new kind of robust procedures for estimating thecovariance matrix which are related to the composite likelihood methods introducedby Lindsay (1988). The maximum composite likelihood estimates were introducedas an alternative procedure for situations where the maximum likelihood estimatesbecome too complicated.
Suppose we have a sample x1, · · · ,xn of p-dimensional vectors and we want toestimate the location vector µ and the scatter matrix Σ . Let ρ be a non-decreasingfunction such that tρ ′(t) is non decreasing and bounded. A monotone M-estimateminimizes
L(µ,Σ) =n
∑i=1
M(xi,µ,Σ) , (1)
whereM(x,µ,Σ) = cdet(Σ)+ρ
((x−µ)′Σ−1(x−µ)
),
where c is a positive constant. The estimates we propose minimize 1 but replacingM(x,µ,Σ) by
M(x,µ,Σ) =p−1
∑j=1
p
∑k= j+1
[d det(Σ jk)+ρ
((x jk−µ jk)
′Σ−1jk (x jk−µ jk)
)],
where Σ jk is the 2×2 submatrix of Σ corresponding to rows and columns j and k,x jk and µ jk are the two-dimensional vectors formed with the components j and k ofvectors x and µ respectively. Finally d is a tuning constant chosen so that the esti-mate of Σ be Fisher consistent for the case that observations are multivariate normal.We call these estimators composite likelihood M-estimates (CLM-estimates).
CLM estimates can be extended to the case where µ depends on a vector ofregressors z and a vector of parameter β , i.e., µ = µ(z,β ) and Σ = σ2D, where Dis a known correlation matrix. This setup covers linear models. CLM-estimates aredefined in this case as
(β , σ2) = argminβ ,σ2
L(β ,σ2) ,
where
L(β ,σ) =n
∑i=1
M(xi,zi,β ,σ2) ,
M(x,z,β ,σ2) =p−1
∑j=1
p
∑k= j+1
[dσ
4 det(D jk)+ρ
(σ−2 (x jk−µ jk(z,β ))′D−1
jk (x jk−µ jk(z,β )))]
.
Robust estimation for multivariate data under the independent contamination model 3
References
Alqallaf F, Aelst SV, Zamar R, Yohai V (2009) Propagation of outliers in multivari-ate data. The Annals of Statistics 37(1):311–331, DOI 10.1214/07-AOS588
Lindsay B (1988) Composite likelihood methods. Contemporary Mathematics80(1):221–39
A comparison of different procedures forcombining high-dimensional multivariatevolatility forecasts
Alessandra Amendola and Giuseppe Storti
Abstract Aim of this paper is to investigate the effect of model uncertainty on mul-tivariate volatility prediction. This effect is expected to be particularly relevant inapplications to vast dimensional datasets since it is well known that, in this case,the need for tractable model structures requires the imposition of severe and oftenuntested constraints on the volatility dynamics. By means of an application to theoptimization of a vast dimensional portfolio of stock returns, the paper compares theperformances of different models and combination procedures. The main finding isthat results are highly sensitive not only to the choice of the model but also to thespecific combination procedure being used.
Key words: multivariate volatility, forecast combination, weights estimation
1 Introduction
In multivariate volatility prediction model uncertainty is a relevant problem to befaced by researchers and practitioners. The risk of model misspecification is partic-ularly sizeable in large dimensional problems where highly restrictive assumptionson the volatility dynamics are usually required (see e.g. Pesaran, Schleicher & Zaf-faroni, 2009). In order to reduce the impact of misspecification at the forecastingstage, a typical approach is to consider the combination of forecasts from differentcompeting models. Although some recent papers have been focused on the evalua-tion of forecast accuracy of MGARCH models (Patton & Sheppard, 2008; Laurent,
Alessandra AmendolaDepartment of Economics and Statistics, University of Salerno, Via Ponte don Melillo, 84084Fisciano, Salerno (Italy) e-mail: [email protected]
Giuseppe StortiDepartment of Economics and Statistics, University of Salerno, Via Ponte don Melillo, 84084Fisciano, Salerno (Italy) e-mail: [email protected]
1
2 Alessandra Amendola and Giuseppe Storti
Rombouts & Violante, 2011) less attention has been paid to the combination ofvolatility forecasts from different models as a strategy for improving the predictiveaccuracy (Amendola & Storti, 2008). Also, it has to be considered that in theorydifferent combination strategies could be implemented but, for a given application,only one must be chosen. A combination strategy is defined by the identificationof two different elements: a combination rule, which is a function of the alterna-tive forecasts available, and an estimator of the weights assigned to each model. Asa consequence of these choices, an additional source of uncertainty, related to thechoice of the combination strategy, is introduced into the analysis.Aim of this work is to discuss some alternative forecast combination strategies for(possibly HD) multivariate volatility forecasts and compare their empirical perfor-mances. Section 2 introduces the reference model used for the analysis while somealternative estimator of the combination weights are discussed in Section 3. The sta-tistical properties of the estimators have been assessed by a Monte Carlo simulationwhose results are not presented here but are available upon request. Section 4 con-cludes illustrating the results of an application to the optimization of a portfolio ofstock returns.
2 The reference model
The data generating process is assumed to be given by
rt = Stzt t=1,...,T, T+1,...,T+N
where T is the end of the in-sample period, ztiid∼ (0, Ik) St is any (k × k)
positive definite (p.d.) matrix such that StS′
t = Ht = V ar(rt|It−1), Ht =C(H1,t, . . . ,Hn,t; w) with Hj,t being a symmetric p.d. (k × k) matrix. In prac-tice Hj,t is a conditional covariance matrix forecast by a given ‘candidate model’.The functionC(.) is an appropriately chosen combination function and w is a vectorof combination parameters. The weights assigned to each candidate model dependon the values of the elements of w but do not necessarily coincide with them. Dif-ferent combination functions C(.) can in principle be used and there is no a priorivalid procedure for selecting the optimal function. Among all the possible choicesof C(.), the most common is the linear combination function
Ht = w1H1,t + . . .+ wnHn,t wj ≥ 0
where w coincides with the vector of combination weights. The assumption of non-negative weights is required in order to guarantee the positive definiteness of Ht butcan be too restrictive. Alternatively, in order to get rid of the positivity constrainton the wj , two different combination functions can be selected: the exponential andsquare root combination function. The exponential combination is defined as
Ht = Expm [w1Logm(H1,t) + . . .+ wnLogm(Hn,t)]
A comparison of different procedures for combining high-dimensional MVF 3
where Expm(.) and Logm(.) indicate matrix exponential and logarithm respectively.Differently from the other two functions, the square root combination (for St) is notdirectly performed on the Hj,t but on the Sj,t
St = w1S1,t + . . .+ wnSn,t
with Ht = StS′
t and Hj,t = Sj,tS′
j,t.
3 Weights estimators
For the estimation of the combination parameters we consider three different estima-tion approaches: Composite Quasi ML (CQML), Composite GMM (CGMM) and‘Pooled’ Mincer-Zarnowitz (MZ) regressions. All the estimators considered sharethe following features: i) do not imply any assumption on the conditional distri-bution of returns ii) can be applied to large dimensional problems. In the CQMLmethod the estimated wi are obtained by performing the following optimization:
w = argmaxw
∑i6=j
L(r(ij)|w, IN ),
where r(ij)t = (ri,t, rj,t)′, w = (w1, . . . , wk)′ and
L(r(ij)|w, IN ) = −0.5N∑
h=1
log(|H(ij)T+h|)− 0.5
N∑h=1
r(ij)T+hH(ij)
T+h(r(ij)T+h)′
is the (bivariate) quasi log-likelihood for the couple of assets (i,j) computed over theprediction period [T+1,T+N].The CGMM estimator extends the same framework to a GMM setting. The wi areobtained by performing the following optimization:
w = argminw
∑i 6=j
m(r(i,j); w)′Ω(i,j)N m(r(i,j); w)
r(i,j)t = (ri,t, rj,t), for t = T+1, . . . , T+N .m(r(i,j); w) = 1
N
∑T+Nt=T+1 µ(r(i,j)
t ; w)
and µ(r(i,j)t ; w) is a (p× 1) vector of moment conditions Ω
(i,j)N is a consistent p. d.
estimator of
Ω(i,j) = limN→∞
NE(m(r(i,j); w∗)m(r(i,j); w∗)′)
with w∗ being the solution to the moment conditions i.e. E(m(r(i,j); w∗)) = 0.Finally, in the ‘Pooled’ MZ regressions the wi are the OLS estimates of the param-eters of the pooled regression model
4 Alessandra Amendola and Giuseppe Storti
vech(ΣT+h) = w1vech(H1,T+h) + . . .+ wnvech(Hn,T+h) + eT+h
for h = 1, ..., N , where, depending on the type of combination chosen, Σt and Hi,t
are appropriate transformations of Hi,t and Σt = Σt = rtr′
t.
4 An application to stock returns
We consider an application to the optimization of a portfolio of stocks using datafrom Chiriac and Voev (2011). Data refer to 2156 open to close daily returns on 6NYSE stocks from 3/1/2000 to 30/7/2008. Six different candidate models and fivecombination strategies are considered. For each of this we compute the associatedminimum variance portfolio and compare the empirical volatilities of the optimizedportfolios (table 1). The CGMM gives the lowest variance but the results appear tobe very sensitive to the choice of the model or combination strategy used.
Model Portfolio Variance∗ Comb. strategy Portfolio Variance∗DCC 2.33188 REG(rv) 2.08441CC 2.37658 REG 2.08733ES 2.33857 CGMM 2.07337
MCOV(22) 2.67185 CQML 2.10192MCOV(100) 2.10778 EW 2.08147
VECH 2.09339
Table 1 Realized portfolio variances ((∗):× 104) for different models, constant conditional corre-lation (CC), dynamic conditional correlation (DCC), exponential smoothing (ES), k-days movingcovariance(MCOV(k)), and weights estimators, CGMM, QML, equally weighted (EW), MZ re-gression (REG), MZ regression using realized covariance as dependent variable (REG(rv)). In allcases a linear combination function is used.
References
1. Amendola, A., Storti G.: A GMM procedure for combining volatility forecasts. Computa-tional Statistics & Data Analysis, 52(6), 3047–3060 (2008).
2. Patton, A., Sheppard, K. Evaluating volatility and correlation forecasts. Oxford FinancialResearch Centre, OFRC Working Papers Series, (2008).
3. Laurent, S., Rombouts, J.V.K., Violante, F.: On the Forecasting Accuracy of MultivariateGARCH Models. Forthcoming in Journal of Applied Econometrics, (2011).
4. Chiriac, R., Voev, V. : Modelling and forecasting multivariate realized volatility, Journal ofApplied Econometrics, 26(6), 922–947 (2011).
5. Pesaran, M. H., Schleicher, C., Zaffaroni, P.: 2009. Model averaging in risk management withan application to futures markets. Journal of Empirical Finance, 16(2), 280–305 (2009)