46TH SCIENTIFIC MEETING OF THE ITALIAN...

INTERNATIONAL CONFERENCE ARCHIVES

46TH SCIENTIFIC MEETING OF THE ITALIAN

STATISTICAL SOCIETY

Sapienza University of Rome - Faculty of Economics

Via del Castro Laurenziano, 9

Roma, IT

June 20, 2012 – June 22, 2012

CALL FOR PAPERS

SPRINGER BOOK: Studies in Theoretical and Applied Statistics

A Book in the new International Series Studies in Theoretical and Applied Statistics will be published

by Springer under the supervision of the Italian Statistical Society. The book will be edited by Prof. G.

Alleva (Sapienza University of Rome), Prof. A. Giommi (University of Florence) and Prof. C.D.

Paulino (University of Lisbon).

All those who presented a paper/poster at the SIS2012 meeting are invited to submit an extended version

of their presentation for possible publication. The submission deadline is October 31st. Papers are sent

to reviewers immediately after submission. Early submissions are encouraged. Submissions will

undergo a blind reviewing process and only the selected papers will be included in the book.

Papers must be written in English according to the same standards used for submission to the conference

(http://meetings.sis-statistica.org/index.php/sm/sm2012/schedConf/overview)

and must not exceed 10 pages. Latex standards are strongly encouraged.

Papers can be submitted on-line using the conference web site, selecting the "Springer Book" track. To

submit a paper, login by using the same username and password you exploited for submitting your

presentation at the conference. Please submit PDF files only. The word or the latex source will be asked

to the authors after acceptance of the paper.

The Italian Statistical Society (SIS) promotes every two years an international scientific meeting where

both methodological and applied statistical research papers are welcomed. Founded in 1939, the Italian

Statistical Society is a non-profit scientific society which aims to promote the development of the

Statistical Sciences and their applications. It publishes an International Journal (Statistical Methods and

Applications).

http://meetings.sis-statistica.org/index.php/sm/sm2012

http://meetings.sis-statistica.org/index.php/sm/sm2012

http://meetings.sis-statistica.org/index.php/sm/sm2012/schedConf/overview

Today there are about one thousand members including academics and researchers from governmental

and private organisations, active in statistical methodology, probability, economic statistics and

demography.

The 46th Scientific Meeting of the Italian Statistical Society will take place in Rome, in the period June

20, 2012 – June 22, 2012.

The conference will include 4 plenary sessions, 17 specialized sessions, 22 solicited sessions, and a

number of contributed sessions. The specialized sessions include 6 sessions, organized by the Royal

Statistical Society, the Société Française de Statistique, the Sociedad de Estadística e Investigación

Operativa, the Sociedade Portuguesa de Estatística and the Deutsche Statistische Gesellschaft. See the

conference overview for a list of session titles.

SOCIAL DINNER

The social dinner will be on Thursday evening 21st June at the Cloister of S. Pietro in Vincoli, Faculty

of Engineering SAPIENZA University of Rome, Via Eudossiana, 18.

Before dinner, there will be a short guided tour of the Basilica of S. Pietro in Vincoli which houses

Michelangelo's famous Moses statue. After dinner, the orchestra MUSA-Sapienza will perform a

concert of classical music.

The cost of the Social Dinner is 35€ for registered participants and 50€ for non-registered participants

and accompanying persons. Visit at the Basilica and concert are free.

For organizing reasons please confirm your participation to the dinner by sending an e-mail to the Local

Organizing Committee ([email protected]) at your convenience but preferably not later than

18th June 2012.

OVERVIEW


Conference Venue

The 46th SIS Scientific Meeting will take place in Sapienza University of Rome. - Faculty of

Economics, Via Del Castro Laurenziano, 9

June 20, 2012:

June 21, 2012:

June 22, 2012:

Session Arrangements

The Conference will include plenary, specialized, solicited, contributed and poster sessions. There will

also be invited sessions from representatives of some European statistical societies.

Contributed papers must be submitted according to the directions below, before 2/15/2012: the Program

Committee will take care of the review process of these papers. Invited talks (to be included in

specialized and solicited sessions) must be submitted to the organizer (chair) of the session, who will

take care of the review process.

Contributed, solicited and specialized papers must be written in English, according to the conference

style standards (see the author guidelines below). Contributed and solicited papers cannot exceed 4

pages (including tables, figures and references). Specialized papers cannot exceed 8 pages.


http://www2.uniroma1.it/default_e.php

http://maps.google.it/maps?hl=it&q=via+del+castro+laurenziano,+9&gs_upl=2076l9853l0l10116l31l16l1l14l14l0l193l1392l14.2l31l0&bav=on.2,or.r_gc.r_pw.,cf.osb&biw=1021&bih=801&um=1&ie=UTF-8&hq=&hnear=0x132f61781d1b28a5:0x13094f9aa0717ee0,Via+del+Castro+Laurenziano,+9,+I-00161+Roma&gl=it&ei=MzEhT9ueOObP4QTg16nZCA&sa=X&oi=geocode_result&ct=image&resnum=1&ved=0CCIQ8gEwAA

http://maps.google.it/maps?hl=it&q=via+del+castro+laurenziano,+9&gs_upl=2076l9853l0l10116l31l16l1l14l14l0l193l1392l14.2l31l0&bav=on.2,or.r_gc.r_pw.,cf.osb&biw=1021&bih=801&um=1&ie=UTF-8&hq=&hnear=0x132f61781d1b28a5:0x13094f9aa0717ee0,Via+del+Castro+Laurenziano,+9,+I-00161+Roma&gl=it&ei=MzEhT9ueOObP4QTg16nZCA&sa=X&oi=geocode_result&ct=image&resnum=1&ved=0CCIQ8gEwAA

Organization

The 46th Scientific Meeting of the Italian Statistical Society will be organized by the Department of

Methods and Models for Economics, Territory and Finance (MEMOTEF) and Department of Statistical

Sciences (DSS) of Sapienza University of Rome.

Organizing Team

The organizers for this international conference are:

SCIENTIFIC PROGRAM COMMITTEE

----------------------------------------------

Andrea Giommi (Chairman)

Giorgio Alleva

Silvia Biffignandi

Francesco Billari

Vincenza Capursi

Giuliana Coccia

Damiana Costanzo

Corrado Crocetta

Gustavo De Santis

Luigi Fabbris

Fabrizia Mealli

Andrea Pastore

Cira Perna

Marilena Pillati

Roberto Rocci

Roberta Siciliano

Nicola Torelli

Roberto Zelli

LOCAL ORGANIZING COMMITTEE --------------------------------------------

Giorgio Alleva (Chairman)

Maria Maddalena Barbieri

Maria Caterina Bramati

Alessandra De Rose

Augusto Frascatani

Daniele Frongia

Stefano Antonio Gattone

Cristina Giudici

Giuseppina Guagnano

Francesco Lagona

Brunero Liseo

Vincenzo Lo Moro

Guido Pellegrini

Maria Grazia Pittau

Francesco Maria Sanna

Isabella Santini

Maria Rita Sebastiani

Riccardo Sucapane

Andrea Tancredi

Donatella Vicari

http://www.memotef.uniroma1.it/newdip/inglese/dipartimento.php

http://www.memotef.uniroma1.it/newdip/inglese/dipartimento.php

http://www.dss.uniroma1.it/?q=en/content/principale/about-us/

http://www.dss.uniroma1.it/?q=en/content/principale/about-us/

Sapienza University of Rome

Sapienza University of Rome was founded in 1303 by Pope Boniface VIII, it is the first University in

Rome and the largest University in Europe: a city within a city, with over 700 years of history, 145,000

students, over 4,500 professors and almost 5,000 people are administrative and technical staff.

Sapienza’s Governance is composed of an internal body: a Vice Rector and a group of Deputy Rectors,

charged of specific activities, support the Rector in the management of the University, thanks also the

cooperation of ad hoc committees.

Sapienza has a wide academic offer which includes over 250 degree programmes and 200 one or two

year professional courses. Sapienza has 59 libraries and 21 museums as well as efficient student

services such as Ciao (Information, welcoming and counselling centre), SoRT (Counselling and

tutorship services) and assistance for disabled students.

Concerning with students’ origin, over 30,000 of them come from all parts of Italy; over 7,000 people

come from abroad. Incoming and outgoing Erasmus students are about 1,000 people per year. Sapienza

is implementing ICT services for students, such as online enrolment, University e-mail address and

wireless hotspots around Campus.

Sapienza plans and carries out important scientific investigations in almost all disciplines, achieving

high-standard results both on a national and on an international level, thanks of the work of its 11

faculties, 66 departments and several centres devoted to scientific research. There are also more than

100 PhD programmes which include almost all major fields of knowledge.

The first University in Rome is proud to have had many famous scholars among his students, such as

the poet Giuseppe Ungaretti, and to be considered an institution of capital importance in the field of

archaeological excavations, having achieved significant results in Libya, Syria, Turkey and on the

Palatin Hill in Rome. Dealing with the field of Physics’students, members of the so called ‘Via

Panisperna’ group – including the scientists Enrico Fermi, Edoardo Amaldi and Emilio Segrè – gave a

crucial contribute to Physics and left an important heritage in subjects like Quantum Physics, Physics

of Disordered Systems and Astrophysics.

Sapienza enhances research by offering opportunities also to international human resources. Thanks to

a special programme for visiting professors, many foreign researchers and professors periodically come

to Sapienza, consolidating the quality of its education and research programmes.

Professor Luigi Frati has been the Rector of Sapienza University since November 2008. He has started

a great innovation process which envisages full tax exemption as a prize for outstanding students,

elimination of useless structures and reorganisation of faculties.

Sapienza University of Rome is a public, autonomous and free university, involved in the development

of society through research, higher level of education and international cooperation.

Editorial standards

Papers must be written in English, according with SIS technical standards. Follow these links for the

current standards: LaTeX and Word. Papers to be presented in specialized sessions must not exceed 8

pages. All the other papers (solicited and contributed) must not exceed 4 pages. All submissions, except

invited talks, are subject to a blind refereeing process.

http://www2.uniroma1.it/default_e.php

http://homes.stat.unipd.it/mgri/SIS2010/svmult_authors.zip

http://homes.stat.unipd.it/mgri/SIS2010/svmult_authors.doc

PROGRAM

The program for this conference is available via the following link:

http://meetings.sis-statistica.org/index.php/sm/sm2012/schedConf/program

PRESENTATIONS AND AUTHORS

The presentation and authors for this conference are available via the following link:

http://meetings.sis-statistica.org/index.php/sm/sm2012/schedConf/presentations

PLENARY

Research advances and new challenges in Cluster Analysis

Maurizio Vichi

Handling Measurement Error in Survey Estimation using Accuracy Indicators

Chris Skinner

Integrating micro and macro data in historical demography

Marco Breschi

SPECIALIZED

A Bayesian nonparametric model for count functional data

Antonio Canale, David B. Dunson

ROI analysis of pharmafMRI data: an adaptive approach for global testing

Giorgos Minas, John A.D. Aston, Thomas E Nichols, Nigel Stallard

Distance - Based Statistics for Covariance Operators in Functional Data Analysis

Davide Pigoli

Clustering Multivariate Longitudinal Data: Hidden Markov of Factor Analyzers

Antonello Maruotti, Francesca Martella

Model based clustering of multivariate spatio-temporal data: a matrix-variate

approach

Cinzia Viroli

Random coefficient based dropout models: a finite mixture approach

Alessandra Spagnoli, Marco Alfò

Bayesian inference for causal effects in randomized experiments with

noncompliance: The role of multivariate outcomes

Fan Li, Alessandra Mattei, Fabrizia Mealli

Unconditional and Conditional Quantile Treatment Effect: Identification

Strategies and Interpretations

Margherita Fort

Dealing with complex problems of confounding in mediation analysis

Stijn Vansteelandt

A Unified Approach for Defining Optimal Multivariate and Multi-Domains

Sampling Designs

Pietro Demetrio Falorsi, Paolo Righi

Forest Inventories: Multi-Phase Sampling Strategies for Estimating Forest and

Non-Forest Resources Over Large Areas

Lorenzo Fattorini

http://meetings.sis-statistica.org/index.php/sm/sm2012/schedConf/program

http://meetings.sis-statistica.org/index.php/sm/sm2012/schedConf/presentations

http://meetings.sis-statistica.org/index.php/sm/sm2012/paper/view/2417



















Recent Advances in Estimation of Poverty Indicators for Domains and Small

Areas

Risto Lehtonen, Ari Veijanen

Weighted likelihood in Bayesian inference

Claudio Agostinelli, Luca Greco

Disclosure risk estimation via nonparametric log-linear models

Cinzia Carota, Maurizio Filippone, Roberto Leombruni, Silvia Polettini

Bayesian Latent Class Models in Veterinary and Human Epidemiology

Luzia Goncalves, Ana Subtil, Nuno Brites, M. Rosario de Oliveira, Ana

Margarida Alho, Jose Meireles, L. M. Madeira de Carvalho, Silvana Belo

STAR modeling of pulmonary tubercolosis delay-time in diagnosis

Bruno de Sousa, Dulce Gomes, Patrıcia A. Filipe, Cristina Areias, Teodoro

Briz, Carla Nunes

ROC Curves in medical decision

Ana Cristina Braga, Lino Costa, Pedro Oliveira

On Dividing an Empirical Distribution into Optimal Segments

Jan W. Owsiński

A composite indicator of sustainable well-being: the relative importance of

weights in the European Strategy for Sustainable Development

Elena Giachin Ricca, Stefano Tarantola

Non-aggregative assessment of subjective well-being

Marco Fattore

On the Extraction of a Common Persistent Component from Several Volatility

Indicators

Fabrizio Cipollini, Giampiero M. Gallo

Estimating jumps in volatility using realized-range measures

Massimiliano Caporin, Eduardo Rossi, Paolo Santucci de Magistris

The Value of Model Sophistication: DJIA Option Pricing

Jeroen V.K. Rombouts, Lars Stentoft, Francesco Violante

Families in Italy: the Quiet Revolution

silvana Salvini, Gustavo De Santis

Enterprise in a globalised context and public and private statistical setups

Fabrizio Guelpa, Giovanni Foresti, Stefania Trenti

Imputation and outlier detection in banking datasets

Andrea Pagano, Domenico Perrotta, Spyros Arsenis

Something for nothing?

Shirley Coleman

On consistency of Bayesian variable selection procedures

Elias Moreno, Javier Giron, George Casella, Lina Martinez, F. Vazquez–

Polo, Maria Martel

Dynamic Classification Trees for imprecise data

Massimo Aria, Valentina Cozza

An innovative procedure for smoothing parameter selection

Gianluca Frasso, Paul Eilers

The Italian Statistical Institute Macroeconometric Model - MEMo-It

Fabio Bacchini, et al.
























A statistical overview of the economic situation in the euro area

Gian Luigi Mazzi, Filippo Moauro, Rosa Ruggeri Cannata

M-Estimation of Shape Matrices under Incomplete and Serially Dependent Data

Gabriel Frahm

Convergence of Depths and Depth-Trimmed Regions

Rainer Dyckerhoff

On Robustifying Some Blind Source Separation Methods for Second Order

Nonstationary Data

Klaus Nordhausen

The impact of the three crises on health in Italy: evidence and lack of adequate

information systems

Giovanni Fattore

LISA map based on distances for functional data

Pedro Delicado, Sonia Broner

Smoothing mortality risks in space and time using flexible models

Maria Dolores Ugarte, Toma Goicoa, Jaione Etxeberria, Ana F. Militino

Economic recession and fertility in the developed world: Past evidence and recent

trends

Tomas Sobotka

Generalized boosted additive models

Sonia Amodio, Jacqueline Meulman

Geometrical approaches to the analysis of theshold exceedances

José M. Angulo, Ana E. Madrid

Conditional simulations for spatial max-stable processes for climate applications

Liliane Bel

Nonparametric estimation of the division rate of a size-structured population

Vincent Rivoirard

PM10 forecasting using mixture linear regression models

Jean-Michel Poggi

Employment outcomes of Short-time work scheme and Unemployment insurance

program beneficiaries: a longitudinal approach.

Maurizio Sorcioni, Giuseppe De Blasio

SOLICITED

Which family model makes couples more happy - dual earner or male

breadwinner ?

Anna Baranowska-Rataj, Anna Matysiak

Socioeconomic determinants of persistence in poor subjective health

Paolo Li Donni, Daria Mendola

Family structures and subjective wellbeing in Italy

Silvia Montecolle, Francesca Rinesi, Alessandra tinto

Identifiability of Discrete Graphical Models with Hidden Variables

Marco Valtorta, Elizabeth S. Allman, John A. Rhodes

Identifiability of hierarchical loglinear models with one hidden binary variable

Barbara Vantaggi

Binary models of marginal independence: a comparison of different approaches

Monia Lupparelli, Luca La Rocca


























Small Area Estimation with Uncertain Random Effects

Gauri Sankar Datta, Abhyuday Mandal, Anthony Wanjoya

Interacting Multiple Try Algorithms

Roberto Casarin, Radu Craiu, Fabrizio Leisen

Higher-order asymptotics in Bayesian inference

Laura Ventura, Walter Racugno

Online Detection of Outliers and Structural Breaks

Giovanni Petris

Estimating the prevalence of cancer patients who have recurred

Angela B. Mariotto, Roberta De Angelis, Lucia Martina

Multivariate Permutation Test to Compare Survival Curves for Matched Data

Stefania Galimberti

Patterns of care and related costs of cancer prevalent cases by phase of disease

Silvia Francisci, Anna Gigli

Estimating the incidence of cancer disease using a Bayesian backcalculation

approach

Leonardo Ventura

Bayesian T-optimal designs by simulation: a case-study on model discrimination

Rossella Berni, Federico M Stefanini

From Markov moves in contingency tables to linear model estimability

Roberto Fontana, Fabio Rapallo, Maria Piera Rogantin

Sensitivity Analysis and FANOVA Graphs for Computer Experiments

Jana Fruth, Sonja Kuhnt

Factorial Graphical Lasso and Slowly Changing Graphical Models for Estimating

Dynamic Networks

Antonino Abbruzzo, Ernst Wit

New Statistics for Estimating the Parameter of the Stochastic Actor-Oriented Model

Viviana Amati

Graph embedding via dissimilarity mapping for network comparison

Domenico De Stefano

Statistical models for virtual water network analysis

Alessandra Petrucci, Emilia Rocco

Latent Class CUB Models

Leonardo Grilli, Maria Iannario, Domenico Piccolo, Carla Rampichini

Formative and reflective models to determine latent construct

Anna Simonetto

Log-ratios analysis to study the relative information of ordinal variables

Michele Gallo

Ordinal Models for Financial Evaluation

Paola Cerchiello

Bayesian model averaging for financial evaluation

Silvia Figini

Labour market response models for university evaluation

Daniele Checchi, Silvia Salini
























A Statistical Framework to Measure Reputation Risk

Tiziano Bellini, Luigi Grossi

The integration of administrative data to analyse the business economic

performance: methodological aspects and results of a study

Fulvia Cerroni, Viviana De Giorgi, Marianna Mantuano

The micro economics of trade patterns and firm performances

Giovanni Dosi, Marco Grazzi, Federico Tamagni, Chiara Tomasi

The post-entry effect of exporting on productivity: inference on the conterfactual

distribution

Maria Ferrante, Marzia Freo, A. Viviani

Data Integration and Productivity Estimation at a Firm Level

Filippo Oropallo, Stefania Rossetti

A PLS algorithm version working with ordinal variables

Giuseppe Boari, Gabriele Cantaluppi

Bivariate logistic models for the analysis of the Students University “Success”

Marco Enea, Massimo Attanasio

University admission test and students’ careers: an analysis through a regression

chain graph with a hurdle model for the credits

Leonardo Grilli, Carla Rampichini, Roberta Varriale

Comparing degree programs using unadjusted performance indicators. Assessing

the bias from the Potential Confounding Factors

Mariano Porcu, Isabella Sulis

University of Pisa and academic performance: a sample survey on students with

no exams in 2011

Lucio Masserini, Monica Pratesi

Assessing the Impact of Financial Aids to Firms: Causal Inference in the presence

of Interference

Bruno Arpino, Alessandra Mattei

Inverse probability weighting to estimate causal effects of sequential treatments: a

latent class extension to deal with unobserved confounding

Francesco Bartolucci, Leonardo Grilli, Luca Pieroni

A two-part geoadditive model for geographical domain estimation

Chiara Bocci, Alessandra Petrucci, Emilia Rocco

Application of Marginal Structural Models in Chronic Kidney Disease (CKD)

Epidemiology: practical implementation in the Swedish National CKD Registry

Elena Pasquali, Marie Evans, Juan Jesus Carrero, Rino Bellocco

A Dimension Reduction Method for Approximating Integrals in Latent Variable

Models for Binary Data

Silvia Bianconcini, Silvia Cagnone, Dimitris Rizopoulos

Kalman Filter for Maximum Likelihood Estimation of GMRFs

Luigi Ippoliti, Luca Romagnoli

Monte Carlo Likelihood Inference in Multivariate Model-Based Geostatistics

Marco Minozzo, Clarissa Ferrari

Statistical Modelling of Spatial Extremes

Anthony Davison, Simone Padoan, Mathieu Ribatet

Nonparametric smoothing of circular data

Agnese Panzera, Charles C. Taylor





























Inverse Batschelet Distributions as Models for Circular Data

Arthur Pewsey

Depth Analysis of Directional Data

Claudio Agostinelli, Mario Romanazzi

Simulation of random rotation matrices

John T. Kent, Asaad M. Ganeiber

Dynamically modelling of fuzzy sets for flexible data retrieval

Miroslav Hudec

Factor PD-Co-clustering in Official Statistics

Marina Marino, Germana Scepi, Cristina Tortora

Extracting meta-information by using Network Analysis tools

Agnieszka Stawinoga, Maria Spano, Nicole Triunfo

How the text mining measures complex phenomena in official statistics

Sergio Bolasco, Pasquale Pavone

Assessing assumptions for data fusion procedures

Alfonso Piscitelli, Antonio D'ambrosio

Filling in long gap sequences by performing EOF and FDA jointly

Antonella Plaia, Francesca Di Salvo, Mariantonietta Ruggeri, Gianna Agrò

Missing Data Imputation within the Statistical learning Paradigm

Antonio D'Ambrosio

The use of adminitrative registers in the 2011 Census in Germany

Stephanie Hirner

Robust methods for correction and control of Italian Agriculture Census data

Alessandra Reale, Francesca Torti, Marco Riani

The industry and services continuous census” based on administrative sources:

opportunities and problems

Manlio Calzaroni, Caterina Viviano

Using coarse resolution satellite images for crop area estimation: benchmarking

their efficiency

Javier Gallego, Mohamed El-Aydam

How to select sample sites onto a study area?

Lucio Barabesi, S. Franceschi, M. Marcheselli

Do personal characteristics affect the Rasch measures of perceived physical risk?

A quantile regression approach.

Fabio Aiello, Giovanni Boscaino, Monica Mandalà

Modeling nonignorable missingness in multidimensional latent class IRT models

Silvia Bacci, Francesco Bartolucci, Bruno Bertaccini

Risk profile versus portfolio selection: a case study

Valeria Caviezel, Sergio Ortobelli Lozza, Lucio Bertoli Barsotti

Cycles Syllogisms and Semantics: Examining the Idea of Spurious Cycles

Stephen Pollock

Spectral filtering for trend estimation

Marco Donatelli, Alessandra Luati, Andrea Martinelli

Robust estimation for multivariate data under the independent contamination

model

Claudio Agostinelli, R. A. Maronna, V. J. Yohai


























Adaptive robust location-scale estimation

Pietro Coretto

Minimum Volume Peeling: a Multivariate Mode Estimator

Giovanni Porzio, Giancarlo Ragozini, Steffen Liebscher, Thomas Kirschstein

A comparison of robust methods with small sample experimental data

Marco Riani, Andrea Cerioli, Maria Adele Milioli, Gianluca Morelli

Patterns of Mortality Decline and Individual Ageing: An Overview

Elisabetta Barbi

Survival predictive models in centenarians

Rossella Miglio, Paola Gueresi

Health status in over 85 years old living in Residential Facilities in Italy

Giulia Cavrini, Claudia Di Priamo, Lorella Sicuro, Alessandra Battisti,

Alessandro Solicapa, Giovanni de Girolamo

CONTRIBUTED

Variation in Obstetric Intervention Rates across Hospitals in Sardinia

Massimo Cannas, Emiliano Sironi

On the role of normalized inverse-Gaussian priors in continuous-time models

Matteo Ruggiero

Randomly Reinforced Urn Designs whose Allocation Proportions Converge to

Arbitrary Prespecified Values

Giacomo Aletti, Andrea Ghiglietti, Anna Maria Paganoni

Calibration estimation in dual frame surveys

Maria Giovanna Ranalli, Annalisa Teodoro

Comparing model-assisted estimators of structural variables in forest surveys

Ivan Arcangelo Sciascia, Matteo Garbarino, Giorgio Vacchiano, Renzo

Motta

A study in panel cointegration and poolability: Long-run money demand equations

for Gulf Cooperation Council countries

Stefano Fachin

Uncertainty in statistical matching for discrete categorical variables

Pier Luigi Conti, Daniela Marella, Mauro Scanu

Independent Component Analysis of Milan Mobile Network Data

Paolo Zanini, Piercesare Secchi, Simone Vantini

An Objective Bayesian analysis of dichotomous sensitive data

Maria Maddalena Barbieri, Brunero Liseo

Ensuring comparability over time and between domains by means of complex

sample techniques

Tiziana Tuoto, Francesca Inglese

Confidence intervals for the Berger & Boos’ procedure in the 2x2 Binomial Trial

enrico ripamonti, piero quatto

Kalman Filter for Maximum Likelihood Estimation of GMRFs

Luigi Ippoliti, Luca Romagnoli

Bayesian inference for the multivariate skew-normal model: a Population Monte

Carlo approach

Antonio Parisi, Brunero Liseo
























Reconstructing a multinormal covariance matrix from its spherically truncated

projection

Filippo Palombi, Simona Toti, Romina Filippini

Clustering of financial time series in extreme scenarios

Roberta Pappadà, Fabrizio Durante

Investments in Renewable energies: evidence from a panel of countries

Giuseppe Scandurra

A Topological Definition of Phase and Amplitude Variability of Functional Data

Simone Vantini

Nonparametric saddlepoint test and pairwise likelihood inference

Nicola Lunardon, Elvezio Ronchetti

On the stationarity of the Threshold Autoregressive process: the two regimes case

Marcella Niglio, Francesco Giordano, Cosimo Damiano Vitale

Filling in long gap sequences by performing EOF and FDA jointly

Francesca Di Salvo, Mariantonietta Ruggieri, Gianna Agro'

Parallel Adaptive Markov chain Monte Carlo with applications

Mauro Bernardi, Lea Petrella

Modern Bayesian Inference in Zero-Inflated Poisson Models

Erlis Ruli, Laura Ventura

A comparison of different procedures for combining high-dimensional

multivariate volatility forecasts

Giuseppe Storti, Alessandra Amendola

Estimation of wind speed prediction intervals by multi-objective genetic

algorithms and neural networks

valeria vitelli

On a predictive measure of discrepancy between classical and Bayesian estimators

Stefania Gubbiotti

A prediction error for a linear regression model with fuzzy random elements

Maria Brigida Ferraro

Some further results for the two-parameter Poisson-Dirichlet partition model

Annalisa Cerquetti

Matching immigrant and native workers: evidence from the recent downturn in

Italy

Adriano Paggiaro

The analysis of firm demography: an approach based on micro-geographic data

Diego Giuliani, Simonetta Cozzi, Giuseppe Espa

Regression estimators for capure-recapture frequency data

Irene Rocchetti

Towards an integrated surveillance system of road accidents

tiziana tuoto, Silvia Bruzzone, Luca Valentino, Giordana Baldassarre,

Nicoletta Cibella, Marilena Pappagallo

On the Design Based Inference for Continuous Spatial Populations

giorgio eduardo montanari, Giuseppe Cicchitelli

A Critical Look at Compositional Analysis for Assessing Habitat Selection

Caterina Pisani

























Small area estimation for panel data

Annamaria Bianchi

Interpreting Deviations from Long-run Parity in an I(2) Model

Giuliana Passamani

Contributions from income components to Zenga's point and synthetic inequality

measures: an application to EU countries

Michele Zenga, Leo Pasquazzi

An income mobility measure based on Zenga’s inequality index

Mauro Mussini

Equivalence scales, inflation, and PPP: a unique (and simple) approach to

estimation

Gustavo De Santis

Sensitivity analysis on a Cellular Automata model for the diffusion of Pleural

Mesothelioma

Claudia Furlan, Cinzia Mortarino

Reproducibility Probability Estimation and Testing for some common

nonparametric tests

Daniele De Martini, Lucio De Capitani

Multilevel algorithmic models to measure item importance on latent variables'

indicators

Marica Manisera, Marika Vezzoli

A multiple imputation procedure of censored values in family-based genetic

association studies

Fabiola Del Greco M., Cristian Pattaro, Cosetta Minelli, Peter P.

Pramstaller, John R Thompson

Ridge analysis through profile likelihoods

Valeria Sambucini, Ludovico Piccinato

Migrant students classroom allocation policy in Italian schools

Andrea Scagni

Testing Phase and Amplitude Variability in Functional Data Analysis: a

Hierarchical Permutation Test Approach

Alessia Pini, Simone Vantini

A hierarchical bayesian model for modelling benthic macroinvertebrates densities

in lagoons

Serena Arima, Alberto Basset, Giovanna Jona Lasinio, Alessio Pollice, Ilaria

Rosati

Life-Course Transitions, Market Work and Domestic Work of Italian Couples

antonino Di Pino

Adjusting Time Series of Possible Unequal Lengths

Ilaria Lucrezia Amerise, Agostino Tarsitano

Variable selection in competing risks model

Marialuisa Restaino, Alessandra Amendola

A Bayesian Semiparametric Fay-Herriot-type model for Small Area Estimation

silvia polettini

Assessing Multivariate Measurement Systems in Multisite Testing

Michele Scagliarini, Stefania Evangelisti



























Predicting EQ-5D responses from SF-12: should we take into account dependece

and ordering?

Caterina Conigliani, Andrea Tancredi, Andrea Manca

Large sample properties of Gibbs-type priors

Pierpaolo De Blasi, Antonio Lijoi, Igor Pruenster

Bayesian Unit Root Tests: a Monte Carlo Study

Margherita Gerolimetto, Isabella Procidano

Data fusion in pharmaceutical marketing: new perspective from administrative

data.

Paolo Mariani

How internal and international migrations shape the age structure of the Italian

regions, 1955-2008

Paola Di Giulio, Cecilia Reynaud, Luca Vergaglia

Bayesian modeling of presence-only data

Fabio Divino, Giovanna Jona Lasinio, Natalia Golini

An Evaluation of the Student Satisfaction based on CUB Models

barbara cafarelli

Limited Information Estimation Methods for Paired Comparison Data

Manuela Cattelan

Closed Likelihood-Ratio Testing Procedures to Assess Similarity of Covariance

Matrices

Francesca Greselin, Antonio Punzo

Fiducial Distributions for Real Exponential Families

Piero Veronese, Eugenio Melilli

Hedonic Indexes and GDP estimate in the USA

Gabriele Serafini

Some results on stochastic comparisons of ROC curves

Silvia Figini, Chiara Gigliarano, Pietro Muliere

Efficiency in the use of natural non-renewable resources from mining and

quarrying in Italy. Time series analysis and Economy-wide Material Flows

Accounts

Donatella Vignani

Marital Disruption and Subjective Well-being: Evidence from an Italian Panel

Survey

Giulia Rivellini, Alessandro Rosina, Emiliano Sironi

The Decision Making Process of Leaving Home: A Longitudinal Analysis of

Italian Women

Giulia Ferrari, Alessandro Rosina, Emiliano Sironi

Alternative Bayesian analysis of capture recapture data with behavioral effect

modelling

Danilo Alunni Fegatelli

PDE penalization for spatial fields smoothing

Laura Azzimonti, Maurizio Domanin, Laura Maria Sangalli, Piercesare

Secchi

Multivariate Nonlinear Least Squares: Direct and Beauchamp and Cornell

Methodologies

Renato Guseo, Cinzia Mortarino





























Handling weak dependence structures with copulas

Enrico Foscolo, Fabrizio Durante

Bayesian nonparametric predictions for count time series

Luisa Bisaglia, Antonio Canale

How to integrate macro and micro perspectives: an example on Human

Development and Multidimensional Poverty.

Silvia Terzi

Composite Indicator of Social Inclusion in the European Union

Erasmo Vassallo, Francesca Giambona

Depth measures for the study of real and simulated ECG signals

Francesca Ieva

Experimental design for the estimation of Rician-distributed intensity fields in

MRI

Stefano Baraldo, Francesca Ieva, Luca Mainardi, Anna Maria Paganoni

A von Mises Markov random field model for the analysis of spatial circular data

Francesco Lagona

A Multivariate VEC-BEKK Model for Portfolio Selection

Andrea Federico Pierini, Alessia Naccarato

Combining the complete-data and nonresponse models for drawing imputations

under MAR

Shahab Jolani, Stef van Buuren, Laurence E. Frank

A Well-being Index Based on the Weighted Product Method

Matteo Mazziotta, Adriano Pareto

A comparison of semiparametric density estimation methods for multivariate risk

management

Marco Bee

Modelling poverty transitions in Luxembourg: true state dependence or

heterogeneity?

Alessio Fusco, Nizamul Islam

Causal analysis of education and birth inequalities through a latent class SEM

Silvia Bacci, Francesco Bartolucci, Luca Pieroni

Prediction of nonstationary functional data: Universal Kriging in a Hilbert Space

Alessandra Menafoglio, Matilde Dalla Rosa, Piercesare Secchi

School tracking and equality of opportunity in a multilevel perspective

Isabella Romeo, Emanuela Raffinetti

Indexing the Worthiness of Social Agents

Giulio D'Epifanio

Asymptotic estimation of right and left kurtosis measures, with applications to

finance

ANNA MARIA FIORI, Davide Beltrami

Ordinal Lorenz Regression with application in Customer Satifaction Surveys

Emanuela Raffinetti

A computational method to estimate sparce multiple Gaussian graphical models

Rossella Onorati, Luigi Augugliaro, Angelo Marcello Mineo

Deterministic or stochastic seasonality in daily electricity prices?

Paolo Chirico



























Social capital and its impact on poverty reduction: measurement issues in

logitudinal and cross-country comparisons. The case of UE.

Isabella Santini, Anna De Pascale

The diffusion of nuclear energy in the developing countries

Alessandra Dalla Valle, Claudia Furlan

A model for the joint distribution of income and wealth

Markus Jantti, Eva Sierminska, Philippe Van Kerm

Mothers with children aged 0-2 years: work/family reconciliation and support

networks

Cinzia Castagnaro, Alessandra Fasano, Antonella Guarneri

A novel method for spatial smoothing

Laura M. Sangalli, James O. Ramsay

Poverty transitions in Italy

Lucia Coppola, Davide Di Laurea, Daniela Lo Castro, Mattia Spaziani

Spatial smoothing over non-planar domains

Bree Ettinger, Simona Perotto, Laura M Sangalli

Lattice Models for the analysis of Urban Crime

Enrico di Bella, Luca Persico, Lucia Leporatti

Family resources and cognitive decline among elderly in Italy

Stefano Mazzuco

The median of a set of histogram data

Lidia Rivoli, Rosanna Verde, Antonio Irpino

Estimating the Homeless Population through Indirect Sampling and Weight

Sharing Method

Claudia De Vitiis

Rates for Bayesian estimation of location-scale mixtures of super-smooth densities

Catia Luisa Scricciolo

Data gathering for elusive population. The case of foreigners during the XV

Italian Census. A focus on Prato

Linda Porciani

Immigrant entrepreneurship through the economic crisis in Italy

Benedetta Cassani, Cristina Giudici, Roberta Rizzi

International Mobility of University Students: the Italian case

Domenica Fioredistella Iezzi, Mario Mastrangelo, Scipione Sarlo

Chronological analysis of textual data and curve clustering: preliminary results

based on wavelets

Matilde Trevisani, Arjuna Tuzzi

Exponential Random Graph Model for multivariate networks: an application in

knowledge network analysis

Domenico De Stefano, Susanna Zaccarin

The Role of Social Capital in Preventing Irregural Work in Italian Regions

Maria Felice Arezzo

Bayesian model averaging for financial credit risk measurement

silvia figini

Considerations about the Quotient of two Correlated Normals

angiola pollastri, Vanda Tulli


























Estimating Business Statistics from administrative data: a study on small and

medium enterprises

Orietta LUZI, Giovanni Seri, Viviana De Giorgi, Giampiero Siesto

The Coverage Survey of the 6th Agricultural Census

Matteo Mazziotta, Antonella Bernardini, Loredana De Gaetano, Lorenzo

Soriani

A Local Price Observatory – Price minimarket: innovations and additional

knowledge about prices - The experience of Umbria

Cristina Carbonari, Angiona sabrina, Paradisi Francesca

Frailty Multi-State Models based on Maximum Penalized Partial Likelihood

Federico Rotolo, Catherine Legrand

Reconciliation of Time Series according to a Growth Rates Preservation Principle

Tommaso Di Fonzo

A decision support system for duopolies with incomplete information

Paola Vicard, Julia Mortera

False discovery rate control and the dependence structure of test statistics

Claudio Lupi

Neural Network Approach Applied for Classification in Business and Trade

Statistics

Jana Juriová

The analysis of the material deprivation of foreigners in Italy

Anna Maria Milito, Annalisa Busetta, Antonino Mario Oliveri

Effective Facebook population: the Italian case

Cristiano Tessitore, Ester Macrì

A Clustream strategy for Functional Boxplots on multiple streaming time series

Antonio Balzanella, Elvira Romano

New approach to the identification of the Inverse Weibull model

Biagio Palumbo, Giuliana Pallotta

Stochastic Frontiers Approach: an Empirical Analysis of Italian Environmental

Spending

Sabrina Auci, Annalisa Castelli, Donatella Vignani

Estimating student learning value-added models from repeated cross-sections

Dalit Contini

Intergenerational Mobility and Gender Gap: Evidence from Mediterranean

Countries

Rosalia Castellano, Gennaro Punzo, Antonella Rocca

The role of Istat territorial offices for data quality control in the 15th Population

and Housing Census. The case of Tuscany

Alessandro Valentini, Sabina Giampaolo

Longitudinal patterns of financial product ownership: a latent growth mixture

approach

Francesca Bassi

Measuring job quality: a composite indicator

Giovanna Boccuzzo, Martina Gianecchini

On Measuring Inequity in Taxation Between Groups of Tax Payers

Achille Vernizzi, Simone Pellegrino, Maria Giovanna Monti



























Capital income inequality: evidences from ECHP data

Francesca Greselin, Leo Pasquazzi, Ricardas Zitikis

Note on a new generalization of the skew-normal distribution

Valentina Mameli, Monica Musio

Estimates of Foreign Trade Using Genetic Programming

Miroslav Klucik, Miroslav Klucik

Machine learning techniques for Propensity score matching with clustered data. A

simulation study.

Massimo Cannas, Bruno Arpino, Francesco Billari

Impact of Audio Tools in Web Surveys

Daniele Toninelli, Silvia Biffignandi

Early-life circumstances and late-life income

Omar Paccagnella, Christelle Garrouste

A Simple Risk-Adjusted CUSUM chart for monitoring binary health data

Marco Marchi

From theory to practice: a methodological proposal for operationalising and

summarizing the concept of quality of work

Marco Centra, Maurizio Curtarelli, Valentina Gualtieri

Partners’ income and decision making

Lucia Coppola, Domenica Quartuccio

Dealing With a Potential Bias in Estimating the Share of Discriminated Women

Rosa Giaimo, Giovanni Luca Lo Magno

Border surveys and Time Location Sampling (TLS): an application on incoming

tourism in Sicily

Stefano De Cantis, Mauro Ferrante

The Use of Administrative Data for Short Term Business Statistics: Lessons from

a Cross-Country Experience

Ciro Baldi, Francesca Ceccato, Silvia Pacini, Donatella Tuzi

Price transmission and market power in the food market

Maria Caterina Bramati

Data imputation processes based on statistical analysis: the case of Kosovo census

data

Marco Scarnò, Bekim Canolli, Servete Muriqi, Hisni Ferizi

Dimension reduction for measuring the multidimensional demographic

convergence

Maria Rita Sebastiani

Is financial fragility a matter of illiquidity? An appraisal for Italian households

Marianna Brunetti

Burnout, learning and self-esteem at school: an empirical study.

Cristiana Ceccatelli

Autocorrelated non-normal data in control charts

Claudio Giovanni Borroni, Manuela Cazzaro, Paola Maddalena Chiodini

Social welfare orderings of the Generalized-Lorenz Type: applications of an

extended equivalence theorem

Alessandra Giovagnoli

Timely Indices for Residential Construction Sector

Attilio Gardini, Enrico Foscolo




























Dimensions of well-being and their statistical measurements

Carla Ferrara, Francesca Martella, Maurizio Vichi

The diagnostics of the mean squared error of the Eblup in small area estimation

models

Renato Salvatore, Maria Chiara Pagliarella

REGISTRATION

The registration method for this conference is available via the following link:

http://meetings.sis-statistica.org/index.php/sm/sm2012/schedConf/registration

ACCOMMODATION

Rooms are optioned for attendees.

To book a room, please send an e-mail message or make a phone call to the hotel (don't use the form on

the hotel website) and, when booking, specify that you will be attending the SIS 2012 meeting (code

SIS2012).

Daily rates per room. Buffet breakfast is always included. City tax is NOT included (Euro 3,00 per night

per person, to be paid directly upon arrival).

Walking distance to the conference site from Google Maps in square brackets. If you need information

on public transportation, here is the link to the buses route planner: (English) (Italiano)

CONFERENCE TIMELINE AND INFORMATION

CONFERENCE

First day of conference June 20, 2012

Last day of conference June 22, 2012

WEBSITE

Go Live (as a Current Conference) December 2, 2011

Move to Conference Archive December 31, 2014

SUBMISSIONS

Author registration opened December 2, 2011

Author registration closed June 20, 2013

Call for Papers posted July 2, 2011




http://meetings.sis-statistica.org/index.php/sm/sm2012/schedConf/registration

http://www.atac.roma.it/index.asp?lingua=ENG

http://www.atac.roma.it/index.asp?lingua=ITA

Submissions accepted July 28, 2012

Submissions closed November 5, 2012

REVIEWS

Reviewer registration opened December 2, 2011

Reviewer registration closed April 10, 2014

WEBSITE POSTING

Accepted papers May 1, 2012

XLVI Riunione Scientifica

Roma, 20-22 qiuqno 2012)

Editore: CLEUP – Padova

ISBN 978-88-6129-882-8

Research advances and new challenges in ClusterAnalysis

Maurizio Vichi

Abstract Methodologies for Cluster Analysis are among the most well-known andappreciated statistical techniques of multivariate analysis. In the last twenty years theyhave been increasingly applied in new disciplines and frequently almost reinvented inmany area of research such as computer science, engineering, bioinformatics and inspecific fields including machine learning, data mining and pattern recognition. In thispresentation we show recent statistical research advances in methodologies forclustering. The illustrated methods have in common the statistical approach offormulating a mathematical model for partitioning or hierarchical clusteringmultivariate observations, estimating parameters of the model and finally fitting it todata.

1. Introduction

The interpretation of the relationship within a set of objects can be helped by obtaininga hard partition of the objects into disjoint classes, with the property that objects in thesame class are perceived as similar to one another, while objects in different classes areconsidered dissimilar. Such partitions can be achieved from the application of ClusterAnalysis methodologies. Several methods have been proposed for clustering a set ofmultivariate objects. In this presentation we concentrate on the model based approachof formulating a clustering model for the data, e.g., a partition or a hierarchy specifiedfor reconstructing data (multivariate observations or dissimilarities) and then solvingthe least-squares or maximum likelihood corresponding fitting problem.

1 Maurizio Vichi, Sapienza Università di Roma, Dipartimento di Scienze Statistiche; email:[email protected]

2 Maurizio Vichi

The presentation in divided in three parts: model-based partitioning and hierarchicalclustering of a set of units for dissimilarity data, multi-partitioning of the modes of athree and two way data matrix including multivariate observations and clustering oflongitudinal multivariate observations.

2. Model-Based partitioning and hierarchical clustering

The Cluster Analysis problem of partitioning or hierarchical clustering a set of units,when dissimilarity data are observed, is here handled with the statistical model-basedapproach of fitting the “closest” classification matrix to the observed dissimilarities. Aclassification matrix represents a clustering model expressed in terms of dissimilarities.Three models for partitioning a set of units from dissimilarity data, are illustrated andtheir estimation -via least-squares- is given together with new fast coordinate descentalgorithms. Following the same statistical fitting approach a new model for hierarchicalclustering objects starting from dissimilarity data is also illustrated.

3. Bi-partitioning, multi-partitioning, clustering and disjointprincipal component analysis

New methodologies for three-mode (units, variables and occasions) and two-mode(units and variables) symmetrical or asymmetrical partitioning or multi-partitioningthree- and two-way data are presented. In particular, by reanalyzing the double k-means, that identifies a unique partition for each mode of the data, a relevant extensionis discussed which allows to synthesize classes of each mode symmetrically by meansof mean vectors or linear combinations (components) for all modes, or asymmetricallyby mixing a different strategy for each mode. Furthermore, the model allows thepartition of one mode, conditionally to the partition of the other one. The performanceof such generalized double k-means has been tested by both a simulation study and anapplication to gene microarray data. Clustering and disjoint principal component allowsto identify a partition of the units and a partition of the variables together with aprincipal component for each class of the partition of variables. This technique can beseen as a special case of the generalized double k-means.

4. Clustering longitudinal multivariate observations

Longitudinal multivariate data involve repeated observations of different features of thesame statistical units over a period of time. The aim is to study the developmentaltrends of the units across at least a part of their life span.

Research advances and new challenges in Cluster Analysis

The dynamic evolution of the partitions of units along time is in this presentationstudied in an unsupervised clustering context by using a model based clusteringapproach. A clustering together with a vector autoregression VAR(P) model -where Pis the lag length of the VAR- are combined into a new technique that identifies anhomogeneous partition in G classes for each time t and the autoregressive dynamicevolution of the clusters. The proposed clustering/VAR model can be used also toforecast a partition at time T+1. The parameters of the model are estimated both in aleast-squares and maximum likelihood framework and efficient recursive algorithms aregiven. A simulation study together with some applications of the proposedmethodologies are shown to appreciate performances of the models and the quality ofits estimates.In the final part of the presentation , similarities between trajectories describinghistories of units are studied. Trend, velocity and acceleration are three characteristicsof trajectories considered to assess pairwise dissimilarities between trajectories. TheTucker model for three-way data, modified for clustering units together with adimensional reduction of the observed variables, is estimated in the metric spacespecified by trend, velocity and acceleration. An application is given to show theperformances of the methodology.

References

Martella F., Alfò M., Vichi M. (2010). Hierarchical mixture models for biclustering in microarray,STATISTICAL MODELLING, 11(6): 489-505.

Martella F., Vichi M. (2012) Clustering microarray data using model-based double K-means,JOURNAL OF APPLIED STATISTICS, DOI:10.1080/02664763.2012.683172.

Maruotti A., Vichi M. (2012) Clustering Longitudinal Multivariate Observations: Model-BasedAutoregressive K-means, Submitted.

Rocci R., Gattone A., Vichi, M (2011). A New Dimension Reduction Method: FactorDiscriminant K-means, JOURNAL OF CLASSIFICATION, vol 28, DOI: 10.1007/s00357-011

Vicari D., Vichi M. (2009). Structural Classification Analysis of Three-Way Dissimilarity Data.JOURNAL OF CLASSIFICATION, vol. 26; p. ., ISSN: 0176-4268

Vichi M., Rocci R. (2008). Two-mode Multi-partitioning. COMPUTATIONAL STATISTICS &DATA ANALYSIS. vol. 52, pp. 1984-2003 ISSN: 0167-9473.

Vichi M., Saporta G. (2009). Clustering and Disjoint Principal Component Analysis.COMPUTATIONAL STATISTICS & DATA ANALYSIS vol. 53; p. 3194-3208, ISSN: 0167-9473, doi: 10.1016/j.csda.2008.05.028,

Vichi M. (2011) Fitting Hierarchical Clustering Models to Dissimilarity Data, Submitted.

Vichi M. (2008). Fitting Semiparametric Clustering Models to Dissimilarity Data, ADVANCES INDATA ANALYSIS AND CLASSIFICTION, vol, 2, 2, 121-161, DOI: 10.1007/s11634-008-0025-4

A Bayesian nonparametric model for countfunctional data

Antonio Canale and David B. Dunson

Abstract Count functional data arise in a variety of applications, including longi-tudinal, spatial and imaging studies measuring functional count responses for eachsubject under study. The literature on statistical models for dependent count datais dominated by models built from hierarchical Poisson components. The Poissonassumption is not warranted in many applications, and hierarchical Poisson mod-els make restrictive assumptions about over-dispersion in marginal distributions.This article discuss a class of nonparametric Bayes count functional data mod-els introduced in Canale and Dunson [3], which are constructed through roundingreal-valued underlying processes. Computational algorithms are developed usingMarkov chain Monte Carlo and the methods are illustrated through application toasthma inhaler usage.

Key words: Generalized linear mixed model; Hierarchical model; Longitudinaldata; Splines; Stochastic process.

1 Introduction

A stochastic process y = y(s),s ∈ D is a collection of random variables indexedby s ∈ D , with the domain D commonly corresponding to a set of times or spa-tial locations and y(s) to a random variable observed at a specific time or location s.There is a rich frequentist and Bayesian literature on stochastic processes, with com-mon choices including Gaussian processes and Levy processes, such as the Poisson,Wiener, beta or gamma process. Gaussian processes provide a convenient and well

Antonio CanaleUniversity of Turin and Collegio Carlo Alberto, Turin, Italy e-mail: [email protected]

David B. DunsonDuke University, Durham, NC e-mail: [email protected]

1

2 Antonio Canale and David B. Dunson

studied choice when y : D→ℜ is a continuous function. Our interest focuses on thecase in which y : D →N = 0, . . . ,∞, so that y is a count-valued stochastic pro-cess over the domain D . There are many applications of such processes includingdevelopemental toxicity epidemiology studies monitoring a count health responseover time.

Although there is a rich literature on count stochastic process models for lon-gitudinal and spatial data, most models rely on Poisson hierarchical specifications.Although such models have a flexible mean structure, the Poisson assumption isrestrictive in limiting the variance to be equal to the mean, with over-dispersionintroduced in marginalizing out the latent processes. Such modeling frameworkshave several disadvantages. Firstly the dependence structure is confounded withmarginals overdispersion and secondly under-dispersed count data are not accomo-date. To relax usual Poisson parametric assumptions [10] exploited a hierarchicalspecification of the Faddy model [6]. Although the gain in flexibility, the computa-tion for this model is challenging.

In considering models that separate the marginal distribution from the depen-dence structure, it is natural to focus on copulas. Nikoloulopoulos and Karlis [15]proposed a copula model for bivariate counts that incorporates covariates intothe marginal model. Erhard and Czado [5] proposed a copula model for high-dimensional counts, which can potentially allow under-dispersion in the marginalsvia a Faddy or Conway-Maxwell-Poisson [16] model. Genest and Neslehova [8]provide a review of copula models for counts.

An alternative approach relies on rounding of a stochastic process. For classifica-tion it is common to threshold Gaussian process regression [4, 9]. For example, [12]rounded a real discrete autoregressive process to induce an integer-valued time se-ries while [2] used rounding of continuous kernel mixture models to induce nonpara-metric models for count distributions. In this article we discuss a class of stochasticprocesses introduced in [3] that map a real-valued stochastic process y∗ : D →ℜ toa count stochastic process y : D →N .

2 Rounded Stochastic Processes

2.1 Notation and model formulation

Let y ∈ C denote a count-valued stochastic process, with D ⊂ ℜp compact and Cthe set of all D →N step functions with unit step and a finite number of jumps inD . Such an assumption is a count process version of the continuity condition rou-tinely assumed for D → ℜ functions. It ensures that for sufficiently small changesin the input the corresponding change in the output is small, being either zero orone. We are particularly motivated by applications in which counts do not changeerratically at nearby times but maintain some degree of similarity.

A Bayesian nonparametric model for count functional data 3

We choose a prior y ∼ Π , where Π is a probability measure over (C ,B), withB(C ) the Borel σ -algebra of subsets of C . The measure Π induces the marginalprobability mass functions

pry(s) = j= Πy : y(s) = j= π j(s), j ∈N , s ∈D , (1)

and the joint probability mass functions

pry(s1) = j1, ...,y(sk) = jk= Πy : y(s1) = j1, ...,y(sk) = jk= π j1... jk(s1, ...,sk),(2)

for jh ∈N and sh ∈D , h = 1, . . . ,k, and any k ≥ 1.In introducing the Dirichlet process, [7] mentioned three appealing characteris-

tics for nonparametric Bayes priors including large support, interpretability and easeof computation. Our goal is to specify a prior Π that gets as close to this ideal aspossible. Starting with large support, we would like to choose a Π that allocates pos-itive probability to arbitrarily small neighborhoods around any y0 ∈ C with respectto an appropriate distance metric, such as L1. To our knowledge, there is no pre-viously defined stochastic process that satisfies this large support condition. In theabsence of prior knowledge that allows one to assume y belongs to a pre-specifiedsubset of C with probability one, priors must satisfy the large support property to becoherently Bayesian. Large support is also a necessary condition for the posteriorfor y to concentrate in small neighborhoods of any true y0 ∈ C .

With this in mind, we propose to induce a prior y∼Π through

y = h(y∗), y∗ ∼Π∗, (3)

where y∗ : D → ℜ is a real-valued stochastic process, h is a thresholding operatorfrom Y → C , Y is the set of all D →ℜ continuous functions, Π ∗ is a probabilitymeasure over (Y ,B) with B(Y ) Borel sets. Unlike count-valued stochastic pro-cesses, there is a rich literature on real-valued stochastic processes. For example, Π ∗

could be chosen to correspond to a Gaussian process or could be induced throughvarious basis or kernel expansions of y∗.

There are various ways in which the thresholding operator h can be defined.For interpretability and simplicity, it is appealing to maintain similarity between y∗

and y in applying h, while restricting y ∈ C . Hence, using the informal definitionof rounding as an operation that reduces the number of digits while keeping thevalues similar, we focus on a rounding operator that let y(s) = 0 if y∗(s) < 0 andy(s)= j if j−1≤ y∗(s)< j for j = 1, . . . ,∞. Negative values will be mapped to zero,which is the closest non-negative integer, while positive values will be rounded upto the nearest integer. This type of restricted rounding ensures y(s) is a non-negativeinteger. Using a fixed rounding function h in (3), we rely on flexibility of the priory∗ ∼Π ∗ to induce a flexible prior y∼Π . For notational convenience and generality,we let y(s) = j if y∗(s) ∈ A j = [a j,a j+1), with a0 < · · · < a∞ and we focus ona0 =−∞,a j = j−1, j = 1, . . . ,∞.

In certain applications, count data can be naturally viewed as arising throughinteger-valued rounding of an underlying continuous process. For example, in the


longitudinal tumor count studies of Section 3.1, it tends to be difficult to distinguishindividual tumors and it is natural to posit a continuous time-varying tumor burden,with tumors fusing together and falling off over time. In collecting the data, tumorbiologists attempt to make an accurate count but even at the same time counts canvary. It is natural to accommodate this with a smoothly-varying continuous tumorburden specific to each animal with measurement errors and rounding producing theobserved tumor counts. However, even when there is no clear applied context moti-vating the existence of an underlying continuous process, the proposed formulationnonetheless leads to a highly flexible and computationally convenient model.

2.2 Count functional data

We have focused on the case in which there is a single count process y observedat locations s = (s1, . . . ,sn)

T . In many applications, there are instead multiple re-lated count processes yi, i = 1, . . . ,n, with the ith process observed at locationssi = (si1, . . . ,sini)

T . We refer to such data as count functional data. As in otherfunctional data settings, it is of interest to borrow information across the individualfunctions through use of a hierarchical model. This can be accomplished within ourrounded stochastic processes framework by first defining a functional data modelfor a collection of underlying continuous functions y∗i , i = 1, . . . ,n, and then let-ting yi = h(y∗i ), for i = 1, . . . ,n. There is a rich literature on appropriate models fory∗i , i = 1, . . . ,n ranging from hierarchical Gaussian processes [1] to wavelet-basedfunctional mixed models [14].

Let yi(s) denote the count for subject i at time s, yit = yi(sit) the number at thetth observation time, and xita predictor for subject i at time t. As a simple modelmotivated by the asthma inhaler use applications described below, we let

yit = h(y∗it), y∗it = ξi +b(xit)T

θ + εit , ξi ∼ Q, εit ∼ N(0,τ−1), (4)

where ξi is a subject-specific random effect, b(·) are B-splines basis functions thatdepend on predictors and time, θ are unknown basis coefficients, and εit is a resid-ual. To induce a penalization on finite differences of the coefficients of adjacentB-spline we let p(θ | λ ) ∝ exp(−1/2λθ TPθ), where P = DTD is a penalty matrixwith D the rth order difference matrix and λ ∼ Ga(ν/2,δν/2), δ ∼ Ga(a,b). Sucha prior for the basis coefficients induces a penalty on finite differences of the coef-ficients of adjacent B-splines with the parameter λ being a roughness penalty. Sucha construction is known as Bayesian P-spline (penalized B-spline) model [13]. Thehyperparameter δ controls dispersion of the prior. By choosing a hyperprior withsmall a,b values, one induces a prior with heavy tails and good performance in avariety of settings [11]. We additionally choose a hyperprior for the residual preci-sion p(τ) ∝ τ−1. To allow the random effect distribution to be unknown, we choosea Dirichlet process prior, with Q∼ DP(αQ0), with α a precision parameter and thebase measure Q0 chosen as N(0,ψ). As commonly done we fix α = 1.


3 Asthma inhaler use applications

We analyze data on daily usage of albuterol asthma inhalers [10]. Daily counts ofinhaler use were recorded for a period between 36 and 122 days at the KunsbergSchool at National Jewish Health in Denver, Colorado for 48 students previouslydiagnosed with asthma. The total number of observations was 5209. As discussedby Grunwald and coauthors [10], the data are under-dispersed.

Let yit denote the number of times the ith student used the inhaler on day t. In-terest focuses on the impact of morning levels of PM25, small particles less than25 mm in diameter in air pollution, on asthma inhaler use. At each day t, a vectorxt = (xt1, . . . ,xt p)

T of environmental variables are recorded including PM25, aver-age daily temperature (Fahrenheit degree/100), % humidity and barometric pressure(mmHg/1000). We modify (4) to include multiple predictors with an additive modelstructure as follows.

yit = h(y∗it), y∗it = ξi +4

∑j=1

b j(x jt)T

θ j + εit , (5)

where ξi is a random effect modeled as described in previous section, b j is a B-splinebasis with θ j the basis coefficients relative to jth predictor and εi∼N(0,τ−1R), withR an AR-1 tridiagonal correlation matrix with correlation parameter ρ . The prior foreach θ j is identical to the prior described above leading to an additive Bayesian P-splines model. Each predictor is normalized to have mean zero and unit varianceprior to analysis. The correlation parameter is given a uniform prior on [−1,1].Computational details are reported in [3].

We ran our Markov chain Monte Carlo algorithm for 10,000 iterations with a1,000 iteration burn-in discarded. To obtain interpretable summaries of the non-linear covariate effects on the inhaler use counts, we recorded for each predictor ata dense grid of x jt values at each sample after burn-in the conditional expectation ofthe count for a typical student having ξi = 0,

µ j(x jt) = E(yit |x jt ,x j′t = 0, j′ 6= j,ξi = 0,θ ,τ,ρ)

≈K

∑k=0

k[Φak+1; µ∗j (x jt),τ−Φak; µ

∗j (x jt),τ], (6)

where Φ(·; µ,τ) is the cumulative distribution function of a normal random variablewith mean µ and precision τ , K is the 99·99% quantile of Nµ∗j (x jt),τ

−1, and

µ∗j (x jt) = b j(x jt)

Tθ j +∑

l 6= jbl(0)T

θl , (7)

with the other predictors fixed at their mean value. Based on these samples, wecalculated posterior means and pointwise 95% credible intervals, with the resultsreported in Figure 1. Interestingly, each of the predictors had a non-linear impact


on the frequency of inhaler use, with inhaler use increasing with morning levels ofPM25.

Previous analysis conducted in [10] tackle the problem under a generalized linearmixed models setup with the Faddy distribution. The mean for each subject i at timet was

µit = exp(x1tβ1 + · · ·+ xptβp +ui + eit) (8)

where ui is a subject specific random effect and eit an error modeled as an AR-1process. They estimated a coefficient of just 0·013 for PM25, which is close to zerowith 95% intervals including zero. In contrast, we obtain clear evidence of non-linear effects of several of the covariates including PM25.

0.605 0.610 0.615 0.620 0.625 0.630

0.5

0.6

0.7

0.8

0.9

1.0

Temperature (F degree/100)

µ 1

(a)

0.1 0.2 0.3 0.4 0.5 0.6

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Humidity (%)

µ 2

(b)

0.2 0.3 0.4 0.5 0.6 0.7 0.8

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Pressure (mmHg/1000)

µ 3

(c)

1 2 3 4

0.4

0.5

0.6

0.7

0.8

0.9

1.0

PM25 (µg/m3/10)

µ 4

(d)

Fig. 1 Posterior mean and 95% pointwise credible bands for the effect of (a) average daily tem-perature, (b) % of humidity, (c) barometric pressure, and (d) concentration of PM25 pollutant onasthma inhaler use calculated with equation (6).


4 Discussion

We have discussed a simple approach, introduced by [3] for modeling count stochas-tic processes based on rounding continuous stochastic processes. The general strat-egy is flexible and allows one to leverage existing algorithms and code for posteriorcomputation for continuous stochastic processes. Although rounding of continuousunderlying processes is quite common for binary and categorical data, such ap-proaches have not to our knowledge been applied to induce new families of countstochastic processes. Instead, the vast majority of the literature for count processesrelies on Poisson process and hierarchical Poisson constructions, which have somewell known limitations in terms of flexibility. The modeling framework can be easilygeneralized to the settings of count functional data, i.e. when one observe n differ-ent realizations of a stochastic process and its performance has been shown in anapplication to asthma inhaler use.

Acknowledgements

This research was partially supported by grant number R01 ES017240-01 from theNational Institute of Environmental Health Sciences (NIEHS) of the National Insti-tutes of Health (NIH) and grant CPDA097208/09 by University of Padua.

References

1. Behseta, S., Kass, R.E., Wallstrom, G.L.: Hierarchical models for assessing variability amongfunctions. Biometrika 92(2), 419–434 (2005)

2. Canale, A., Dunson, D.B.: Bayesian kernel mixtures for counts. Journal of the AmericanStatistical Association 106(496), 1528–1539 (2011)

3. Canale, A., Dunson, D.B.: Nonparametric Bayes modeling of count processes (2012). Sub-mitted

4. Chu, W., Ghahramani, Z.: Gaussian process for ordinal regression. Journal of Machine learn-ing Research 6, 1019–1041 (2005)

5. Erhard, V., Czado, C.: Sampling count variables with specified Pearson correlation - a com-parison between a naive and a C-vine sampling approach. In: D. Kurowicka, H. Joe (eds.)Dependence Modeling - Handbook on Vine Copulae, pp. 73–87. World Scientific (2009)

6. Faddy, M.J.: Extended Poisson process modeling and analysis of count data. BiometricalJournal 39, 431–440 (1997)

7. Ferguson, T.S.: A Bayesian analysis of some nonparametric problems. The Annals of Statis-tics 1, 209–230 (1973)

8. Genest, C., Neslehova, J.: A primer on copulas for count data. Astin Bulletin 37, 475–515(2007)

9. Ghosal, S., Roy, A.: Posterior consistency of Gaussian process prior for nonparametric binaryregression. The Annals of Statistics 34(5), 2413–2429 (2006)

10. Grunwald, G.K., Bruce, S.L., Jiang, L., Strand, M., Rabinovitch, N.: A statistical model forunder- or overdispersed clustered and longitudinal count data. Biometrical Journal 53(4),578–594 (2011).


11. Jullion, A., Lambert, P.: Robust specification of the roughness penalty prior distribution inspatially adaptive Bayesian P-splines models. Computational Statistics and Data Analysis51(5), 2542–2558 (2007)

12. Kachour, M., Yao, J.F.: First order rounded integer-valued autoregressive (RINAR(1)) pro-cess. Journal of time series analysis 30(4), 417–448 (2009)

13. Lang, S., Brezger, A.: Bayesian P-splines. Journal of Computational and Graphical Statistics13, 183–212 (2004)

14. Morris, J., Carroll, R.: Wavelet-based functional mixed models. JRSS-B 68, 179–199 (2006)15. Nikoloulopoulos, A., Karlis, D.: Regression in a copula model for bivariate count data. Jour-

nal of Applied Statistics 37(9), 1555–1568 (2010)16. Shmueli, G., Minka, T.P., Kadane, J.B., Borle, S., Boatwright, P.: A useful distribution for fit-

ting discrete data: revival of the Conway-Maxwell-Poisson distribution. JRSS-C 54(1), 127–142 (2005).

ROI analysis of pharmafMRI data: an adaptiveapproach for global testing

Giorgos Minas, John A.D. Aston, Thomas E. Nichols and Nigel Stallard

Abstract Pharmacological fMRI (pharmafMRI) is a new highly innovative tech-nique utilizing the power of functional Magnetic ResonanceImaging (fMRI) tostudy drug induced modulations of brain activity. FMRI recordings are very infor-mative surrogate measures for brain activity but still veryexpensive and thereforepharmafMRI studies have typically small sample sizes. The high dimensionalityof fMRI data and the arising high complexity requires sensitive statistical analy-sis in which often dimensionality reductions are crucial. We consider Region ofInterest (ROI) analysis and propose an adaptive two-stage testing procedure for re-spectively formulating and testing the fundamental hypothesis as to whether thedrug modulates the control brain activity in selected ROI. The proposed tests areproved to control the type I error rate and they are optimal interms of the predictedchance of a true positive result at the end of the trial. Poweranalysis is performedby re-expressing the high dimensional domain of power function into a lower di-mensional easily interpretable space which still gives a complete description of thepower. Based on these results, we show under which circumstances our procedureoutperforms standard single-stage and sequential two-stage procedures focusing onthe small sample sizes typical in pharmafMRI. We also apply our methods to ROIdata of a pharmafMRI study.

Giorgos MinasDepartment of Statistics and Warwick Centre of Analytical Sciences, University of Warwick, UKe-mail: [email protected]

John A.D. AstonCRISM, Department of Statistics, University of Warwick, UKe-mail: [email protected]

Thomas E. NicholsDepartment of Statistics and Warwick Manufacturing Group,University of Warwick, UK e-mail:[email protected]

Nigel StallardDivision of Health Sciences, Warwick Medical School, University of Warwick, UK e-mail:[email protected]

1

2 Giorgos Minas, John A.D. Aston, Thomas E. Nichols and NigelStallard

Key words: functional Magnetic Resonance Imaging, global testing, dimension re-duction, adaptive designs, predictive power

1 Introduction

Pharmacological fMRI (pharmafMRI) is an exciting new technique employing func-tional Magnetic Resonance Imaging (fMRI) to study brain activity under drug ad-ministration. The so-called Blood Oxygenation Level Dependent (BOLD) fMRIcontrast, often used in pharmafMRI studies, measures localblood flow changesknown to be associated with changes in brain activity. Whilebecoming more es-tablished, pharmafMRI faces a number of challenges of whichsome are statistical.

FMRI datasets are extremely high dimensional with enormousspatial resolution(≈ 3mm) and moderate temporal resolution (≈ 3s). The typical fMRI dataset pro-duced by a single scanning session consists of BOLD recordings acquired during arelatively short period of time (few hundreds time points) from around 105 voxels(3-dimensional volume elements) throughout the brain. To handle such high dimen-sional datasets it is often appropriate to formulate specific regional hypotheses forthe drug action and reduce accordingly the dimension of the data. The need for thistype of analysis, which can provide regional summary measures of drug effect, isparticularly acute in the typical pharmafMRI setting, in which due to the high costof fMRI scans only a small number of subjects can be recruited.

Region of Interest (ROI) analysis can reduce an fMRI datasetinto a relativelysmall number of ROI response summary measures expressing the local strength oftreatment effect across the selected brain regions. If boththe definition of ROI andthe computation of the ROI response measures are cautiouslyconducted, a statis-tical analysis based on these ROI measures can potentially achieve high levels ofsensitivity. We wish to go along this path and apply a multivariate test assessing thefundamental null hypothesis as to whether the new compound of interest changesthe underlying brain activity in the selected ROI.

In previous work [5], we showed that tests based on a scalar linear combinationof multivariate ROI responses can outperform fully multivariate methods, especiallyfor the typically small sample sizes of fMRI studies. The decisive question for theformer tests is the selection of the weights applied to ROI responses. In his seminalcontribution O’Brien [6] use equal weights for all coordinates while Lauter [3] ex-tract the weighting vector from the data sums of products matrix. In Minas et al. [5]the weights are optimally derived based on prior information and pilot data.

Here, we develop an adaptive two-stage procedure where a weighting vector,initially chosen based on prior information, is optimally adapted at a subsequentinterim analysis based on the collected first stage data. Thefirst and the secondweighting vector are applied to the first and second stage responses, respectively, toproduce the stage-wise linear combination test statistics. A combination function,combining the test statistics of the two studies, is used to perform the final analysis.

Adaptive global testing for fMRI ROI data 3

Both weighting vectors are optimal in terms of the predictive power [7] of this two-stage test which is analytically proved to control type I error rate.

Finally, we perform power analysis of the proposed tests andpower compar-isons to alternative methods. Note that the performance of atest with such a highdimensional domain of the power function can be hard to interpret. We tackle thisproblem by proving that the high dimensional power domain can be re-expressedinto a lower dimensional easily interpretable space which still gives a complete de-scription of the power. Using these results, our power analysis shows clearly thosecircumstances where our procedure outperforms standard single stage and two-stagesequential procedures. We also apply our methods to ROI dataof a pharmafMRIstudy in which our tests are shown to be far more powerful thanthe latter methods.

2 Formulation

In this section we formally introduce our problem. We start by giving a brief de-scription of the methods for extracting ROI measures from fMRI data.

ROI measures are typically extracted from mass univariate General Linear Mod-els (GLMs) applied to the preprocessed series of 3-dim fMRI image at voxel-by-voxel resolutions (see figure 1). Estimates of the treatmenteffect in each voxel ofeach subject are first extracted from these GLMs and then averaged across the pre-defined, based on either brain anatomy or brain function, ROI. The coordinates ofthe produced multivariate outcomes correspond to representative measures of thetreatment effect within each ROI of each subject.

In our methods, we assume that these ROI responses of then j subjects partici-pating in stagej of the study are independent multivariate Normal random variables

Yji ∼ NK (µ ,Σ) , i = 1,2, ...,n j , j = 1,2, (1)

with meanµ and covariance matrixΣ . Normality is typically an acceptable assump-tion for modeling ROI linear measures in fMRI [2].

We summarise the ROI responses using scalar linear combinations

Fig. 1 Typical steps of fMRI data analysis producing a multivariate ROI outcome. The prepro-cessed series of fMRI images are modeled at voxel-by-voxel resolution using mass univariateGLMs. Suitable estimates of parameter values (β ) expressing the treatment effect in each voxelare first extracted from the GLM and then averaged across the predefined ROI.


L ji =K

∑k=1

wjkYjik , (2)

wherewjk is the non-zero weight applied to thek-th ROI response,k = 1, ...,K, ofstagej. Using these linear combinations, we wish to test the globalnull hypothesisof no treatment effect across all ROIH0 : µ = 0 (=(0,0, ...,0)T) against the two-sided alternativeH1 : µ 6= 0.

The stage-wise test statistics in our design are the linear combinationz and tstatistics

Z j =L j

σ j/n1/2j

, Tj =L j

sj/n1/2j

(3)

for Σ known or unknown, respectively. Here,σ2j , L j , s2

j are the variance, samplemean and sample variance of the linear combinationL j , respectively. The two-sidedp values,p j , j = 1,2, may be obtained from thez or t statistics in(3). We use atwo-stage design which instructs the investigators to:

1. stop the trial (after the first stage) and rejectH0 if p1 <α1 or stop the trial withoutrejection if p1 > α0,

2. continue to the second stage ifα1 ≤ p1 ≤ α0 and rejectH0 if p1p2 < c.

Here, the Fisher’s product combination function [1],p1p2, is used for the final anal-ysis. We also consider alternative functions including theInverse Normal combi-nation function [4]. Under this design, thetype I error rate is controlled at thenominalα level if the rejection probability of the two-stagezor t test,

pr (p1 < α1)+

∫ α0

α1

pr (p1p2 < c | p1)g(p1)dp1, g(·) density ofp1, (4)

is under the null hypothesisH0 equal toα.We target on maximizing thepower of the above two-stage tests, i.e. the rejection

probability in (4) underH1, with respect to the weighting vectorsw1,w2, whilecontrolling the type I error rate. In other words, we wish to find the optimal directionin which the projection of the treatment effect vector produces optimal power.

3 Methods

Here, we develop the proposed adaptive two-stage testing procedure. We start byproviding the optimal weighting vector for the two-stagez and t tests describedabove.

Theorem 1. Under the assumption in(1), the power of the above two stage tests,i.e. the rejection probability in(4) under H1, is maximized with respect to w1 andw2 if and only if the latter are both proportional toω = Σ−1µ .


The optimal weighting vectorω is unknown and therefore we use the availableinformation at the planning stage (prior) and at the interimstage (posterior) to selectw1,w2.

Prior informationD0 elicited from previous studies and experts clinical opinionis used to inform the following Normal and Inverse-Wishart priors for µ and Σ ,respectively,

(µ | Σ ,D0)∼ NK (m0,Σ/n0) , (Σ | D0)∼ IWK×K(

ν0,S−10

)

. (5)

Here,m0 represents a prior estimate forµ , n0 the number of observationsm0 isbased on; andν0, S0 respectively represent the degrees of freedom and scale matrixof the inverse-Wishart prior.

Under this Bayesian model, the posterior distributions, given the prior informa-tion D0 and the first stage datay1, have the same form as the prior distributions

(µ | Σ ,D0,y1)∼NK (m1,Σ/(n0+n1)) , (Σ | D0,y1)∼ IWK×K(

ν0+n1,S−11

)

. (6)

where the posterior estimates

m1 =n0m0+n1y1

n0+n1, S1 = S0+(n1−1)Sy1 +

n0n1

n0+n1(y1−m0)(y1−m0)

T (7)

can be thought as “weighted averages” of the prior and first stage estimates ofµ andΣ , respectively.

We wish to optimally select the weighting vectors of the two stages. Here opti-mality is defined in terms of the predictive power of the test.Predictive power ex-presses“the chance, given the data so far, that the planned test rejects H0 when thetrial is completed”. GivenD0, the predictive powerBz,1 andBt,1 for the two-stagezandt tests, respectively, are defined as

pr ( p1 < α1 | D0)+ pr (p1 ∈ [α1,α0], p1p2 < c | D0) (8)

and if we continue to the second stage, the predictive powerBz,2 andBt,2 given theprior informationD0 and the first stage datay1 are defined as

pr (p1p2 < c | D0,y1) , (9)

for p j corresponding to either thezor t statistics in(3), respectively.

Theorem 2. Under the assumptions(1) and(6), the first and second stage predic-tive power of the z test, Bz,1 and Bz,2 are maximized with respect to w1, w2, respec-tively, if the latter are proportional to wz,1 = Σ−1m0 and wz,2 = Σ−1m1, respectively.

Further, for largeν0, i.e. ν0 → ∞, the weighting vectorswt,1 = S−10 m0 andwt,2 =

S−11 m1 maximise the predictive power functionsBt,0 andBt,1.

We can now describe the proposed adaptive two-stagez andt tests. These fol-low the two-stage design described earlier with the first andsecond stage weightingvectors of the stage-wisez andt statistics being equal to the vectorswz,1,wz,2 and


wt,1,wt,2, respectively. These tests are power optimal based on the collected infor-mation. We can also prove that they control the type I error rate.

4 Power analysis

The design variables that need to be considered for the analysis of the power func-tion of the abovez andt tests are: (i) the stopping boundariesα0, α1 andc, (ii) thesample sizesn0, n1 andn2 (andν0), (iii) the parametersµ andΣ and (iv) the priorestimate(s)m0 (andS0). While the variables in (i) and (ii) are scalar, those in (iii) and(iv) are high dimensional (RK ×R

K×K ×RK (×R

K×K)). Without any dimensionalityreduction, it would be challenging to get a full picture and explain the power per-formance of our tests. However, we can prove that for thez test, (iii) and (iv) can bereplaced by: (a) the Mahalanobis distance(µΣ−1µ)1/2 of the nullNK(0,Σ) to thealternativeNK(µ ,Σ) distribution expressing the strength of the treatment effect and(b) the angleθ between the transformed optimal weighting vectorω = Σ−1/2µ andthe transformed selected first stage weighting vector ˜wz,1 = Σ−1/2m0 (both transfor-mations correspond to left multiplication byΣ1/2). Considering thet test the angulardistance in (b) is replaced by one expressed in terms of easily interpretable vectorsin [0,π/2]K × [0,π/2]K ×R

K+. In figure 2, we illustrate how these results can be

used to compare our procedure to standard testing procedures. For small samplesizes, the power of the single-staget test is larger (smaller) than the power of the

10 15 20 25 30 35 40 45 50 55 60

0.2

0.4

0.6

0.8

1

nT

pow

er

Fig. 2 Simulation-based approximation of the power,βt , of the single-stage (green−−) and adap-tive (blue —) linear combinationt test as well as the Hotelling’sT2 test (red· · ·) plotted against thetotal sample sizenT . The angleθ betweenω and the transformed selected weighting vectors ˜w andwz,1 of the single-staget test and the first stage of the adaptivet test, respectively, are taken to beequal to 0 (∗), 15 ( ), 30 (), 45 (H), 60 (×), 75 (+) and 90 (⋆). Further,α0 = 1,α1 = 0.01,c= 0.0087 (α = 0.05),K = 15,n0 = 5, ν0 = 4, f = n1/nT = 0.5 andD1 = Σ−1/2S0Σ−1/2 = I .


adaptivet test if the selected weighting vector is relatively close (distant) to the op-timal weighting vector. For relatively large sample sizes,in contrast to the singlestage test, the adaptivet test reaches high power levels even for first stage weightingvector orthogonal (θ = 90) to the optimal. For increasingnT and all other designvariables remaining fixed, the angleθ , for which the power of Hotelling’sT2 test(applicable only fornT > K) is equal to the power of thet tests, is decreasing.

4.1 Application to a pharmafMRI study

We use the sample mean and sample covariance matrix (see table 1) of ROI dataextracted from a GlaxoSmithKline pharmafMRI study (K = 11,nT = 13) to performpower comparisons. As we can see in table 1, effect sizes differ across ROI andgenerally high correlations are observed. Further, the prior estimates presented arefairly poor resulting in angleθ betweenω and wt,1 equal to 67. However, evenfor these prior estimates and such small sample sizes the adaptive t test might beconsidered as sufficiently powered (βt = 0.82). This is in contrast to standard singlestage tests, such as Hotelling’sT2, OLS [6], SS and PC [3]t tests (β

T2 = 0.30,βOLS = 0.13, βSS= 0.13, βPC = 0.14) as well as their corresponding sequentialtwo-stage versions (β s

OLS= 0.10,β sSS= 0.09,β s

PC = 0.10, sequential Hotelling’sT2

test not applicable fornT = 13) which give very low power values. Note that for

Table 1 Means (line 1), variances (line 3) and correlations (upper triangle of matrix in lines 5−15)and the corresponding prior estimates (lines 2, 4 and lower triangle of matrix in lines 5−15) ofROI data of the sample (nT = 13) of a GSK pharmafMRI study. The ROI are:Anterior Cingulate(AC), Atlas Amygdala (A), Caudate (C), Dorsolateral Prefrontal Cortex (DLPFC), Globus Pallidus(GP),Insula (I), Orbitofrontal cortex (OFC), Putamen (P),Substantia Nigra (SA), Thalamus (T),Ventral Striatum (VS).

ROI AC A C DLPFC GP I OFC P SA T VS1 µk −0.01 0.06 −0.08 −0.08 −0.14 −0.02 −0.08 −0.06 −0.10 −0.10 −0.132 m0,k 0 0.10 −0.10 −0.10 −0.15 0 −0.15 0 −0.10 −0.10 −0.153 σk 0.11 0.11 0.03 0.05 0.11 0.08 0.13 0.15 0.10 0.11 0.104 s0,k 0.15 0.10 0.02 0.10 0.10 0.10 0.15 0.15 0.10 0.10 0.105 AC 1 0.70 0.87 0.88 0.73 0.89 0.66 0.81 0.26 0.95 0.706 A 0.70 1 0.54 0.61 0.72 0.77 0.65 0.68 0.59 0.68 0.667 C 0.70 0.50 1 0.89 0.72 0.87 0.47 0.80 0.27 0.90 0.748 DLFPC 0.70 0.70 0.70 1 0.71 0.76 0.73 0.77 0.27 0.87 0.629 GP 0.70 0.70 0.70 0.70 1 0.86 0.51 0.90 0.54 0.70 0.9010 I 0.70 0.70 0.70 0.70 0.70 1 0.45 0.85 0.46 0.86 0.8411 OFC 0.50 0.50 0.50 0.70 0.50 0.50 1 0.44 0.09 0.65 0.3012 P 0.70 0.70 0.70 0.70 0.70 0.70 0.50 1 0.49 0.82 0.8913 SA 0.50 0.70 0.30 0.50 0.50 0.50 0.50 0.30 1 0.30 0.5514 T 0.70 0.70 0.70 0.70 0.70 0.70 0.50 0.70 0.50 1 0.7415 VS 0.70 0.50 0.70 0.70 0.70 0.70 0.50 0.70 0.50 0.70 1


improved prior estimates (smaller angles) the power of the adaptivet test can beincreased further.

5 Discussion

The formulation of specific regional hypotheses for drug action and the associateddimensionality reductions are crucial for further establishment of pharmafMRI. Aswe illustrate in our methods, ROI analysis combined with multivariate methods canbe successfully used to answer the fundamental question as to whether the drugmodulates the brain activity over the regions of greatest interest for a particularstudy. We show that reduction of ROI responses to a scalar linear combination maysubstantially increase sensitivity compared to fully multivariate methods on ROI re-sponses, without any cost in terms of specificity. For the latter reduction, we proposederiving the weights of the linear combination by exploiting the available prior in-formation and allowing for data dependent adaptation at an interim analysis. Theseweights are optimal in terms of the predictive power given the available informationat each selection time. Further, we show how the high dimensional power func-tion domain space can be reduced to a lower dimensional easily interpretable spacewhich allows us to show clearly under which circumstances the improvement oversingle stage and sequential designs is achieved. We finally show that our methodscan outperform standard single stage and sequential two-stage multivariate tests ina pharmafMRI study.

References

1. Bauer, P., Kohne, K.: Evaluation of Experiments with Adaptive Interim Analyses. Biometrics50, 1029−−1041 (1994)

2. Friston, K., Ashburner, J., Kiebel, S., Nichols, T., and Penny, W.: Statistical parametric map-ping: the analysis of funtional brain images. Elsevier/Academic Press, Amsterdam; Boston(2007)

3. Lauter, J.: Exact t and F tests for analyzing studies with multiple endpoints. Biometrics52,964−−970 (1996)

4. Lehmacher, W., Wassmer, G.: Adaptive Sample Size Calculations in Group Sequential Trials.Biometrics55, 1286−−1290 (1999)

5. Minas, G., Rigat, F., Nichols, T.E., Aston, J.A.D., Stallard, N.: A hybrid procedure for de-tecting global treatment effects in multivariate clinicaltrials: theory and applications to fMRIstudies. Statist. Med.31, 253−−268 (2012)

6. OBrien, P. C.: Procedures for comparing samples with multiple endpoints. Biometrics40,1079−−1087 (1984)

7. Spiegelhalter, D. J., Freedman, L. S., Blackburn, P.: Monitoring clinical trials: conditional orpredictive power?. Control. Clin. Trials7, 8−−17 (1986)

Distance - Based Statistics for CovarianceOperators in Functional Data Analysis

Davide Pigoli

Abstract The statistical analysis of covariance operators in a functional data anal-ysis setting is considered. Many suitable distances to compare covariance operatorsare presented and in particular the problem of estimating the average covariance op-erators among different groups is addressed. Finally, an applied problem in whichthis methodology has proved useful is introduced, namely, exploring phonetic re-lationships among Romance languages looking at covariance operators across fre-quencies.

Key words: Trace class Operators, Functional Data, Linguistic Data

1 Introduction

The aim of this work is to set up a framework for the comparison of covarianceoperators on L2(Ω), Ω ⊆R. This problem arises in Functional Data Analysis whenfeatures of curve populations lie in their covariance structure rather than in the meanfunction. In Section 2 some definitions and properties of operators on L2(Ω) arerecalled. Section 3 illustrates suitable distances to measure differences between co-variance operators and to explore their properties. In Section 4, the application of theproposed methodology to a linguistic problem is introduced and some preliminaryresults are shown.

Davide PigoliMOX- Department of Mathematics, Politecnico di Milano , Piazza Leonardo da Vinci, 32, 20133Milano, Italy. e-mail: [email protected]

1

2 Davide Pigoli

2 Some remarks on compact operators on L2(Ω)

In this section we review some properties and definitions that will be of use whendescribing our proposed methodology. More details and proofs can be found, e.g.,in Zhu (2007).

Definition 1. Let B1 be the closed ball in L2(Ω), i.e. it consists in all f ∈ L2(Ω)so that || f ||L2(Ω) ≤ 1. A bounded linear operator T : L2(Ω)→ L2(Ω) is compact ifT (B1) is compact in the norm of L2(Ω). A bounded linear operator T is self-adjointif T = T ∗.

An important property of compact operators on L2(Ω) is the existence of acanonical decomposition. This means that two orthonormal bases ukk,vkk existso that

T f =+∞

∑k=1

σk〈 f ,vk〉uk,

or, equivalently,T vk = σkuk,

where 〈., .〉 indicates the scalar product in L2(Ω). σk is called the sequence ofsingular value for T . If the operator is self-adjoint, a basis vkk exists such that

T f =+∞

∑k=1

λk〈 f ,vk〉vk,

or, equivalently,T vk = λkvk

and λk is called the sequence of eigenvalues for T .A compact operator T is said to be trace class if

trace(T ) :=+∞

∑k=1〈Tek,ek〉<+∞

for an orthonormal basis ek. It has been proved that the definition is independentfrom the choice of the basis and

trace(T ) =+∞

∑k=1

σk

where σkk are singular values for T . We indicate with S(L2(Ω)) the space of thetrace class operator on L2(Ω).

A compact operator T is said to be Hilbert-Schmidt if its Hilbert-Schmidt normis bounded, i.e.

||T ||2HS = trace(T ∗T )<+∞.

This is a generalization of the Frobenius norm for finite-dimensional matrices.

Distance - Based Statistics for Covariance Operators in Functional Data Analysis 3

Definition 2. A bounded linear operator R on L2(Ω) is said to be unitary if

||R f ||L2(Ω) = || f ||L2(Ω) ∀ f ∈ L2(Ω)

We indicate with SO(L2(Ω)) the space of unitary operators on L2(Ω).Let now f be a random variable which takes values in L2(Ω), Ω ⊆ R, such that

E[||f||2L2(Ω)]<+∞. Then, the covariance operator Cf(s, t) = cov(f(s), f(t)) is a trace

class compact operator on L2(Ω) (see Bosq, 2000, Section 1.5).

3 Distances between covariance operators

In this section novel distances to compare trace class compact operators are pro-posed. These are generalizations to the functional setting of metrics that have beenproved useful for the case of positive definite matrices.

Distance between kernels in L2(Ω)

Every covariance operator S on L2(Ω) can be associated with an integral kernels(x,y) ∈ L2(Ω ×Ω), so that

S f =∫

Ω

s(x,y) f (y)dy, ∀ f ∈ L2(Ω).

Thus, distance between covariance operators can be naturally defined with thedistance between kernels in L2(Ω),

dL(S1,S2) = ||s1− s2||L2(Ω) =

√∫Ω

∫Ω

(s1(x,y)− s2(x,y))2dxdy.

This distance is correctly defined, since it inherits all the properties of the distancein the Hilbert space L2(Ω). However, it does not exploit in any way the particularstructure of the covariance operators and therefore it may not highlight the signifi-cant differences between covariance structures.

Spectral distance

A second possibility is to see the covariance operator as an element of L(L2(Ω)),the space of the linear bounded operators on L2(Ω). It follows that the distancebetween S1 and S2 can be defined as the operator norm of the difference. We recallthat the norm of a self-adjoint bounded linear operator on L2(Ω) is defined as

||T ||L(L2(Ω)) = supv∈L2(Ω)

|〈T v,v〉|||v||2L2(Ω)

4 Davide Pigoli

and for a covariance operator it coincides with the absolute value of the first (i.e.largest) eigenvalue. Thus,

dL(S1,S1) = ||S1−S2||L(L2(Ω)) = |λ1|

where λ1 is the first eigenvalue of the operator S1−S2. dL(., .) generalizes the ma-trix spectral norm which is often used in the finite dimensional case (see, e.g., ElKaroui, 2008). This distance takes into account the spectral structure of the covari-ance operators, but it seems somehow restrictive to focus only on the behavior onthe first mode of variation.

Procrustes size-and-shapes distance

In Dryden et al. (2009), a Procrustes size-and-shape distance is proposed to com-pare two positive definite matrices. Our aim is to generalize this distance to thecase of covariance operators on L2(Ω). Let S1 and S2 be two trace class covarianceoperators on L2. We define the Procrustes distance in S(L2(Ω)) as

dP(S1,S2)2 = inf

R∈SO(L2(Ω))||L1−L2R||2HS = inf

R∈SO(L2(Ω))trace((L1−L2R)∗(L1−L2R)),

where ||.||HS indicates the Hilbert-Schmidt norm on L2(Ω) and Li are so thatSi = L∗i Li. The evaluation of the Procrustes distance asks for the solution of a mini-mization problem. However, an analytical solution is available and the distance hastherefore an expression based on the canonical decomposition of the operator L∗2L1.The unitary operator R that minimizes ||L1−L2R||2HS is defined by

Rvk = uk ∀k = 1, . . . ,+∞.

where ukk,vkk are the orthogonal bases in the canonical decomposition of L∗2L1.

Proposition 1. The Procrustes distance in S(L2(Ω)) is

dP(S1,S2)2 = ||L1||2HS + ||L2||2HS−2

+∞

∑k=1

σk

where σk are the singular values of the compact operator L∗2L1.

Square root operator distance

We can also generalize the square root matrix distance (see Dryden et al., 2009) tocompare S1,S2 ∈ S(L2(Ω)). Since S1/2

i is an Hilbert-Schmidt operator,

dR(S1,S2) = ||S1/21 −S1/2

2 ||HS


This is a special case of the Procrustes distance above, when no unitary transfor-mation is allowed.

3.1 Averaging of covariance operators

Once appropriate distances for dealing with covariance operators have been defined,many statistical tools can be developed, conveniently generalizing traditional meth-ods based on Euclidean distance. For the sake of brevity, here only the case of esti-mating the average from a sample of covariance operators is presented. Let S1, . . . ,Sgbe the covariance operators for g different groups. Then, a possible estimator of thecommon covariance operator Σ may be

Σ =1

n1 + · · ·+ng(n1S1 + · · ·+ngSg).

However, this formula arises from the minimization of square Euclidean deviations,weighted with the number of observations. If we choose a different distance to com-pare covariance operators, it is more coherent to average covariance operators withrespect to the chosen distance. A least square estimator for Σ can be defined for ageneral distance d(., .),

Σ = argminS

g

∑i=1

nid(S,Si)2.

The actual computation of the sample Frechet mean Σ j depends on the choice of thedistance d(., .). In general, it asks for the solution of a high dimensional minimiza-tion problem but some distances allows for an analytic solution while for othersefficient minimization algorithms are available. For those concerning Kernel dis-tances, it is easy to see that the L2(Ω) kernel of the Frechet average is obtained withthe weighted average of the kernels s1(s, t), . . . ,sg(s, t) of the data, i.e.

σ(x,y) =1

n1 + · · ·+ng(n1s1(x,y)+ · · ·+ngsg(x,y)).

For the Square root distance, the following result can be proved.

Proposition 2.

Σ = argminS

g

∑i=1

nidS(S,Si)2 = (

1G

g

∑i=1

niS12i )

2. (1)

where G = n1 + · · ·+ng.

The Procrustes mean can be obtained by an adaptation of the algorithm proposedin Gower (1975) or Ten Berge (1977). This works very well in practice if the algo-rithm is initialized with the estimate provided by (1).

6 Davide Pigoli

4 Exploring phonetic relationships among Romance languages

The traditional way of exploring relationships among languages consists in lookingat textual similarity. However, this often neglects any phonetic characteristics of thelanguages. Here a novel approach is proposed to compare languages on the basis ofphonetic structure.

In particular, people speaking different languages (French, Italian, Portuguese,Iberian Spanish and American Spanish) are registered while pronouncing wordscorresponding with the numbers from one to ten in each language. The output of theregistration for each word and for each speaker consist in the intensity of the soundover time and frequencies.

Fig. 1 Frechet average along time of covariance operators of log-spectrogram among frequenciesfor five Romance languages, using square root distance.

The aim is to use this data to explore linguistic hypotheses concerning the re-lationship among different languages. However, while many possible phonetic fea-tures may be of interest, it has been shown that covariance operators associatedwith frequencies can provide some phonetic insight (Hajipantelis et al., 2012). Fre-quency covariances indeed can summarize phonetic information for the language,disregarding particular characteristics of speakers and words. For the scope of thiswork, we focus on the covariance operators among frequencies obtained from thelog-spectrogram with estimates being obtained using the sample of all speaker ofthe language. We consider different time points as replicates of the same covarianceoperator among frequencies. It is clear that this is a major simplification of the richstructure in the data but it already leads to some interesting conclusions. Here some


preliminary results are reported, focusing on the covariance operator for the word“one”.

Fig. 2 Left: Distance matrix among Frechet average of Fig. 1, obtained with Square root dis-tance. Right: Dendrogram obtained from distance matrix using an average linking, where I=Italian,F=French, P=Portuguese, SA=American Spanish, SI=Iberian Spanish.

Fig. 1 shows the covariance operator estimated for each language via Frechetaveraging along time, using square root distance, for the word “one”. Fig. 2 showsdissimilarity matrix among average covariance operators for each language and thecorrespondent dendrogram, while Fig. 3 compares a two-dimensional projection ofthe data obtained with a classical (metric) multidimensional scaling with the mapcoming from linguistic experts, containing information about historical and geo-graphical relationship among languages. Indeed, it seems that focusing on the co-variance operator captures some important information about languages. There is anoverall similarity between the map predicted by experts and relationships among co-variance structures. However, some unexpected features may suggest new researchlines. For example, it is worth to notice that Portuguese covariance structure is aconsiderable distance from all the others, thus highlighting particular linguistic in-fluences on the language.

5 Conclusions

In this work the problem of dealing with the covariance operator has been addressed.The choice of the appropriate metric is crucial in the analysis of covariance opera-tors. Here some suitable metrics have been proposed and their properties have beenhighlighted. On the basis of appropriate metric, statistical methods can be devel-oped to deal with covariance operators in the functional data analysis framework.The notable case of estimating the average from a sample of covariance operators is

8 Davide Pigoli

Fig. 3 Left:Map of languages built by linguistic experts using historical and geographical in-formation. Some languages are shown for whom phonetic data are not available. Right: Bidi-mensional metric multidimensional scaling. The extreme behavior of the Portuguese languagelead to a slightly difference configuration. Label correspond to languages: I=Italian, F=French,P=Portuguese, SA=American Spanish, SI=Iberian Spanish.

illustrated. Moreover, in many applications, the covariance operator itself is the ob-ject of interest, as illustrated by the linguistic data of Section 4. Using the square rootdistance between covariance operator among frequencies, some significant phoneticfeatures of Romance languages have been found.

References

1. Bosq, D. : Linear processes in function spaces. Springer, New York (2000)2. Dryden, I.L., Koloydenko, A., Zhou, D. : Non-euclidean statistics for covariance matrices,

with applications to diffusion tensor imaging. Ann. Appl. Stat. 3, 1102-1123 (2009)3. El Karoui, N. : Operator norm consistent estimation of large-dimensional sparse covariance

matrices. Ann. Stat. 36, 2717–2756 (2008)4. Gower, J. C.: Generalized Procrustes analysis. Psychometrika 40,33–50 (1975)5. Hadjipantelis, P.Z., Aston, J.A.D., Evans, J.P. : Characterizing fundamental frequency in Man-

darin: A functional principal component approach utilizing mixed effect models. J. Acoust.Soc. Am. In press (2012)

6. Ten Berge, J. M. F. : Orthogonal Procrustes rotation for two or more matrices. Psychometrika42, 267–276 (1977)

7. Zhu, K. : Operator theory in function spaces (2nd ed.). American Mathematical Society(1977)

Clustering Multivariate Longitudinal Data:Hidden Markov of Factor Analyzers

Antonello Maruotti and Francesca Martella

Abstract Parsimonious Hidden Markov of Factor Analyzers models are developedby using a modified factor analysis covariance structure. This framework can be seenas a extension of the Parsimonious Gaussian mixture models (PGMMs) accountingfor heterogeneity in a longitudinal setting. In particular, a class of 12 models are in-troduced and the maximum likelihood estimates for the parameters in these modelsare found using an AECM algorithm. The class of models includes parsimoniousmodels that have not previously been developed. The performance of these modelsis discussed on a benchmark gene expression data. The results are encouraging andwould deserve further discussion.

Key words: Clustering Longitudinal Data, Factor Analyzers, Hidden Markov Mod-els, Dimensionality reduction

1 Introduction

In a longitudinal setting, repeated measurements are collected on the same (inde-pendent) units over several periods of time. Standard methods for longitudinal dataanalysis focus on the dependence of the variables on covariates, serial dependence,and heterogeneity in the individuals/units (see, e.g., [9]). A growing interest hasbeen recently devoted to appropriately account for heterogeneity across the individ-ual sequences (see e.g. [16]). To capture heterogeneity in a longitudinal setting, itis common to assume the existence of a latent process, driving and characterizing

Antonello MaruottiDipartimento di Isitituzioni Pubbliche, Economia e Societa, Universita di Roma Tre, Via GabrielloChiabrera, 199 - 00145 Roma e-mail: [email protected]

Francesca MartellaDipartimento di Scienze Statistiche, Sapienza Universita di Roma, P.le Aldo Moro, 5 - 00185 Romae-mail: [email protected]

1

2 Antonello Maruotti and Francesca Martella

different data generation mechanisms ([6, 10, 12] provide interesting reviews on thistopic in different contexts).

Recently, [16] introduce a model-based clustering technique for clustering longi-tudinal in a finite mixture framework. Each longitudinal sequence can be consideredas a single object or entity belonging to one of the mixture components and all theindividual sequences within the same component are characterized by the same gen-erating mechanism. Other approaches have been developed by using hierarchicalmodels ([4, 5, 11, 1, 13]) at the cost of an increasing computational burden.

In the following we are going to consider a multivariate Gaussian hidden Markovmodel (HMM; see [23] for a general introduction on HMMs), which can be seen asan extension of the finite mixture model [14] where individuals are allowed to movebetween the (hidden) components during the period of observation. Starting fromthe Parsimonious Gaussian mixture models (PGMMs) introduced by [15] and fur-ther extended by [17], we introduce a hidden Markov of factor analyzers by spec-ifying a modified factor analysis covariance structure, including the possibility ofimposing constraints which leads to a family of 12 models, including parsimoniousmodels.Parameter estimates can be obtained by an Alternating Expectation ConditionalMaximization algorithm (AECM, [18]) in a HMM framework by adapting the well-know forward-backward algorithm ([3, 21]). The hidden Makov framework of factoranalyzers is illustrated in the clustering of a representative dataset in the microarrayliterature: the yeast galactose data of [8]. The paper is organized as follows. Section2 introduces the model by specifying some preliminaries on HMMs and providingextensions of the basic HMM in a multivariate clustering setting. Computationaldetails are briefly described in Section 3, while Section 4 provides an illustrativeexample of the proposed models.

2 Model-based clustering of longitudinal data

In this section we firstly introduce the basic notation and the main assumptions onHMMs. Afterwards, we introduce in detail the hidden Markov of factor analyzers,pointing out the considered covariance structures and the computational aspects re-lated to the estimate of model parameters.

2.1 Hidden Markov models

In a basic HMM for longitudinal data, the existence of two processes is assumed:an unobservable finite-state first-order Markov chain, Sit , i = 1, . . . ,n, t = 0, . . . ,Twith state space S = 1, . . . ,m and an observed process, Yit = Yit1,Yit2, . . . ,YitJ,where Yit j denotes the j-th response variable for individual i at time t (similarly forSit ).

Clustering Multivariate Longitudinal Data: Hidden Markov of Factor Analyzers 3

We assume that the distribution of Yit depends only on Sit , specifically the Yit ,t = 1, . . . ,T , are conditionally independent given the Sit :

f (Yit = yit | Yi0 = yi0, . . . ,Yit−1 = yit−1,Si0 = si0, . . . ,Sit = sit) =

f (Yit = yit | Sit = sit) (1)

Typically it is assumed that the state-dependent distributions, i.e. the distributionsof Yit given Sit , come from a parametric family of continuous or discrete distribu-tions. Thus, the unknown parameters in a HMM involve both the parameters of theMarkov chain and those of the state-dependent distributions of the random variablesYit . In particular, the parameters of the Markov chain are the elements of the transi-tion probability matrices Q = qitlk, where qitlk = Pr(Sit = k | Sit−1 = l), l,k ∈S isthe probability that individual i visits state k at time t given that at time t −1 he/shewas in state l, and the initial probabilities δ = δil, where δil = Pr(Si0 = l), i.e.the probability of being in state l at time 0. The simplest model in this frameworkis the homogeneous HMM, which assumes common transition and initial probabil-ities, i.e. qitlk = qlk and δil = δl . We will focus on homogeneous HMMs to simplifythe discussion, but of course the hidden Markov chain can be assumed to be non-homogeneous: the transition probabilities may be individual and/or time varyingand modeled via a logit function of explanatory variables.

2.2 Hidden Markov of Factor Analyzers

Consider an HMM with Yit being multidimensional with the conditional distribu-tion of Yit given Sit = sit being N(µsit

,Σ sit ), i.e. multivariate Gaussian with state-dependent mean, µsit

, and covariance matrix Σ sit . In line with the more generalmixture of factor analyzers framework, we assume that conditionally to the sit-thstate, the random vector yit is modelled using a H-dimensional vector of latent fac-tors wisit (typically H ≪ J) as yit = µsit

+Λ sit wisit +eit , where Λ sit is a J×H matrixof factor weights, the latent variables wisit ∼ MV N(0,IH), and eit ∼ MV N(0,Ψ sit ),where Ψ sit is a J×J diagonal matrix. Thus, conditionally on the sit -th state, the den-sity of yit is MV N(0,Λ sit Λ

′sit+Ψ sit ). Therefore, the marginal density of a hidden

Markov of factor analyzers is given by:

f (yi) = ∑S T

δsi0

T

∏t=1

qsit−1sit

T

∏t=0

exp[− 1

2 (yit −µsit)′(Λ sit Λ

′sit+Ψ sit )

−1(yit −µsit)]

(2π)J/2 | Λ sit Λ′sit+Ψ sit |1/2 .(2)

where ∑S T denotes summation over all realizations sit, t = 0, . . . ,T , for individ-ual i.Note that the proposed model can be seen as an extension of the mixture of fac-tor analyzers model by allowing time dependence and, following the idea in [17],constraints across groups on the Λ sit and Ψ sit matrices and on whether or notΨ sit = ψsit Ξ sit , where ψsit ∈R+ and Ξ sit = diagξ1, ...,ξJ such that |Ξ sit |= 1. The


full range of possible constraints provides a class of 12 different Hidden Markov ofFactor Analyzers models, which are given in Table 1. Note that CCCC and CCCUassume the equal isotropic noise whereas UCCC and CUUU assume the unequalisotropic noise. The other eight covariance structures incorporating constraints onthe loading matrices dramatically reduces the number of covariance parameters andlead to parsimonious models.

Table 1 Covariance structure in a hidden Markov of factor analyzers framework

Model ID Λ sit = Λ Ξ sit = Ξ ψsit = ψ Ξ sit = IJ

CCCC Constrained Constrained Constrained Constrained Σ = ΛΛ ′+ψIJCCCU Constrained Constrained Constrained Unconstrained Σ = ΛΛ ′+ψΞCCUC Constrained Constrained Unconstrained Constrained Σ sit = ΛΛ ′+ψsit IJCUUU Constrained Unconstrained Unconstrained Unconstrained Σ sit = ΛΛ ′+ψsit Ξ sitUCCC Unconstrained Constrained Constrained Constrained Σ sit = Λ sit Λ

′sit+ψIJ

UCCU Unconstrained Constrained Constrained Unconstrained Σ sit = Λ sit Λ′sit+ψΞ

UCUC Unconstrained Constrained Unconstrained Constrained Σ sit = Λ sit Λ′sit+ψsit IJ

UUUU Unconstrained Unconstrained Unconstrained Unconstrained Σ sit = Λ sit Λ′sit+ψsit Ξ sit

CCUU Constrained Constrained Unconstrained Unconstrained Σ sit = ΛΛ ′+ψsit ΞUCUU Unconstrained Constrained Unconstrained Unconstrained Σ sit = Λ sit Λ

′sit+ψsit Ξ

CUCU Constrained Unconstrained Constrained Unconstrained Σ sit = ΛΛ ′+ψΞ sitUUCU Unconstrained Unconstrained Constrained Unconstrained Σ sit = Λ sit Λ

′sit+ψΞ sit

3 Computational details

Even if this form of the likelihood has several appealing properties, as it stands ex-pression (2) is of little or no computational use, because it involves a sum over mT

terms for each unit i and cannot be directly evaluated. It quickly becomes infeasi-ble to compute even for small values of m as T grows to moderate size. Clearly, amore efficient procedure is needed to perform the calculation of the likelihood func-tion. This issue may be addressed via the so-called forward variables ([3, 21]). Toestimate model parameters, an Alternating Expectation Conditional Maximization(AECM) algorithm introduced by [18] is used. This algorithm is an extension ofthe EM algorithm using different definitions of missing data at different stages. TheAECM algorithm tends to be preferred to its alternatives due to its robustness andease of application in various scenarios, especially when the model parameters areconstrained. For homogeneous HMMs, the AECM reduces to an iterative procedurewith simple, closed form expressions for parameter estimates at each iteration. It isbased on complete-data log-likelihood, i.e., the log-likelihood of the observations(the incomplete data) plus the states (the missing data). Before deriving the com-plete data log-likelihood, we define uitl = I(Sit = l) as an indicator variable equal to


1 if unit i is in state l at time t and 0 otherwise, and vitlk = I(Sit = k,Sit−1 = l) as anindicator variable equal to 1 if unit i is in to state l at time t −1 and in state k at timet, 0 otherwise. Moreover, we partition the vector of unknown parameters Φ in (Φ1, Φ2 ); Φ1 contains the transition probabilities qlk and δl and µsit

. The Φ2 containsthe elements of Λ sit , Ψ sit and wisit . At the first stage of the algorithm, we define thestate labels as missing data, and the complete data log-likelihood function has thefollowing form:

ℓc1(θ) =n

∑i=1

m

∑l=1

ui0l logδl +T

∑t=1

m

∑l=1

m

∑k=1

vitlk logqlk

+T

∑t=0

m

∑l=1

uitl log f (yit | Sit = l)

, (3)

where

f (yit | Sit = l) =exp

[− 1

2 (yit −µ l)′(Λ lΛ ′

l +Ψ l)−1(yit −µ l)

](2π)J/2 | Λ lΛ ′

l +Ψ l |1/2

Thus, the first E-step consists of calculating the conditional expectation of expres-sion (3) by replacing all the quantities uitl and vitlk with their conditional expecta-tions uitl and vitlk, given the current values of the parameters and the observed data.On the other hand, at the first CM-step, the expected complete-data log-likelihoodis maximized with respect to µl , δl and qlk obtaining:

µl =∑n

i=1 ∑Tt=0 uitlyit

∑ni=1 ∑T

t=0 uitl, δl =

∑ni=1 ui0l

n; qlk =

∑ni=1 ∑T

t=1 vitlk

∑ni=1 ∑T

t=1 ∑mk=1 vitlk

.

At the second stage of the AECM algorithm, we use µl , δl and qlk obtainedabove, when estimating Λ l and Ψ l and consider the state labels and the factors to bethe missing data. Therefore, the complete data log-likelihood is

ℓc2(θ) =n

∑i=1

m

∑l=1

ui0l logδl +T

∑t=1

m

∑l=1

m

∑k=1

vitlk logqlk

+T

∑t=0

m

∑l=1

uitl log f (yit | Sit = l,wil)+T

∑t=0

m

∑l=1

loguitl f (wil)

. (4)

In a similar manner as before, the estimates of Λ l and Ψ l can be easily derivedunder the different imposed constraints (not shown her for sake of brevity). TheAECM algorithm iteratively updates the parameters until convergence to maximumlikelihood estimates of the parameters. As a by-product of the estimation procedurewe have the possibility of classifying genes on the basis of their posterior probabilityestimates uitl . In fact, the i-th gene can be classified to the l-th group (component of


the estimated mixture) if uitl = max(uit1, uit2, . . . , uitm). It is worth noticing that eachgroup is characterized by homogeneous values of the estimated parameters.

4 The yeast galactose data

To discuss the empirical performance of the proposed model, we use a typical geneexpression dataset where the expression levels are measured at many time points orunder different conditions to elucidate genetic networks or some important biologi-cal process. Specifically, this dataset has been used to study integrated genomic andproteomic analyses of a systemically perturbed metabolic network ([8]). The exper-iments included single gene deletion involving nine of the key genes (GAL1, GAL2GAL3 GAL4, GAL5(PGM2), GAL6(LAP3), GAL7 , GAL10, GAL80) that partic-ipate in yeast galactose metabolism. For each experiment, one of the nine geneswas deleted, or alternatively, the experiment used a wild-type cell wherein no geneswere deleted. For each of those 10 experimental conditions, galactose was avail-able extracellularly in one set of experiments and absent in another set. Thus, therewere a total of T = 20 different experimental conditions. Since each of those 20experiments refers J = 4 experimental conditions, the overall dataset contains 80experiments. As in ([22]) and ([19]), we imputed all the missing values using a k-nearest neighbor method. The resulting n = 205 gene expression levels reflect fourfunctional categories in the Gene Ontology (GO) listings ([2]). Thus, we applieda hidden Markov of factor analyzers to group genes into m=4 states; we do notdiscuss fitting for varying numbers of states m, since we would analyze the perfor-mance of our proposal in reproducing the known functional categories. Genes areallowed to move among the states during the period of observation. In fact, a genecan be associated with multiple biological functions, due to the fact that genes of-ten have several distinct roles in regulation processes. Therefore the assumption ofassigning a gene only to one state (or cluster) is an oversimplification for a biologi-cal system. In the following we summarize the potential of the proposed approach.We look at three over twelve factorial parameterizations as illustrative examples.The evolution over time is presented in Figure 1, while a comparison in terms ofBIC = 2× ℓ+ #parameters× log(n) and goodness-of-classification with PGMMsis provided in Table 2. We classify each gene in the state maximizing its posteriormembership probability deriving the unobserved sequence of states. Figure 1 showsthe hidden sequences of hidden states; it is clear that time dependence and hetero-geneity play an important role in the classification, since genes seem to change theirbehavior, moving across states over time.

Furthermore, we provide a measure of the quality of the classification by theindex

S =∑n

i=1 ∑Tit=1

(max(uit1, uit2, . . . , uitm)− 1

m

)(1− 1

m )∑ni=1 Ti


Index S is always between 0 and 1, with 1 corresponding to the situation of ab-sence of uncertainty in the classification, since one of such posterior probabilities isequal to 1 for every individual at every time, with all the other probabilities equalto 0. It helps in identifying if the population clusters are sufficiently well separated.It is worth noting that each state is characterized by homogeneous values of esti-mated random effects; thus, conditionally on observed covariates values, subjectsfrom that state have a similar propensity to the event of interest. The UCCU HMMis the preferred model, providing the best goodness-of- classification and the bestBIC. This confirms the importance of appropriately account for all longitudinal datacharacteristics.

Table 2 Summary results

Model H PGMM HMMBIC S BIC S

UUUU2 14821.53 0.758 16867.12 0.9313 14759.23 0.756 16840.02 0.932

UCCU2 14659.44 0.795 16820.41 0.9343 14859.23 0.758 16903.45 0.932

UCUC2 7611.761 0.735 10413.41 0.9703 12162.73 0.864 14131.45 0.919

Fig. 1 Hidden states sequences for the 205 genes over 20 times

Times

Gen

es

5 10 15 20

20

40

60

80

100

120

140

160

180

2001

1.5

2

2.5

3

3.5

4


References

1. Alfo M. and Maruotti, A.: A hierarchical model for time dependent multivariate longitudinaldata. In: Data Analysis and Classification. Springer Series on Studies in Classification, DataAnalysis and Knowledge Organization. C. Lauro, F. Palumbo (eds), Springer-Verlag, 271-279(2010).

2. Ashburner, M. Ball, C.A. Blake, J.A. Botstein, D. Butler, H. Cherry, J.M. et al. Gene Ontol-ogy: tool for the unification of biology. Nat Genet 25, 25-29.

3. Baum, L.E., Petrie, T., Soules, G. and Weiss, N.: A maximization technique occurring inthe statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist. 41,164–171 (1970).

4. Celeux, G., Martin, O. and Lavergne, C.: Mixture of linear mixed models for clustering geneexpression profiles from repeated microarray experiments. Stat. Model. 5, 243–267.

5. De la Cruz-Meıa, R., Quintana, F. A. and Marshall, G.: Model-based clustering for longitudi-nal data. Comp. Stat. and Data Anal. 52, 1441–1457 (2008).

6. Fruhwirth-Schanatter, S.: Panel data analysis: a survey on model-based clustering of timeseries. Adv. Data Anal. Classif. 5, 251–280 (2011).

7. Ghahramani, Z. and Hinton, G. E.: The EM algorithm for factor analyzers. Technical ReportCRG-TR- 96-1, University of Toronto (1997).

8. Ideker, T. Thorsson, V. Ranish, J.A. Christmas, R. Buhler, J. Eng, J.K. et al. Integrated ge-nomic and proteomic analyses of a systematically perturbed metabolic network. Science 92,929-934 (2001).

9. Laird, N.M. and Ware, J.H.: Random effects models for longitudinal data. Biometrics 38,963–974 (1982).

10. Martella, F. and Vermunt, J.K.: Model-based approaches to synthesize microarray data:a unifying review using mixture of SEMs. Stat. Methods Med. Res. (2012), doi:10.1177/0962280211419482

11. Maruotti, A. and Ryden, T.: A semiparametric approach to hidden Markov models underlongitudinal observations. Statist. Comput. 19, 381–393 (2009).

12. Maruotti, A.: Mixed hidden Markov models for longitudinal data: an overview. Int. Stat. Rev.79,427–454 (2011).

13. Maruotti, A. and Rocci, R.: A mixed non-homogeneous hidden Markov model for categoricaldata, with application to alcohol consumption. Stat. Med. (2012) doi: 10.1002/sim.4478.

14. McLachlan, G.J. and Peel, D.: Finite mixture models. Wiley series in probability and statis-tics. Wiley, New York (2000).

15. McNicholas P.D. and Murphy T.B.: Parsimonious Gaussian mixture models. Stat. Comput.18, 285–296 (2008).

16. McNicholas P.D. and Murphy T.B.: Model-based clustering of longitudinal data. The Cana-dian Journal of Statistics 38, no. 1, 153–168 (2010).

17. McNicholas P.D. and Murphy T.B.: Model-based clustering of microarray expression data vialatent Gaussian mixture models. Bioinformatics 26, no. 21, 2705–2712 (2010).

18. Meng, X.L. and Van Dyk, D.A.: The EM algorithm -an old folk song sung to a fast new tune.Journal R. Statist. Soc., B 59 511–567 (1997).

19. Ng, S.K. McLachlan, G.J. Bean, R.W. and Ng, S.W. Clustering replicated microarray data viamixtures of random effects models for various covariance structures. In: Boden Mand BaileyTL (eds.) Conferences in research and practice in information technology. The AustralianComputer Society, Sydney, 73 29-33 (2006).

20. Tipping, T. E. and Bishop, C. M.: Mixtures of probabilistic principal component analysers.Neural Comp. 11, 443–482 (1999).

21. Welch, L.R.: Hidden Markov models and the Baum-Welch algorithm. IEEE Inf. Theory Soc.Newsl. 53, 1–13 (2003).

22. Yeung, K.Y. Medvedovic, M. Bumgarner, R.E. Clustering gene-expression data with repeatedmeasurements. Genome Biol, 4(R34) (2003).

23. Zucchini, W. and MacDonald, I.L.: Hidden Markov Models for Time Series: An IntroductionUsing R. Boca Raton, FL: Chapman & Hall (2009).

Random coefficient based dropout models: afinite mixture approach

Alessandra Spagnoli and Marco Alfo

Abstract In longitudinal studies, subjects may be lost to follow up (a phenomenonwhich is often referred to as attrition) or miss some of the planned visits thus gen-erating incomplete responses. When the probability for nonresponse, once condi-tioned on observed covariates and responses, still depends on the unobserved re-sponses, the dropout mechanism is known to be informative. A common objectivein these studies is to build a general, reliable, association structure to account fordependence between the longitudinal and the dropout processes. Starting from theexisting literature, we introduce a random coefficient based dropout model wherethe association between outcomes is modeled through discrete latent effects; theselatent effects are outcome-specific and account for heterogeneity in the univariateprofiles. Dependence between profiles is introduced by using a bidimensional rep-resentation for the corresponding distribution. In this way, we define a flexible latentclass structure, with possibly different numbers of locations in each margin, and afull association structure connecting each location in a margin to each location inthe other one. By using this representation we show how, unlike standard (unidi-mensional) finite mixture models, an informative dropout model may properly nesta non informative dropout counterpart.

Key words: Finite mixtures, informative dropout, concomitant latent variables

Alessandra SpagnoliDipartimento di Scienze Statistiche, Sapienza Universita di Roma, e-mail: [email protected]

Marco AlfoDipartimento di Scienze Statistiche, Sapienza Universita di Roma, e-mail:[email protected]

1

2 Alessandra Spagnoli and Marco Alfo

1 Introduction

In longitudinal studies, measurements from the same individuals (units) are takenrepeatedly over time. These kind of studies often suffer of attrition, since indi-viduals may dropout of the study before the scheduled completion time and thuspresent incomplete data. When the reasons for dropout is related to unobservedresponses, even after controlling for available covariates and responses, the miss-ingness is known to be informative. In such studies, scientific interest may focus onthe association structure between the longitudinal measurements and the missignessprocess. In a seminal paper, [10] discuss a class of statistical models for non ignor-able dropout, referred to as Random Coefficient Based Dropout Models (RCBDM),where marginal association between the longitudinal and the survival process arisesdue only to dependent, outcome-specific, random coefficients. Separate models arehypothesized for the two partially observed processes, which share a common (cor-related) set of random coefficients. In the context of binary responses, [2] proposean extension of these models by defining a semi-parametric selection model wherethe longitudinal and the dropout processes are linked through correlated random ef-fects. The random effects are usually assumed to be Gaussian, but this assumptionhas been questioned by some authors, see eg [14], since the resulting inferences canbe sensitive to assumptions that cannot be verified from the available data. In thisperspective, [13], investigated the effect of misspecifying the random effect distri-bution on parameter estimates and standard errors when a shared parameter model isconsidered. They showed that, as the number of repeated longitudinal measurementsper individual grows, the effect of misspecifying the random effect distribution van-ishes for certain parameter estimates, thus referring, implicitly, to theoretical resultsin [4]. But in several contexts, for example in clinical research, the follow up timesare usually short, and individual sequences include only a few information on therandom effects; therefore, the choice of the random effect distribution may be im-portant. As far as selection models are entailed, just to mention a few, [19] used aMonte Carlo EM algorithm for linear mixed model with Gaussian random effects,[8] propose a Laplace approximation to overcome the high-dimensional integrationover the distribution of the random effects. Numerical integration techniques, suchas standard or adaptive Gaussian quadrature, can be used as well. In this paper,we are interested to investigate the association structure between measurement anddropout processes when the random coefficient distribution is left completely un-specified, adopting a finite mixture perspective. We consider a bivariate distributionfor the random coefficients that is equal to the product of the marginal distributionsonly when the dropout mechanism is ignorable. The structure of the paper follows.Section 2 discusses a random coefficient based dropout model where the associationbetween outcomes is modeled through discrete latent effects. Section 3 describes theproposed ML algorithm. Last section contains concluding remarks.

Random coefficient based dropout models: a finite mixture approach 3

2 Random coefficient-based models

Let Yit represent a set of longitudinal measurements recorded on i= 1, . . . ,n subjectsat time t = (1, . . . ,T ), associated to a row vector of p covariates xit = (xit1, . . . ,xit p).Let us assume that the observed responses yit are realizations of a random variablewith density in the exponential family and canonical parameter, θit . The canonicalparameter is defined as follows:

θit = xTit β +xTit bi (1)

The terms bi, i = 1, . . . ,n, are used to model unobserved individual-specific (time-invariant) heterogeneity common to each lower-level unit (time) within the same ithupper-level unit (individual), while β is a p−dimensional vector of fixed regressionparameters. Those effects that vary across individuals are collected in the designvector zit = (zit1, . . . ,zitm). We denote with Ri the missing data indicator vector,with generic element defined as Rit = 1 if the ith unit drops out at any point in thewindows (t − 1, t), Rit = 0 otherwise. Using this representation, we are implicitlyassuming a discrete structure for the time to dropout; however the following argu-ments apply to continuous time survival process as wel. We assume that, once aperson drops out, he or she is out forever (attrition). If the designed completion timeis denoted by T , we will have Ti ≤ T measures for each unit. We may introduce anexplicit model for the dropout mechanism, conditioning on a set of dropout specificcovariates, vi, and the random coefficients in the longitudinal response model:

h(ri|vi,yi,bi) = h(ri|vi,bi) =Ti

∏t=1

h(rit |vi,bi) i = 1, . . . ,n (2)

where the corresponding canonical parameter is: φit = vTit γ + dT

it bi. These modelsare usually referred to as shared parameter models, see [21],[22], and are basedon the assumption of conditional independence between the longitudinal responseand the dropout indicator; as it can be easily noticed, they assume a perfect linearcorrelation between the latent variables in the two equations. In this framework, thejoint density of the measurements process Yit and the missigness process Rit may bewritten as: ∫ [ T

∏t=1

f (yit |xit ,bi)Ti

∏t=1

h(rit |vit ,bi)

]dG(bi) (3)

where G(·) represents a discrete or a continuous random coefficient distribution.Here, measurement and missigness processes are assumed to be independent giventhe random effects bi; therefore, if any, association is completely accounted for bythis latent structure. Correlated random effects represent a further alternative, see eg[1]: the unobservable latent characteristics control for potential overdispersion in theunivariate profiles and for association between the measurements and missingnessprocesses; this structure, however, avoids unit correlation estimates, and represents amore flexible approach when compared to shared random effects, where conditional


independence still hold. Let bi = (b1i,b2i) denote a set of subject and outcome spe-cific random coefficients; then, the joint density of the measurement process Yit andthe missigness process Rit can be factorized as:

∫ [ T

∏t=1

f (yit |xit ,b1i)Ti

∏t=1

h(rit |vit ,b2i)

]dG(b1i,b2i) (4)

An extension of this association structure between random coefficients in the twoequations may e defined following [5] where a general random effect model is intro-duced, where common, partially shared and independent (response-specific) randomeffects influences the measurement and the dropout processes. While it is commonto assume that random effects follows a Gaussian distribution, this does not usu-ally lead a tractable form of the integral in eqs (3) and (4). Among others, [20],[15], [17], show that the choice of the random effect distribution does not have greatimpact on parameter estimates, except for extreme cases, such as discrete distribu-tions. On the same line, [13] show that when all subjects have a relatively largenumber of repeated measurements, the effects of a misspecifying the random effectdistribution became minimal for model parameter estimates. However, [18] observethat the choice of an appropriate random effect distribution is generally difficult for,at least, three reasons. There is often little information about these unobservables,thus any distributional assumption is difficult to justify, by looking only at observeddata. When high dimensional random coefficients are considered, the use of a para-metric multivariate distribution imposing the same shape on every dimension canbe restrictive. A potential dependence of the random effects on unobserved covari-ates induces heterogeneity that cannot be captured by common parametric assump-tions. In studies where some subjects have few measurements, ie due to dropout,the choice of the random coefficient distribution may therefore be important. A fi-nite mixture approach avoids any unverifiable assumptions upon this distribution,frequently referred to as the mixing distribution. In this perspective, [18] proposea semi-parametric shared parameter model to analyze continuous longitudinal re-sponses while adjusting for non monotone missingness. On the same line, [2] jointlyanalyze longitudinal binary responses subject to dropout trough a selection modelwith correlated, outcome-specific, random coefficients. Using a finite mixture ap-proach, the log-likelihood function in equation (4) can be written as follows:

`(·) =n

∑i=1

K

∑k=1

f (yi|xi,b1k)h(ri|vi,b2k)πk

=

n

∑i=1

K

∑k=1

f (yi,ri|xi,vi,bk)πk

(5)

where πk =Pr(bk)=Pr(b1k,b2k) is the joint probability of locations bk =(b1k,b2k),k=1, . . . ,K. The use of finite mixture has several significant advantages over paramet-ric models; for instance, this approach is computationally efficient, and the discretenature of the estimate may help classify subjects in components corresponding toclusters characterized by homogeneous values of random parameters. However, wemay notice that the latent variables, as well as the corresponding number of loca-tions, considered in the model to account for individual extra-model departures can


be different when the longitudinal and the missingness processes are considered.For this reason, according to [3], we propose to consider different number of com-ponents, locations and/or masses for the latent variables in the two equations. Whencompared to previously mentioned proposals, see eg equation (5), this is a moreflexible representation for the random coefficient distribution and, in particular, thismodel properly nests a model which describes the dropout as being non informative.That is, the proposed MNAR model properly nests a MAR countepart, while in caseof equation (5) this is not true. Let us suppose the joint bivariate distribution of therandom effects has the following marginal representation [9]:

P1 = (u1g,π1g) ,g = 1, . . . ,K1 P2 = (u2l ,π2l) , l = 1, . . . ,K2

with π1g = Pr(b1i = b1g), g = 1, . . . ,K1, π2l = Pr(b2i = b2l), l = 1, . . . ,K2. That is,we associate to each couple of random coefficients, say (b1g,b2l), g = 1, . . . ,K1,l = 1, . . . ,K2, a mass πgl = Pr(b1i = b1g,b2i = b2l), where we do not restrict to con-sider the same number of components in each profile. While marginals control forheterogeneity in the univariate profiles, joint probabilities describe the associationbetween latent effects in the two submodels. This approach can be related to a stan-dard finite mixture approach where K = K1×K2 components are used and each ofthe K1 locations in the first profile appears in a couple with each of the K2 locationscorresponding to the second profile. Theorem 1 in [6] shows that the elements ofany probability matrix π ∈ Π K1K2 , where the latter represents the set of K1×K2probability matrices, can be decomposed as:

πgl =M

∑h=1

τhπ1g|hπ2l|h (6)

for an appropriate choice of M. Obviously, the following constraints hold:

∑h

τh = ∑g

π1g|h = ∑l

π2l|h = ∑g

∑l

πgl = 1

Therefore, the two set of random coefficients b1i and b2i, i= 1, . . . ,n are independentconditional on belonging to the h-th (upper level) latent class h = 1, . . . ,M. Randomcoefficients control for heterogeneity in the univariate profiles, while the hierarchyof the latent components control for potential dependence between outcome-specificrandom coefficients; this somewhat leads to separability of univariate heterogeneityand bivariate dependence. In some way, the hierarchical structure for πgl resemblesa copula-based model, where dependence between profiles is modeled through acopula function joining the marginal distributions for the outcome-specific randomcoefficients, see [13]. The independence case arises simply when M = 1; in this case,the dropout mechanism is non ignorable. The dropout probability still depends onunobserved sources of variation, but these are independent on those influencing thelongitudinal measurements. When M ≥ 2 we have some form of dependence andwe can define different non ignorable dropout mechanisms according to the valuesassumed by the parameter M. In this sense, it may be interesting to investigate the


sensitivity of the results with respect to model assumptions when M moves awayfrom 1, as for example in [16].

3 ML Parameter Estimation

The data vector is composed by an observable part yi and by unobservables zi =(zi1, . . . ,ziK) and ζ i = (ζi1, . . . ,ζiM) representing lower and upper level membershipvectors. For fixed K1, K2 and M, we assume zi and ζ i have multinomial distributions,with probabilities πgl , g= 1, . . . ,K1, l = 1, . . . ,K2 and τh, h= 1, . . . ,M. The completedata likelihood is given by

Lc(·) =n

∏i=1

K1

∏g=1

K2

∏l=1

f (yi,ri | zigl)

[M

∏h=1

π1g|hπ2l|hτh

]ζih

zigl

=n

∏i=1

K1

∏g=1

K2

∏l=1

Ti

∏t=1

f (yit | zigl)h(rit | zigl)

[M

∏h=1

π1g|hπ2l|hτh

]ζih

zigl

where τh is the prior probability for the h−th upper level latent class, π1g|h andπ2l|h are the conditional probabilities of belonging to the g-th and the l.th lowerlevel components, conditional on being in the h-th class. We partition the parametervector Ψ =

(Ψ g,Ψ l ,Ψ glh

), where Ψ g and Ψ l denote the parameter vectors for the

longitudinal and the dropout process, respectively, while Ψ glh = (π1g|h,π2l|h),τh.By writing figl = f (yi,ri | zigl) = f (yi | zigl)h(ri | zigl), the score function is:

Sc(Ψ g) =n

∑i=1

K2

∑l=1

wigl∂

∂Ψ g

[log( figl)+ log(πgl)

]=

n

∑i=1

wig∂

∂Ψ g[log( fig)]

Sc(Ψ l) =n

∑i=1

K1

∑g=1

wigl∂

∂Ψ l

[log( figl)+ log(πgl)

]=

n

∑i=1

wil∂

∂Ψ l[log(hil)]

Sc(Ψ glh) =n

∑i=1

wiglωih|gl∂

∂Ψ glh

[log(πg|h)+ log(πl|h)+ log(τh)

]where fig = f (yi | zigl), hil = f (ri | zigl) and ωih|gl is the posterior probability that thei−th belongs to the h-th upper level component, given the observed data, the lowerlevel components and the current parameter estimates Ψ

(r). Terms wigl represent the

posterior probability of the unit being in the g-th component and the l-th componentin the measurement and dropout profiles, respectively. In this way, we may test forindependence of the two processes, through standard Wald-type or χ2-based statis-tics; in particular, when the probability of dropout depends on unobserved sourcesof variation, eg unobserved heterogeneity, which influences also the longitudinal re-sponse, then the dropout process is non ignorable. Molemberghs et al.(2007) show


that for every MNAR model, there is an MAR counterpart that produces exactlythe same fit to observed data. This can be more easily understood if we look atprevious score equations that resemble the score equations for univariate mixtureregression models, representing a potential MAR solution. The ML estimates canbe achieved, conditional on w(r)

igl , in subsequent maximization steps. To speed ud theEM algorithm, and to ensure identifiability of a two-level latent structure with onlyone observation level, we may proceed by discretizing w(r)

igl using a MAP rule, as inCEM algorithm (condition choice=”C” in the algorithm below), or by drawing thecomponent indicator z(r)igl from a multinomial distribution using posterior probabili-ties as in SEM algorithms (condition else below), see [11]. In this case, the last thescore equation resembles the one for a polytomous latent class model. The resultingEM algorithm is sketched below.

begininitialize w(0)

igl ,Ψ(0),ε > 0 repeat

update w(t)igl Expectation Step

if (choice=”C”)z(t)igl = 1 ⇐⇒ w(t)

igl = maxr,v w(t)irv; Classification Step

elsedraw z(t)igl with probs given by w(t)

igl ; Stochastic Step

estimate β(t)1 , β

(t)2 , u1, u2 given z(t)igl Maximization Step

estimate π(t)g|h,π

(t)l|h ,τ

(t)h Maximization Step

until Q(·)(t)−Q(·)(t−1) < ε;end

Algorithm 1: Pseudo-code of the proposed SEM-CEM algorithm

4 Conclusions

We have defined a random coefficient based dropout model where the associationbetween the longitudinal and the dropout processes is modeled through discreteoutcome-specific latent effects. A bidimensional representation for the random ef-fect distribution is used with possibly different numbers of locations in each margin,and a full association structure connecting each location in a margin to each locationin the other one. The proposed approach may also be used, for example, in clinicalcontext, where we have only few repeated measurements for subjects. The main ad-vantage of a more flexible representation for the random effects distribution is thatthe general MNAR model properly nests a model where the dropout mechanism isnon informative. This opens to a sensitivity analysis of changes in model parameterestimates as the number of upper level components, M, moves from 1.


References

1. Aitkin, M., Alfo, M.: Variance component models for longitudinal count data with baselineinformation: epilepsy data revisited. Stat. Comput. 3, 291–303 (2003)

2. Alfo, M., Maruotti, A.: A selection model for longitudinal binary responses subject to non-ignorable attrition. Statist. Med. 28, 2435–2450 (2009)

3. Alfo, M., Rocchetti, I.: A flexible approach to finite mixture regression models for multivari-ate mixed responses. Submitted (2012)

4. Carlin, BP, Louis TA Bayes and Empirical Bayes Methods for Data Analysis, Chapman andHall: New York, 2000.

5. Creemers, A., Hens, N., Aerts, M., Molenberghs, G., Verbeke, G., Kenward,M.: Generalizedshared-parameter models and missingness at random. Statist. Mod. 11, 279–310 (2011)

6. Dunson, D., Chuanhua, X.: Nonparametric Bayes Modeling of Multivariate Categorical Data.J.Am. Statist. Assoc. 104, 1042–1051 (2009)

7. Follmann, D., Wu, M.: An Approximate Generalized Linear Model with Random Effects forInformative Missing Data. Biometrics 51, 151–168 (1995)

8. Gao, S.: A shared random effect parameter approach for longitudinal dementia data withnon-ignorable missing data. Statist. Med. 23, 211–219 (2004)

9. Lagona, F.: Model-based classification of clustered binary data with non ignorable missingvalues. In Proceedings of Italian Statistical Society. CLEUP, Padova (2010)

10. Little, R.J.A.: Modeling the drop-out mechanism in repeated-measures studies. J. Amer.Statist. Assoc. 90, 1112–1121 (1995)

11. McCulloch, C. Maximum likelihood algorithms for generalized linear mixed models. J.Am.Statist. Assoc. 92, 162–170 (1997)

12. Molenberghs, G., Beunckens, C., Sotto, C., Kenward, M. G.: Every missing not at randommodel has got a missing at random counterpart with equal fit. J. Roy. Statist. Soc. Ser. B 70,371–388 (2008)

13. Rizopoulos, D., Verbeke, G., Lesaffre, E., Vanrenterghem, Y.: A two-part joint model for theanalysis of survival and longitudinal binary data with excess zeros. Biometrics 64, 611–619(2008)

14. Scharfstein, D., Rotnitzky, A., Robins, J.: Adjusting for Nonignorable Drop-Out Using Semi-parametric Nonresponse Models. J. Am. Statist. Assoc. 94, 1096–1120 (1999)

15. Song, X., Davidian, M., Tsiatis, A.: semiparametric likelihood approach to joint modeling oflongitudinal and time-to-event data. Biometrics 58, 742–753 (2002)

16. Troxel, AB, Ma, G, Heitjan, DF: An Index f Local Sensistivity to Non Ignorability, Statist.Sinica 14, 1221–1237 (2004)

17. Tsiatis, A., Davidian, M.: An overview of joint modeling of longitudinal and time-to-eventdata. Statist. Sinica 14, 793–818 (2004)

18. Tsonaka, R., Verbeke, G., Lesaffre, E.: A Semi-Parametric Shared Parameter Model to HandleNonmonotone Nonignorable Missingness. Biometrics 65, 81–87 (2009)

19. Verzilli, C.J., Carpenter, J.R.: A Monte Carlo EM algorithm for random-coefficient-baseddropout models. J. Appl. Statist. 29, 1011–1021 (2002)

20. Wang, Y., Taylor, J.: Jointly Modeling Longitudinal and Event Time Data with Applicationto Acquired Immunodeficiency Syndrome. J.Am. Statist. Assoc. 96, 895–905 (2001)

21. Wu, M., Carroll, R.: Estimation and comparison of changes in the presence of informativeright censoring by modeling the censoring process. Biometrics 44, 175–188 (1988)

22. Wu, M., Bailey, K.: Estimation and comparison of changes in the presence of informativecensoring by modeling the censoring process. Biometrics 44, 175–188 (1989)

Bayesian inference for causal effects inrandomized experiments with noncompliance:The role of multivariate outcomes

Fan Li, Alessandra Mattei and Fabrizia Mealli

Abstract Principal Stratification (PS) is a principled framework for addressingnoncompliance issues. Due to the latent nature of principal strata, model-based PSanalysis usually involves weakly identified models and identification of causal ef-fects relies on untestable structural assumptions, such as exclusion restriction. Thisarticle develops a Bayesian approach to exploit multivariate outcomes to sharpen in-ferences for weakly identified models within PS. Simulation studies are performedto illustrate the potential gains in identifiability of jointly modelling more than oneoutcome. The method is applied to evaluate the causal effect of a job search programon depression.

Key words: Bayesian statistics, Causal inference, Principal stratification, Mixturemodels, Multivariate outcome, Noncompliance

1 Introduction

Many randomized experiments suffer from noncompliance, which breaks random-ization, implying that assignment to the treatment rather than the treatment itself israndomly administered to individuals. In the presence of noncompliance, the treat-ment actually received is a post-treatment intermediate variable, which is potentiallyaffected by the assignment and also may affect the response. A standard intention-to-treat analysis gives a valid inference of the effect of assignment on outcome,

Fan LiDepartment of Statistical Science, Duke University, e-mail: [email protected]

Alessandra MatteiDipartimento di Statistica “G. Parenti”, Universita di Firenze e-mail: [email protected]

Fabrizia MealliDipartimento di Statistica “G. Parenti”, Universita di Firenze e-mail: [email protected]

1

2 Li, Mattei, Mealli

but usually the goal is to study the effect of receiving the treatment rather than theassignment.

A principled framework to noncompliance is principal stratification (PS) (Fran-gakis and Rubin, 2002), a generalization of the instrumental variable approach tononcompliance by Angrist et al.(1996) and Imbens and Rubin (1997). While PS isapplicable to a wide range of situations involving intermediate variables, such astruncation by death, mediation, this paper focuses on the special case of noncom-pliance. A PS with respect to the intermediate variable “receipt of the treatment” isa cross-classification of units into latent classes defined by the joint potential com-pliance statuses under both treatment and control. Principal causal effects (PCE),that is, comparisons of potential outcomes under different treatment levels withincompliance principal strata, are in general the causal estimands of primary interestin a PS analysis.

Since at most one potential outcome is observed for any unit, compliance princi-pal strata are generally latent and the key of PS analysis is to address the identifia-bility issue of PCEs. There are two streams of work in the existing literature regard-ing this: (1) deriving nonparametric bounds for the causal effects under minimalstructural assumptions (e.g., Manski, 1990); (2) specifying additional structural ormodelling assumptions, such as exclusion restriction, to identify PCEs and conduct-ing sensitivity analysis to check the consequences of violations to such assumptions(e.g., Schwartz et al., 2012).

Using auxiliary information from covariates to identify causal effects has beenalso discussed (e.g., Jo, 2002). However, the importance of exploiting multiple out-comes is less acknowledged. In fact, information on multiple outcomes is routinelycollected in randomized experiments and observational studies, but it is rarely usedin analysis unless the goal is to study the relationships between outcomes. Excep-tions include Jo and Muthen (2001), Mattei et al. (2012) and Mealli and Pacini(2012). In this article we further investigate the role of multivariate outcomes tosharpen inferences for weakly identified models within PS, proceeding from a para-metric perspective, particularly under the Bayesian paradigm.

The article is organized as follows. Section 2 introduces the PS framework andSection 3 proposes a Bayesian approach to exploit multivariate outcomes to sharpeninferences for weakly identified models within PS. In Section 4, we perform simula-tion studies to examine the benefit from using multivariate outcomes under variousscenarios. In Section 5, we re-analyze the Job Search Intervention Study (JOBS II)using the proposed bivariate approach. Section 6 concludes with a discussion.

2 The principal stratification approach to noncompliance

Discussion of causal inference in this article is carried out under the potential out-come framework, also known as the Rubin Causal Model (RCM) (Rubin, 1978).Consider a large population of units, each of which can potentially be assigned atreatment indicated by z, with z = 1 for treatment and z = 0 for control. A ran-

Multivariate Bayesian inference for causal effects with noncompliance 3

dom sample of n units from this population comprises the participants in a study,designed to evaluate the effect of the treatment on all or a subset of M outcomesY = (Y1, . . . ,YM)′.

Assuming the standard Stable Unit Treatment Value Assumption (SUTVA) (Ru-bin, 1980), for each outcome Ym, we can define two potential outcomes for eachunit, Yim(0) and Yim(1), corresponding to each of the two possible treatment level.For each unit i, let Yi(z) = (Yi1(z), . . . ,Yim(z))′ be the vector of the potential out-comes given assignment z.

In the presence of noncompliance, the actual taking of the treatment is beyondthe control of the researcher, therefore there are also two potential treatment receiptindicators for each unit: Di(0) and Di(1). Let Si = (Di(0),Di(1)) be the joint po-tential treatment outcomes. Applying the idea of principal stratification, units canbe classified into four principal strata according to their compliance behaviour, de-fined by Si: compliers (Si = (0,1) = c); never takers (Si = (0,0) = n); always tak-ers (Si = (1,1) = a); and defiers (Si = (1,0) = d). By definition the principal stra-tum membership Si is not affected by treatment assignment. Therefore, comparisonsof summaries of Y (1) and Y (0) within a principal stratum, the so-called principalcausal effects (PCEs), have a causal interpretation because they compare quantitiesdefined on a common set of units. The causal estimands of interest in this article arethe population principal average causal effects for the first outcome:

τs = E(Yi1(1)−Yi1(0)|Si = s), (1)

for s = c,a,n, where τc is the well-known complier average causal effect (CACE).Since Di(0) and Di(1) are never jointly observed, principal stratum Si is latent.

Specifically, for each unit i and for each post-treatment variable, only one potentialoutcome is observed. Let Zi for i = 1, ...,n be the binary variable indicating whetherunit i is assigned to the treatment (Zi = 1) or to the control (Zi = 0). Then, the ob-served potential outcomes are: Dobs

i = Di(Zi) and Yobsi = Yi(Zi). Let Z,Dobs,Yobs

denote column vectors/matrices of the corresponding unit-level observed variables.The other potential outcomes Dmis

i = Di(1− Zi) and Ymisi = Yi(1− Zi), are miss-

ing. Henceforth, the bold denotes column vectors/matrices of the correspondingunit-level variables. Without loss of generality, we concentrate on the case of twooutcomes (M = 2). Since we focus on randomized experiments, the following as-sumption holds by design:

Assumption 1. Randomization of treatment assignment

Pr(Zi|Di(0),Di(1),Yi(0),Yi(1)) = Pr(Zi).

3 Multivariate Bayesian principal stratification analysis

Following Imbens and Rubin (1997), we model the conditional distribution of thecompliance type: πs = Pr(Si = s), s = a,c,d,n; and the conditional distribution


of potential outcomes given compliance type: f isz = Pr(Yobs

i |Si = s,Zi = z;θθθ s,z),z = 0,1. Let θθθ = (πa,πc,πd ,πn,θθθ s,zs=a,c,d,n;z=0,1) be the parameter vector andlet p(θθθ) denote its prior distribution. Then, the posterior distribution of θθθ can bewritten as

Pr(θθθ |Z,Dobs,Yobs) ∝ p(θθθ) ∏i:Zi=1,Dobs

i =1

[πc f i

c1 +πa f ia1]

∏i:Zi=1,Dobs

i =0

[πn f i

n1 +πd f id1]

∏i:Zi=0,Dobs

i =1

[πa f i

a0 +πd f id0]

∏i:Zi=0,Dobs

i =0

[πn f i

n0 +πc f ic0]

Without additional assumptions, inference on PCEs, τs, though possible and rel-atively straightforward from a Bayesian perspective, can be very imprecise, even inlarge samples, because models are only weakly identified. Jointly modelling multi-ple outcomes may help to reduce uncertainty about the treatment effects on the pri-mary outcomes. Specifically, though additional outcomes do not play extra role inthe compliance model, they can improve the prediction of principal strata member-ship through the outcome model. In addition, some key substantive identifying as-sumptions, such as exclusion restriction (ER), may be more plausible for secondaryoutcomes than the primary one. This condition is referred to as “partial exclusionrestriction (PER)” in Mealli and Pacini (2012):

Assumption 2. Stochastic Partial Exclusion Restriction

Pr(Yi2(0)|S = s) = Pr(Yi2(1)|S = s) f or s = a,n.

Restrictions on secondary outcomes reduce the parameter space of the joint distri-bution of all outcomes and in turn the marginal distribution of the primary one.

In our setting a strong monotonicity assumption holds by design, Di(1)≥ Di(0)and Di(0) = 0 for all i, implying that πd = 0 and πa = 0, so that the population isonly composed of compliers and never-takers. Therefore a simple Bernoulli modelis used for the compliance principal strata membership: Pr(Si = c) = πc, πc ∈ (0,1).

The outcome variables we focus on consists of either two continuous vari-ables or a continuous variable and a binary indicator. For two continuous out-comes, conditional on the principal stratum, we assume a bivariate normal distribu-

tion: Yi(z)|Si = s,∼ N2 (µµµs,z,ΣΣΣ s,z) , where µµµs,z =

(µ

s,z1

µs,z2

)and ΣΣΣ

s,z =

(σ

s,z11 σ

s,z12

σs,z12 σ

s,z22

),

s = c,n; z = 0,1. In the model for a continuous outcome Y1 and a binary out-come Y2, we replace Yi2 in the previous normal model by a latent variable Y ∗i2and assume in addition that Yi2(z) = I(Y ∗i2(z) > 0) with σ

s,z2 = 1. This is equiva-

lent to assuming a generalized linear model with probit link for Y2: Pr(Yi2(z) =1|Si = s) = Φ(µs,z

2 ). The full set of parameters is θθθ = πc,µµµs,z,ΣΣΣ s,z. We as-

sume that parameters are a priori independent. A conjugate prior Beta distribu-tion is used for the compliance principal strata: πc ∼ Beta(α0,β0). Conjugate priordistributions are also used for the parameters of bivariate continuous outcomemodels: ΣΣΣ

s,z ∼ Inv−Wishartν0((Λs,z0 )−1); and µµµs,z|ΣΣΣ s,z ∼ N2(µµµ

s,z0 ,ΣΣΣ s,z/ks,z

0 ). Forcontinuous-binary outcomes, we use semi-conjugate diffused normal prior distri-


Table 1 True values of parameters of the six simulation scenarios.

µµµc,0 µµµc,1 µµµn,0 µµµn,1 ΣΣΣc,0

ΣΣΣc,1

ΣΣΣn,0

ΣΣΣn,1

I[

2.58

] [0.56.5

] [2.7512

] [4.2513

] [0.09 0.240.24 1

] [0.01 0.080.08 1

] [0.16 0.160.16 4

] [0.04 0.08

0.082 4

]II

[2.58

] [0.56.5

] [2.7512

] [4.2524

] [0.09 0.240.24 1

] [0.01 0.080.08 1

] [0.16 0.160.16 4

] [0.04 0.120.12 9

]III

[2.58

] [0.56.5

] [2.7524

] [4.2536

] [0.09 0.240.24 1

] [0.01 0.080.08 1

] [0.16 0.960.96 9

] [0.04 0.8

0.8 25

]

butions for the mean parameters, µµµs,z ∼ N2(µµµs,z0 ,ΣΣΣ s,z

0 ). For the covariance matricesΣΣΣ

s,z, there is no conjugate prior, due to the constraint of σ22 = 1. As in Chib andHamilton (2000), we assume a flexible truncated bivariate normal prior for the co-variance parameters σσσ s,z = (σ s,z

11 ,σs,z12 ): σσσ s,z ∼N2(σσσ

s,z0 ,V s,z

0 )IA (σσσ s,z) where σσσs,z0 and

V s,z0 are hyperparameters, A = σσσ s,z :∈ℜ2 : σ

s,z11 > (σ s,z

12 )2 is the region where ΣΣΣ

s,z

is a positive definite matrix, and IA is the indicator function taking the value one ifσσσ s,z is in A and the value zero otherwise.

The joint posterior distribution, Pr(θθθ ,Dmis|Yobs,Dobs,Z), is obtained from aMarkov Chain algorithm that uses the Data Augmentation method (Tanner andWong 1987) to impute at each step the missing indicators and to exploit completecompliance data posterior distributions to update the parameter distribution. .

4 Simulations

To assess the improvement in the estimation of the PCEs by exploiting multivari-ate outcomes, we conduct simulation studies to compare the posterior inferencesobtained by jointly modelling two outcomes with those by only one outcome. Con-sistently with the JOBS II data, the primary outcome is considered to be “depressionscore”, measured with a 5-point rating scale (1 = not at all distressed to 5 = extremelydistressed). To simplify the computation, we focus on two continuous outcomes, us-ing alcohol use (in percent) as auxiliary outcome in the simulation study.

Here we present simulation results under three different scenarios, accounting fordifferent correlation structures between the outcomes for compliers and never-takersand various deviations from the ER for the secondary outcome. The true simulationparameters are shown in Table 1 and all simulated data sets have N = 600 sampleunits, generated using principal strata probabilities of 0.7 for compliers and 0.3for never-takers. The simulated samples are randomly divided in two groups, halfassigned to the treatment and half to the control.

Figure 1 shows the histograms and 95% posterior intervals of the PCEs for com-pliers and never-takers on the primary outcome, in both the univariate and bivari-ate cases. The results clearly demonstrate that simultaneous modelling of both out-


Principal Causal Effects for CompliersI II III

−2.3 −2.2 −2.1 −2.0 −1.9 −1.8 −2.3 −2.2 −2.1 −2.0 −1.9 −1.8 −2.3 −2.2 −2.1 −2.0 −1.9 −1.8

Principal Causal Effects for Never-takersI II III

1.1 1.3 1.5 1.7 1.9 2.1 1.1 1.3 1.5 1.7 1.9 2.1 1.1 1.3 1.5 1.7 1.9 2.1

True Value Univariate Approach Bivariate Approach

Fig. 1 Simulation Results: Histograms and 95% Posterior Intervals of PCEs for Compliers andNever-Takers.

comes significantly reduces posterior uncertainty for the causal estimates, providingconsiderably more precise estimates of the PCEs for compliers and never-takers.

In addition, the histograms in the upper and lower panels of Figure 1 suggest thatthe posterior distributions of the PCEs are much more informative in the bivariatecase. Specifically, histogram (I)s show that the posterior distributions of the PCEsfor compliers and never-takers are somewhat flat in the univariate approach, butbecome much tighter in the bivariate case. The improvement is even more dramaticin scenario (II) and (III), where the histograms show that posterior distributionsof the PCEs for compliers and never-takers are bimodal in the univariate case, butboth become unimodal in the bivariate case. Biases and MSEs (based on posteriormean) were also calculated (not shown here) and suggest that jointly modellingthe two outcomes reduces the average biases by more than 64% and the MSEs bymore than 79% in these scenarios. Several other scenarios with additional structuralassumptions were also examined: magnitude of the improvement varies, and thepattern is consistent with what is described here.

5 Application to the JOBS II study

The Job Search Intervention Study (JOBS II) (Vinokur et al., 1995) is a randomizedfield experiment intended to prevent poor mental health and to promote high-qualityreemployment among unemployed workers. The intervention consisted of 5 half-


Table 2 Posterior Distributions of PCEs on Depression for Compliers and Never-takers.

Bivariate Approach Univariate ApproachWithout PER With PER Without ER With ER

τc τn τc τn τc τn τc

Mean −0.135 −0.192 −0.211 −0.110 −0.207 −0.097 −0.269SD 0.157 0.176 0.196 0.229 0.178 0.207 0.1702.5% −0.486 −0.526 −0.620 −0.587 −0.573 -0.532 −0.62150% −0.122 −0.197 −0.200 −0.100 −0.201 -0.086 −0.26297.5% 0.143 0.179 0.144 0.306 0.123 0.281 0.045Width PCI0.95 0.629 0.706 0.764 0.893 0.696 0.812 0.666

Note that PER in the bivariate model is for remployment, whereas ER in the univariate model isfor depression.

day job-search skills seminar. The control condition consisted of a mailed bookletbriefly describing job-search methods and tips. Our analysis focuses on a sample of398 subjects who were at high-risk of depression.

Since the treatment condition is only available to the individuals assigned to theintervention in JOBS II, there is no defiers and always-takers. Noncompliance arisesin JOBS II because a substantial proportion (45%) of individuals invited to partic-ipate in the job-search seminar did not show up to the intervention. Our focus ison estimating causal effects of the intervention on a depression score measured sixmonths after the intervention, relaxing ER, but using reemployment status as sec-ondary outcome. ER on depression may be controversial, because, for example,never-takers randomized to the intervention might feel more demoralized by inabil-ity to take advantage of the opportunity. We use reemployment status as secondaryoutcome.

Table 2 reports summaries of the posterior distributions of PCEs for compliersand never-takers on depression in the bivariate (columns 1 through 4) and univariate(columns 5 through 7) case. Although the benefits of the bivariate approach are notpronounced, jointly modelling the two outcomes improves inference: the bivariateapproach (without PER) provides more precise estimates of the PCEs for compliersand never-takers, and tighter 95% posterior credible intervals. Also it is worth notingthat the bivariate approach leads to posterior distributions of τc and τn cantered atdifferent means and medians. In the light of the simulation results, which show thatjointly modelling two outcomes generally reduces the average biases, these findingsmake the bivariate estimates more faithful, suggesting that the univariate estimatesmay be affected by larger biases.

6 Discussion

We develop a Bayesian parametric bivariate model that exploits multiple outcomesof different types to improve the estimation of weakly identified causal estimands.


Although we focus on randomized experiments with noncompliance, our approachis immediately applicable to casual inference problems with alternative confoundedpost-treatment variables, and also in observational studies, whenever the exclusionrestriction assumptions for the instrument are often questionable.

Our approach has several benefits. First, the Bayesian approach provides a refinedmap of identifiability, clarifying what can be learned when causal estimands are in-trinsically not fully identified, but only weakly identified. Second, in a Bayesiansetting, the effect of relaxing or maintaining assumptions (regardless of structuralor modelling) can be directly checked by examining how the posterior distributionsfor causal estimands change, therefore serving as a natural framework for sensitivityanalysis. Third, the use of multiple outcomes improves model identifiability, leadingto smaller posterior variance of the parameters. However the additional informationprovided by secondary outcomes is obtained at the cost of having to specify morecomplex multivariate models, which may increase the possibility of misspecifica-tion. Therefore, model checking procedures to ensure sensible model specificationis a valuable topic for future research.

References

1. Angrist, J.D., Imbens, G. W., Rubin, D.B.: Identification of causal effects using instrumentalvariables. Journal of the American Statistical Association, 91, 444–455 (1996)

2. Chib, S., Hamilton, B.H.: Bayesian analysis of cross-section and clustered data treatmentmodels. Journal of Econometrics, 97, 25–50 (2000)

3. Frangakis, C.E., Rubin, D.B.: Principal stratification in causal inference. Biometrics, 58, 191–199 (2002)

4. Imbens, G. W., Rubin, D. B.: Bayesian inference for causal effects in randomized experimentswith noncompliance. Annals of Statistics, 25, 305–327 (1997).

5. Jo, B.: Estimation of intervention effects with noncompliance: Alternative model specifica-tions. Journal of Educational and Behavioral Statistics. 27, 385–420 (2002).

6. Jo, B., Muthen, B.: Modeling of intervention effects with noncompliance: a latent variable ap-proach for randomized trials. In G. Marcoulides, R. Schumacker (Eds.), New developmentsand techniques in structrual equation modeling, 57–87. Lawrence Erlbaum Associates, Pub-lishers. Mahwah, New Jersey (2001).

7. Manski, C.: Nonparametric bounds on treatment effects. The American Economic Review,80, 319–323 (1990).

8. Mattei A., Mealli F., Pacini B.: Exploiting Multivariate Outcomes in Bayesian Inferencefor Causal Effects with Noncompliance. In Studies in Theoretical and Applied Statistics(SIS2010 Scientific Meeting). Forthcoming (2012).

9. Mealli, F., Pacini, B.: Using secondary outcomes and covariates to sharpen inference in ran-domized experiments with noncompliance. Technical report, Department of Statistics, Uni-versity of Florence (2012).

10. Rubin, D.B: Comment on ‘Randomization analysis of experimental data: The Fisher random-ization test’ by D. Basu. Journal of the American Statistical Association, 75, 591–593 (1980).

11. Schwartz, S., Li, F., Reiter J.: Sensitivity analysis for unmeasured confounding in principalstratification. Statistics in Medicine. In press (2012).

12. Tanner, M., Wong, W.: The calculation of posterior distributions by data augmentation (withdiscussion). Journal of the American Statistical Association, 82, 528–550 (1987).

13. Vinokur, A., Price, R., Schul, Y.: Impact of the jobs intervention on unemployed workers vary-ing in risk for depression. Journal of American Community Psychology, 23, 39–74 (1995).

Unconditional and Conditional QuantileTreatment Effect: Identification Strategies andInterpretations

Margherita Fort

Abstract This paper reviews strategies that allow one to identify the effects of pol-icy interventions on the unconditional or conditional distribution of the outcome ofinterest. This distiction is irrelevant when one focuses on average treatment effectssince identifying assumptions typically do not affect the parameter’s interpretation.Conversely, finding the appropriate answer to a research question on the effects overthe distribution requires particular attention in the choice of the identification strat-egy. Indeed, quantiles of the conditional and unconditional distribution of a randomvariable carry a different meaning even if identification of both these set of param-eters may require conditioning on observed covariates.

Key words: impact heterogeneity, quantile treatment effects, rank invariance

1 Introduction

In the recent years there has been a growing interest in the evaluation literature formodels that allow essential heterogeneity in the treatment parameters and more gen-erally for models that are informative on the impact distribution. The recent increasein the attention devoted to the identification and estimation of quantile treatment ef-fects (QTEs) is due to their intrinsic ability to characterize the heterogenous impactof the treatment on various points of the outcome distribution. QTEs are informativeabout the impact distribution when the potential outcomes observed under variouslevels of the treatment are comonotonic random variables. The variable describingthe relative position of an individual in the outcome distribution thus plays a specialrole in this setting, representing at the same time the main dimension along whichtreatment effects are allowed to vary as well as a key ingredient to relate poten-tial outcomes. Several identification approaches currently used in the literature for

Department of Economics, University of Bologna, IZA and CHILD; Piazza Scaravilli 2 Bologna,e-mail: [email protected]

1

2 Margherita Fort

the assessment of mean effects have thus been extended to quantiles. Most of thesestrategies require to condition on a set of variables to achieve identification. Whileconditioning on a set of observed regressors does not affect the interpretation of theparameters in a mean regression, this is not the case for quantiles. The law of iter-ated expectations guarantees that the parameters of a mean regression have both aconditional and an unconditional mean interpretation. This does not carry over toquantiles, where conditioning on covariates affects the interpretation of the residualdisturbance term. Indeed, since quantile regression allows one to characterize theheterogeneity of the treatment response only along this latter dimension, condition-ing on covariates in quantile regression generally affects the interpretation of theresults.This paper reviews strategies aimed at identifying quantile treatment effects, cover-ing strategies that deal with the identification of conditional and unconditional quan-tile treatment effects with particular attetion to cross-sectional data applications inwhich the treatment is endogenous without conditioning on additional covariates.The aim of the paper is to provide useful guidance for users of quantile regressionmethods in choosing the most appropriate approach while addressing a specific re-search question.The remainder of the paper is organized as follows. After introducing the basic no-tation and the key parameters of interest in Section 2, Section 3 reviews solutionsto the identification of quantile treatment effects. The review covers strategies thatare appropriate only when the outcome of interest is a continuous variable, i.e. incases where the quantiles of the outcome distribution are unambiguosly defined. Itconcludes illustrating some of the methods through two illustrative examples aimedat assessing the distributional impacts of training on earnings and of education onwages. Section 4 concludes.

2 What Are We After: Notation and Parameters of Interest

In this section I first introduce the notation used throughout the paper and then definethe objects whose identification is sought.

Y denotes the observed outcome, D the intensity of the treatment received and Wa set of observable individual characteristics. W may include exogenous variablesX and instruments Z.1 Y is restricted to be continuous while D,W can be eithercontinuous or discrete random variables. Both Y and D can be decomposed in twocomponents: one of which is deterministic and one of which is stochastic. These twocomponents need not be additively separable. The stochastic components accountfor differences in the distribution of D and Y across otherwise identical individuals.The econometric models reviewed in Section 3 place restriction on : i) the scale ofD; ii) the number of independent sources of stochastic variation in the model; iii)the distribution (joint, marginal, conditional) of these stochastic components and Dor W ≡ (X ,Z); iv) the scale of Z. Y d

i denotes the potential outcome for individual iif the value of the treatment is d: it represents the outcome that would be observed

1 Capital letters denote random variables and lower case letters denote realizations.

Unconditional and Conditional QTEs 3

had the individual i been exposed to level d of the treatment. FY d (·), fyd (·) andF−1

Y d (·) = q(d, ·) denote the corresponding cumulative distribution and density func-tion and the quantile function. The conditional distribution and conditional quantileare denoted by FY d (·|x) and F−1

Y d (·|x) = q(d,x, ·).We are interested in characterizing the dependence structure between Y and D

eventually conditioning on a set of covariates W in the presence of essential hetero-geneity and in the absence of general equilibrium effects. Knowledge of the jointdistribution (Y d)d∈D or the conditional joint distribution (Y d|x)d∈D would allow tocharacterize a distribution for the outcome for any possible level of the treatment.When potential outcomes are comonotonic, they can be described as different func-tions of the same (single) random variable and quantile treatment effects (QTEs)are informative on the impact distribution. The potential outcome could be writtenas yd = q(d,u),u˜U (0,1), q(d,u) is increasing in u as is refereed in the litera-ture as the structural quantile function. If the potential outcomes are not comono-tonic, QTEs are informative on the distance between potential outcomes distribu-tions, which may be interesting per se, but not on the impact distribution. We thusconcentrate on strategies that focus on QTEs.2 In the binary case, QTEs (see equa-tion (2)) are defined as the horizontal distance between the distribution function inthe presence and in the absence of the treatment ([9]; [15]) .3

δ (τ) = F−1Y 1 (τ)−F−1

Y 0 (τ) 0 < τ < 1 (2)

We can distinguish conditional and unconditional quantile treatment effects by char-acterizing the uniformly distributed random variable that describes the quantile ofthe outcome variable. This distinction becomes clearer if we think about a specificempirical example.Motivating Example: Returns to Education or TrainingThere is a large literature that studies the returns to education. Key questions inthis literature (e.g. does additional education cause wage increase? does additionalschooling increase wages more for the more able than for the less able? does ad-ditional schooling increase or decrease wage inequality?) can be addressed usingquantile regression methods. In this applications, the treatment is likely endogenousin the outcome equation without conditioning on additional covariates: typically re-searchers seek instruments that allow to isolate exogenous variation in education inthe wage equation. Suppose we could measure the individual ability ai that drivesthe endogeneity of education in the wage equation. Now, consider the alternativespecifications for the wage model presented in equation (3), (4) where D denotesschooling (the treatment).

2 The review will not cover strategies that focus on other objects and may deliver QTEs as byprod-uct such as [8], for instance.3 In the continuous case δ (τ) represents the change in Y induced by a change in D from d to d+ε

when ε is small.

δ (τ) =∂QY (τ|d)

∂d0 < τ < 1 (1)

4 Margherita Fort

Yi = α0( f (εi,ai))+α1( f (εi,ai))D (3)

Yi = β0(εi)ai +β1(εi)D (4)

These specifications differ because they impose different structures of the variablesgoverning the heterogeneity in the returns to education. In equation (3) the relativeposition of an individual in the wage distribution is determined by (εi,ai), i.e. byboth an unobserved uniformly distributed error component εi and by the observedindividual ability level while in equation (4) the relative position of the individualis only determined by εi. In both cases, we can think about the relative position ofan individual in the wage distribution as his/her proneness ([9]) to earn a high wagefor a given level of schooling D. However, in model (3) we would refer to the totalproneness/ability while in model (4) we would be speaking only about unobservedproneness/ability.4 Using model (3) we can explore whether the returns to educationvary depending on the individuals’ total ability levels while using model (4) we canstudy how the returns to education vary for given observed ability levels. Individ-uals who earn high wages conditional on some specific level of ability may not bethe same individuals who earn high wages in the sample. However conditioning onobserved ability maybe important to be able to isolate the causal effect of schoolingD on the distribution of wages Y . Equation (5) and Equation (6) represent the struc-tural quantile function corresponding to model (3) and (4) respectively5: equation(5) is an example of an unconditional quantile regression model while equation (6)is an example of a conditional quantile regression model. This distinction might beempirically relevant since, in general, for a given τ ∈ (0,1), α1(τ) 6= β1(τ).

f (ε,a)≡ ε∗,ε∗˜U (0,1) QY (τ|d) = α0(τ)+α1(τ)d (5)

ε˜U (0,1) QY (τ|d) = β0(τ)ai +β1(τ)d (6)

Table 1 Moment conditions under assumptions in [2] and [11]

Quantile conditional unconditionalY1 E[1(Y < q(1,x))− τ ·wy,d,x ·D|X] = 0 E[1(Y < q(1))− τ ·wy,d ·D] = 0Y0 E[1(Y < q(0,x))− τ ·wy,d,x · (1−D)|X] = 0 E[1(Y < q(0))− τ ·wy,d · (1−D)] = 0weight 1− D[1−P(Z=1|Y,D,X)]

1−P(Z=1|X) − (1−D)P(Z=1|Y,D,X)]P(Z=1|X) E[ Z−P(Z=1|X)

P(Z=1|X)[1−P(Z=1|X)] |Y,D](2D−1)

Note: Positive weights are reported. See [2] and [11] for other definitions of weights.

3 Identification Strategies and Estimation

In cross-sectional applications, two main identification approaches have been ex-tended to QTEs: strategies based on the unconfoundedness assumption and strate-gies based on the availability of an instrumental variable. In the first case, the re-searcher must be willing to assume that the joint distribution of the potential out-

4 To the best of my knowledge, [17] is the first to distinguish between total and observed proneness.5 Under comonotonicity of potential outcomes, the structural quantile function describes the linkbetween potential outcomes.


comes is independent of the treatment conditional on a set of exogenous covariates.Under this assumptions, conditional QTEs can be estimated as originally proposedby [14] and unconditional QTEs can be estimated as proposed by [10]. [2] and [6],[7] propose identifying assumptions for conditional quantiles when an instrumentalvariable is available. The assumptions of [2] guarantee identification of conditionaland unconditional QTEs when the treatment is binary and endogenous and a binaryinstrument is available. They lead to the moment conditions described in Table 1: inboth cases, identification relies on previous results ([1], [13]) that guarantee that inthe subpopulation of compliers comparisons by treatment D, conditional on X , havea causal interpretation. Recall that compliers are individuals whose treatment sta-tus is affected by the instrument Z but that this sub-population cannot be identifieddirectly from the data, because it is defined by means of potential outcomes. Themoment conditions highlight that is possible to construct weights that ’find compli-ers in the population in an average sense’ ([1]). The weights will differ when one isinterested in the conditional or in the unconditional quantiles. Only the weights con-sidered in the second case ’simultaneously balance the distribution of the covariatesbetween treated and non-treated compliers’ ([12]). In both cases weights are func-tions of P(Z = 1|X) and observed variables. Estimation thus proceeds in two steps:1) weights are estimated; 2) weighted quantile regressions are run.6. Estimation re-quires two steps also under the identification strategy proposed by [6], [7] and [17],[18] but does not involve re-weighting. The crucial assumption for identification inthe approach by [6] is rank invariance or rank similarity, i.e. we require that theindividual’s rank in the potential outcome distribution, conditional on exogenouscovariates, is not systematically affected by the treatment. The assumptions by [6]lead to the moment condition in equation (7). Equation (7) suggests an estimationprocedure that first requires to compute the conditional quantiles of the random vari-able Y −q(d,x,τ) given X and Z; then, choose as estimate of q(d,x,τ) the one thatminimizes the absolute value of the coefficient associated with Z in the first step.7

Pr[Y −q(d,x,τ)≤ 0|X ,Z] = τ. (7)

The instrumental variable approach for the identification of unconditional QTEsproposed by [17] delivers the moment condtions in equation (8)

E[Z1(Y≤ q(d,τ))− τX] = 0,τX ≡ P[Y≤ q(d,τ)|X]. E[1(Y≤ q(d,τ))− τ] = 0.(8)

These moment conditions reflect the idea that, first, the instrument Z does not affectthe distribution of the disturbance once X is controlled for and, second, the jointdistribution of X and the disturbance is unrestricted. Estimation involves first anestimation of the quantiles of Y −q(d,τ) given X and Z and τX ; then, a second step

6 When identification is achieved relying on uncounfoundedness, the moment conditions are simi-lar but the weights are identically 1 for conditional quantiles ([14]) and are D

P(D=1|X) +1−D

1−P(D=1|X)

for unconditional quantiles ([10]).7 This approach can be used when the treatment and instrument are binary, discrete as well ascontinuous.

6 Margherita Fort

choose as estimate of q(d,τ) the value that minimizes the coefficient of Z averagingover all possible values of X .

We now apply these strategies to two illustrative examples taken from the lit-erature. Table 2 reports estimates of the effect of training (or education) on theconditional and unconditional distribution of earnins (or log wages) using data ofmales from [2] and data from [5], respectively.8 Column (1) and (2) reports resultsdelivered when training or education are treated as exogenous in the estimation ofconditional and unconditional quantiles respectively. Column (3) and (4) report esti-mates that address the endogeneity of training or education in the outcome equationrelying on [2]. These estimates apply to the sub-population of compliers. Column(5)-(8) report estimates based on [6] or [17]. These approaches guarantee globalidentification of conditional and unconditional QTEs. We discuss the top-panel es-timates first: in the example from [2] the treatment assignment is randomized thuscovariates are not needed for identification. Indeed, under both the identification ap-proaches considered, training effects on the conditional and unconditional quantilesdo not exhibit substantial differences in magnitude and all suggest that the effectof training is larger at the top of the earnings distribution.9 In addition, both theidentification strategies deliver similar results, suggesting that key assumptions areunlikely to be violated in both cases. Let’s now turn to the estimates in the bottompart of the table. In this example, covariates are needed for identification: we needto control for country specific secular trends in education and differences acrosscountries in the levels of education and wages to be able to isolate the exogenousvariation in education induced by school reforms. In this example, addressing en-dogeneity seems to have relevant consequences: the estimates in column (1) and(2) suggest that returns are increasing over the wage distribution, while estimatesin columns (3) suggest the opposite -although precision of these estimates is low-and in column (4) we find no evidence of heterogeneity.10 Estimates of conditionalQTEs under rank invariance are reported in column (5); estimates of unconditionalQTEs in column (6) assume rank invariance and do not use covariates for identifica-tion. The estimates in column (5) are unrealistic and suggest that rank invariance is

8 In the second example, only reforms that increased compulsory schooling for 3 years are con-sidered (i.e. only Greece, Italy and Finland) and the original treatment (years of education) andinstrument (years of compulsory schooling) were recoded to binary. Estimates of column (1), (2),(3), (4) have been computed by the author using the STATA package ivqte by [12], except col-umn (3) for the first example (taken from the article). Estimates in column (1) replicate originalresults in the papers except that standard errors are now robust to heteroskedasticity; estimates ofcolumns (5)-(8) are taken from [18] for the AAI02 example and obtained using the STATA packageivqreg by Do Wan Kwack available from Christian Hansen’s research page.9 When endogeneity of training is addressed, point estimates of the returns to training are gener-ally lower in the unconditional distribution with respect to the returns observed holding race, age,education and marital status fixed.10 In this example, we look at the effect of three more additional years of schooling on wages.Assuming linearity and dividing point estimates reported by three, the results in columns (1)-(3)are fairly consistent with the literature: association is lower that causal effects; causal estimatesuggest a return between 10% and 4% for each additional year of education.


unlikely to hold. Estimates in column (6) are negative and confirm that controllingfor covariates is necessary for identification.

Table 2 Effect of Training on the Conditional and Unconditional Distribution of Earnings ([2],males only) and Effect of Education on the Conditional and Unconditional Distribution of LogWages ([5], males, Italy Greece and Finland,treatment and instrument recoded to binary)

Exogenous Training Endogenous TrainingStrategy Monotonicity Rank Invariance

Conditional Unconditional Conditional Unconditional Conditional Unconditional(1) (2) (3) (4) (5) (6) (7) (8)

q KB78 F07 AAI02 FM10 CH08 CH08 w/o controls P11 logit P11 probitEffect of Training on Earnings, Abadie et al. (2002) Obs. 5102

0.25 2510 3058 702 414 530 200 100 100417)∗∗∗ (377)∗∗∗ (670) (754) (629) (746) (753) (750)

0.50 4420 4678 1544 1291 310 1320 790 790(613)∗∗∗ (771)∗∗∗ (1074) (1239) (1101) (1234) (1151) (1161)

0.75 4678 4626 3131 2457 2660 1710 1490 1490(901)∗∗∗ (1056)∗∗∗ (1376)∗∗ (1650) (1845) (1712) (1542) (1530)

0.85 4806 5532 3378 3971 3190 3580 3410 3410(1045)∗∗∗ (1241)∗∗∗ (1811)∗ (1886)∗∗ (1185)∗∗ (1427)∗∗ (1542)∗ (1550)∗

Effect of Education on Log Wages, Brunello et al. (2009) Obs. 22920.30 0.168 0.223 0.303 0.514 0.836 -0.198

(0.024)∗∗∗ (0.064)∗∗∗ (0.142)∗∗ (16.48) (0.063)∗∗∗ (0.033)∗∗∗ - -0.50 0.177 0.208 0.328 0.521 0.985 -5.119

(0.024)∗∗∗ (0.062)∗∗∗ (0.126)∗∗∗ (5.95) (0.063)∗∗∗ (0.124)∗∗∗ - -0.75 0.213 0.297 0.154 0.599 1.868 0.996

(0.026)∗∗∗ (0.072)∗∗∗ (0.168)∗∗∗ (10.13) (0.998)∗∗ (0.037)∗∗∗ - -

Legend: Column labels refer to the estimation method. KB78: as in [14]; F07: as in [10]; AAI02: as in [2]; FM10: as in[11], [12]; CH08: as in [7]; P11: as in [18].

4 Conclusions

In this paper, I reviewed approaches that guarantee the identification of quantiletreatment effects (QTEs). In many cases, these approaches correspond to extensionsof strategies conventionally used in linear regression models (selection on observ-ables, instrumental variables, fixed effects) to quantile regressions. An importantconsequence of the difference between the statistical tools applied in these two set-tings is that the interpretation of treatment parameters differs between conditionaland unconditional quantile regressions, while, conversely, the law of iterated ex-pectations guarantees that the treatment parameter in a linear regression as both aconditional and an unconditional mean interpretation. It is crucial to bear this inmind while using QTEs to answer a specific research question. Consider the recentproposal of [3] to link educator compensation to the ranks of their students withinwhat the author call an appropriately defined comparison sets. The authors’ suggestto employ methods in [4] to contrast actual ranks of students of a given teacher withsome predicted counterfactual rank. Betebenner ([4]) however employs conditionalquantile regression methods aimed specifically at answering questions like Are therestudents with unusually low growth who need special attention?, i.e. a value-addedspecification of the achievement. Barlevy and Neal ([3]) instead look for a methodthat allows to isolate the teachers contribution to a student rank in the achievementdistribution in a given period, eventually conditioning on covariates for identifica-

8 Margherita Fort

tion. In other words Barlevy and Neal would like to avoid attributing to a teacher thechanges in perfomance of a student that are only due to his initial proficiency level.Standard value added specifications for students’ achievement in quantile regres-sion context are not the appropriate instrument to address questions about the het-erogeneity in students’ achievement depending on their initial ability level. Thosequantile regression describe instead how students experiecing the largest gains inperformance over a given time period perform relative to students experiencing thelowest gains in the same period. Cross-sectionally, some of the high-gain studentsmay be in the lower part of the test score distribution.11

Acknowledgements This paper benefited from comments by E. Rettore, B. Pacini and F. Mealli.Financial support of MIUR- FIRB 2008 project RBFR089QQC-003-J31J10000060001 grant isgratefully acknowledged.

References

1. Abadie, A. : Semiparametric instrumental variable estimation of treatment response models.Journal of Econometrics 113, 231-263 (2003)

2. Abadie, A. Angrist, J. and Imbens, G.: Instrumental variables estimates of the effect of subsi-dized training on the quantiles of trainee earnings. Econometrica 70, 91-117 (2002)

3. Barlevy, G. and Neal, D. : Pay for Percentile,NBER Working Paper 17194 (2010)4. Betebenner, D. W.: Norm and Criterion-Referenced Student Growth. Educational Measure-

ment: Issues and Practice, 28(4) 42-51 (2009)5. Brunello, G. and Fort, M. and Weber, G.: Changes in Compulsory Schooling, Education and

the Distribution of Wages in Europe. Economic Journal vol. 119(536), 516-539 (2009)6. Chernozhukov, V. and Hansen, C. : An IV model of quantile treatment effects. Econometrica

73, 245-261 (2005)7. Chernozhukov, V. and Hansen, C. : Instrumental variable quantile regression: A Robust Infer-

ence Approach.Journal of Econometrics 142(1), 379-398 (2008)8. Chesher, A. : Identication in nonseparable models. Econometrica 71, 1405-1441 (2003)9. Doksum, K.: Empirical probability plots and statistical inference for nonlinear models in the

two-sample case. The Annals of Statistics 2, 267-277 (1974)10. Firpo, S.: Efficient semiparametric estimation of quantile treatment effect. Econometrica 75,

259-276 (2007)11. Froelich, M. and Melly, B. (2010a) Unconditional Quantile Treatment Effects Under Endo-

geneity. IZA DP 3288 (2010)12. Froelich, M. and Melly, B. : Estimation of Quantile Treatment Effects with STATA. The Stata

Journal 10(3), 423-457 (2010b)13. Imbens,G. and Rubin,D.: Estimating the Outcome Distribution for Compliers in Instrumental

Variables Models. Review of Economic Studies 64, 555-574 (1997)14. Koenker, R. and Bassett, G.S.: Regression Quantiles. Econometrica 46, 33-50 (1978)15. Lehmann, E.H. : Nonparametrics: Statistical Models Based on Ranks. San Francisco,CA

(1974)16. Powell, D.: Unconditional Quantile Regression for Panel Data with Exogenous or Endogenous

Treatment Variables. RAND Working Paper No. WR-710 (2010a)17. Powell, D.: Unconditional Quantile Treatment Effects in the Presence of Covariates. RAND

Working Paper No. WR-816 (2010b)18. Powell, D.: Unconditional Quantile Regression for Exogenous or Endogenous Treatment Vari-

ables. RAND Working Paper No. WR-824 (2011)

11 A similar point was made by [16] in his discussion of the analysis of the effect of vouchers onstudent achievements.

Dealing with complex problems of confoundingin mediation analysis

Stijn Vansteelandt

Abstract Mediation analysis is frequently utilized in diverse scientific fields such aspsychology, sociology and epidemiology, to develop insight into the causal mech-anism whereby an exposure affects an outcome. It concerns the study of indirecteffects of that exposure that are mediated through a given intermediate variable ormediator, and/or the study of the remaining direct effect. Despite its popularity, thetraditional approach to mediation analysis proceeds in a predominantly heuristicfashion, which can largely be ascribed to the lack of precise definitions of direct andindirect effect in the traditional mediation analysis literatures. Moreover, problemsof confounding bias have been largely ignored.James Robins, Sander Greenland and Judea Pearl laid the foundations for a rigor-ous approach towards mediation analysis, which is based on counterfactuals. Theygave precise definitions of direct and indirect effect and elucidated the kind of datathat must be collected in order to control for confounding bias. In addition, theyprovided generic ways to decompose a total effect into a direct and indirect effectthat is not tied to a specific statistical model. In this presentation, after a brief re-view of some of these developments, I will concentrate on the - partly unsolved- methodological challenges that arise when confounders of the mediator-outcomeassociation are affected by the exposure. In particular, I will present results on thethe identification of (natural) direct and indirect effects in such settings, and on theestimation of (controlled) direct effects, thereby focussing on matched case-controlstudies and/or survival analysis.

Key words: causal inference, direct effect, G-estimation, indirect effect, interme-diate confounding, mediation, time-varying confounding

Stijn VansteelandtGhent University, Department of Applied Mathematics and Computer Science, Krijgslaan 281, S9,9000 Gent, Belgium, e-mail: [email protected]

1

2 Stijn Vansteelandt

1 Introduction

For many decades, scientists from diverse scientific fields - most notably, psychol-ogy, sociology and epidemiology - have been occupied with questions as to whetheran exposure affects an outcome through pathways other than those involving a givenmediator or intermediate variable. The answer to such questions is of interest be-cause it brings insight into the mechanisms that explain the effect of exposure onoutcome [12]. Mediation analyses are used for this purpose. They attempt to sep-arate so-called ‘indirect effects’ from ‘direct effects’. The former term is typicallyused in a loose sense to designate that part of an exposure effect which arises indi-rectly by affecting a (given) set of intermediate variables; the latter then refers to theremaining exposure effect.

In traditional mediation analysis, the direct effect is commonly connected withthe residual association between outcome and exposure after adjusting for the medi-ator(s); the indirect effect is then obtained through a combination of the exposure’seffect on the mediator and the mediator’s effect on the outcome [1, 5]. For instance,when the associations between exposure A and mediator M and outcome Y can bemodeled through linear regressions as

E(Y |A,M) = β0 +βaA+βmM

E(M|A) = α0 +αaA,

then βa is commonly interpreted as a direct effect and βmαa as an indirect effect [1].It is well known from the causal inference literature that these interpretations areoften not justified as a result of confounding of the mediator-outcome association[9, 3]. Even when confounders L of this association have been measured, standardregression adjustment is not applicable when - as often - some of these confoundersare themselves affected by the exposure, in which case we say that there is interme-diate or time-varying confounding [4, 10, 13]. Furthermore, decomposition of a totaleffect into a direct and indirect effect becomes subtle when certain nonlinear asso-ciations exist between mediator and outcome [9, 7], e.g. when a logistic regressionmodel for a dichotomous outcome is adopted [11].

Robins and Greenland [9] and Pearl [7] introduced model-free definitions of di-rect and indirect effect. Unlike the foregoing development due to Baron and Kenny[1], their formalism of so-called natural direct and indirect effects can therefore ac-commodate nonlinear associations between mediator and outcome. Natural directand indirect effects are defined in terms of so-called composite or nested counter-factuals such as Y (a,M(0)), which denotes the counterfactual outcome that wouldhave been observed if the exposure A were set to a and the mediator M to the valueM(0) that it would have taken at some reference exposure level 0. Because suchcomposite counterfactuals are unobservable when a 6= 0, strong assumptions mustbe imposed for identification. The development of Robins and Greenland [9] pre-cludes the existence of moderation, i.e. exposure effect modification by the mediatoron the additive scale; it precludes such moderation even at the unit level. The de-velopment of Pearl [7] precludes the possibility of intermediate confounding of the

Dealing with complex problems of confounding in mediation analysis 3

mediator-outcome association. This places severe restrictions on the range of realis-tic applications that can be addressed. In fact, the prior absence of methodology todeal with intermediate confounding has been one of the difficulties with the causalinference literature on mediation.

This presentation will primarily focus on this problem of intermediate confound-ing in mediation analysis. First, I will consider the problem of estimating so-calledcontrolled direct effects in the presence of exposure-induced confounding of the as-sociation between mediator and outcome. I will thereby focus on diverse settingslike survival analysis and the analysis of matched case-control studies. Next, I willpropose novel results on the identification of natural direct and indirect effect in thepresence of intermediate confounding.

Fig. 1 Causal diagram with exposure A, mediator M, outcome Y , intermediate confounder L, andwith U an unmeasured confounder of the L-Y relationship.

2 The problem of intermediate confounding in mediationanalysis

The causal diagram of Figure 1 displays a setting with intermediate confounding. Itvisualizes prognostic factors L of the mediator (other than the exposure) that mayalso be associated with the outcome, and which thereby confound the associationbetween mediator and outcome. This situation is representative of most empiricalstudies, including randomized experiments, because the fact that the exposure israndomly assigned does not prevent confounding of the mediator-outcome associ-ation. In the presence of such confounding, the residual association between out-come and exposure after adjusting for the mediator(s) (cfr. βa in the above model)does not encode a direct exposure effect. This is technically seen because adjust-ment for a collider M (i.e. a node in which two edges converge) along the pathA→M← L←U →Y may render exposure A and outcome Y dependent along thatpath, and may thus induce a non-causal association [8, 3]. One of the major contri-butions of the causal inference literature has been to point this out and to make clear


that specialized estimation techniques are often needed to be able to adjust for suchconfounders, as these may themselves be affected by the exposure (as illustrated inFigure 1). Indeed, additional regression adjustment for the confounder L once againamounts to adjustment for a collider L along the path A→ L←U → Y . It therebyrenders A and Y dependent along that path, even in the absence of a direct effect.

3 Estimation of controlled direct effects in the presence ofintermediate confounding

Let Y (a,m) denoting the counterfactual outcome that would have been observed forgiven subject if the exposure were set to a and the mediator to m. Then a controlleddirect effect [9, 7] refers to a contrast between two counterfactual outcomes forthe same subject, corresponding to different exposure levels, but the same fixedmediator level. For instance, the controlled direct effect of exposure level a versusreference exposure level 0, controlling for M, can then be defined as the expectedcontrast

EY (a,m)−Y (0,m).

Likewise, the conditional controlled direct effect, given covariates C, of exposurelevel a versus reference exposure level 0, controlling for M, can then be defined asthe expected contrast

EY (a,m)−Y (0,m)|C.

Robins [8] showed that, under specific identification assumptions that we shalldescribe next, controlled direct effects can be identified in the presence of inter-mediate confounding. Specifically, provided that data have been recorded on allconfounders of the exposure-outcome relationship, as well as all confounders ofthe mediator-outcome relationship, the conditional controlled direct effect can beidentified using the so-called G-formula:

EY (a,m)−Y (0,m)|C =∫

E(Y |A = a,M = m,L) f (L|A = a,C)dL

−∫

E(Y |A = 0,M = m,L) f (L|A = 0,C)dL.

It thus follows that parametric models for the outcome and intermediate confounderscan be combined to result in an expression for the controlled direct effect. However,the G-formula does not admit a practical approach. It requires parametric modelsfor the intermediate confounders, which can be problematic when the confounderis high-dimensional. Moreover, it can be computationally cumbersome as a resultof the possibly high-dimensional integration which it involves. Finally, even simplemodels for the outcome and intermediate confounder may combine into intractableexpressions for the controlled direct effect, which depend on the exposure level aand covariate C in a highly contrived way. This not only makes results impracticalfor reporting, but also makes interesting hypotheses difficult to test [8].


Various approaches have been developed to accommodate this, some of whichwe will review in this presentation.

One class of approaches involves weighting each subject’s data by the reciprocalof the likelihood of the observed mediator, given exposure and confounders, andthen regressing the outcome on exposure and mediator [8, 10]. Since the weight-ing corrects for confounding bias, the weighted regression analysis of the outcomecan ignore confounders and therefore does not suffer the aforementioned problemof collider-stratification that was observed in Figure 1. However, a limitation of in-verse probability weighting approaches is that they can perform poorly when someindividuals get assigned large weights.

An alternative class of approaches avoids inverse probability weighting by usingG-estimation strategies instead. These involve transforming the outcome in a waythat removes the mediator’s effect on the outcome and thereby the indirect effect.Next, the resulting transformed outcome is regressed on the exposure to obtain ameasure of direct effect. This idea has been considered for additive and multiplica-tive models [8, 4, 13], for logistic regression models [14], for survival models [6],and for unmatched [13, 14] and matched [2] retrospective studies; see Vansteelandt[15] for a detailed review.

4 Identification results for natural direct and indirect effects inthe presence of intermediate confounding

These developments on controlled direct effect have a number of limitations. First,the concept of controlling the mediator at level m uniformly in the population isoften rather restrictive as it is often difficult to conceptualize a single level of themediator that is realistic for all units in the population. Second, the difference be-tween the total effect and a controlled direct effect cannot generally be interpretedas an indirect effect [9]. To overcome these limitations, alternative definitions havebeen proposed of so-called natural direct and indirect effect [9, 7]. These are morenatural by allowing for variation between subjects in the level at which the mediatoris controlled and, moreover, combine to the total effect regardless of the underlyingdata distribution. However, natural direct effects require stronger identification con-ditions than controlled direct effects. In particular, it remains unclear to date hownatural direct and indirect effects can be identified in the presence of intermediateconfounding, unless in the unrealistic case where the exposure and mediator do notinteract (at the unit level) in the effect that they produce on the outcome.

Vansteelandt and VanderWeele [16] overcome this limitation by basing their de-velopment on the following definitions of natural direct and indirect effects in theexposed:

E Y −Y (0,M)|AE Y (0,M)−Y (0,M(0))|A ,


respectively. The first expresses, within each exposure stratum, how much the out-come would change on average if the exposure were set to the reference level 0, butthe mediator were held fixed at its observed level. The second evaluates how muchthe outcome would change on average if the exposure’s effect acted only throughmodifying the mediator. These definitions enable decomposition of the total effect(in the exposed) into a direct and indirect effect (in the exposed), as follows

E Y −Y (0)|A = E Y −Y (0,M(0))|A= E Y −Y (0,M)|A+E Y (0,M)−Y (0,M(0))|A .

Vansteelandt and VanderWeele [16] show that natural direct and indirect effectson the exposed allow for effect decomposition under weaker identification condi-tions than population natural direct and indirect effects. When no confounders of themediator-outcome association are affected by the exposure, identification is possi-ble under essentially the same conditions as for controlled direct effects. Other-wise, identification is still possible with additional knowledge on a non-identifiableselection-bias function which measures the dependence of the mediator effect onthe observed exposure within confounder levels, and which evaluates to zero in alarge class of realistic data-generating mechanisms.

Vansteelandt and VanderWeele [16] furthermore argue that natural direct and in-direct effects on the exposed are of intrinsic interest in various applications. Theymoreover show that these natural direct and indirect effects on the exposed coin-cide with the corresponding population natural direct and indirect effects when theexposure is randomly assigned. In such settings, their results are thus also of rele-vance for assessing population natural direct and indirect effects in the presence ofexposure-induced mediator-outcome confounding, which existing methodology hasnot been able to address.

Acknowledgements The author acknowledges support from IAP research network grant nr.P06/03 from the Belgian government (Belgian Science Policy) and is grateful to Carlo Berzuini(University of Cambridge), Torben Martinussen (University of Copenhagen) and Tyler Vander-Weele (Harvard University), with whom parts of this work have been developed.

References

[1] R.M. Baron and D.A. Kenny. The moderator-mediator variable distinction insocial psychological research: conceptual, strategic, and statistical considera-tions. J. Pers. Soc. Psychol., 51:1173–1182, 1986.

[2] C. Berzuini, S. Vansteelandt, L. Foco, R. Pastorino, and L. Bernardinelli. Di-rect genetic effects and their estimation from matched case-control data. Tech-nical report, University of Cambridge, 2011.

[3] S.R. Cole and M.A. Hernan. Fallibility in estimating direct effects. Interna-tional Journal of Epidemiology, 31:163–165, 2002.


[4] S. Goetgeluk, S. Vansteelandt, and E. Goetghebeur. Estimation of controlleddirect effects. Journal of the Royal Statistical Society, Series B, 70:1049–1066,2008.

[5] D.P. MacKinnon. An Introduction to Statistical Mediation Analysis. NewYork: Lawrence Erlbaum Associates, 2008.

[6] T. Martinussen, S. Vansteelandt, M. Gerster, and J.v.B. Hjelmborg. Estima-tion of direct effects for survival data using the aalen additive hazards model.Journal of the Royal Statistical Society, Series B, 73(5):773–788, 2011.

[7] J. Pearl. Direct and indirect effects. In Proceedings of the Seventeenth Confer-ence on Uncertainty and Artificial Intelligence, pages 411–420, San Francisco,2001. Morgan Kaufmann.

[8] J.M. Robins. Testing and estimation of direct effects by reparameterizing di-rected acyclic graphs with structural nested models. In Computation, causa-tion, and discovery, pages 349–405. AAAI Press, Menlo Park, CA, 1999.

[9] J.M. Robins and S. Greenland. Identifiability and exchangeability for directand indirect effects. Epidemiology, 3:143–155, 1992.

[10] T. J. VanderWeele. Marginal structural models for the estimation of direct andindirect effects. Epidemiology, 20:18–26, 2009.

[11] T. J. VanderWeele and S. Vansteelandt. Conceptual issues concerning medi-ation, interventions and composition. Statistics and its Interface, 2:457–468,2009.

[12] T.J. VanderWeele. Mediation and mechanism. European Journal of Epidemi-ology, 24:217–224, 2009.

[13] S. Vansteelandt. Estimating direct effects in cohort and case-control studies.Epidemiology, 20:851–860, 2009.

[14] S. Vansteelandt. Estimation of controlled direct effects on a dichotomous out-come using logistic structural direct effect models. Biometrika, 97:921–934,2010.

[15] S Vansteelandt. Estimation of direct and indirect effects. In C. Berzuini,P. Dawid, and L. Bernardinelli, editors, Causal Inference: Statistical Perspec-tives and Applications. Wiley and Sons, 2012.

[16] S. Vansteelandt and T.J. VanderWeele. Natural direct and indirect effects onthe exposed: effect decomposition under weaker assumptions. Biometrics, inpress, 2012.

Which family model makes couples more happy - dual earner or male breadwinner ?

Anna Baranowska-Rataj and Anna Matysiak

Abstract We investigate the effects of men’s and women’s employment on their spouses’ subjective well-being in Poland. We use panel data techniques that allow us to account for selection of intrinsically happy individuals into male-breadwinner or dual earner models. We find that women’s employment has positive impact on women’s well-being, but reduces the happiness of their husbands. Our findings suggest that the sex-role specialisation model is rooted in the perception of Polish men.

1 Introduction

The research discussion on the effects of partners’ involvement in the labour market on marital stability has lasted already for several decades (e.g. Ross and Sawhill, 1975, Simpson and England, 1981, Oppenheimer, 1997, Cherlin, 2000, Jalovaara, 2003, Raz-Yurovich, 2012). What is at the core of this debate and still has been hardly investigated in detail are the effects of spouses’ labour force participation on both partners’ personal satisfaction with life. This issue is the main point of interest in our study.

Most empirical studies show that husbands work reduces anxiety and psychological distress among spouses (see Ross et al., 1990 for a review, Stolzenberg, 2001). There is more controversy when it comes to the effects of wives’ employment on the subjective well-being of their partners. The role specialisation model suggests that women’s labour market participation lowers the gains from marriage (Becker et al., 1977). Moreover, Stolzenberg (2001) argued that women are socialised to promote household members’ well-being whereas men are socialised to earn income and simultaneously to ignore their own physical and mental health, which implies that women’s involvement 1 Anna Baranowska-Rataj, Institute of Statistics and Demography, Warsaw School of Economics; email: [email protected] Anna Matysiak, Institute of Statistics and Demography, Warsaw School of Economics; email: [email protected]

2 Anna Baranowska-Rataj and Anna Matysiak

outside household might be detrimental to their spouses’ health. Furthermore, woman’s involvement in paid work might be indicative of a man’s failure to fulfil the breadwinner duties and consequently may lead to psychological distress among men (Macmillan and Gartner, 1999).

These arguments have been undermined recently because of the changes in the gender roles and a social shift from household production to household consumption (Cherlin, 2000, Raz-Yurovich, 2012). It has been emphasized that in modern societies similarity of economic activities and interests may improve understanding between spouses (Simpson and England, 1981) and hence improve their subjective well-being. Furthermore, it was presupposed that women’s earnings have the potential to increase living standards (Oppenheimer, 1997, Cherlin, 2000). The benefits of women’s employment on their partner’s subjective well-being may be particularly evident in countries with less traditional gender roles or lower living standards coupled with high or strongly increasing material aspirations (Rogers and DeBoer, 2001).

Empirical research on the effects of spouses’ employment on satisfaction with marriage, or more generally, the psychological well-being, is relatively scarce. The existing studies, carried out in US, suggest that while women’s employment is usually beneficial or neutral for their own well-being it seems to be detrimental for their husbands (Rosenfield, 1992, Stolzenberg, 2001, Rogers and DeBoer, 2001, Schoen et al., 2006). Such effects may not be present in all country contexts, however. For instance, Lee and Ono (2008) found an opposite effect for Japan and interpreted it with the strong prevalence of the role specialisation model in this country. Apart from the restricted geographical and cultural range, an important limitation of the empirical studies mentioned above is that they fail to control for a selection of intrinsically (un)happy individuals both into unions with (non)working partners and these selection effects may bias the results.

In this paper we contribute to the literature on the effects of spouses’ employment on their subjective well-being by extending the discussion to Poland. This country adopted a so called dual earner – female double burden model meaning that women are perceived as major care providers, but they are also expected to contribute to the household budget. Furthermore, this study takes a methodological step forward and uses panel data techniques to account for the time-invariant unobserved characteristics of individuals that jointly determine marriage behaviours and happiness levels. The applied statistical approach allows us thus to account for selection of intrinsically happy individuals into male-breadwinner or dual earner models.

2 Data and Method

In our study we used data from Social Diagnosis, which is a national representative panel survey. Its subsequent waves took place in 2003, 2005, 2007, 2009, and 2011. For our analysis, we selected women who entered the survey at ages 18-35. This gave us a sample of 27,251 female observations. Our dependent variable is self-rated happiness, derived from a question: “Taking all things together, would you say you are very happy, quite happy, somewhat happy or not at all happy?”; with responses coded on a four-point scale. In order to account for unobserved time-constant individual characteristics we estimated a fixed-effects ordered logit model. Fixed-effects approach removes the potential selection bias but accommodating it to models with dependent

Which family model makes couples more happy - dual earner or male breadwinner 3

variables measured on an ordinal scale is problematic. In order to solve this problem we employed the “Blow and Cluster Estimator” (BUC) recently developed by Baetschmann et al. (2011). This method dichotomizes the dependent variable at all possible cutpoints and then estimates the resulting fixed-effects logit models jointly by applying the conditional ML estimation. This method is relatively less likely to cause incidental parameter problems than the previously developed estimator proposed by the Ferrer-i-Carbonell and Frijters (2004) which performs the dichotomization at only one a priori specified cutpoint (i.e. the mean of the dependent variable).

Additionally, as a sensitivity analysis, we also used two other estimators: namely, the above mentioned Ferrer-i-Carbonell and Frijters (2004) ordered-logit estimator and the correlated random effects ordered probit model proposed by Mundlak (1978). The results from these two additional models were rather consistent with our main findings presented here, for the sake of brevity we do not include these results here, but they are available from authors on request.

3 Results and conclusions

Our findings show that all the three variables of our interest, namely marital status, employment status of the individual and the employment status of his/her partner, are important determinants of psychological well-being for both women and men. Consistently with the majority of the empirical studies conducted so far we find that employment improves subjective well-being of both women and men whereas unemployment has negative effect. Interestingly, we do not observe any significant difference between employment and inactivity.

Table 1. Results from the ordered logit model with BUC estimator Covariates

Model for women Model for men Coeff. S.E. Coeff. S.E.

Labour market status (ref. employed) unemployment -0.460*** (0.147) -0.932*** (0.162) inactivity -0.018 (0.130) -0.234 (0.164)

Partnership and partner’s employment (ref. non-working spouse) Single -0.884*** (0.286) -0.757** (0.326) Working spouse 0.121 (0.170) -0.347** (0.175) Divorced / Widowed -1.285*** (0.426) -1.167** (0.560) LL -1321.993 -1139.579 N 4395 3768 Source: authors’ calculations based on Social Diagnosis, * p<0.05; ** p<0.01, *** p<0.001. Control variables (not displayed) include: age, education, self-rated health and income, number of children and age of the youngest child.

We observe that the life satisfaction is the lowest among the divorced or widowed, followed by the single. Married persons are clearly most happy. Nevertheless, there are interesting gender differences among the married with respect to the effects of employment status of the partner. It turns out that husband’s involvement in the labour market does not affect the subjective well-being of wives. Among men, however, we

4 Anna Baranowska-Rataj and Anna Matysiak

observe a clearly detrimental effect of their wives employment on husbands’ psychological well-being. Thus men in Poland are satisfied with a male breadwinner family model – they are happier if they work and they prefer to have non-working wives.

References

1. Baetschmann, G., Staub, K. E. and Winkelmann, R. (2011) 'Consistent Estimation of the Fixed Effects Ordered Logit Model', IZA Discussion Paper 5443. Bonn, IZA.

2. Becker, G. S., Landes, E. M. and Michael, R. T. (1977) 'An economic analysis of marital instability', Journal of Political Economy, 85: 1141-1188.

3. Cherlin, A. (2000) 'Toward a new home socioeconomics of union formation', in Waite, L. J., Bachrach, C., Hindin, M., Thomson, E. and Thornton, A. (eds) The ties that bind - Perspectives on marriage and cohabitation, pp. 126-144. Hawthorne, New York, Aldine De Gruyter.

4. Ferrer-i-Carbonell, A. and Frijters, P. (2004) 'How Important Is Methodology For The Estimates Of The Determinants Of Happiness?', The Economic Journal, 114: 641–659.

5. Jalovaara, M. (2003) 'The joint effects of marriage partners’ socioeconomic positions on the risk of divorce', Demography, 40(1): 67–81.

6. Lee, K. S. and Ono, H. (2008) 'Specialization and happiness in marriage: A U.S.-Japan comparison', Social Science Research, 37(4): 1216-1234.

7. Macmillan, R. and Gartner, R. (1999) 'When She Brings Home the Bacon: Labor-Force Participation and the Risk of Spousal Violence against Women', Journal of Marriage and Family, 61(4): 947-958.

8. Mundlak, Y. (1978) 'On the pooling of time series and cross section data', Econometrica, 46: 69–85.

9. Oppenheimer, V. K. (1997) 'Women's Employment and the Gain to Marriage: The Specialization and Trading Model', Annual Review of Sociology, 23: 431-453.

10. Raz-Yurovich, L. (2012) 'Economic Determinants of Divorce Among Dual-Earner Couples: Jews in Israel', European Journal of Population/Revue européenne de Démographie: 1-27.

11. Rogers, S. J. and DeBoer, D. D. (2001) 'Changes in Wives' Income: Effects on Marital Happiness, Psychological Well-Being, and the Risk of Divorce', Journal of Marriage and Family, 63(2): 458-472.

12. Rosenfield, S. (1992) 'The Costs of Sharing: Wives' Employment and Husbands' Mental Health', Journal of Health and Social Behavior, 33(3): 213-225.

13. Ross, C. E., Mirowsky, J. and Goldsteen, K. (1990) 'The Impact of the Family on Health: The Decade in Review', Journal of Marriage and Family, 52(4): 1059-1078.

14. Ross, H. L. and Sawhill, I. V. (1975) Time of Transition. The Growth of Families Headed by Women, Washington, DC: The Urban Institute.

15. Schoen, R., Rogers, S. J. and Amato, P. R. (2006) 'Wives' Employment and Spouses' Marital Happiness', Journal of Family Issues, 27(4): 506-528.

16. Simpson, I. H. and England, P. (1981) 'Conjugal Work Roles and Marital Solidarity', Journal of Family Issues, 2(2): 180-204.

17. Stolzenberg, Ross M. (2001) 'It's about Time and Gender: Spousal Employment and Health', American Journal of Sociology, 107(1): 61-100.

Family structures and subjective wellbeing in

Italy

Silvia Montecolle, Francesca Rinesi and Alessandra Tinto

Abstract In the last decades an increasing attention has been paid to the issue of

subjective wellbeing. This paper aims to shed light on this topic by analysing which

characteristics are associated with high levels of subjective wellbeing in Italy. Special

emphasis will be given to selected socio-demographic and other domains

characteristics, such as the role of individuals within the family structure.

1 Introduction and aim of the work

In recent decades a renewed attention has been given to the concept of subjective

wellbeing. At the same time studies on the interrelation between socio-demographic

variables and subjective wellbeing in Italy are relatively scarce. This paper aims at

contributing to the existing literature by investigating the association between

subjective wellbeing and a set of domains of individual life. We will take into account

not only individual socio-demographic characteristics but also socio-economic aspects,

health conditions and interpersonal trust. Particular attention will be given to selected

socio-demographic characteristics such as marital status, family structure and the role

of individuals within the family.

Subjective wellbeing can be seen as a construct made up of two distinct, yet

interrelated, components: cognitive and affective [6,9,10]. The cognitive component of

subjective wellbeing, measured through life-satisfaction, grows out of the process of

comparison between individual life conditions and personal standards (expectations,

ideals, believes). Life satisfaction, as a consequence, can be seen as an individual and

retrospective evaluation of own life as a whole. The affective component (distinct in

1 Silvia Montecolle, Italian National Institute of Statistics (ISTAT); email:

[email protected]

Francesca Rinesi, Italian National Institute of Statistics (ISTAT); email: [email protected]

Alessandra Tinto, Italian National Institute of Statistics (ISTAT); email: [email protected]

2 Silvia Montecolle, Francesca Rinesi and Alessandra Tinto

positive and negative affects) is identified by the emotions or affects that people

experience during their daily life, and it is considered conceptually distinct from the

cognitive component because it is influenced by different variables [2,3,7]. While the

cognitive component implies a retrospective reflection on his own life at a given point

in time, the affective component has to do with the present situation.

The present study focuses on the cognitive component of subjective wellbeing, and

aims at highlighting which characteristics of individuals are associated with higher

levels of life satisfaction in Italy.

2 Data and Method

Data used come from the cross-sectional survey “Aspects of Daily Life” carried out

annually since 1993, by the Italian National Institute of Statistics (ISTAT). This survey

is based on a representative sample of about 20,000 households, consisting of over

50,000 individuals. The annual multipurpose survey “Aspects of Daily Life” collects a

set of data on individuals, households and events which enable a wide range of social

phenomena to be investigated [8]. The advantage of using this data source consists in

the large sample, in the prompt release of data and in the fact that each member of the

household is interviewed.

In the latest available wave of the survey a question on life satisfaction was

introduced for the first time, for all individuals aged 14 and over, therefore our sub-

sample is made of approximately 42,000 individuals. The question is harmonized with

the majority of international sample surveys and it reads: “All things considered, how

satisfied are you with your life as a whole these days? Use a 0 to 10 scale, where 0 is

completely dissatisfied and 10 is completely satisfied”. The choice of this wording was

done according to the research path developed in the 70s in the United States [1,4,5], as

each respondent was asked to evaluate autonomously their life satisfaction using a 0 to

10 scale.

A logistic regression model has been estimated with the aim of studying the

association between the variables considered. The dependent variable y (subjective

well-being) is equal to 1 when the individual gives a score between 8 and 10, y is equal

to 0 when the individual gives a lower score.

The independent variables considered are:

- socio-demographic variables: age group, gender, marital status, role within the

household, geographical area of residence;

- socio-economic variables: level of education, occupational status;

- social participation: meeting friends, religious participation;

- interpersonal trust: degree of trust in the majority of people;

- health conditions: self-perceived health.

Family structures and subjective wellbeing in Italy 3

3 Preliminary findings and future developments

The mean score for life-satisfaction in Italy is 7.2, while the median is 7.0 over 10.

Moreover, as shown in Figure 1, the distribution of interviewees according to their life-

satisfaction score is skewed to the right: in particular 43.3% of population aged 14 and

over scores 8 or more. However it must be noted that the life satisfaction score varies

considerably according to individual characteristics.

Figure 1: Answers to the question “All things considered, how satisfied are you with your life as

a whole these days?”

Source: ISTAT - Aspects of daily life (2010)

The results of the logistic regression model show how the majority of explanatory

variables introduced in the model are significantly associated with the definition of a

high/low level of life-satisfaction.

Particularly relevant is the association between the latter and the role of individuals

within the household (Table 1): adults living in couples and parents living with

children display significantly higher levels of life satisfaction compared to individuals

living alone (reference category). On the other hand, significantly lower levels of life-

satisfaction characterizes lone parent households (both when the respondent is the

parent and the children living in this kind of household) and members of extended

households.

Other variables positively associated with the level of life satisfaction are the

attendance of places of worship, and the degree of trust in the majority of people. In

particular, from the analysis of the interactions between variables, it was observed that

the impact of trust on life satisfaction is much stronger among individuals with a higher

level of education.

4 Silvia Montecolle, Francesca Rinesi and Alessandra Tinto

Table 1: Characteristics associated with the expression of high level of life-satisfaction in Italy .

Results from a logistic model.

Role within the family Odds Ratio p-value

Living alone (Reference Category) --- Member of extended household 0.759 0.0034

Couple without children 1.395 <0,0001

Parent within a Couple with children 1.299 <0,0001

Child within a Couple with children 1.055 0.3418

Parent within a Lone parent family 0.861 0.0172

Child within a Lone parent family 0.767 0.0001

Other 0.917 0.1704 Source: ISTAT - Aspects of daily life (2010). Results are controlled for: Gender, Age class,

Educational level, Perceived health, Marital status, Geographical area of residence, Role within

the family, Occupational status, Attendance of places of worship, Trust on the majority of people,

Meeting friends.

In conclusion, the present work gives us the opportunity to analyse which are the

individual characteristics which are mostly associated with a high level of life-

satisfaction. However it must be noted that, due to the cross-sectional nature of

available data, it is not possible to take into account the possible (non observable)

heterogeneity among individuals and the consequent selection effects. Further

developments of this research are aimed at introducing environment and social

contextual variables which can help capturing, through a multilevel approach, the effect

of the context on the individual definition of life-satisfaction.

References

1. Andrews, F.M., Withey, S.B.: Social Indicators of Well-Being: Americans’ Perceptions of

life quality. Plenum Press, New York (1976)

2. Argyle, M.: The Psychology of Happiness. Methuen, London, (1987)

3. Bradburn, N.M.: The Structure of Psychological Well-being. Aldine, Chicago (1969)

4. Campbell, A., Converse, P.: The Human Meaning of Social Change. Russell Sage

Foundation, New York (1972)

5. Campbell, A., Converse, P., Rodgers, W.: The Quality of American Life. Russell Sage

Foundation, New York (1976)

6. Diener, E.: Subjective Well-being. Psichol. Bull. 95, 542--575 (1984)

7. Diener, E., Emmons, R.A.: The independence of positive and negative affect. J Pers Soc

Psychol. 47 (5), 1105- -1117 (1984)

8. ISTAT: Il sistema di indagini sociali multiscopo, Metodi e norme, n. 31, Istat, Rome (2006)

9. OECD: Subjective well-being in Factbook 2009: Economic, Environmental and Social

Statistics,

http://miranda.sourceoecd.org/vl=73684696/cl=14/nw=1/rpsv/factbook2009/11/02/02/index

.htm. Cited 15 Mar 2012

10. Stiglitz, JE, Sen, A., Fitoussi, J.-P.: Report by the Commission on the Measurement of

Economic Performance and Social Progress (2009), http://www.stiglitz-sen-fitoussi.fr. Cited

15 Mar 2012

Identifiability of Discrete Graphical Models withHidden Variables

Marco Valtorta, Elizabeth S. Allman, and John A. Rhodes

Abstract We define a space of identifiability problems in causal Bayesian networksand concentrate on two of them. The first problem involves the generic identifiabilityof all parameters with restrictions on the state space of the variables. We present atechnique that, given an arbitrary directed graphical model with a single hiddenvariable, modifies the model in such a way that we can apply Kruskal’s theoremand solve the first identifiability problem. The second problem involves the globalidentifiability of the causal effect of a set T of variables on a set S of variables.Pearl’s do-calculus solves the second identifiability problem.

Key words: Causal Bayesian networks, Semi-Markovian models, Intervention,Identifiability, Unidentifiability

1 Two Settings for Identifiability

Markovian models are popular graphical models for encoding distributional andcausal relationships. A Markovian model consists of an acyclic directed graph(DAG) G over a set of variables V = V1, . . . ,Vn, called a causal graph, and a prob-ability distribution over V , which satisfies two constraints: each variable in the graphis independent of all its non-descendants given its direct parents, and the directededges in G represent direct causal influences. A Markovian model for which onlythe first constraint holds is called a Bayesian network. This explains why Markovianmodels are also called causal Bayesian networks.

Marco ValtortaDepartment of Computer Science and Engineering, University of South Carolina e-mail:[email protected]

Elizabeth S. Allman and John A. RhodesDepartment of Mathematics and Statistics, University of Alaska, Fairbanks AK 99775 USA e-mail:e.allman,[email protected]

1

2 Marco Valtorta, Elizabeth S. Allman, and John A. Rhodes

The chain rule for Bayesian networks states that the joint probability functionP(v) = P(v1, . . . ,vn) can be factorized as

P(v) = ∏Vi∈V

P(vi|pa(Vi)) (1)

The simplest kind of intervention [4] is fixing a subset of V , T , to some constantst, denoted by do(T = t) or just do(t), and then the post-intervention distributionPT (V )(T = t,V = v) = Pt(v) is compatible with the excision semantics and givenby:

Pt(v) =

∏Vi∈V\T P(vi|pa(Vi)) v consistent with t0 v inconsistent with t

(2)

Let N and U stand for the sets of observable (observed) and unobservable (hid-den) variables in graph G, i.e., N and U partition V . The observed probability distri-bution is:

P(n) = ∑U

∏Vi∈N

P(vi|pa(Vi)) ∏V j∈U

P(v j|pa(Vj)) (3)

One can define a space of identifiability problems based on equation (3). Weconcentrate on three dimensions of this space: identifiability of all parameters oronly some of them, identifiability of parameters in their whole range (global iden-tifiability) or with the exception of some subspace of measure zero (generic iden-tifiability), and identifiability with restrictions on the cardinality of the state spaceof variables or without them. We call identifiability 1 the generic identifiability ofall the probabilities in (3) with appropriate bounds on the state spaces of variables,and identifiability 2 the global identifiability with no bounds on the state space ofvariables of the causal effect Pt(s), given by:

Pt(s) =

∑Vl∈(N\S)\T ∑U ∏Vi∈N\T P(vi|pa(Vi))×∏V j∈U P(v j|pa(Vj)) s consistent with t0 s inconsistent with t

(4)

2 Kruskal Theorem and Its Use to Solve Identifiability 1

Kruskal’s theorem applies to a simple latent class model, in which three observedvariables are independent when conditioned on a single hidden one. We outline atechnique that, given an arbitrary directed graphical model with a single hiddenvariables, modifies the model in such a way that we can apply Kruskal’s theorem.The technique, which we have been developing and generalizes the one in [1], isbased on the following operations:

• Clump several variables (all hidden or all observed) into a single one, with largerstate space.

• Condition on the state of an observed variable.• Marginalize over an observed variable (making it hidden).

Identifiability of Discrete Graphical Models with Hidden Variables 3

Each of these can be done multiple times, and in combination with one another.The goal in applying these modifications is always to produce a model to whichKruskal’s theorem applies, so one needs to use them so that:

• At least 3 observed variables remain, which are independent when conditionedon the hidden variable.

• The resulting hidden state spaces are “not too large” relative to observed ones.(Letting a,b,c, and q be the sizes of the state spaces of the observed and hiddenvariables, in order, then min(a,q)+min(b,q)+min(c,q)≥ 2q+2.)

• Parameters for the resulting model are easily related to those of the original one.

It is easy to show by a counting argument that all Bayesian networks of fournodes in which there is at least one edge between children of the hidden variable arenot identifiable. An example of such a network is in Figure 1(a).

The causal Bayesian network of Figure 1(b) is identifiable by conditioning onvariable 2, applying Kruskal’s theorem on the resulting network of three observednodes and inverting the resulting conditional probability tables.

3 Using the Do-calculus to Solve Identifiability 2

The do-calculus consists of three rules that allow the replacement of interventionswith observations in modified graphs [4]. Let X , Y , Z be arbitrary disjoint sets ofnodes in a causal graph G. We denote by GX the graph obtained by deleting from Gall edges pointing to nodes in X and by GX the graph obtained by deleting from Gall edges emerging from nodes in X . To represent the deletion of both incoming andoutgoing edges, we use the notation GXZ .

(Rules of Do-Calculus) Let G be the DAG of a causal Bayesian network, andlet P(.) stand for its probability distribution. For any disjoint subsets of variablesX ,Y ,Z, and W we have the following rules.

Rule 1 (Insertion/deletion of observations)

Fig. 1 The causal Bayesian network (whose graph is) (a) is not identifiable 1, but the causal effectPt(s) is identifiable 2. The causal Bayesian network (b) is identifiable 1, but Pt(s) is not identifi-able 2.

4 Marco Valtorta, Elizabeth S. Allman, and John A. Rhodes

Px(y|z,w) = Px(y|w) i f (Y ⊥ Z|X ,W )GX(5)

Rule 2 (Action/observation exchange)

Pxz(y|w) = Px(y|z,w) i f (Y ⊥ Z|X ,W )GXZ(6)

Rule 3 (Insertion/deletion of actions)

Pxz(y|w) = Px(y|w) i f (Y ⊥ Z|X ,W )GX ,Z(W )(7)

where Z(W ) is the set of Z-nodes that are not ancestors of any W -node in GX .It was shown that the do-calculus is sound and complete [2] for the identifia-

bility 2 problem, i.e., a causal effect is identifiable 2 if and only if the quantityPt(s) can be tranformed into a formula that includes only observable quantities (i.e.,quantities derivable from P(N)) by using the rules of the do-calculus and standardprobability manipulations. To show that a causal effect is unidentifiable, it is how-ever more convenient to use the algorithm of Tian [6], which was also shown to besound and complete [5, 3]. For example, PT (S) is identifiable 2 in the graph of Fig-ure 1(a), because PT (S) = P(S); in other words, T has no causal effect on S. This canalso be shown by applying rule 3 (equation (7)), with X = , Y = S,Z = T,W = ,;consequently, Z(W ) = T , and GX ,Z(W ) is the graph of Figure 1(a) without the edge(U,T ). PT (S) is not identifiable 2 in the graph of Figure 1(b).

Acknowledgements The authors thanks the American Institute of Mathematics for financial sup-port to this research, through the Square initiative, that also involves Elena Stanghellini, whosecontribution is also gratefully acknowledged.

References

1. Allman, E.S., Mathias, C., Rhodes, J.A.: Identifiability of Parameters in Latent StructureModels with Many Observed Variables. The Annals of Statistics 37, 3099-3132 (2009)

2. Huang, Y., Valtorta, M.: Pearl’s Calculus of Intervention Is Complete. In: Proceedings of the22nd Conf. on Uncertainty in Artificial Intelligence (UAI-06), pp.217-224. Cambrige, MA,July 13-16, 2006

3. Huang, Y., Valtorta, M.: Identifiability in Causal Bayesian Networks: A Sound and CompleteAlgorithm. In: Proceedings of the 21st National Conf. on Artificial Intelligence (AAAI-06),pp. 1149-1154, Boston, MA, July 16-20, 2006

4. Pearl, J.: Causality — Models, Reasoning, and Inference, 2nd ed. Camb. Univ. Press, 20095. Shpitser, I., Pearl, J.: Identification of Conditional Interventional Distributions. In: Proceed-

ings of the 22nd Conf. on Uncertainty in Artificial Intelligence (UAI-06), pp.437-444. Cam-brige, MA, July 13-16, 2006

6. Tian, J.: Studies in Causal Reasoning and Learning. Technical Report R-309. Cognitive Sys-tems Laboratory, Department of Computer Science, University of California, Los Angeles.August 2002

Bayesian T-optimal designs by simulation: acase-study on model discrimination

Rossella Berni and Federico M. Stefanini

Abstract In a case study on wine making, total anthocyanins are measured duringwine pre-fermentation. An inhomogeneous Markov Chain is developed to obtainthe Bayesian T-optimal design for the next year. Results are discussed in view ofextensions of the utility function often needed in actual applications.

Key words: Bayesian T-optimal designs, utility function, Monte Carlo simulation

1 Introduction

Optimal design criteria received growing attention in the last ten years, both at the-oretical and computational levels, in part following the increase of computationalpower. Since the 70s, there is a long history of seminal papers in literature on D andT-optimality, both to estimate model parameters and to discriminate among models(for example, [7],[2], [3], [1]). Each experimental point and the final optimal designare selected according to the General Equivalence Theorem (G.E.T.). Model depen-dency may be considered as the main disadvantage for an optimal design, thus theresult depends on the hypothesized statistical model and its parameters: in presenceof uncertainty on model and parameters this dependence is crucial.

More recently, this dependence was considered in a Bayesian framework [5],by introducing prior distributions on models and parameters. Later [6] extended T-optimality by adopting the Kullback-Leibler distance to address the heteroschedas-ticity and the non-Gaussian nature of response variables: they defined the KL-optimality.

Notwithstanding the generality achieved, in actual applications further flexibilityis often needed, for example by defining a utility function in which the cost of eachobservation depends on the value taken by the independent variable.

Stefanini F.M. e-mail: [email protected] · Berni R. e-mail: [email protected] of Statistics ”G.Parenti”, University of Florence

1

2 Rossella Berni and Federico M. Stefanini

In this paper, a utility function based on the T-optimality criterion is definedand an inhomogeneous Markov Chain algorithm is developed [4] to perform MonteCarlo optimization.

2 Basic Theory

The class of linear and non-linear parametric models are considered as:

E(Yi|xi,θ j) = η j(xi,θ j); j = 1,2;xi ∈X ,θ j ∈Θ j (1)

with j = 1,2 the index of the considered model. Vector xi =(xi1, ..,xi j, ..,xik) denotesthe set of k independent variables for the i-th observation (i = 1, ..,n); for simplicitywe assume there are no replicates; the two vectors of unknown parameters θ j, j =1,2 have size m j, θ j ∈ Θ j ⊂ Rm j . We assume that η j(xi,θ j) are continuous realfunctions of (xi,θ j) ∈X ×Θ j. In an experimental design setting each xi is the i-thtrial chosen or/and controlled by the experimenter; xi belongs to the compact set Xwhich is the experimental region defined in the Rk field. Regarding random errorsεi, j, we assume that εi, j

i.i.d.∼ N(0,σ2j ).

In this context we consider a discrete or exact design, i.e. a design formed by aset of n points (x1, ..,xi, ..,xn) in X and denoted by Dn. Furthermore, ξ is the de-sign measure defined as a probability measure on the compact set X and it satisfies∫X ξ (x) dx = 1. It must be noted that a continuous design is a design which depends

only on the assumed probability measure ξ , without considering the number of ex-perimental points. A discrete design is defined for a specific set of trials and for eachdesign point of Dn given the total number of observations n by assigning masses pito each xi: ξn = (pi,xi); i = 1, ..,n;∑i pi = 1 where ρ is the rounded number of ob-servations taken at xi, with ∑i ρ(pin) = n. In [5], the theory is extended to situationsin which model uncertainty is present, and it is described by an elicited prior distri-bution with parameters π0 j; j = 1,2. Moreover elicited prior distributions of modelparameters are p(θ j), j = 1,2. The central result is that the Bayesian T-optimal cri-terion satisfies the G.E.T. theorem while it no longer depends on unknown values of

Fig. 1 Maximum likelihoodestimates of polynomial (left,continuous line) and logis-tic (right, continuous line)models. Dashed lines joinobserved values.

Pol Log

50

100

150

200

5 10 15 5 10 15Day

To

ta

l A

nth

ocya

nin

s

T-optimal Bayesian designs by simulation 3

the true model parameters. The two noncentrality parameters are:

∆ j(ξ ,θ j) = infθ j∈Θ j

∫X(η j(x,θ j)−η j(x,θ j))

2ξ (dx) (2)

with j ∈ 1,2 and j its complement in 1,2, thus j is in turn the index of the truemodel, after [2]. A T-optimal design ξ ? maximizes the criterion:

Γ (ξ ) = ∑j∈1,2

π0 j

∫Θ j

∆ j(x,θ j)p(θ j)dθ j (3)

3 A Monte Carlo algorithm for wine making

The kinetic of total anthocyanins (TAs) during a vinification procedure which isquite popular in Tuscany is considered. Year 2010 data were collected by daily sam-pling during maceration just after a pumping over. One hundred ml were withdrawnfrom the sampling valve and the UV/VIS spectra of the supernatants were recordedafter centrifugation. Two models were considered (Figure 1): a polynomial (η1)which is motivated by the possible role played by sulfitation, a logistic curve (η2)which is based on chemical considerations about the presence of TAs in solution.The expected values are:

η1(x,θ1) = θ1,0 +θ1,1x+θ1,2x2 +θ1,3x3 +θ1,4x4 (4)η2(x,θ2) = θ2,2 +((θ2,1−θ2,2)/(1+ exp((θ2,3− x)/θ2,4))) (5)

thus discrimination is performed between two non-linear functions in coefficientsand/or factors with only one independent variable (k = 1).

The prior probability values on models are π0,1 = 0.1 and π0,2 = 0.9. Discretizedposterior distributions on a grid of 25 points given observed data d were derived aftercalculating Laplace approximations under weakly informative priors, respectivelyp(θ1 | d) and p(θ2 | d). Following [4], we defined the inhomogeneous Markov Chainto optimize the utility function:

u(ξ ,θ1,θ2) = 1 ·10−10 +π01∆2(ξ ,θ1)+π02∆1(ξ ,θ2) (6)

with a multivariate normal jump distribution g(ξ | ξ ) defined on suitably trans-formed coordinates and weights (hx(xi),hp(pi)), i = 1, . . . ,5. Given the current state(ξ ,v) of this chain, the steps of our algorithm are:

1. Generate a candidate design ξ given the current state using g.2. Generate J points (θ ( j)

1 ,θ( j)2 ) j from the distributions of model parameters p(θ1 |

d) and p(θ2 | d). Calculate

v = J−1J

∑j=1

log(

u(ξ ,θ ( j)1 ,θ

( j)2 ))

4 Rossella Berni and Federico M. Stefanini

and set the candidate (ξ , v).3. Calculate αJ = min(1,exp(Jv−Jv), and with probability αJ accept the candidate

(ξ , v), otherwise let v unchanged.4. Increase the current value of J according to a suitable cooling schedule Jt : t =

1,2, . . . with Jt+1 ≥ Jt , e.g. every fifty steps increase J by 25.5. Go to step (1) up to convergence.

Among the designs made by six distinct points, we found the optimum at,(xi, pi):(1.00,0.48),(3.40,0.35),(3.26,0.11),(8.87,0.02),(11.64,0.02),(16.06,0.02).

4 Discussion

The proposed algorithm is based on a utility function which is remarkably close towhat optimized in the classic framework (up to a small constant to make it positive).Nevertheless the implementation does not change much if equation (6) is changed toincorporate the cost of observations, that may eventually depend on the value takenby x. Similarly, the total number of observations could be explicitly accounted for.

From the standpoint of the quality of calculations, approximated prior distribu-tions were considered to better compare results of simulations to the solution pro-vided by strictly following [5]. In [4] it is presented a more general algorithm tosimultaneously obtain the posterior distribution and the optimized decision. In theliterature, it has been already hypothesized that the prior distribution on modelscould be updated given the observed data, as we performed for model parameters,even though a substantial increase of computational burden is expected after suchextension.

References

1. Atkinson, A.C., Cox, D.R.: Planning experiments for discriminating between models (withdiscussion). J. R. Stat. Soc. ser. B, 36, 321–348 (1974)

2. Atkinson, A.C., Fedorov, V.V.: The design of experiments for discriminating two rival models.Biometrika, 62, 57–70 (1975)

3. Atkinson, A.C., Fedorov, V.V.: The design of experiments for discriminating between severalmodels. Biometrika, 62, 289–303 (1975)

4. Muller, P., Sanso, B., De Iorio, M.: Optimal Bayesian design by inhomogeneous Markovchain simulation. J. Am. Stat. Assoc., 99, 788–798 (2004)

5. Ponce De Leon, A.C., Atkinson, A.C.: Optimum experimental design for discriminating be-tween two rival models in the presence of prior information. Biometrika, 78, 601–618 (1991)

6. Tommasi, C., Lopez-Fildago, J.: Bayesian optimum designs for discriminating between mod-els with any distribution. Comput. Stat. Data An., 54, 143–150 (2010)

7. Wynn, H.P.: The sequential generation of D-optimum experimental designs. Ann. Math. Stat.,41, 1655–1664 (1970)

Monte Carlo Likelihood Inference inMultivariate Model-Based Geostatistics

Marco Minozzo and Clarissa Ferrari

Abstract Though in the last decade many works have appeared in the literature deal-ing with model-based extensions of the classical (univariate) geostatistical mappingmethodology based on linear Kriging, very few authors have concentrated, mainlyfor the inferential problems they pose, on model-based extensions of classical mul-tivariate geostatistical techniques like the linear model of coregionalization, or therelated ‘factorial kriging analysis’. Nevertheless, in presence of multivariate spatialnon-Gaussian data, in particular count data, as in many environmental applications,the use of these classical techniques can lead to incorrect predictions about the un-derling factors. To overcome this problem, here we discuss a hierarchical geosta-tistical factor model that extends, following a model-based geostatistical approach,the classical geostatistical proportional covariance model. For this model we investi-gate a likelihood-based inferential procedure using the Monte Carlo EM algorithm.In particular, we discuss some of its theoretical properties and show, through somethorough simulation studies, its sampling performances.

Key words: Cokriging, generalized linear mixed models, linear model of coregion-alization, Monte Carlo EM, spatial factor model, spatial prediction

1 Introduction

The classical linear model of coregionalization, or its simpler counterpart, the pro-portional covariance model, otherwise known as intrinsic correlation model, and

Marco MinozzoDepartment of Economics, University of Verona, Via dell’Artigliere 19, IT-37129 Verona, Italy,e-mail: [email protected]

Clarissa FerrariDepartment of Economics, University of Verona, Via dell’Artigliere 19, IT-37129 Verona, Italy,e-mail: [email protected]

1

2 Marco Minozzo and Clarissa Ferrari

the related ‘factorial kriging analysis’ have become standard tools in many areasof application for the analysis of multivariate spatial data. However, in presenceof non-Gaussian data, in particular count or skew data, the use of these geostatis-tical instruments can lead to misleading predictions and to erroneous conclusionsabout the underling factors. To cope with these situations, following the proposalput forward in the univariate case by Diggle et al. (1998), and somehow extend-ing the works of Zhang (2007) and of Zhu et al. (2005), we propose in Section 2 ahierarchical multivariate spatial model, built upon a generalization of the classicalgeostatistical proportional covariance model. Adopting a non-Bayesian inferentialframework, and assuming that the number of underlying common factors and theirspatial autocorrelation structure are known, in Section 3 we show how to carry outlikelihood inference on the parameters of the model by exploiting the capabilities ofthe Monte Carlo EM (MCEM) algorithm (see Wei and Tanner, 1990).

2 The Modeling Framework

Let us consider the following hierarchical extension of the classical geostatisticallinear model of coregionalization. Let yi(xk), i = 1, . . . ,m, k = 1, . . . ,K, be a set ofgeo-referenced data measurements relative to m regionalized variables, gathered atK spatial locations xk. These m regionalized variables are seen as a partial realizationof a set of m random functions Yi(x), i = 1, . . . ,m, x ∈ R2. For these functions weassume, for any x, and for i = j,

Yi(x)⊥⊥Yj(x)|Zi(x) and Yi(x)⊥⊥Z j(x)|Zi(x), (1)

and, for x′ = x′′, and i, j = 1, . . . ,m,

Yi(x′)⊥⊥Yj(x′′)|Zi(x′) and Yi(x′)⊥⊥Z j(x′′)|Zi(x′), (2)

where Zi(x), i= 1, . . . ,m, x∈R2, are mean zero joint stationary Gaussian processes.Moreover, for any given i and x, we assume that, conditionally on Zi(x), the ran-

dom variables Yi(x) have conditional distributions fi(y;Mi(x)), that is, Yi(x)|Zi(x)∼fi(y;Mi(x)), specified by the conditional expectations Mi(x) = E[Yi(x)|Zi(x)], andthat hi(Mi(x)) = βi +Zi(x), for some parameters βi and some known link functionshi(·). For instance, we might assume that for some or all i, and for any given x, thedata are conditionally Poisson distributed, that is, that

fi(y;Mi(x)) = exp−Mi(x)(Mi(x))y/y!, y = 0,1,2, . . . , (3)

and that the linear predictor βi + Zi(x) is related to the conditional mean Mi(x)through a logarithmic link function so that ln(Mi(x)) = βi + Zi(x). On the otherhand, for the rest of the i, we might assume that, for any given x, conditionallyon Zi(x), the random variables Yi(x) are Gamma distributed with conditional ex-pectations Mi(x) = E

[Yi(x)

∣∣Zi(x)]= exp

βi + Zi(x)

= νb, (here again hi(·) =

Multivariate Model-Based Geostatistics 3

ln(·)) and conditional variances Var[Yi(x)

∣∣Zi(x)]= νb2 = ν−1 exp

2βi+2Zi(x)

=

ν−1(Mi(x))2, where ν > 0 and b > 0 are parameters, that is, we might assume

fi(y;Mi(x)) = (yν−1/Γ (ν))exp−yν/Mi(x)(ν/Mi(x))ν , y > 0. (4)

Here the ‘shape’ parameter ν is constant for x ∈ R, whereas the ‘scale’ parameterb varies over R depending on the conditional expectation Mi(x). In addition to thePoisson or Gamma distributions, other discrete or continuous distributions could beconsidered to account for particular set of data.

For the latent part of the model, we adopt the following structure. For the m jointstationary Gaussian processes Zi(x), let us assume the linear factor model

Zi(x) =P

∑p=1

aipFp(x)+ξi(x), (5)

where aip are m×P coefficients, Fp(x), p = 1, . . . ,P, are P ≤ m non-observablespatial components (common factors) responsible for the cross correlation be-tween the variables Zi(x), and ξi(x) are non-observable spatial components (uniquefactors) responsible for the residual autocorrelation in the Zi(x) unexplained bythe common factors. We assume that Fp(x) and ξi(x) are mean zero stationaryGaussian processes with covariance functions Cov

[Fp(x),Fp(x+h)

]= ρ(h), and

Cov[ξi(x),ξi(x+h)

]= ψiρ(h), where h ∈R2, ρ(h) is a real spatial autocorrelation

function common to all factors such that ρ(0) = 1 and ρ(h)→ 0, as ||h|| → ∞, andψi are non-negative real parameters. We also assume that the processes Fp(x) andξi(x) have all cross-covariances identically equal to zero.

Assuming that the number P of latent common factors and that the spatial au-tocorrelation function ρ(h) have already been chosen, the model depends on theparameter vector θ = (β ,A,ψ), where β = (β1, . . . ,βm)

T , A = (a1, . . . ,am)T , with

ai = (ai1, . . . ,aiP), for i = 1, . . . ,m, and ψ = (ψ1, . . . ,ψm)T . Let us note that, as the

classical linear factor model, our model is not identifiable. However, the only inde-terminacy stays in a rotation of the matrix A.

3 Inference with the Monte Carlo EM algorithm

Likelihood inference on the parameters of the model would require the maximiza-tion, with respect to θ = (β ,A,ψ), of the likelihood based on the marginal densityfunction of the observations yi(xk). However, since this marginal density is not avail-able, and since the integration required in the E-step of the EM algorithm would notbe easy, here, to maximize the log-likelihood, we will resort to the MCEM algo-rithm.

Our implementation of the algorithm proceeds as follows. Let us define ξ =(ξ 1, . . . ,ξ m) where ξ i = (ξi(x1), . . . ,ξi(xK))

T , i = 1, . . . ,m, and F = (F1, . . . ,FP)where Fp = (Fp(x1), . . . ,Fp(xK))

T , p = 1, . . . ,P, and let f (y,ξ ,F;θ) be the joint

4 Marco Minozzo and Clarissa Ferrari

distribution of the model, that is, the complete log-likelihood, accounting also forthe unobserved factors. Assuming that the current guess for the parameters afterthe (s− 1)th iteration is given by θ s−1, and that Rs is a fixed positive integer, thesth iteration of the MCEM algorithm involves the following three steps (stochastic,expectation, maximization):

S step – draw Rs samples (ξ (r),F(r)), r = 1,. . .,Rs, from the (filtered) conditionaldistribution f (ξ ,F|y;θ s−1);

E step – compute Qs(θ ,θ s−1) = (1/Rs)∑Rsr=1 ln f (y,ξ (r),F(r);θ);

M step – take as the new guess θ s the value of θ which maximizes Qs(θ ,θ s−1).

In our modeling framework, the S-step of the algorithm can be implemented throughimportance sampling or Markov chain Monte Carlo (MCMC) techniques, whereasthe M-step typically requires the use of numerical procedures.

When the matrix A is known, and the conditional distributions fi(y;Mi(x)) are,for instance, Poisson, Gamma or Binomial, it is possible to see that the completelog-likelihood belongs to the curved exponential family and by choosing an appro-priate increasing sequence Rs the algorithm converges to the maximum likelihoodestimate (Fort and Moulines, 2003). On the other hand, when the matrix A is un-known, the complete likelihood does not belong any more to the curved exponentialfamily and theoretical convergence properties are not available. In this case, andassuming, for instance, fi(y;Mi(x)) to be Poisson, Gamma or Binomial, it is possi-ble to show that the complete log-likelihood to be maximized in the M-step of theMCEM algorithm is concave and so admits just one local maximum. Although thisdoes not guarantee by itself the convergence of the algorithm to some local max-imum, it allows a straightforward computational implementation of the M-step ofthe algorithm. Despite the lack of theoretical results on the sampling properties ofthe MCEM estimator, either in the case in which A is known or unknown, we show,through some extensive simulation studies, that the MCEM algorithm provides es-timates with quite reasonable sampling distributions.

Acknowledgements We gratefully acknowledge funding from the Italian Ministry of Education,University and Research (MIUR) through PRIN 2008 project 2008MRFM2H.

References

1. Diggle P.J., Moyeed R.A., Tawn J.A.: Model-based geostatistics (with discussion). Appl. Stat.47, 299–350 (1998)

2. Fort G., Moulines E.: Convergence of the Monte Carlo expectation maximization for curvedexponential families. Ann. Stat. 31, 1220–1259 (2003)

3. Wei G.C.G., Tanner M.A.: A Monte Carlo implementation of the EM algorithm and poorman’s data augmentation algorithm. J. Am. Stat. Assoc. 85, 699–704 (1990)

4. Zhang H.: Maximum-likelihood estimation for multivariate spatial linear coregionalizationmodels. Environmetrics 18, 125–139 (2007)

5. Zhu J., Eickhoff J.C., Yan P.: Generalized linear latent variable models for repeated measuresof spatially correlated multivariate data. Biometrics 61, 674–683 (2005)

Simulation of random rotation matrices

John T. Kent and Asaad M. Ganeiber

Key words: matrix Fisher distribution, acceptance-rejection simulation, angularcentral Gaussian distribution, Bingham distribution

1 Introduction

Directional data analysis is concerned with statistical analysis on various non-Euclidean manifolds, starting with circle and the sphere, and extending to relatedmanifolds [6]. Directional distributions can used as building blocks in more sophis-ticated statistical models which are studied using MCMC methods. For example,[2] used the matrix Fisher distribution in a Bayesian model to align two configura-tions of points in R3 in an unlabelled version of shape analysis, and they applied themodel to a problem of protein alignment in bioinformatics. Hence there is a needto develop simulation methods for directional distributions which are efficient overa wide range of concentration parameters. In this paper we focus on the simula-tion of the matrix Fisher distribution on the space of 3× 3 rotations using a newacceptance-rejection method to simulate the Bingham distribution.

John T. KentUniversity of Leeds, Leeds LS2 9JT, UK, e-mail: [email protected]

Asaad M. GaneiberUniversity of Leeds, Leeds LS2 9JT, UK, e-mail: [email protected]

1

2 John T. Kent and Asaad M. Ganeiber

2 Directional distributions

The following table gives some of the common spaces associated with directionaldata analysis, together with the main distributions.

Space Notation Distributionscircle S1 von Mises, wrapped Cauchysphere Sp Fisher (p = 2) and von-Mises-Fisher (p≥ 1),

Fisher-Binghamreal projective space RPp Bingham, angular central Gaussianspecial orthogonal group SO(q) matrix Fisher

The sphere Sp = x ∈ Rp+1 : xT x = 1 represents the space of “directions” inRp+1. Real projective space consists of the “axes” or “unsigned directions” ±x. Insome sense this space is half of a sphere; it can also be represented as the space ofrank 1 projection matrices,

RPp = P ∈ R(p+1)×(p+1) : P = PT , P2 = P, tr(P) = 1. (1)

A rank one projection matrix can be written as P = xxT where x is a unit vector. Thespecial orthogonal group of p× p rotation matrices is defined by

SO(q) = R ∈ Rp×p : detR = 1,RT R = Ip,

On each of these spaces there is a unique uniform distribution which is invariantunder rotations. Further each of these spaces is naturally embedded in a Euclideanspace. A natural “linear-exponential” family of distributions can be generated byletting the density (with respect to the uniform measure) be proportional to the ex-ponential of a linear function of the Euclidean variables. This construction generatesthe first named distribution in each row of the table above. It should be noted thatthe Bingham distribution, whose log density is linear in P = xxT in (1), can also beviewed as a distribution on the sphere whose log density is quadratic in x.

3 The matrix Fisher distribution

The linear-exponential family on SO(p) is known as the matrix Fisher distribution,with density

f (X) = cF exptr(FT X), X ∈ SO(p),

with respect to the underlying invariant Haar measure. This density was introducedby [4]; it is unimodal about a fixed rotation matrix determined by the p× p param-eter matrix F .

Now specialize to the case q = 3. A matrix in X ∈ SO(3) can be written in theform X = H23(φ)H13(θ)H12(ψ), where for1 ≤ i < j ≤ 3, Hi j(θ) denotes a 3×3 matrix which looks like an identity matrix except for values cosθ in locations

Simulation of random rotation matrices 3

(i, i) and ( j, j), and values sinθ and −sinθ in locations (i, j) and ( j, i). Thus Xis constructed as a product of three two-dimensional rotations about each of thecoordinate axes in turn. The angles φ ,θ ,ψ are known as Euler angles. They lie inranges, 0 ≤ φ ,ψ < 2π and −π/2 ≤ θ ≤ π/2. In these coordinates the underlyingHaar measure can be represented as

[dX ] = cosθdθdφdψ.

Note the presence of the cosθ term. This arises because small circles of constantlatitude have a smaller circumference near the poles than near the equators.

Let F have the signed singular value decomposition

F = U∆V T ,

where U and V are 3×3 rotation matrices and ∆ = diag(δ j) is diagonal with δ1 ≥2δ2 ≥ |δ3|. It differs from the usual singular value decomposition by requiring Uand V to be rotation matrices (rather than just orthogonal matrices) and by allowingthe smallest singular value to be negative if necessary to compensate.

This distribution reduces to the uniform distribution if F = 0 and becomes moreconcentrated about its modal value X = VUT as the overall concentration ||F || =√tr(FT F) increases.For theoretical purposes it suffices to limit attention to the diagonal case F = ∆ .

As the concentration increases, the distribution becomes concentrated near θ = φ =ψ = 0, and asymptotically, f (X) becomes a trivariate normal distribution,

f (X) ∝ exp−12[(δ1 +δ3)θ 2 +(δ1 +δ2)φ 2 +(δ2+δ3)ψ2],

with respect to Lesbesgue measure dθdφdψ .

4 Simulation

When developing simulation methods for directional distributions, there are severalissues to consider:

• the need for good efficiency for a wide range of concentration parameters, fromnear uniform to highly concentrated. In similar problems on Rp, the task is sim-pler when distributions are closed under affine transformations; in such cases itis sufficient to consider just a single standardized form of the distribution.

• the need for a tractable envelope distribution.• the presence of trigonometric factors in the base measure.

Efficient acceptance-rejection methods are available for the simpler directionaldistributions, most notably the Best-Fisher method [1] for the von Mises distribu-tion. For the more complicated distributions, several MCMC algorithms have re-cently been proposed, e.g. [2], [5] and [3]. However, acceptance-rejection methods

4 John T. Kent and Asaad M. Ganeiber

with reasonable acceptance probabilities are to be preferred when available. Thefollowing simulation method is based on a new acceptance-rejection algorithm forthe Bingham distribution. The new simulation method is based around the followingobservations.

• A classic result from differential geometry states that the space SO(3) can beidentified with real projective space RP3 under a one-to-one mapping, or equiv-alently with the unit sphere S3 in R4 under a one-to-two mapping. Each rotationmatrix on SO(3) maps to two antipodal points on this unit sphere. This identifi-cation is limited to the case p = 3. There does not seem to be any useful analoguefor SO(p), p > 3.

• The matrix Fisher distribution on SO(3) corresponds to the Bingham distributionon S3.

• The PhD thesis of the second author gives a new method to simulate from theBingham distribution on Sp for any p ≥ 1 using an acceptance-rejection algo-rithm with the angular central Gaussian distribution as an envelope.

• The angular central Gaussian distribution on Sp is very simple to simulate.Given a (p + 1)× (p + 1) covariance matrix Σ , simulate y ∼ Np+1(0,Σ) andset z = y/||y||. Given the parameters of a Bingham distribution, it is possible todetermine a choice of Σ to give a good envelope.

• The use of an angular central Gaussian envelope for the Bingham distributionis closely related to the use of a multivariate Cauchy density as an envelope forsimulating a multivariate normal distribution.

• the acceptance ratio is typically about 45% for a wide range of parameters. Thisvalue is very reasonable for practical purposes.

References

1. Best, D. J. and Fisher, N. I. Efficient simulation of the von Mises distribution. J. Appl. Statist.28, 152–157 (1979).

2. Green, P. J. and Mardia, K. V. Bayesian alignment using hierarchical models, with applicationsin protein bioinformatics, Biometrika 93, 235–254 (2006).

3. Hoff, P. D. Simulation of the matrix Bingham-von Mises-Fisher distribution with applicationto multivariate and relational data. Journal of Computational and Graphical Statistics 18,438–456 (2009)

4. Khatri, C. G. and K. V. Mardia, K. V. The von Mises-Fisher matrix distribution in orientationstatistics. J. Roy. Statist. Soc. B 39, 95–106 (1977).

5. Kume, A., and Walker, S. G. Sampling from compositional and directional distributions.Statistics and Computing 16, 261–265 (2006),

6. Mardia, K. V. and Jupp, P. E. Directional Statistics. Wiley, Chichester (2000).

Dynamically modelling of fuzzy sets for flexibledata retrieval

Miroslav Hudec

Abstract Flexible querying allows users to implement linguistic terms to better qualifydata they wish to obtain and rules to reveal. The question is how to properly constructfuzzy sets for each linguistic term. This issue is considered from the two aspects: user’sview on particular linguistic term and on the current content in database. Evidently, theuser can obtain the picture about stored data before running a query. This approach canbe used in situations when a non-commutative operator is required. The rules extractionby linguistic quantifiers is another task where modelling of fuzzy sets can be applied.Institutions of official statistics deal with large amount of surveyed data and potentiallyuseful administrative data, what makes them interesting for this approach.

1 Introduction

The increasing use of computers by business and governmental agencies has createdmountains of data that contain potentially valuable knowledge (Rasmussen and Yager,1997). The same holds for agencies of official statistics. Firstly, databases could containcrisp values which are not always accurately surveyed. Secondly, data fromadministrative sources contain valuable information which should be examined.

Flexible querying allows users to implement linguistic terms to better qualify datathey wish to obtain and rules to reveal. For example, to find municipalities wheremigration is small and unemployment is high, or to find to which extent the rule mostof companies which report to Intrastat have value of trade near exemption threshold istrue. The linguistic terms clearly suggest that there is a smooth transition betweenacceptable and unacceptable records.

1 Miroslav Hudec, Institute of Informatics and Statistics, Bratislava; email:[email protected]

2 Miroslav Hudec

Several fuzzy query implementations have been proposed e.g. (Bosc and Pivert,2000; Hudec, 2009; Kacprzyk and Zadrożny, 1995) and fuzzy queries for data mining(Rasmussen and Yager, 1997). In all approaches, the matching degree criticallydepends on constructed membership functions (Hudec and Sudzina, 2012).

This paper examines construction of fuzzy sets for flexible queries and its usage inaggregation by fuzzy linguistic quantifiers and in situation when commutative operatorsare not appropriate.

2 Defining appropriate fuzzy sets for each linguistic term

Let Dmim and Dmax be the lowest and the highest domain values of the attribute A(database column) i.e. Dom(A) = [Dmin, Dmax] and L and H be the lowest and thehighest values from an current database content; that is, [L, H] [Dmin, Dmax]. Formany attributes in databases holds [L, H] [Dmin, Dmax]; that is, intervals [Dmin, L]and/or [H, Dmax] are empty. This fact should be considered in data retrieval and ruleextraction. Theoretically, the domain of attribute value of export is [exemptionthreshold value, ]. The highest value of realized export is far from the “upperlimit” of R+. In construction of term high, we need to consider stored real values.

Let the linguistic domain have elements small, medium, high. The linguisticdomain covers the crisp sub domain of an attribute in a way illustrated in figure 1.

Figure 1: Linguistic and crisp domain

The first aspect allows users to freely define parameters of fuzzy sets (A, B, C, D).If the user is not familiar with the current database content, the query might easily endup with an empty answer. Moreover, the user is usually familiar with values of Dmin andDmax but not with values of L and H.

The second aspect is focused on construction of membership functions (A, B, C, D)directly from current content of a database. The first method is the uniform domaincovering method (Tudorie, 2008), depicted in figure 1. At the beginning, values of Land H are obtained from current database content. The length of fuzzy set core β andthe slope α (Figure 1) are created in the following way (Tudorie, 2008):

)(81 LH (1)

)(41 LH (2)

Consequently, it is easy to calculate required parameters A, B C and D.The uniform domain covering method is appropriate when the distribution of

attribute values in the domain is more or less uniform. If it is not the case, the uniformdomain coverage could lead to a conclusion that the meaning of the linguistic term is

Dynamically modelling of fuzzy sets for flexible data retrieval 3

far from real data. For these situations, the method can be improved by the statisticalmean (Tudorie, 2008) or the logarithmic transformation (Hudec and Sudzina, 2012).

For the solution of data retrieval task both aspects should be taken into account.The above mentioned methods could be used to suggest parameters of fuzzy sets. In thesecond step, users can modify these parameters, if they are not satisfied with suggestedones, before running a query (Hudec and Sudzina, 2012).

3 Linguistic quantifiers

A special role among the aggregation operators play linguistic quantifiers such as mostor few. For example, to find out whether in the Intrastat database most of businesseshave small value of intra-EU trade (are near the exemption threshold).

This problem is depicted in way Qx(Px), where Q denotes a linguistic quantifier, X=x is a universe of disclosure (set of all companies) and P(x) is a predicatecorresponding to a query condition. In the first step we need to construct membershipfunction for the term small value of trade. The uniform domain covering method (1)and (2) is the best option, because the main goal is not to retrieve data but to revealrules. Value of L (figure 1) is the exemption value. The truth value of statement iscomputed by the following equation (Zadrożny and Kacprzyk, 2009):

))(n1())((

1P

n

iiQ xPxQxTruth (3)

where n is the cardinality of X and µQ (the quantifier most) might be given as:

5.0,085.05.0,62

85.0,1)(

yforyfory

yforyQ

4 Non-commutative aggregation operator

T-norm functions are used for the aggregation under uncertainty. From the axiom thatall t-norm functions are commutative, implies that they are applicable only if the orderof elementary conditions is irrelevant. There exists a class of problems whereelementary conditions are not independent, that is, the second elementary conditiondepends on answers obtained from the first one. Obviously this requires using a non-commutative operator. The among operator (Tudorie, 2008) meets this requirement:

))(a),(amin( 2P1/P 22121 PPAMONGP (4)

where a1 and a2 are database attributes, µP2 is the membership function definingfulfilment of independent elementary condition and µP1/P2 is the fulfilment degree ofdepended elementary condition relative to the independent one.

The example of this query is: select companies which exported small amount ofgoods (P1) among companies having high value of trade (P2).

4 Miroslav Hudec

In the first step companies with high value of trade (vt) are selected. Themembership function of linguistic term high is calculated by one of methods examinedin section 2 for the domain [Lvt, Hvt] from the current content in database. Companiesselected by P2 create sub set of all companies in database. This subset constitutesreduced sub domain [Lag-red, Hag-red] [Lag, Hag] of amount of goods (ag). The fuzzyset small amount of goods is created on sub domain [Lag-red, Hag-red]. Even if the usercan define parameters for membership function µp2, without suggestion from currentdatabase content, defining the membership function for µp1/p2 is beyond his capabilities.

5 Conclusion

In this paper, we suggested a flexible SQL-like query language for data retrieval anddata mining. The problem of construction of membership functions for data retrievaltasks and data mining can be satisfactorily solved if we merge the user’s opinion aboutlinguistic terms with the current content in database. This approach is also a supportingtool for queries where elementary conditions are not independent and for extractingrules by linguistic quantifiers.

In addition, this approach is open for further improvements like: querying overmissing values when users know functional dependencies between attributes andquerying using priorities between elementary conditions.

The research reported herein was funded by the European Commission via the SeventhFramework Programme for Research (FP7/2007-2013) under Grant agreementn°244767. This work was supported by the Slovak Research and Development Agencyunder the contract No. DO7RP-0024-10.

References

1. Bosc, P., Pivert, O.: SQLf query functionality on top of a regular relational databasemanagement system. In: Pons, M., Vila, M.A., Kacprzyk, J. (eds.) Knowledge Managementin Fuzzy Databases, pp. 171-190. Physica-Verlag, Heidelberg (2000).

2. Hudec, M.: An approach to fuzzy database querying, analysis and realisation. Comput. Sci.Inf. Syst. 6(2), 127-140 (2009).

3. Hudec, M., Sudzina, F.: Construction of fuzzy sets and applying aggregation operators forfuzzy queries. In: In: 14th International Conference on Enterprise Information Systems(ICEIS 2012), Wroclav (2012). Accepted for publication.

4. Kacprzyk, J., Zadrożny, S.: FQUERY for Access: Fuzzy querying for windows-basedDBMS. In: Bosc, P., Kacprzyk, J. (eds.) Fuzziness in Database Management Systems, pp.415-433. Physica-Verlag, Heidelberg (1995).

5. Rasmussen, D., Yager, R.R.: Summary SQL - A Fuzzy Tool for Data Mining. Intell. DataAnal. 1, 49-58 (1997)

6. Tudorie, C.: Qualifying objects in classical relational database querying. In: Galindo J. (ed.)Handbook of Research on Fuzzy Information Processing in Databases, pp. 218-245. IGIGlobal, London (2008).

7. Zadrożny, S., Kacprzyk, J.: Issues in the practical use of the OWA operators in fuzzyquerying. J. Intell. Inf. Syst. 33, 307-325 (2009).

How the text mining measures complex phenomena

in official statistics Come il text mining misura fenomeni complessi nelle statistiche ufficiali

Bolasco Sergio, Pavone Pasquale Dipartimento Memotef Università di Roma “La Sapienza”,

[email protected];

Scuola Superiore S.Anna di Pisa, [email protected]

Riassunto:

Il presente lavoro si propone, attraverso strumenti di text mining, di misurare

quantitativamente caratteristiche dell’attività quotidiana descritta nei diari individuali

dell’indagine Istat sull’Uso del Tempo (TUS). In particolare, vengono studiati fenomeni

riguardo la localizzazione delle attività relazionali riconducibili al “comunicare con”. Le

maggiori potenzialità di un’analisi condotta direttamente su informazioni in linguaggio

naturale sono dovuti alla migliore “risoluzione” della misurazione, in quanto l’analisi

dei concetti è più flessibile, precisa e accurata di quella basata su codifiche. Ciò

migliora la produzione, anche in forma tradizionale, di statistiche ufficiali aprendo

nuove prospettive nella valutazione di fenomeni complessi quali sono quelli da misurare

nel calcolo dei bilanci del tempo.

Keywords: text mining, information extraction, ETL, linguistic resources

1 Introduction

The applications of textual statistics1 handling information expressed in natural

language (unstructured textual data) in the same way as classical structured (quantitative

and / or categorical) data have increased in recent years. The greatest potential for the

direct analysis of textual information depends on the better "resolution" of the

measurement, because analyses based on concepts are more flexible, precise and

accurate than those conducted through keywords or coding. This paper aims, through

lexical and textual analysis, to measure quantitatively the characteristics of the everyday

activities of individuals described in the diaries of the Istat Time Use Survey (TUS).

The survey aims to establish a free text daily diary to describe the activities carried out

during the day. For the first time in the survey of 2002-2003 (Romano, 2007), Istat has

agreed to acquire the full text of individual diaries, thereby providing an archive of great

importance, not only in size (over 50,000 diaries, equivalent to 16,000 pages of text) but

especially in content, clearing the way for numerous developments. The limits resulting

from the ambiguity of natural language are largely resolved at the start of the treatment,

by appropriate tools for this type of data2. Each application of the models and

1 Lebart et al. (1998); Aureli & Bolasco (2004), Dulli et al. (2004); www.jadt.org : online JADT Proceedings, 2000--2010.

2 There are several software for the natural language processing and automatic analysis of texts, which differ according to the type

of analysis to be conducted. In this study, considering the statistical purpose of the analysis, we used the TaLTaC2 software, which

mailto:[email protected]

http://www.jadt.org/

techniques of text mining is characterized by strong multidisciplinary integration

involving statistics, computer science and linguistics in equal measures.

We will illustrate the procedure adopted to automatically extract information from the

non-structured text of the diaries, record them in a structured way (as a Boolean or

frequency) in a matrix of individual data and then cross the variables generated by the

textual analysis with the categorical characteristics of individuals in order to produce

official statistics. In particular, phenomena concerning the intensity of social interaction

– that can be related to the "who you are communicating with " – and the different

locations of this type of activity are regarded here. The study is conducted by

considering individuals as units of analysis, where the diary of a day is regarded as a

single context (see Bolasco et al. 2007).

2 Definition of the resource "place" and relational activities

The places of individual daily activities described in the diaries are captured through a

general model presented in our previous work (Bolasco and Pavone 2010). This model

allows us to identify a wide variety of adverbial locutions indicating place, based on the

linguistic structure of a prepositional syntagm, as follows:

PREPOSITION (ADJECTIVE) SUBSTANTIVE (ADJECTIVE)

where the adjectives are in brackets because their presence is optional and / or repeated.

For example, starting from the primary term "home", the model recognizes sequences

such as "at home", "my second home", "nella mia casa futura (in my future home)".

The whole syntagm may be repeated several times, with the adjectival function of the

first noun, for example: <on the seat | of the car>; <alla festa | di compleanno | di un

amico (at the Birthday Party | of a friend)> (Table 1).

Table 1 - Examples of expressions of place from the model

PREP POSS AGG SOST PREP POSS AGG SOST

a casa

davanti a casa

nella mia seconda casa

nella mia casa futura

a casa mia

a casa di mia madre

a casa del vicino

vicino (a) casa

The model, based on a hybrid system consisting of rules and dictionaries, is done in two

stages. An initial exploratory phase of training, used to develop the basic components of

the model and a second application stage to detect their actualization in the corpus of

the TUS. The application of the model is divided into: i) the launch of a query,

consisting in a single regular expression composed of 39 sequences in the OR for a total

of over 150 relations (rules) between the concepts expressed by 16 semantic dictionaries

able to extract locutions, ii) the evaluation of the entities found, iii) the calculation of

the occurrences of each term, for a total number of occurrences (redundant) equal to

stands for Automatic Treatment for Lexical and Textual Analysis of the Contents of a Corpus, developed from research at the University of Rome "La Sapienza " (Bolasco 2010; www.taltac.it)

22% of the entire corpus. These sequences, as space-time modifiers, were divided ex-

post into sub-thematic classes, distinguishing between activities "at home" (his own,

with relatives, friends or others) from activities "away" related to movement (walking,

cycling, on public transport, ...) or activities related to roles-places (at the hairdressers,

newsagents, ...) or linked to different environments / sites (in the office, at the bank, in a

shop, among the market stalls, ...).

Relational activities are identified by studying a sequence of two "components",

interlaced by the keyword <con> (in some cases <a>). In particular, the first component

of the verbal type, limited to verbs expressing communication "talk / communicate

with", and "call / tell". These concepts have been captured even when expressed in

similar terms (phone call, phone) in a compound verb phrase: "make (a) p." or "be (on)

the p.". The second component is the "who", ie the actor who is addressed by the

speaker. Several classes of actors already defined (parents, spouses, children,

grandchildren, grandparents, friends etc.) are used to reconstruct the sequence, even

with more complex expressions such as: <parlo di politica con mia moglie (I’m talking

about politics with my wife)>. For a list of verbs and actors considered, see Bolasco et

al. (2007).

3 Measuring the characteristic places of relational activities

After having defined the entities and their concepts, created thematic dictionaries, the

search in the text was based on the construction of complex queries, using regular

expressions, in order to identify the sequences in the diaries that realise these activities

in relation to different categories of actors (relatives / friends) in conjunction with the

different classes of place identified by the model as described above. In particular, in

our case the set of queries takes the following form:

"CATSEM(Verb) LAG3 CATSEM(Prep) LAG4 CATSEM(Actor#) LAG8 WH LAG3 CATSEM(Place#) LAG2 |"

where CATSEM denotes the classes of: i) verbs of "communicating", ii) actors

(“relatives / friends"), iii) prepositions “con/a/tra/in (with / to / between / in)”, iv) places

("own home / home of other people / other places / means of transport"). The LAG #

expresses the maximum number of words in the interval between two operands of the

expression and the token <|> denotes the end of the sentence. Some examples of the

sequences extracted are shown in Table 2.

Table 2 – Some examples of sequences

raccontato a mia moglie cosa ho fatto oggi WH a casa mia |

litigo con mia sorella WH a letto a casa |

parlo con mio marito WH a casa di amici |

giocato a calcio con mio fratello e i nostri amici WH parco |

parlavo con i miei familiari con l' autoradio accesa WH in macchina |

chiacchierato con gli amici § ho ascoltato la radio WH a casa mia |

gioco con un amichetto § WH a casa della nonna |

chiacchierato con amici e parenti aspettando gli sposi § WH al ristorante |

chiacchiero con degli amici e ascolto musica WH in corriera |

Each query captures an instance, whenever the sequence is present in the diary. The

result of the query produces a new variable that measures the presence / absence (or

frequency) of the entity for each individual. This new structured information can be

placed in connection with the individual a priori information, such as structural

variables (age, sex, marital status, education level) to produce traditional statistics.

By applying this model to a sub-sample (10,000 units) of the Istat survey, we obtain a

statistic of the type shown in Table 3, corresponding to 18,628 sentences.

Table 3 – Sentences concerning relation activities with parents/friends of the sub-

sample by gender, age groups and type of place (percentage values)

References

Aureli E., Bolasco S. (a cura di) (2004) Applicazioni di analisi statistica di dati testuali

Casa Editrice Università "La Sapienza", Roma.

Bolasco S. (2010). Taltac2.10 Sviluppi, esperienze ed elementi essenziali di analisi

automatica dei testi, LED, Milano.

Bolasco S., Canzonetti A., Capo F. M. (2005) Text mining: uno strumento strategico

per imprese e istituzioni, CISU, Roma.

Bolasco S., D’Avino E., Pavone P. (2007) Analisi dei diari giornalieri con strumenti di

statistica testuale e text mining, in: I tempi della vita quotidiana. Un approccio

multidisciplinare all'analisi dell'uso del tempo, Romano, M. C. (ed.), ISTAT,

Roma, 309-340.

Bolasco S., Pavone P. (2010) Automatic Dictionary and Rule-Based Systems for

Extracting Information from Text, in: Data Analysis and Classification

Proceedings of the 6th Conference of the Classification and Data Analysis Group

of the Società Italiana di Statistica, Palumbo, F. , Lauro, C. N. , Greenacre, M.

(Eds.), Springer, Berlin-Heidelberg, 189-198.

Dulli S., Polpettini P., Trotta M. (2004) Text mining: teoria e applicazioni, Franco

Angeli, Milano.

Lebart L., Salem A., Berry L. (1998) Exploring textual data, Kluwer Academic Publ.,

Dordrecht.

Romano M. C. (ed.) (2007) L'uso del tempo - Indagine multiscopo sulle famiglie "Uso

del tempo" - Anni 2002-2003, Collana: Informazioni, n. 2, ISTAT, Roma.

14-24 25-44 45-64 65+ Total 14-24 25-44 45-64 65+ Total

with relatives at own home 2.4 9.0 7.6 4.5 23.5 2.9 14.7 10.9 6.2 34.7 58.2 with relatives at home of other people 0.1 0.2 0.1 0.1 0.4 0.1 0.2 0.1 0.1 0.4 0.8 with relatives in other places 0.3 1.8 1.5 0.5 4.2 0.5 2.4 1.6 0.7 5.2 9.4 with relatives on a mean of trasport 0.3 1.6 1.4 0.5 3.8 0.5 2.4 1.6 0.7 5.2 9.0

with friends at own home 0.3 0.5 0.2 0.1 1.1 0.5 0.8 0.5 0.4 2.1 3.3 with friends at home of other people 0.0 0.3 0.1 0.0 0.5 0.1 0.3 0.1 0.1 0.6 1.0 with friends in other places 2.2 3.0 1.7 1.1 8.0 1.8 1.7 0.6 0.3 4.5 12.5 with friends on a mean of trasport 1.1 1.2 0.5 0.4 3.2 1.2 1.0 0.2 0.2 2.7 5.9

Total gender by age 6.7 17.6 13.2 7.2 44.7 7.6 23.4 15.6 8.7 55.3 100.0

Relation activities Men

Age groups Women

Age groups Total place

Robust estimation for multivariate data underthe independent contamination model

C. Agostinelli, R.A. Maronna, and V.J. Yohai

Abstract We introduce a new kind of robust procedures for estimating the meanvector and covariance matrix for multivariate normal observations, when outliersare generated according to an independent contamination model. These estimators,namely composite likelihood M-estimates (CLM-estimates) are related to the com-posite likelihood methods.

Key words: Multivariate scatter, independent contamination model, M-estimators

1 Composite Likelihood M-estimates

In Alqallaf et al (2009) a new contamination model, called independent contami-nation model, is introduced. In this model each component of a multivariate obser-vation has a probability ε of being replaced by an outlier. Then, even if ε is small,the fraction of observations with at least one contaminated component tends to onewhen the dimension of the data p increases. Alqallaf et al (2009) showed that forthis type of contamination the breakdown point of the usual affine equivariant robustmethods for estimating multivariate location tends to 0 when p increases. A similarresult can be proved for affine equivariant robust estimates of the scatter matrix.

Scatter estimates which are robust, for the independent contamination model canbe obtained using separate robust estimates of the covariances for each pair of vari-

C. AgostinelliDepartment of Environmental Sciences, Informatics and Statistics, Ca’ Foscari University, Venice,e-mail: [email protected]

R.A. MaronnaDepartment of Mathematics, University of La Plata, Argentina

V.J. YohaiDepartment of Mathematics, University of Buenos Aires, Argentina

1

2 C. Agostinelli, R.A. Maronna, and V.J. Yohai

ables. A shortcoming of this approach is that the resulting covariance matrix maynot be positive definite. This is specially true in the presence of outliers.

In this talk we will present a new kind of robust procedures for estimating thecovariance matrix which are related to the composite likelihood methods introducedby Lindsay (1988). The maximum composite likelihood estimates were introducedas an alternative procedure for situations where the maximum likelihood estimatesbecome too complicated.

Suppose we have a sample x1, · · · ,xn of p-dimensional vectors and we want toestimate the location vector µ and the scatter matrix Σ . Let ρ be a non-decreasingfunction such that tρ ′(t) is non decreasing and bounded. A monotone M-estimateminimizes

L(µ,Σ) =n

∑i=1

M(xi,µ,Σ) , (1)

whereM(x,µ,Σ) = cdet(Σ)+ρ

((x−µ)′Σ−1(x−µ)

),

where c is a positive constant. The estimates we propose minimize 1 but replacingM(x,µ,Σ) by

M(x,µ,Σ) =p−1

∑j=1

p

∑k= j+1

[d det(Σ jk)+ρ

((x jk−µ jk)

′Σ−1jk (x jk−µ jk)

)],

where Σ jk is the 2×2 submatrix of Σ corresponding to rows and columns j and k,x jk and µ jk are the two-dimensional vectors formed with the components j and k ofvectors x and µ respectively. Finally d is a tuning constant chosen so that the esti-mate of Σ be Fisher consistent for the case that observations are multivariate normal.We call these estimators composite likelihood M-estimates (CLM-estimates).

CLM estimates can be extended to the case where µ depends on a vector ofregressors z and a vector of parameter β , i.e., µ = µ(z,β ) and Σ = σ2D, where Dis a known correlation matrix. This setup covers linear models. CLM-estimates aredefined in this case as

(β , σ2) = argminβ ,σ2

L(β ,σ2) ,

where

L(β ,σ) =n

∑i=1

M(xi,zi,β ,σ2) ,

M(x,z,β ,σ2) =p−1

∑j=1

p

∑k= j+1

[dσ

4 det(D jk)+ρ

(σ−2 (x jk−µ jk(z,β ))′D−1

jk (x jk−µ jk(z,β )))]

.

Robust estimation for multivariate data under the independent contamination model 3

References

Alqallaf F, Aelst SV, Zamar R, Yohai V (2009) Propagation of outliers in multivari-ate data. The Annals of Statistics 37(1):311–331, DOI 10.1214/07-AOS588

Lindsay B (1988) Composite likelihood methods. Contemporary Mathematics80(1):221–39

A comparison of different procedures forcombining high-dimensional multivariatevolatility forecasts

Alessandra Amendola and Giuseppe Storti

Abstract Aim of this paper is to investigate the effect of model uncertainty on mul-tivariate volatility prediction. This effect is expected to be particularly relevant inapplications to vast dimensional datasets since it is well known that, in this case,the need for tractable model structures requires the imposition of severe and oftenuntested constraints on the volatility dynamics. By means of an application to theoptimization of a vast dimensional portfolio of stock returns, the paper compares theperformances of different models and combination procedures. The main finding isthat results are highly sensitive not only to the choice of the model but also to thespecific combination procedure being used.

Key words: multivariate volatility, forecast combination, weights estimation

1 Introduction

In multivariate volatility prediction model uncertainty is a relevant problem to befaced by researchers and practitioners. The risk of model misspecification is partic-ularly sizeable in large dimensional problems where highly restrictive assumptionson the volatility dynamics are usually required (see e.g. Pesaran, Schleicher & Zaf-faroni, 2009). In order to reduce the impact of misspecification at the forecastingstage, a typical approach is to consider the combination of forecasts from differentcompeting models. Although some recent papers have been focused on the evalua-tion of forecast accuracy of MGARCH models (Patton & Sheppard, 2008; Laurent,

Alessandra AmendolaDepartment of Economics and Statistics, University of Salerno, Via Ponte don Melillo, 84084Fisciano, Salerno (Italy) e-mail: [email protected]

Giuseppe StortiDepartment of Economics and Statistics, University of Salerno, Via Ponte don Melillo, 84084Fisciano, Salerno (Italy) e-mail: [email protected]

1

2 Alessandra Amendola and Giuseppe Storti

Rombouts & Violante, 2011) less attention has been paid to the combination ofvolatility forecasts from different models as a strategy for improving the predictiveaccuracy (Amendola & Storti, 2008). Also, it has to be considered that in theorydifferent combination strategies could be implemented but, for a given application,only one must be chosen. A combination strategy is defined by the identificationof two different elements: a combination rule, which is a function of the alterna-tive forecasts available, and an estimator of the weights assigned to each model. Asa consequence of these choices, an additional source of uncertainty, related to thechoice of the combination strategy, is introduced into the analysis.Aim of this work is to discuss some alternative forecast combination strategies for(possibly HD) multivariate volatility forecasts and compare their empirical perfor-mances. Section 2 introduces the reference model used for the analysis while somealternative estimator of the combination weights are discussed in Section 3. The sta-tistical properties of the estimators have been assessed by a Monte Carlo simulationwhose results are not presented here but are available upon request. Section 4 con-cludes illustrating the results of an application to the optimization of a portfolio ofstock returns.

2 The reference model

The data generating process is assumed to be given by

rt = Stzt t=1,...,T, T+1,...,T+N

where T is the end of the in-sample period, ztiid∼ (0, Ik) St is any (k × k)

positive definite (p.d.) matrix such that StS′

t = Ht = V ar(rt|It−1), Ht =C(H1,t, . . . ,Hn,t; w) with Hj,t being a symmetric p.d. (k × k) matrix. In prac-tice Hj,t is a conditional covariance matrix forecast by a given ‘candidate model’.The functionC(.) is an appropriately chosen combination function and w is a vectorof combination parameters. The weights assigned to each candidate model dependon the values of the elements of w but do not necessarily coincide with them. Dif-ferent combination functions C(.) can in principle be used and there is no a priorivalid procedure for selecting the optimal function. Among all the possible choicesof C(.), the most common is the linear combination function

Ht = w1H1,t + . . .+ wnHn,t wj ≥ 0

where w coincides with the vector of combination weights. The assumption of non-negative weights is required in order to guarantee the positive definiteness of Ht butcan be too restrictive. Alternatively, in order to get rid of the positivity constrainton the wj , two different combination functions can be selected: the exponential andsquare root combination function. The exponential combination is defined as

Ht = Expm [w1Logm(H1,t) + . . .+ wnLogm(Hn,t)]

A comparison of different procedures for combining high-dimensional MVF 3

where Expm(.) and Logm(.) indicate matrix exponential and logarithm respectively.Differently from the other two functions, the square root combination (for St) is notdirectly performed on the Hj,t but on the Sj,t

St = w1S1,t + . . .+ wnSn,t

with Ht = StS′

t and Hj,t = Sj,tS′

j,t.

3 Weights estimators

For the estimation of the combination parameters we consider three different estima-tion approaches: Composite Quasi ML (CQML), Composite GMM (CGMM) and‘Pooled’ Mincer-Zarnowitz (MZ) regressions. All the estimators considered sharethe following features: i) do not imply any assumption on the conditional distri-bution of returns ii) can be applied to large dimensional problems. In the CQMLmethod the estimated wi are obtained by performing the following optimization:

w = argmaxw

∑i6=j

L(r(ij)|w, IN ),

where r(ij)t = (ri,t, rj,t)′, w = (w1, . . . , wk)′ and

L(r(ij)|w, IN ) = −0.5N∑

h=1

log(|H(ij)T+h|)− 0.5

N∑h=1

r(ij)T+hH(ij)

T+h(r(ij)T+h)′

is the (bivariate) quasi log-likelihood for the couple of assets (i,j) computed over theprediction period [T+1,T+N].The CGMM estimator extends the same framework to a GMM setting. The wi areobtained by performing the following optimization:

w = argminw

∑i 6=j

m(r(i,j); w)′Ω(i,j)N m(r(i,j); w)

r(i,j)t = (ri,t, rj,t), for t = T+1, . . . , T+N .m(r(i,j); w) = 1

N

∑T+Nt=T+1 µ(r(i,j)

t ; w)

and µ(r(i,j)t ; w) is a (p× 1) vector of moment conditions Ω

(i,j)N is a consistent p. d.

estimator of

Ω(i,j) = limN→∞

NE(m(r(i,j); w∗)m(r(i,j); w∗)′)

with w∗ being the solution to the moment conditions i.e. E(m(r(i,j); w∗)) = 0.Finally, in the ‘Pooled’ MZ regressions the wi are the OLS estimates of the param-eters of the pooled regression model

4 Alessandra Amendola and Giuseppe Storti

vech(ΣT+h) = w1vech(H1,T+h) + . . .+ wnvech(Hn,T+h) + eT+h

for h = 1, ..., N , where, depending on the type of combination chosen, Σt and Hi,t

are appropriate transformations of Hi,t and Σt = Σt = rtr′

t.

4 An application to stock returns

We consider an application to the optimization of a portfolio of stocks using datafrom Chiriac and Voev (2011). Data refer to 2156 open to close daily returns on 6NYSE stocks from 3/1/2000 to 30/7/2008. Six different candidate models and fivecombination strategies are considered. For each of this we compute the associatedminimum variance portfolio and compare the empirical volatilities of the optimizedportfolios (table 1). The CGMM gives the lowest variance but the results appear tobe very sensitive to the choice of the model or combination strategy used.

Model Portfolio Variance∗ Comb. strategy Portfolio Variance∗DCC 2.33188 REG(rv) 2.08441CC 2.37658 REG 2.08733ES 2.33857 CGMM 2.07337

MCOV(22) 2.67185 CQML 2.10192MCOV(100) 2.10778 EW 2.08147

VECH 2.09339

Table 1 Realized portfolio variances ((∗):× 104) for different models, constant conditional corre-lation (CC), dynamic conditional correlation (DCC), exponential smoothing (ES), k-days movingcovariance(MCOV(k)), and weights estimators, CGMM, QML, equally weighted (EW), MZ re-gression (REG), MZ regression using realized covariance as dependent variable (REG(rv)). In allcases a linear combination function is used.

References

1. Amendola, A., Storti G.: A GMM procedure for combining volatility forecasts. Computa-tional Statistics & Data Analysis, 52(6), 3047–3060 (2008).

2. Patton, A., Sheppard, K. Evaluating volatility and correlation forecasts. Oxford FinancialResearch Centre, OFRC Working Papers Series, (2008).

3. Laurent, S., Rombouts, J.V.K., Violante, F.: On the Forecasting Accuracy of MultivariateGARCH Models. Forthcoming in Journal of Applied Econometrics, (2011).

4. Chiriac, R., Voev, V. : Modelling and forecasting multivariate realized volatility, Journal ofApplied Econometrics, 26(6), 922–947 (2011).

5. Pesaran, M. H., Schleicher, C., Zaffaroni, P.: 2009. Model averaging in risk management withan application to futures markets. Journal of Empirical Finance, 16(2), 280–305 (2009)

46TH SCIENTIFIC MEETING OF THE ITALIAN...

Documents

Transcript of 46TH SCIENTIFIC MEETING OF THE ITALIAN...