Genetic heterogeneity of asthma phenotypes identified by a ... · 2013-12-05 · Respiratory and...
Transcript of Genetic heterogeneity of asthma phenotypes identified by a ... · 2013-12-05 · Respiratory and...
1
Online Supplementary Material
Genetic heterogeneity of asthma phenotypes identified by a
clustering approach
Valérie Siroux1,2*, Juan R González3,4,5*, Emmanuelle Bouzigon 6,7, Ivan Curjuric 8,9, Anne
Boudier 1,2, Medea Imboden 8,9, Josep Maria Anto 3,5,10,11, Ivo Gut 12,13 , Deborah Jarvis 14,
Mark Lathrop 7,12, Ernst Reidar Omenaas 15,16, Isabelle Pin 1,2,17, Mathias Wjst 18,19, Florence
Demenais 6,7, Nicole Probst-Hensch 8,9, Manolis Kogevinas 3,5,11,20, Francine Kauffmann 21,22
* equal first
1 Team of Environmental Epidemiology applied to Reproduction and Respiratory Health,
Inserm, U823, Grenoble, France
2 Univ Joseph Fourier, Grenoble, France
3 Centre for Research in Environmental Epidemiology (CREAL), Barcelona;
4 Department of Mathematics, Autonomous University of Barcelona;
5 CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain
6 Inserm, UMRS-946, F-75010 Paris, France
7 Univ Paris Diderot, Sorbonne Paris Cité, Institut Universitaire d’Hématologie, F-75007,
Paris, France
8 Swiss Tropical and Public Health Institute, Basel, Switzerland
9 University of Basel, Basel, Switzerland
10 Universitat Pompeu Fabra. Departament de Ciències Experimentals i de la Salut,
Barcelona;
2
11 IMIM (Hospital del Mar Medical Research Institute), Barcelona
12 CEA-Centre National de Genotypage, 2 rue Gaston Cremieux, 91057 Evry France
13 Centro Nacional de Analisis Genomico, C/Baldiri Reixac 4, 08028 Barcelona, Spain.
14 Respiratory Epidemiology and Public Health, Imperial College, and MRC-HPA Centre for
Environment and Health, London, United Kingdom
15 Bergen Respiratory Research Group, Institute of Medicine, University of Bergen, Bergen,
Norway
16 Center for Clinical Research, Haukeland University Hospital, Bergen, Norway
17 Department of pediatrics, CHU Grenoble, France
18 Comprehensive Pneumology Center (CPC), Helmholtz Zentrum Muenchen, German
Research Center for Environmental Health (GmbH), Neuherberg, Germany
19 Institute of Medical Statistics and Epidemiology, Klinikum rechts der Isar der TU
Muenchen, Munchen, Germany
20 National School of Public Health, Athens, Greece
21 Inserm, U1018, CESP Centre for research in Epidemiology and Population Health,
Respiratory and environmental epidemiology Team, Villejuif, France
22 Université Paris Sud, UMRS 1018, Villejuif, France
3
Methods
Cluster analysis
Personal characteristics (age and sex), age at asthma onset, respiratory symptoms over the
past 12 months (woken up by attack of coughing, asthma symptom score combining 5
symptoms (wheeze and breathless, woken up with a feeling of chest tightness, attack of
shortness of breath at rest, attack of shortness of breath after exercise and woken by attack of
shortness of breath) as previously proposed [1], chronic cough or phlegm, asthma attacks),
asthma exacerbation (either hospitalisation or oral steroids used in the past 12 months),
allergic characteristics (rhinitis, eczema, atopy (defined by at least one positive skin prick test
to 11 aeroallergens in EGEA, specific IgE to cat, Dermatophagoides pteronyssinus,
Cladosporium and timothy grass in ECRHS and specific IgE to cat, Dermatophagoides
pteronyssinus, and timothy grass in SAPALDIA), total IgE), lung function (FEV1%predicted)
and bronchial hyperresponsiveness to methacholine (PD20 < 1mg) have been considered in
the LCA model. Daily use of asthma medication was not included in the present LCA analysis
as it was unavailable in SAPALDIA2 and was highly related to several variables already
included in the model (presence/absence of an asthma attack in the past 12 months, asthma
symptoms, asthma exacerbations).
To address to which extent the lack of asthma treatment variable in the classification may
have impacted the classification, the 14-variable model was compared to a 15-variable model
including a 3-class asthma treatment variable (No treatment, other than inhaled cortico-
steroids (ICS), daily ICS, as previously used [2]) in ECRHSII and EGEA2; overall 87.0% of
the subjects were assigned in the same cluster with the two models.
4
Genotypic data
The subjects were genotyped in the framework of the European GABRIEL consortium.
Genotyping was carried out using the Illumina Human610 quad array at the French national
genotyping centre (CNG). Stringent quality control criteria were applied [3]. Family
relationships were confirmed or revised based on the results of an identity-by-state (IBS)
analysis. An ancestry analysis was carried out and putative non-European samples were
eliminated from subsequent analyses. The genetic association analysis was restricted to a
reliable collection of SNPs fulfilling the following quality control criteria in each study: (1)
genotype missing rate <3 %; (2) minor allele frequency ≥ 5% in controls; (3) consistency with
Hardy-Weinberg equilibrium by a 1 degree-of-freedom goodness-of-fit test in controls (p>10-
4).
5
Results
Asthma phenotypes identified by latent class analysis
LCA models with 1 to 6 clusters were conducted and compared using the BIC criteria. The
BIC observed for each model are presented in Figure E2. A really small BIC decrease (in
absolute numbers) was observed between the 4-cluster model and the 5-cluster model as
compared with change in BIC observed with the model with lower number of clusters. In
order to choose between the 4-cluster model and the 5-cluster model, both models were
developed and compared : 91% of the subjects assigned in the phenotype A in the 4-class
model were also assigned in the phenotype A in the 5-class model; these results for
phenotypes B, C and D were 93%, 97% and 84%. The 5-class model lead to the identification
of a fifth class representing only 5% of the population. This additional phenotype E was
mainly composed by older men with airflow limitation. The most parsimonious 4-class model
was retained first because the BIC were very similar between the 4-class and the 5-class
model, indicating that not much was gained by adding a further class and secondly because
the 5-class model leads to the identification of a small class (5% of the population under
study) preventing from conducting GWAS on this low prevalent phenotype.
6
References 1. Sunyer J, Pekkanen J, Garcia-Esteban R, Svanes C, Kunzli N, Janson C, de MR, Anto JM,
Burney P. Asthma score: predictive ability and risk factors. Allergy 2007; 62: 142-148.
2. Siroux V, Basagana X, Boudier A, Pin I, Garcia-Aymerich J, Vesin A, Slama R, Jarvis D,
Anto J, Kauffmann F, Sunyer J. Identifying adult asthma phenotypes using a clustering
approach. Eur Respir J 2011; 38: 310-317.
3. Imboden M, Bouzigon E, Curjuric I, Ramasamy A, Kumar A, Hancock DB, Wilk JB,
Vonk JM, Thun GA, Siroux V, Nadif R, Monier F, Gonzalez JR, Wjst M, Heinrich J,
Loehr LR, Franceschini N, North KE, Altmuller J, Koppelman GH, Guerra S, Kronenberg
F, Lathrop M, Moffatt MF, O'Connor GT, Strachan DP, Postma DS, London SJ, Schindler
C, Kogevinas M, Kauffmann F, Jarvis DL, Demenais F, Probst-Hensch NM. Genome-wide
association study of lung function decline in adults with and without asthma. J Allergy Clin
Immunol 2012; 129: 1218-1228.
7
Figure legend
Figure E1 : Quantile - Quantile plots for A) phenotype A, B) phenotype B, C) Phenotype C
and D) phenotype D.
Figure E2 : Bayesian information criterion for LCA models with 1 to 6 clusters
Figure E3: Regional association plots for each region identified in the GWAS analyses.
Chromosome position (NCBI build 36.3) and recombination rate (hg18 build). The sentinel
SNP is represented as a diamond and r2 for SNPs to the sentinel SNP (HapMap CEU phase
II). A) ALCAM region with phenotype D, B) CD200 region with phenotype D, C) GRIK2
region with phenotype A, D) Region on chromosome 7 with phenotype A, E) LRRC6 region
with phenotype A, F) SBF2 region with phenotype A
Figure E4 : Forest plot for the most significant SNPs within each gene identified in the
GWAS. A) Phenotype D and rs9842772; B) Phenotype D and rs10511245; C) Phenotype D
and rs9851461; D) Phenotype A and rs279931; E) Phenotype A and rs10230811; F)
Phenotype A and rs13272108; G) Phenotype A and rs79386471
8
Table E1: Description of the population with asthma in EGEA2, ECRHSII and
SAPALDIA2 population
ECRHS II n=1895
EGEA2 n=641
SAPALDIA2
n=465
P value adjusted on age and sex
Age ≥ 40, % 59.9 44.0 79.3 <0.0001 Sex, men, % 41.2 52.6 45.6 <0.0001 Age of asthma onset: ≤4 years, % ]4-16] years, % >16 years, %
15.8 30.5 53.7
31.4 34.9 33.7
15.1 25.9 59.0
<0.0001
Woken by cough last 12m, % 44.7 38.0 42.1 0.16 Asthma symptom score, last 12 months 0 symptom, % 1 or 2 symptoms, % ≥3 symptoms, %
24.2 40.2 35.6
21.7 43.8 34.5
40.2 38.4 21.4
<0.0001
Chronic cough or phlegm, % 22.5 14.6 18.5 0.0001 Asthma attack, last 12 months, % 43.5 36.0 29.5 <0.0001 Exacerbation, last 12 months, % 10.2 14.8 5.8 <0.0001 Eczema, % 57.2 48.7 54.1 0.003 Rhinitis, % 65.1 72.8 50.6 <0.0001 Atopy*, % 63.7 79.7 53.1 <0.0001 Total IgE ≥100 IU/ml, % 45.6 61.4 39.1 <0.0001 FEV1 < 80% predicted, % 13.0 15.0 15.9 0.15 BHR, PD20≤1mg, % 48.5 47.6 26.3 <0.0001
*assessed with skin prick tests or specific IgEs
9
Table E2: Description of the smoking status and the asthma treatment in the past 3
months for each LCA-derived asthma phenotype
Phenotype
A
Phenotype B Phenotype C Phenotype D
% (n) % (n) % (n) % (n)
Smoking status*
Never smokers
Past smokers
Current smokers
40.8 (225)
33.9 (187)
25.4 (140)
50.3 (550)
25.3 (276)
24.4 (267)
48.4 (388)
29.4 (236)
22.2 (178)
44.9 (240)
29.3 (157)
25.8 (138)
In ECRHSII and EGEA2
Asthma treatment 3
months**
No treatment
Other than daily ICS
Daily ICS
67.4 (257)
22.6 (86)
10.0 (38)
62.5 (524)
25.1 (211)
12.4 (104)
13.6 (80)
60.8 (358)
25.6 (151)
23.1 (84)
43.8 (159)
33.1 (120)
*p chi2 = 0.003
**p chi2 < 0.0001
10
Table E3 : Fisher p-value for SNPs retained in the GWAS
Chr Gene Rs number Position Ref/alt all.* Alt all. freq Fisher p-value (asthma phenotype vs control) Phenotype A Phenotype B Phenotype C Phenotype D 3 ALCAM rs9842772 106735892 G/A 0.86 0.64 0.08 0.40 8.8e-05 3 ALCAM* rs9288812 106783827 A/G 0.80 0.90 0.50 0.88 7.6e-05 3 ALCAM* rs10511245 106783905 G/A 0.80 0.92 0.45 0.89 6.8e-05 3 CD200* rs9851461 113593771 C/T 0.94 1.0 0.56 0.74 1.5e-04 6 GRIK2 rs2579931 101961354 A/G 0.90 1.3e-05 0.89 1.9 0.72 7 LOC401410* rs10264996 140731438 A/G 0.95 5.9e-06 0.94 0.81 0.37 7 LOC401410* rs10259042 140734967 A/G 0.95 5.1e-06 0.88 0.71 0.41 7 LOC401410* rs10230811 140735196 T/C 0.95 5.0e-06 0.88 0.76 0.41 7 LOC401410* rs17162196 140739352 T/C 0.95 6.1e-06 0.86 0.64 0.34 8 LRRC6 rs7834760 133714562 C/T 0.90 1.3e-04 0.09 0.02 0.45 8 LRRC6 rs13272108 133736255 G/A 0.92 2.3e-04 0.35 0.02 0.60 11 SBF2 rs4576815 9943804 C/T 0.71 4.8e-06 0.94 0.65 0.04 11 SBF2 rs7938647 10017999 C/T 0.72 5.7e-06 0.94 0.62 0.03
*reference versus alternative allele
11
Table E4: Linkage disequilibrium assessed with r² between SNPs within the same gene identified in the GWAS
rs9842772 rs9288812 rs10511245 rs10264996 rs10259042 rs10230811 rs17162196 rs7834760 rs13272108 rs4576815 rs7938647 rs7938491 Alcam rs9842772 1 0.39 0.39 Alcam rs9288812 1 1 Alcam rs10511245 1 LOC401410 rs10264996 1 1 1 1 LOC401410 rs10259042 1 1 1 LOC401410 rs10230811 1 LOC401410 rs17162196 1 LRRC6 rs7834760 1 0.9 LRRC6 rs13272108 1 SBF2 rs4576815 1 0.85 0.85 SBF2 rs7938647 1 0.98 SBF2 rs7938491 1
12
Table E5: P-values of association from the GWAS analysis using LCA probabilities (linear model). This corresponds to a sensitivity analysis of p-values obtained in Table 3 from main manuscript.
P value Chr Gene Rs number Phenotype
A Phenotype
B Phenotype
C Phenotype
C 3 ALCAM rs9842772 0.48 0.91 0.63 7.8e-7 3 ALCAM rs9288812 0.81 0.88 0.73 5.0e-6 3 ALCAM rs1051124
5 0.78 0.76 0.78 8.9e-7
3 CD200 rs9851461 0.70 0.18 0.62 5.8e-8 6 GRIK2 rs2579931 2.5e-8 0.96 0.40 0.60 7 LOC401410 rs1026499
6 2.3e-7 0.97 0.63 0.15
7 LOC401410 rs10259042
1.4e-7 0.91 0.45 0.24
7 LOC401410 rs10230811
1.5e-7 0.90 0.50 0.22
7 LOC401410 rs17162196
1.4e-7 0.88 0.53 0.19
8 LRRC6 rs7834760 8.2e-7 0.06 0.69 0.49 8 LRRC6 rs1327210
8 9.0e-7 0.16 0.13 0.93
11 SBF2 rs4576815 3.0e-7 0.60 0.84 0.19 11 SBF2 rs7938647 2.2e-7 0.61 0.74 0.17
13
Figure E1 : Quantile - Quantile plots for A) phenotype A B) phenotype B C) Phenotype C and D) phenotype D.
14
15
Figure E2 : Bayesian information criterion for LCA models with 1 to 6 clusters
16
Figure E3: Regional association plots for each region identified in the GWAS analyses.
Chromosome position (NCBI build 36.3) and recombination rate (hg18 build). The sentinel
SNP is represented as a diamond and r2 for SNPs to the sentinel SNP (HapMap CEU phase
II). A) ALCAM region with phenotype D, B) CD200 region with phenotype D, C) GRIK2
region with phenotype A, D) Region on chromosome 7 with phenotype A, E) LRRC6 region
with phenotype A, F) SBF2 region with phenotype A
A)
B)
17
C)
18
D)
E)
F)
19
20
Figure E4 : Forest plot for the most significant SNPs within each gene identified in the GWAS.
A) Phenotype D and rs9842772
B) Phenotype D and rs10511245
C) Phenotype D and rs9851461
21
D) Phenotype A and rs279931
E) Phenotype A and rs10230811
F) Phenotype A and rs13272108
G) Phenotype A and rs79386471