WES INO80D CircG SM R3 - Circulation: Genomic and...

1

SUPPLEMENTAL MATERIAL

2

Library preparation, whole exome capture and sequencing

Paired-‐end indexed libraries were prepared following the manufacturer’s (Agilent

Technologies, Santa Clara, CA) protocol. Briefly, target DNA (3ηg in 120 ul TE buffer) was

fragmented using a Covaris E210 sonicator (Covaris Inc, Woburn, Mass.,) using a duty cycle of 10%,

intensity 5, cycles 200, time 360 seconds, resulting in double-‐stranded DNA fragments with blunt or

sticky ends with a fragment size between 150-‐200 bp. The ends were repaired and phosphorylated

using Klenow, T4 polymerase, and T4 polynucleotide kinase, after which “A” base is added to the 3’

ends of double-‐stranded DNA using Klenow exo-‐ (3’ to 5’ exo minus). Paired end Index DNA

adaptors (Agilent Technologies, Santa Clara, CA) with a single “T” base overhang at the 3’ end were

ligated and resulting constructs were purified using AMPure SPRI beads from Agencourt. The

adapter-‐modified DNA fragments were enriched by 4 cycles of PCR using InPE 1.0 forward and

SureSelect Pre-‐Capture Indexing reverse (Agilent Technologies, Santa Clara, Ca) primers. The

concentration and size distribution of the libraries were determined on an Agilent Bioanalyzer DNA

1000 chip (Agilent Technologies, Santa Clara, Ca).

Exome capture was carried out using the protocol for Agilent’s SureSelect Human All Exon

50MB kit (Agilent Technologies, Santa Clara, Ca). This kit encompasses coding exons annotated by

the GENCODE project (www.sanger.ac.uk/gencode/) as well as consensus coding sequence (CCDS,

www.ncbi.nlm.nih.gov/CCDS/) and RefSeq (www.ncbi.nlm.nih.gov/refseq/) databases and

incorporates exomic regions and non-‐coding RNAs from miRBase (v.13) and Rfam databases to

provide a capture size of approximately 50 Mb. 500 ng of the prepped library was incubated for 24

hours at 65 °C with whole exon biotinylated RNA capture baits supplied in the kit. The captured

DNA:RNA hybrids were recovered using Dynabeads MyOne Streptavidin T1 from Dynal (Invitrogen,

Carlsbad, CA). DNA was eluted from the beads and purified using Ampure XP beads from Agencourt

(Beckman Coulter, Brea, CA). The purified capture products were then amplified using the

SureSelect Post-‐Capture Indexing forward and Index PCR reverse primers (Agilent) for 12 cycles.

Libraries were validated and quantified on the Agilent Bioanalyzer (Agilent Technologies, Santa

Clara, Ca).

For individuals L6-‐L9, libraries were loaded onto paired end flow cells at concentrations of

4-‐5 pM to generate cluster densities of 300,000-‐500,000/mm2 following Illumina’s standard

protocol using the Illumina cBot and HiSeq Paired end cluster kit version 1 (Illumina, San Diego,

CA). The flow cells were sequenced as 101 X 2 paired end reads on an Illumina HiSeq 2000 using

3

TruSeq SBS sequencing kit version 1 and HiSeq data collection version 1.1.37.0 software. Base-‐

calling was performed using Illumina’s RTA version 1.7.45.0.

For individual L10, libraries were loaded onto paired end flow cells at concentrations of 7.5

pM to generate cluster densities of 500,000-‐600,000/mm2 following Illumina’s standard protocol

using the Illumina cBot and HiSeq Paired end cluster kit version 3. The flow cells were sequenced as

101 X 2 paired end reads on an Illumina HiSeq 2000 using TruSeq SBS sequencing kit version 3 and

HiSeq data collection version 1.4.8 software. Base calling was performed using Illumina’s RTA

version 1.12.4.2.

Genotype calling and variant filtration

The technical challenge of the modern era of genomic medicine and personalized exome

analytics is in the effective use of combination of tools to find software-‐agnostic, highly concordant,

high-‐quality genetic variants underlying complex, familial diseases[1]. To address this challenge, we

used two computational genomic data analysis pipelines and two complementary variant filtering

methods. Both pipelines included analyses modules for quality control, sequence alignment (two

different aligners: BWA and Novoaligner), base quality score recalibration, and variant calling and

complimentary variant filtering methods.

Illumina fastq files were converted to Sanger fastq files using the MAQ software

(http://maq.sourceforge.net/). We used the FASTX-‐toolkit

(http://hannonlab.cshl.edu/fastx_toolkit/) for preprocessing short-‐read fastq files. The

preprocessing steps included clipping sequencing primers/adapter sequences, trimming sequences

based on the quality scores, and filtering artifacts and low quality sequences. We used FastQC

(http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/) to perform QC on raw and QC filtered

sequence data. We aligned sequence-‐reads with the reference genome GRCh37/hg19 from the

1000 Genomes project, using the Burrows-‐Wheeler Aligner (BWA) software. We used a Perl script

(cmpfastq.pl [http://compbio.brc.iop.kcl.ac.uk/software/cmpfastq.php]) to identify paired and un-‐

paired reads. BWA software was then used to align single-‐end and paired-‐end data separately. The

generated SAM files were merged using PICARD (http://picard.sourceforge.net/index.shtml) to

generate sorted BAM files. The BAM files were indexed using SAMtools

(http://samtools.sourceforge.net/).

We used the Genome Analysis Toolkit[2] (GATK v1.0.5777,

http://www.broadinstitute.org/gsa/wiki/index.php/The_Genome_Analysis_Toolkit) for post-‐

alignment processing of BAM files, including local realignment around insertions or deletions

4

(indels), removal of duplicates, and base quality score recalibration. Due to alignment artifacts or

false positive SNPs, sequence aligners are unable to perfect map reads containing indels. Multiple

steps were employed in the realignment process, i.e., determining (small) suspicious intervals

which are likely in need of realignment, running the realigner over those intervals, and fixing the

mate pairs of realigned reads. The realigned, fixed, and sorted BAM files were generated for each

sample. Duplicate reads were located and removed using PICARD tools (MarkDuplicates). Finally,

we corrected for variation in quality with machine cycle and sequence context by analyzing the co-‐

variation among several features of the base (i.e., reported quality score, the position within the

read, the preceding and current nucleotide observed by the sequencing machine, probability of

mismatching the reference genome, and known SNPs taken into account). The recalibrated quality

scores are more accurate.[2]

Analysis of depth of coverage in the final BAM files indicated that approximately 80% of the

exomic regions were present >8 times in the five patients (-‐-‐minMappingQuality=10 and -‐-‐

minBaseQuality=20). We then used GATK ‘UnifiedGenotyper’ for multiple-‐sample calling to

generate raw variants. GATK applies a Bayesian algorithm for variant discovery and genotyping

that simultaneously estimates the probability that two alleles A (the reference allele), and B (the

alternative allele), are segregating in a sample of N individuals and the likelihoods for each of the

AA, AB and BB genotypes for each of individual samples.[2] If the genotype for this individual could

not be assigned based on the genotype likelihood model, an unknown genotype ‘N’ was assigned.

To generate analysis-‐ready variants, the GATK ‘VariantFiltration’ was used to annotate

suspicious calls from variant calling format (VCF) files based on their failing given filters. Raw SNP

calls were filtered using empirically derived cut-‐offs for the following GATK filter expressions: –

filterExpression “QUAL<30.0 || QD<5.0 || HRun>5 || SB>−0.10 || SNPcluster || InDel” –filterName

“StandardFilters” –filterExpression “DP<8” –filterName “LOW_DEPTH” -‐filterExpression “MQ0 >= 4

&& ((MQ0 / (1.0 * DP)) > 0.1)” –filterName “Hard2Validate”, where DP–sequencing depth at the

SNP position; QD–QUAL/DP ratio at the SNP position; HRun–maximal length of the homopolymer

run; SB–strand bias at the SNP position; SNPcluster– 3 SNPs with 10 bp of each other; InDel–SNP

calls around the raw InDels calls; and MQ0–the number of mapping-‐quality zero reads at the

position. The resulting VCF file was annotated using SeattleSeq Annotation Server and filtered

based on variant quality and localization of variants overlapping the ROH regions.

In the confirmatory pipeline, the fastq files were processed using GenomeGPS pipeline v2.0.

Briefly, the Illumina paired end reads were aligned to the hg19 reference genome using Novoalign

5

(http://novocraft.com) followed by the sorting and marking of duplicate reads using Picard. Local

realignment of INDELs and base quality score recalibration were then performed using the Genome

Analysis Toolkit (GATK). Single nucleotide variants (SNVs) and insertions/deletions (INDELs) were

called across all of the samples simultaneously using GATK's UnifiedGenotyper with variant quality

score recalibration. Variants were annotated using a custom annotation workflow and filtered

using VAAST v1.04 and knowledge-‐based gene lists relevant to the phenotype observed.

Genotype calling and variant filtration

Results from the primary pipeline based were filtered using annotation database following

strict criteria and localization of variants in ROH regions shared by the family (Table S2).

Results from the confirmatory pipeline were filtered using two different methods. The first

variant filtering method focused on annotation driven filtering followed by presence of variants

based on localization of a variant on ROH region shared by the pedigree. The second filtering

method followed a knowledge-‐based approach to look for rare variants in the known genes

associated with leading phenotypes followed by a probabilistic disease variant identification using

VAAST. The analysis was carried out to identify high-‐confidence variants in the affected individuals

using a pipeline that use a different short-‐read aligner. Given the fact that two siblings were

affected, we ruled out the possibility that the mutation arose from a de novo mutation as such a

possibility is extremely low in likelihood.. We also ruled out the possibility of uniparental disomy

(UPD) or a single copy deletion occurring in both siblings, as this is also extremely unlikely.

(i) Method 1: Single-‐ homozygous alternative mutation at a position where both parents are

heterozygous

Variants were processed using VAAST v1.04 (3) configured to fit a recessive mode of

inheritance in a trio-‐mode assessment. Variant calls from the two affected siblings were intersected

and input as the “proband”. The resulting statistically significant candidate variants were filtered to

exclude any findings positive in the unaffected sibling. The program was configured with the

following options:

m lrt -‐o output_trio_pnt_i_02_21_2013 -‐pnt c -‐-‐mp1 8 -‐-‐less_ram -‐-‐fast_gp -‐-‐gp 1e10 -‐-‐

significance 2.4e-‐6 -‐-‐codon_bias -‐iht r -‐-‐locus_heterogeneity n –trio

The VAAST program was configured to run in a recessive mode with no locus heterogeneity. Given

the limitations of the current version of the software, we ran it with the affected siblings’ genotype

6

data intersected and compared against the parents. A post-‐analysis filter was then applied to

remove potential candidates that were found in the unaffected daughter.

(ii) Method 2: Compound heterozygous variant where each parent carries only one variant

Variant calls were filtered for positions that were homozygous alternative in the affected

siblings, heterozygous in the parents, and either heterozygous or homozygous reference in the

unaffected sibling. Results were then filtered to select non-‐synonymous mutations with an MAF <

0.1 in the ESP6500 datasets and all HapMap and 1k Genome population datasets. Results from the

two methods were combined and manually reviewed for relevance to the disease state using

GLAD4U (http://bioinfo.vanderbilt.edu/glad4u/). Two gene lists were created, one with eight

targets using the specific term “aortic hypoplasia-‐atherosclerosis syndrome” and a second list that

generated 196 targets using the generic term “atherosclerosis”.

Initial results were processed to find Method (i) variants. Results were filtered to restrict

the unaffected sibling to being either heterozygous or homozygous reference. Additionally, variants

were be filtered for non-‐synonymous changes. Variant list were initially filtered on the basis of

MAF (< 0.1) are as follows: 595 variants (both affected siblings were homozygous alternative and

both parents were heterozygous). Of these, there were 387 variants (unaffected sister was either

homozygous reference or heterozygous). Of these, 68 variants encoded non-‐synonymous changes.

Further results from variant filtering using MAF were compared against the candidate gene lists

and a common variant in APOB (chr2:21250914 G>A; A618V) was observed. 4/68 variants that

had population frequencies < 0.1 in all population datasets queried. This includes two deleterious

and damaging variants in INO80D (Chr2:206869724 T>A) and NPIPA5 (Chr16: 15463612 G>A),

and two tolerated and benign variants in SLC35B1 (Chr17:47783663 C>T) and SALL3

(Chr18:76754549 T>C).

Sanger sequencing

To validate the presence of the mutation we used Sanger sequencing. The FASTA sequence

of two genes of interest (that satisfied strict filtering of discovery and validation pipelines, VAAST

analyses and ROH filtering results) were obtained using NCBI nucleotide search to design the

primers http://www.ncbi.nlm.nih.gov/nuccore. This FASTA sequence was used as a query to

search in NCBI Primer BLAST http://www.ncbi.nlm.nih.gov/tools/primer-‐

blast/index.cgi?LINK_LOC=BlastHome. Primers pairs were selected based on primer length (18-‐30

7

bp), GC content, theoretical melting temperature (Tm = 59-‐60°C), and product size. BLAST was used

to check the specificity of primers. PAGE-‐purified oligos (Integrated DNA Technologies, IA, USA)

were used for real-‐time PCR. Primers used for resequencing mutation sites in INO80D and

TMPRS11E is provided (Table S4). Sanger sequencing for loci mapping was performed using Big

Dye terminator chemistries on ABI 3730xl (Life technologies; Carlsbad, CA) sequencer.

Sequence, structure and functional annotation of INO8OD

Conserved domains and motifs in INO80D:

We performed a comparative sequence analysis of wild-‐type isoforms and derived sequence

with Ser818Cys mutation. Two wild-‐type isoform sequences and mutated sequences were used to

assess the secondary structure, solvent accessibility and distribution of LCRs. Secondary structure

and solvent accessibility were predicted from the sequence using SABBLE [3], and LCRs were

characterized using SEG program integrated in SMART database[4].

We also characterized conserved functional domains in INO80D using sequence based

protein domain searches: protein sequences (Q53TQ3-‐1 and Q53TQ3-‐2) were scanned against

Pfam database. INO80D encodes two copies of zf-‐C3Hc3H domains in both isoforms. Seven low-‐

complexity regions (LCRs) were found in the sequence of Q53TQ3-‐1 and nine different segments of

(LCR) were found on Q53TQ3-‐2 using SEG [5] program integrated in SMART[4, 6]. LCRs are tandem

sequence repeats in the protein universe and were often excluded in the past prior to detailed

sequence analysis. For example, sequence search algorithms like BLAST mask off sequence with

low compositional complexity. Recent studies on LCR function suggest that protein sequences with

LCRs have several important functional roles. LCRs are common in protein sequence space and

observed in diverse proteins. Proteins with LCRs have the higher number of first-‐degree interaction

partners when compared to proteins without LCRs (Wilcoxon-‐Mann-‐Whitney test; P<0.05). LCRs in

proteins were preferentially positioned in sequence extremities and relative position of functions

could have an impact on the function (Kolmogorov-‐Smirnov test; P = 7.6 x 10-‐6). Gene Ontology

based analysis indicated that LCRs encoded in different regions of proteins may also mediate

different biological roles. From a structural perspective, LCRs do not adopt a definite 3D structure

but could exist as a solvent exposed disordered coils [5, 7]. Scanning the INO80D protein sequence

using Pfam database indicated that presence of two copies of zf-‐C3Hc3H domains in the protein.

The E-‐values of the associations were 4.3e-‐12 and 1e-‐16 respectively. According to Pfam

8

annotations, zf-‐C3Hc3H is considered as a potential DNA binding domain and may found in

chromatin remodeling proteins and helicases (http://pfam.sanger.ac.uk/family/zf-‐C3Hc3H).

Protein structure modeling and fold prediction of INO80D:

For structural characterization of INO80D, we performed a sequence-‐structure template

search using ModBase and ModWeb. No single structural homologs were found with sequence

similarity above the twilight zone (>30%)[8]. A remote homolog (16%) of INO80D sequence was

found in PDB identifier 2VZ9, A chain (structure of mammalian fatty acid synthase) [9]. This could

be a further pointer that IN080D may encode a novel fold or a fold similar to fatty acid synthase

(Figure: S1(a)). As 2VZ9 is not yet incorporated in the database of Structural Classification of

Proteins (SCOP) [10], hence an objective fold recognition approach was not possible to detect

additional structural relationships.

Interactome of INO80D:

To understand the functional context of INO80D from a network perspective, first-‐degree

interactome of INO80D was obtained from IntAct, a database of experimentally characterized

protein-‐protein interactions and visualized using Cytoscape (Figure 3(d)). 20 interactions

originated from spoke-‐expanded co-‐complexes were reported in IntAct. This indicates that INO80D

gene product is involved in multiple protein-‐protein interactions, a hallmark feature of proteins

with LCRs in addition to an important subunit of the INO80 complex.

Phylogenetic analysis of IN080D:

INO80D is a component of human INO80 complex, which has multiple functions including

chromatin remodeling. INO80D is a non-‐conserved subunit in human, yeast and drosophila. Exact

evolutionary lineage of INO80D is unclear. To understand functional role from homologs of

INO80D, we performed a detailed phylogenetic analysis. Sequence of the longest isoform was used

for homology search using PSI-‐BLAST, and a phylogenetic tree was constructed using Phylip v3.6

(http://evolution.genetics.washington.edu/phylip.html)and visualized using iTOL [11]. PSI-‐BLAST

search [12] (E-‐value: 0.05) was performed against non-‐redundant database (nr) with sequences

from GenBank CDS translations, PDB, SwissProt, PIR and PRF. From first iteration, 146 sequences

were obtained. 146 sequences were aligned using Clustal-‐Omega [13]. Bootstrapping of the output

from Clustal-‐Omega was performed using seqboot (1000 iterations). ‘protdist’ program was used to

derive the pairwise distance between 146 sequences. Phylogenetic trees were derived from

9

‘protdist’ output using ‘neighbor’ program (Neighbor-‐joining tree method). Consensus trees with

bootstrap values were derived from ‘neighbor’ output using ‘consense’ program. Nodes of INO80D

phylogenetic tree co-‐clustered with the query sequence (Q53TQ3-‐2) indicate that INO80D is

conserved exclusively in higher eukaryotes and the functions of the co-‐clustered proteins are

largely unknown (Figure S1(b)). This indicates that INO80D is a metazoan specific protein, and it

may have a recent evolutionary history.

MicroRNAs targeting INO80D:

MicroRNA (miRNA) molecules have established role in the regulating genes involved in

cardiovascular and aging phenotypes [14-‐17] via translational repression pathways [18]. To

understand whether any known miRNAs implicated in cardiovascular or aging phenotypes, we

compiled literature reports and miRNA expression data. To perform this analysis we retrieved all

miRNAs targeting the UTR region of INO80D. A list of putative miRNAs that could target INO80D

was identified by TargetScan search [19-‐21] using a library of regulatory targets of mammalian and

vertebrate miRNAs. A list of 27 miRNAs was retrieved, and clinical phenotypes associated with

these miRNAs were obtained from Human MiRNA & Disease Database (HMDD). We noted that

several miRNAs implicated in cardiovascular and aging phenotypes target INO80D (See

Supplementary Table S6) suggesting a regulatory perturbation of INO80D in the setting of various

disease phenotypes.

Disease or quantitative traits associated with INO80 complex subunits:

We compiled results from published genome-‐wide association studies to understand the

genetic role of different subunits of INO80 complex. Subunits of INO80 complex and their

phylogenetic similarity compiled from protein databases are provided (Figure S1(c)). Published

GWAS reports suggest that that subunits of INO80 complex were associated with phenotypes like

extreme obesity, heart rate and capecitabine sensitivity (Table S7).

10

Table S1. Application, results and inferences from various tools employed for functional analysis of Ser818Cys mutation on INO80D Application Tool Result Inference Prediction of conserved domains and motifs

SMART No conserved domains predicted, LCRs are predicted

INO80D encodes multiple LCRs and the mutation site is part of LCR-‐7

Prediction of conserved domains and motifs

Pfam Encodes 2 copies of Potential DNA-‐binding domain “zf-‐C3Hc3H”

Presence of zf-‐C3Hc3H indicates its functional role in mediating protein-‐DNA binding and related functional mechanisms

Prediction of unassigned region

PURE No distant domain association predicted

No known domains could be assigned to INO80D using PURE

Homology modeling (template search)

ModWeb/ModBase Template search identified remote homolog (16%) A chain of structure of mammalian fatty acid synthase, PDB ID: 2VZ9 with no SCOP classification

IN080D may encode a novel fold or a fold similar to fatty acid synthase

Homology modeling ModWeb/ModBase No single structural homologs were found with sequence similarity above the twilight zone (>30%).

Homology model derived using low-‐similarity (< 30%) templates are not ideal for structure analysis

Phylogenetic analysis PSI-‐BLAST, Phylip v3.6, iTOL

A phylogenetic tree was derived using protein homologs. (Figure S1)

Tree depicts that INO80D is conserved exclusively in higher eukaryotes and the functions of the co-‐clustered proteins are largely unknown

Protein-‐protein interaction analysis

IntAct First-‐degree interactome of INO80D was obtained

20 interactions originating from spoke expanded co-‐complexes were reported in IntAct indicating that INO80D is involved in multiple protein-‐protein

11

interactions, a feature of proteins with LCRs

Secondary structure and solvent accessibility

SABBLE Secondary structure and solvent accessibility predicted using the sequence of INO80D isoforms

Comparative analysis of wild-‐type and mutant sequence revealed changes in solvent accessibility due to mutation

12

Table S2: Homozygous regions identified using Runs of Homozygosity (ROH) routine in PLINK

!"#$ %&'($ %&')$ %*+#*$ ,-.$ /0$ &%&'$&1234#$

56$74-48$

!5.9-7$

:589*95-8$

!" #$%!&%'()*" #$&)+)!)," !(&-.(+-(%%" !()-)%)-,,," %!%,/,+" !!," %%" !'-!%&"

;! #8<)=>?@?! #8(=)@?(=($ (A;B@;(B=AC$ (A@B>;;BA>@$ ))>AD;=$ )@<$ (C$ A;BA<C$

?$ #8CA;)@);$ #8CA<<)(($ @(B>;=BC(@$ @)BA<CB@(;$ ()C(DC$ (;;$ E$ E$

(>$ #8())<@;(@$ #8)ACA)A>$ ;=B=A?B>>C$ ;CB>)@B<(=$ ()C)D<($ (;A$ ($ =A@$

F5*+G$ $ $ $ $ $ $ A($ ?(BC>A$

13

Table S3: Two novel missense mutations characterized using variant filtering pipeline rsID Chr Position Reference

Base Sample Genotype

functionGVS annotation

Gene Symbol

-‐-‐ 2q33.3 206869724 T A,A,W,W,T missense INO80D

-‐-‐ 4q13.2 69343287 A G,G,R,R,R missense TMPRS11E 11000 Genomes project (NCBI Build 37); 2From left to right: L6, L7, L8, L9 and L10; W: A/T; R: G/A

14

Table S4: Forward and reverse primers used for amplifying genomic regions around INO80D and TMPRSS11E primers Primer ID Sequence Theoretical

Tm (⁰C) Tm (⁰C)

GC Content(%)

INO80D-‐FP CACGCCTCCAAGTGGCACCTC 60.1 63.1 66.6

INO80D-‐RP ACCCACCTACACCCCTGGCA 57.8 63.5 65

TMPRSS11E-‐FP ACCTGGTCAGAACCCTGAGCCTT 58.7 62.3 56.5

TMPRSS11E-‐RP TGAGTTCCTCTTCCGATGCTCACC 59 60.5 54.1

FP=Forward Primer; RP=Reverse Primer

15

Table S5: Syndromes associated with mutations in genes involved in chromatin remodeling

Syndrome Clinical features Genes Chromatin remodeling complex or gene

References

ATRX-‐syndrome (α-‐thalassemia X-‐linked mental retardation) and α-‐thalassemia myelodysplasia syndrome

Distinctive craniofacial features, genital anomalies, severe developmental delays, hypotonia, intellectual disability, and mild-‐to-‐moderate anemia secondary to alpha-‐thalassemia

ATRX SWI/SNF family

[22]

CHARGE syndrome

Coloboma, heart defect, atresia choanae, retarded growth and development, genital hypoplasia, ear anomalies/deafness

CHD7 CHD family

[23]

Williams-‐Beuren syndrome

Diverse phenotypes including supravalvular aortic stenosis, peripheral pulmonary arterial stenosis, dysmorphism, mental and growth deficiency, aberrant vitamin D metabolism, and hypercalcemia

Deletion of 28 genes in 7q11.23 including a chromatin remodeling complex associated gene

BAZ1B [24-‐26]

COFS (cerebro-‐oculo-‐facio-‐skeletal syndrome) and Cockayne syndrome B (CSB)

UV sensitivity, cataracts, growth failure, and neurological degeneration

ERCC6 and ERCC8

Mutations in chromatin remodelers or proteins interacting with components of remodeling complexes

[27, 28]

Werner's Syndrome

Cataract, premature arteriosclerosis, subcutaneous calcification, diabetes mellitus, wizened and prematurely aged facies, scleroderma-‐like skin changes, especially in the extremities. Chromosomal deletion or instabilities were also reported.

WRN WRN is a member of the RecQ Helicase family

http://www. pathology. washington.edu/ research/werner/ database/

16

Table S6: Clinical phenotypes implicated to microRNAs targeting INO80D

miRNA* PCT Score^ Observed phenotypes Related phenotypes has-‐miR-‐30abcdef/30abe-‐5p > 0.99 Vascular calcification [29]

Periodontitis [30] Periodontal disease [31]

Left ventricular hypertrophy [32]

hsa-‐miR-‐384-‐5p >0.99 _ Myocardial ischemia [33] hsa-‐miR-‐125a-‐5p/125b-‐5p 0.99 _ Heart failure [34]

Hypertrophic cardiomyopathy [35] Myocardial ischemia [36]

hsa-‐miR-‐351 0.99 NR NR hsa-‐miR-‐670 0.99 NR NR hsa-‐miiR-‐4319 0.99 NR NR has-‐miR-‐181abcd 0.99 Coronary artery disease

[37]; Periodontitis [30] Heart failure [34] Hypertrophic cardiomyopathy [38]

hsa-‐miR-‐4262 0.99 NR NR hsa-‐miR-‐128/128ab 0.94 _ _ has-‐miR-‐101/101ab 0.93 _ _ hsa-‐miR-‐27abc/27a-‐p 0.93 Vascular disease [39] Heart failure [34]

Hypertrophic cardiomyopathy [35] Ischemic heart disease [39] Cardiomyopathies [40]

hsa-‐miR-‐199ab-‐5p 0.92 _ Hypertrophic cardiomyopathy [38] Cardiomegaly, expression cardiac myocytes [41] Heart failure [42] Aging [43]

hsa-‐let-‐7 0.91 Cataract [44] Heart failure [34] Cardiac hypertrophy [45]

hsa-‐miR-‐98 0.91 _ Heart failure [34] Cardiac hypertrophy [45] Myocardial infarction [46]

hsa-‐miR-‐4458 0.91 NR NR hsa-‐miR-‐4500 0.91 NR NR hsa-‐miR-‐124/124ab/506 0.89 _ _ hsa-‐miR-‐103a 0.87 Hypertension [47] _ hsa-‐miR-‐107/107ab 0.87 _ Heart failure [34, 48] hsa-‐miR-‐19ab 0.83 Vascular disease [49] Age-‐related heart failure [50] hsa-‐miR-‐17/17-‐5p 0.83 Vascular disease [49]

Periodontitis [51] Heart failure [34]

hsa-‐20ab/20b-‐5p 0.83 Vascular disease [49] Hypertension [52]

_

hsa-‐miR-‐93 0.83 _ _ hsa-‐miR-‐106ab 0.83 Periodontal disease [31] Heart failure [34]

Myocardial infarction [53] hsa-‐miR-‐427 0.83 NR NR hsa-‐miR-‐518a-‐3p 0.83 _ _ hsa-‐miR-‐519d 0.83 _ _

* Preferentially conserved targeting (PCT) score is a Bayesian estimate of the probability that an miRNA binding site in the upstream of INO80D is conserved due to selective maintenance of miRNA targeting rather than by chance or any other reason not pertinent to miRNA targeting, allowing for uncertainty in the S/B ratio * miRNAs targeting INO80D identified using TargetScan 6.2 (human); high confidence miRNAs with PCT <= 0.8 considered for phenotype mapping NR: No phenotype data reported in Human MiRNA & Disease Database -‐: No reported phenotype implications in Human MiRNA & Disease Database

17

Table S7: Published genomics associations of INO80 complex subunits from genome-‐wide

association studies

Gene * Synonyms^ SNP Disease / Trait P-‐value OR/β Genes1; Context

Ref

ACTL6A BAF53, BAF53A,

INO80K

rs7612445 Heart rate 2.00E-‐14 0.36 GNB4-‐ACTL6A; Intergenic

[54]

ACTR5 ARP5 -‐ -‐ -‐ -‐ -‐ -‐

ACTR8 INO80N -‐ -‐ -‐ -‐ -‐ -‐

INO80C C18orf37 rs7603514 Capecitabine sensitivity

3.00E-‐06 NR INO80C-‐MIR3975; Intergenic

[55]

INO80E CCDC95 -‐ -‐ -‐ -‐ -‐ -‐

INO80D INO80 rs7603514 Obesity (extreme) 8.00E-‐06 1.36 NRP2-‐INO80D; Intergenic

[56]

INOC1 INOC1, KIAA1259 -‐ -‐ -‐ -‐ -‐ -‐

MCRS1 INO80Q, MSP58 -‐ -‐ -‐ -‐ -‐ -‐

NFRKB INO80G -‐ -‐ -‐ -‐ -‐ -‐

RUVBL1 INO80H, NMP238, TIP49, TIP49A

-‐ -‐ -‐ -‐ -‐ -‐

RUVBL2 INO80J, TIP48, TIP49B, CGI-‐46

-‐ -‐ -‐ -‐ -‐ -‐

TCF3 E2A, ITF1 -‐ -‐ -‐ -‐ -‐ -‐

ZNHIT4 HMGA1L4, PAPA1, ZNHIT4

-‐ -‐ -‐ -‐ -‐ -‐

* Complex subunits, gene names were obtained from CORUM database (http://mips.helmholtz-‐muenchen.de/genre/proj/corum/complexdetails.html?id=302) ^ Synonyms were compiled from HUGO Gene Nomenclature Committee (HGNC) portal (http://www.genenames.org/) and GeneCards (http://www.genecards.org/) 1 Reported/mapped genes from GWAS Catalog (http://www.genome.gov/gwastudies/)

18

Figure S1: a) Homology model of IN080D protein; location of S818C is highlighted in red circle b) phylogenetic tree of INO80D using protein sequence of longest isoform c) Phylogenetic conservation of INO80 complex derived using data from CORUM and SIMAP databases. Similarities of protein complex subunits to proteins from other organisms are represented as color-‐coded as well as phylogenetic conservation ratios. Organism names are provided as abbreviated two letter codes. INO80D is represented using its synonym INO80.

19

References:

1. O'Rawe J, Jiang T, Sun G, Wu Y, Wang W, Hu J, et al. Low concordance of multiple variant-‐calling pipelines: practical implications for exome and genome sequencing. Genome Med. 2013; 5(3):28.

2. Caliskan M, Chong JX, Uricchio L, Anderson R, Chen P, Sougnez C, et al. Exome sequencing reveals a novel mutation for autosomal recessive non-‐syndromic mental retardation in the TECR gene on chromosome 19p13. Hum Mol Genet. 2011; 20(7):1285-‐1289.

3. Adamczak R, Porollo A, Meller J. Combining prediction of secondary structure and solvent accessibility in proteins. Proteins. 2005; 59(3):467-‐475.

4. Letunic I, Doerks T, Bork P. SMART 6: recent updates and new developments. Nucleic Acids Res. 2009;37(Database issue):D229-‐232.

5. Wootton JC. Non-‐globular domains in protein sequences: automated segmentation using complexity measures. Comput Chem. 1994; 18(3):269-‐285.

6. Schultz J, Milpetz F, Bork P, Ponting CP. SMART, a simple modular architecture research tool: identification of signaling domains. Proc Natl Aca Sci. U S A 1998; 95(11):5857-‐5864.

7. Coletta A, Pinney JW, Solis DY, Marsh J, Pettifer SR, Attwood TK. Low-‐complexity regions within protein sequences have position-‐dependent roles. BMC Syst Biol. 2010; 4:43.

8. May AC, Johnson MS, Rufino SD, Wako H, Zhu ZY, Sowdhamini R, et al. The recognition of protein structure and function from sequence: adding value to genome data. Philos Trans R Soc Lond B Biol Sci. 1994; 344(1310):373-‐381.

9. Maier T, Leibundgut M, Ban N. The crystal structure of a mammalian fatty acid synthase. Science 2008, 321(5894);1315-‐1322.

10. Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, et al. Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 2008; 36(Database issue):D419-‐425.

11. Letunic I, Bork P: Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics 2007; 23(1):127-‐128.

12. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-‐BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997; 25(17):3389-‐3402.

13. Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, et al. Fast, scalable generation of high-‐quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 2011; 7:539.

14. Lu M, Zhang Q, Deng M, Miao J, Guo Y, Gao W, et al. An analysis of human microRNA and disease associations. PLoS One 2008; 3(10):e3420.

15. Small EM, Frost RJ, Olson EN. MicroRNAs add a new dimension to cardiovascular disease. Circulation 2010; 121(8):1022-‐1032.

16. Smith-‐Vikos T, Slack FJ. MicroRNAs and their roles in aging. J Cell Sci. 2012; 125(Pt 1):7-‐17.

17. Stather PW, Sylvius N, Wild JB, Choke E, Sayers RD, Bown MJ. Differential MicroRNA Expression Profiles in Peripheral Arterial Disease. Circ Cardiovasc Genet. 2013.

20

18. He L, Hannon GJ. MicroRNAs. small RNAs with a big role in gene regulation. Nat Rev Genet. 2004; 5(7):522-‐531.

19. Lewis BP, Burge CB, Bartel DP. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005; 120(1):15-‐20.

20. Friedman RC, Farh KK, Burge CB, Bartel DP. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 2009; 19(1):92-‐105.

21. Grimson A, Farh KK, Johnston WK, Garrett-‐Engele P, Lim LP, Bartel DP. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol cell. 2007; 27(1):91-‐105.

22. Xue Y, Gibbons R, Yan Z, Yang D, McDowell TL, Sechi S, et al. The ATRX syndrome protein forms a chromatin-‐remodeling complex with Daxx and localizes in promyelocytic leukemia nuclear bodies. Proc Natl Acad Sci. U S A. 2003; 100(19):10635-‐10640.

23. Vissers LE, van Ravenswaaij CM, Admiraal R, Hurst JA, de Vries BB, Janssen IM, et al. Mutations in a new member of the chromodomain gene family cause CHARGE syndrome. Nat Genet. 2004; 36(9):955-‐957.

24. Lu X, Meng X, Morris CA, Keating MT. A novel human gene, WSTF, is deleted in Williams syndrome. Genomics. 1998; 54(2):241-‐249.

25. Meng X, Lu X, Li Z, Green ED, Massa H, Trask BJ, et al. Complete physical map of the common deletion region in Williams syndrome and identification and characterization of three novel genes. Hum Genet. 1998; 103(5):590-‐599.

26. Meng X, Lu X, Morris CA, Keating MT. A novel human gene FKBP6 is deleted in Williams syndrome. Genomics. 1998; 52(2):130-‐137.

27. Woudstra EC, Gilbert C, Fellows J, Jansen L, Brouwer J, Erdjument-‐Bromage H, et al. A Rad26-‐Def1 complex coordinates repair and RNA pol II proteolysis in response to DNA damage. Nature. 2002; 415(6874):929-‐933.

28. Citterio E, Van Den Boom V, Schnitzler G, Kanaar R, Bonte E, Kingston RE, Hoeijmakers JH, et al. ATP-‐dependent chromatin remodeling by the Cockayne syndrome B DNA repair-‐transcription-‐coupling factor. Mol Cell Biol. 2000; 20(20):7643-‐7653.

29. Balderman JA, Lee HY, Mahoney CE, Handy DE, White K, Annis S, et al. Bone morphogenetic protein-‐2 decreases microRNA-‐30b and microRNA-‐30c to promote vascular smooth muscle cell calcification. J Am Heart Assoc. 2012; 1(6):e003905.

30. Lee YH, Na HS, Jeong SY, Jeong SH, Park HR, Chung J. Comparison of inflammatory microRNA expression in healthy and periodontitis tissues. Biocell. 2011; 35(2):43-‐49.

31. Perri R, Nares S, Zhang S, Barros SP, Offenbacher S. MicroRNA modulation in obesity and periodontitis. J Dent Res. 2012; 91(1):33-‐38.

32. Duisters RF, Tijsen AJ, Schroen B, Leenders JJ, Lentink V, van der Made I, et al. miR-‐133 and miR-‐30 regulate connective tissue growth factor: implications for a role of microRNAs in myocardial matrix remodeling. Circ Res. 2009; 104(2):170-‐178, 176p following 178.

33. Bao Y, Lin C, Ren J, Liu J. MicroRNA-‐384-‐5p regulates ischemia-‐induced cardioprotection by targeting phosphatidylinositol-‐4,5-‐bisphosphate 3-‐kinase, catalytic subunit delta (PI3K p110delta). Apoptosis. 2013; 18(3):260-‐270.

21

34. Thum T, Galuppo P, Wolf C, Fiedler J, Kneitz S, van Laake LW, et al: MicroRNAs in the human heart: a clue to fetal gene reprogramming in heart failure. Circulation. 2007; 116(3):258-‐267.

35. Busk PK, Cirera S. MicroRNA profiling in early hypertrophic growth of the left ventricle in rats. Biochem Biophys Res Commun. 2010; 396(4):989-‐993.

36. Ren D, Wang X, Ha T, Liu L, Kalbfleisch J, Gao X, et al. SR-‐A deficiency reduces myocardial ischemia/reperfusion injury; involvement of increased microRNA-‐125b expression in macrophages. Biochim Biophys Acta. 2013; 1832(2):336-‐346.

37. Hulsmans M, Sinnaeve P, Van der Schueren B, Mathieu C, Janssens S, Holvoet P. Decreased miR-‐181a expression in monocytes of obese patients is associated with the occurrence of metabolic syndrome and coronary artery disease. J Clin Endocrinol Metab. 2012; 97(7):E1213-‐1218.

38. van Rooij E, Sutherland LB, Liu N, Williams AH, McAnally J, Gerard RD, et al. A signature pattern of stress-‐responsive microRNAs that can evoke cardiac hypertrophy and heart failure. Proc Natl Acad Sci U S A. 2006; 103(48):18255-‐18260.

39. Bang C, Fiedler J, Thum T. Cardiovascular importance of the microRNA-‐23/27/24 family. Microcirculation. 2012; 19(3):208-‐214.

40. Yeh CH, Chen TP, Wang YC, Lin YM, Fang SW. MicroRNA-‐27a regulates cardiomyocytic apoptosis during cardioplegia-‐induced cardiac arrest by targeting interleukin 10-‐related pathways. Shock. 2012; 38(6):607-‐614.

41. Song XW, Li Q, Lin L, Wang XC, Li DF, Wang GK, et al. MicroRNAs are dynamically regulated in hypertrophic hearts, and miR-‐199a is essential for the maintenance of cell size in cardiomyocytes. J Cell Physiol. 2010; 225(2):437-‐443.

42. Haghikia A, Missol-‐Kolka E, Tsikas D, Venturini L, Brundiers S, Castoldi M, et al. Signal transducer and activator of transcription 3-‐mediated regulation of miR-‐199a-‐5p links cardiomyocyte and endothelial cell function in the heart: a key role for ubiquitin-‐conjugating enzymes. Eur Heart J. 2011; 32(10):1287-‐1297.

43. Ukai T, Sato M, Akutsu H, Umezawa A, Mochida J. MicroRNA-‐199a-‐3p, microRNA-‐193b, and microRNA-‐320c are correlated to aging and regulate human cartilage metabolism. J Orthop Res. 2012; 30(12):1915-‐1922.

44. Peng CH, Liu JH, Woung LC, Lin TJ, Chiou SH, Tseng PC, et al. MicroRNAs and cataracts: correlation among let-‐7 expression, age and the severity of lens opacity. Br J Ophthalmol. 2012; 96(5):747-‐751.

45. Yang Y, Ago T, Zhai P, Abdellatif M, Sadoshima J. Thioredoxin 1 negatively regulates angiotensin II-‐induced cardiac hypertrophy through upregulation of miR-‐98/let-‐7. Circ Res. 2011; 108(3):305-‐313.

46. Zhu W, Yang L, Shan H, Zhang Y, Zhou R, Su Z, et al. MicroRNA expression analysis: clinical advantage of propranolol reveals key microRNAs in myocardial infarction. PLoS One. 2011; 6(2):e14736.

47. Wu WH, Hu CP, Chen XP, Zhang WF, Li XW, Xiong XM, et al. MicroRNA-‐130a mediates proliferation of vascular smooth muscle cells in hypertension. Am J Hypertens. 2011; 24(10):1087-‐1093.

48. Voellenkle C, van Rooij J, Cappuzzello C, Greco S, Arcelli D, Di Vito L, et al. MicroRNA signatures in peripheral blood mononuclear cells of chronic heart failure patients. Physiol Genomics. 2010; 42(3):420-‐426.

22

49. Parmacek MS: MicroRNA-‐modulated targeting of vascular smooth muscle cells. J Clin Invest. 2009; 119(9):2526-‐2528.

50. van Almen GC, Verhesen W, van Leeuwen RE, van de Vrie M, Eurlings C, Schellings MW, et al. MicroRNA-‐18 and microRNA-‐19 regulate CTGF and TSP-‐1 expression in age-‐related heart failure. Aging Cell. 2011; 10(5):769-‐779.

51. Liu Y, Liu W, Hu C, Xue Z, Wang G, Ding B, et al. MiR-‐17 modulates osteogenic differentiation through a coherent feed-‐forward loop in mesenchymal stem cells isolated from periodontal ligaments of patients with periodontitis. Stem Cells. 2011; 29(11):1804-‐1816.

52. Brock M, Samillan VJ, Trenkmann M, Schwarzwald C, Ulrich S, Gay RE, et al. AntagomiR directed against miR-‐20a restores functional BMPR2 signalling and prevents vascular remodelling in hypoxia-‐induced pulmonary hypertension. Eur Heart J. 2012.

53. Liu Z, Yang D, Xie P, Ren G, Sun G, Zeng X, et al. MiR-‐106b and MiR-‐15b modulate apoptosis and angiogenesis in myocardial infarction. Cell Physiol Biochem. 2012; 29(5-‐6):851-‐862.

54. den Hoed M, Eijgelsheim M, Esko T, Brundel BJ, Peal DS, Evans DM, et al. Identification of heart rate-‐associated loci and their effects on cardiac conduction and rhythm disorders. Nat Genet. 2013; 45(6):621-‐631.

55. O'Donnell PH, Stark AL, Gamazon ER, Wheeler HE, McIlwee BE, Gorsic L, et al. Identification of novel germline polymorphisms governing capecitabine sensitivity. Cancer. 2012; 118(16):4063-‐4073.

56. Cotsapas C, Speliotes EK, Hatoum IJ, Greenawalt DM, Dobrin R, Lum PY, et al. Common body mass index-‐associated variants confer risk of extreme obesity. Hum Mol Genet. 2009; 18(18):3502-‐3507.

WES INO80D CircG SM R3 - Circulation: Genomic and...

Documents

Transcript of WES INO80D CircG SM R3 - Circulation: Genomic and...