SU PPLEMENTAR Y INFORMATION · PDF filentr ol: I am l e c u l a r (A ... Clustering of...

46
Supplementary Figures and Legends Supplementary Figure 1. Detailed TIF-Seq protocol. (1) Non-capped RNA molecules are dephosphorylated. (2) Using the oligo-capping method 28 a known oligo (orange) is ligated to the capped mRNA molecules. (3) Sample is split and barcoded full-length cDNA is produced by reverse transcription and PCR amplification. (4) Sticky ends are produced by NotI digestion; samples are pooled and circularized by intermolecular ligation. (5) Circularized molecules are purified and fragmented by sonication. 5'-3' junctions are captured by biotin-streptoavidin purification. (6) Illumina libraries are produced and sequenced using 105-nucleotide paired-end reads. AAAAAA 3’ -Biot 5’ 3’ Chimera control barcodes Multiplexing barcodes + + Multiplexing barcodes Sequencing Sequencing P- + 2. ssRNA ligation to full length mRNAs 3. Split sample: RT and PCR amplification of FlcDNA 4. Intramolecular ligation 6. Library construction 1. Dephosphorylation of non-capped mRNA AAAAAA CAP- AAAAAA P- AAAAAA -Biot 5’ 3’ Chimera control barcode 1 Chimera control barcode 2 A A AAAAAA -Biot 5’ 3’ Chimera control barcode 1 Chimera control barcode 2 B B 5’ -Biot Chimera control: Intram olecular (A-A) A A 5’ intra m ole c ular lig atio n -Biot Chimera control: Intram olecular (B-B) AAAAAAAAAAA 5’ intra m ole c ular lig atio n -Biot Chimera control: Intram olecular (A-A) A B B pool samples 3’ Intramolecular: A-A or B-B Intermolecular: A-B, B-A, A-A or B-B -Biot 5’ 3’ AAAAAAA 5. Sonication and biotin capture -Biot 5’ 3’ AAAAAAA A A B B TTTTTTTTTTT AAAAAAAAAAA TTTTTTTTTTT AAAAAAAAAAA TTTTTTTTTTT TTTTTTTT TTTTTTTT WWW.NATURE.COM/NATURE | 1 SUPPLEMENTARY INFORMATION doi:10.1038/nature12121

Transcript of SU PPLEMENTAR Y INFORMATION · PDF filentr ol: I am l e c u l a r (A ... Clustering of...

  • Supplementary Figures and Legends

    Supplementary Figure 1. Detailed TIF-Seq protocol. (1) Non-capped RNA molecules are dephosphorylated. (2) Using the oligo-capping method28 a known oligo (orange) is ligated to the capped mRNA molecules. (3) Sample is split and barcoded full-length cDNA is produced by reverse transcription and PCR amplification. (4) Sticky ends are produced by NotI digestion; samples are pooled and circularized by intermolecular ligation. (5) Circularized molecules are purified and fragmented by sonication. 5'-3' junctions are captured by biotin-streptoavidin purification. (6) Illumina libraries are produced and sequenced using 105-nucleotide paired-end reads.

    AAAAAA

    3

    -Biot

    53

    Chimera control barcodesMultiplexing

    barcodes+ +

    Multiplexingbarcodes

    Sequencing

    Sequencing

    P-+

    2. ssRNA ligation to full length mRNAs

    3. Split sample: RT and PCR amplification of FlcDNA

    4. Intramolecular ligation

    6. Library construction

    1. Dephosphorylation of non-capped mRNA

    AAAAAACAP-

    AAAAAAP-

    AAAAAA-Biot 5

    3Chimera control barcode 1

    Chimera control barcode 2

    AA

    AAAAAA-Biot 5

    3Chimera control barcode 1

    Chimera control barcode 2

    BB

    AAAAAAA5

    -Biot

    Chimera control: Intramolecular (A-A)

    A A

    5

    intramolecular ligation

    -Biot

    Chimera control: Intramolecular (B-B)

    AAAAAAAAAAA5

    intramolecular ligation

    -Biot

    Chimera control: Intramolecular (A-A)

    A

    B B

    pool samples

    3

    Intramolecular:A-A or B-BIntermolecular:A-B, B-A, A-A or B-B

    -Biot

    53

    AAAAAAA

    5. Sonication and biotin capture

    -Biot

    53AAAA

    AAA

    AA

    B B

    TTTTTTTTTTT

    AAAAAAAAAAATTTTTTTTTTT

    AAAAAAAAAAATTTTTTTTTTT

    TTTTTTTT

    TTTTTTTT

    WWW.NATURE.COM/NATURE | 1

    SUPPLEMENTARY INFORMATIONdoi:10.1038/nature12121

  • Supplementary Figure 2. Clustering of transcript boundary positions for the definition of major transcript isoforms (mTIFs). a, Criteria for mTIF clustering. When comparing distances between TSSs from all overlapped TIFs, start sites separated by 8 nucleotides (red line) are overrepresented. A 5 nucleotide clustering distance for mTIFs was chosen to stay well above our technical precision while maintaining the resolution necessary to identify the 8 nucleotide-spaced start sites, which could have a biological basis. In the following panels, individual transcription start sites (purple) or polyadenylation sites (orange) are displayed in natural and logarithmic scale; sites co-occurring within 5 nucleotides were clustered into mTIFs (dark blue boxes, Supplementary Methods). Individual TIFs are displayed as blue lines, coverage as dark blue. b, Example of small boundary variations with large functional consequences: 5' end variations of only a few nucleotides in HTS1 determine whether or not the 1st AUG is included and thus whether the transcript encodes the mitochondrial or cytoplasmic version of histidyl-tRNA synthetase respectively29, as the former requires an N-terminal signal peptide (also shown in Supplementary Fig. 20 with other examples of isoforms encoding truncated proteins). c, The same clustering was applied to 3' ends, illustrated here for ALT1; the logarithmic plot demonstrates the prevalence of isoforms with minor variations that would have been discarded with a larger window for clustering. d, In some cases transcription start sites can also span hundreds of nucleotides, shown here for ALT1. Protein coding sequences (in b and d) are represented as in Fig. 4a.

    319976 320049 320080 320110 320129 320155 320183 320210 320236 320281 320313 320334 320358 320373 320398 320413 320439

    320137 320160 320203 320225 320255 320292

    5 Start (-) Logarithmic scale [0-875]

    5 Start (-) Natural scale [0-875]

    mTIF 5 Start clusters

    ALT1Combined 5Start dataset

    (Glucose + Galactose)

    Glucose TIFs

    638947 638966 638975 638987 638998 639010 639020 639031 639051 639064 639090

    638981 638993 639005 639025 639056

    G C A T T A G C A G T A G G A G C A G A A G T A G C A G C A G C G G T A G C G G A T G A C A T C C T G A T T A T C G A A G A C G A C T T T A T A G A T G T G A C T A C T T T A T T T A G T G A T C T A C T A A G C A T T A A G C A A G A G A C G A G C A G A A A G G A A C T A T T C T A A A G T A A C A C A T T G G G C C G T A A G A C T C A T T T A A A A G C C C G G T C G A T C

    A N A T P A S T A A A T A S S M R I I S S S K I S T V V K N L S R S L M

    Chr XVI 638,940 bp 638,960 bp 638,980 bp 639,000 bp 639,020 bp 639,040 bp 639,060 bp 639,080 bp

    5 Start (-) Logarithmic scale [0-1483]

    5 Start (-) Natural scale [0-1483]

    mTIF 5 Start clusters

    HTS1Combined 5Start dataset (Glucose + Galactose)

    Glucose TIFs

    1st AUG2nd AUG

    H T I S N S V T F H N KA S L S L M

    Chr XII 320,100 bp 320,200 bp 320,300 bp 320,400 bp

    b

    d

    0 20 40 60 80 100

    0.00

    00.

    010

    0.02

    00.

    030

    Distance between TSSs of overlapped TIFs (nt)

    Dens

    ity

    8 nt

    317960 317988 318008 318023 318039 318050 318060 318079 318100 318108 318120 318137 318152 318159

    318034 318045 318055 318142 318167

    3 End (-) Logarithmic scale [0-1068]

    3 End (-) Natural scale [0-1068]

    mTIF 3 End clusters

    ALT1318,000 bpChr XII 318,100 bp 318,200 bp318,050 bp 318,150 bp

    Combined 3 End dataset(Glucose + Galactose)

    Glucose TIFs

    a

    c

    WWW.NATURE.COM/NATURE | 2

    SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature12121

  • Supplementary Figure 3. Reproducibility of the TIF-Seq method. Correlation between technical replicates of chimaera control configurations (A-A) and (B-B) of one sample in YPD (YPD1, see Supplementary text); and biological replicates YPD1 and YPD2. Spearman correlation values (cor) are shown. a, Correlation between the numbers of exactly identical TIFs (identical start and end sites) in A-A and B-B; b, Correlation between the numbers of TIFs overlapping with annotated transcripts in A-A and B-B; c, Correlation between the numbers of exactly identical TIFs (identical start and end sites) in biological replicates YPD1 and YPD2; and d, Correlation between the numbers of TIFs overlapping with annotated transcripts4 in YPD1 and YPD2.

    a

    !

    !

    !

    ! !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !!

    !

    !

    !

    !

    ! ! !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    ! !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    ! !! !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    ! !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    ! !

    !

    !

    !

    !

    !

    !

    !

    !!!

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    ! !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    ! !!

    !

    !!!

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !!

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    ! !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !