Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between...
Transcript of Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between...
Additional File 1 Figure S1 – Quality of PacBio Reads Figure S2 – Comparison Long and Short data sets Figure S3 – Within transcript variation Figure S4 – No increased bias for longer ERCC transcript Figure S5 – Verification of isoforms by Sanger and amplification-free PacBio seuqencing Figure S6 – Major isoform expression Figure S7 – Coding and non-coding variation Figure S8 – Example ideal read Figure S9 – Example concatenated read Figure S10 – List of custom oligonucleotides
Figure S1 (A) Quality of PacBio reads. Comparison of read length between the first 5 SMRT cells and the last 2 SMRT cells. As mentioned in Methods there was an added purification step and a change in chemistry between the first five SMRT cells and the last two. Note the change of scales on the vertical axis for the last two SMRT cells. (B) GC distribution among all pooled PacBio sequences. The theoretical distribution was calculated from the observed data and used to build a reference distribution. (C) Average quality per read. (D) Sequence nucleotide content across all bases. Note the shift in scale in the X-axis to allow for individual bases at the start of the sequence. Note also that few reads are longer than 5 kb. All subfigures in Figure S1 were made with the software FastQC
01020
4050
30
%T%C%A%G
All PacBio Reads First 5 SMRT cells Last 2 SMRT cells
80’000
40’000
0
60’000
20’000
2000
-249
9
500-
990
3500
-399
9
5000
-549
9
6500
-699
9
8000
-849
9
9500
-999
9
80’000
40’000
0
60’000
20’000
2000
-249
9
500-
990
3500
-399
9
5000
-549
9
6500
-699
9
8000
-849
9
9500
-999
9
4’000
0
6’000
2’000
2200
-239
9
600-
799
3800
-399
9
5400
-559
9
7000
-719
9
8600
-879
9
2’000
0
3’000
1’000
600 10 20 9040 50 70 8030GC distribution
Num
ber o
f rea
ds
GC count per readTheoretical Distribution
Sequence Length Sequence Length Sequence Length
Average quality per read
4’000
0
6’000
2’000
8’000
10’000
10 20 4030Mean Sequence Quality (Phred Score)
Sequence content across all bases
Position in read (bp)
1 95
2200
-239
9
600-
799
3800
-399
9
5400
-559
9
7000
-719
9
8600
-879
9
1020
0-10
399
A
B C
D
Sequence Length (bp)Sequence Length (bp) Sequence Length (bp)
Num
ber o
f rea
ds
Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative data set (see Methods), which contains reads from both the long and short data set. (A) Scatterplot depicting for each gene the longest sequenced transcript and the longest annotated isoform in bp. (B) Distribution of transcript lengths. The green bars show number of observed transcripts, while the red bars shows actual annotated lengths of the corresponding genes.
Figure S3 Within-transcript variation based on reads sharing the same UMI, showing technical errors in isoform boundaries. (A) Histograms showing the offset from median alignment position for 5ʹ ends, 3ʹ ends and splice sites. (B) Magnified view of 5ʹ and 3ʹ ends. (C) Magnified view of exon junctions.
24403
0.00%
0.25%
0.50%
0.75%
<−100
−90
−80
−70
−60
−50
−40
−30
−20
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90
>100
119793
0.00%
0.25%
0.50%
0.75%
<−100
−90
−80
−70
−60
−50
−40
−30
−20
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90
>100
24403
0%10%20%30%
100%
<−100
−90
−80
−70
−60
−50
−40
−30
−20
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90
>100
24403
0%10%20%30%
100%<−
100
−90
−80
−70
−60
−50
−40
−30
−20
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90
>100
119793
0.00%
0.25%
0.50%
0.75%
<−100
−90
−80
−70
−60
−50
−40
−30
−20
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90
>100
24403
0.00%
0.25%
0.50%
0.75%
<−100
−90
−80
−70
−60
−50
−40
−30
−20
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90
>100
24403
<−1000
−1000
−100 −10 0 10 100
1000
>1000
119793
<−1000
−1000
−100 −10 0 10 100
1000
>1000
24403
<−1000
−1000
−100 −10 0 10 100
1000
>1000
24403
<−1000
−1000
−100 −10 0 10 100
1000
>1000
e
Transcripts5 prime 3 prime
ERCC5 prime 3 primeA
BZoom in 5 prime Zoom in 3 prime
ERCC
Tran
scrip
ts
C Zoom in exon junctions
a b c d e f
b d
a f
c
119793
<−1000
−1000
−100 −10 0 10 100
1000
>1000
24403
<−1000
−1000
−100 −10 0 10 100
1000
>1000
20%10%
30%11307
<−1000
−1000
−100 −10 0 10 100
1000
>1000
11307<−
1000
−1000
−100 −10 0 10 100
1000
>1000
11307
0%10%20%30%
100%
<−100
−90
−80
−70
−60
−50
−40
−30
−20
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90
>100
11307
0%10%20%30%
100%
<−100
−90
−80
−70
−60
−50
−40
−30
−20
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90
>100
100%
100% 100%
100%
100%
Figure S4 Histogram showing the offset from median for 5ʹ , 3ʹ ends and splice sites, in terms of percentage of zero offset for reads mapping to ERCC, split by ERCC length. Red, observed number of transcripts. Note that the Y-axis is truncated and that the X-axis is shown in bins of 10 bp outside the region of ± 10 bp. Note also that there were no ERCC transcripts with length between 1300 and 1900 bp and that the number of transcripts longer than 1900 bp are very few.
1510
0%10%20%30%
100%
<−100
−90
−80
−70
−60
−50
−40
−30
−20
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90
>100
5’ end 3’ end
ERCC
500
-900
bp
1510
0%10%20%30%
100%
<−100
−90
−80
−70
−60
−50
−40
−30
−20
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90
>100
ERCC
900
-130
0 bp
ERCC
190
0-20
00 b
p
3248
0%10%20%30%
100%
<−100
−90
−80
−70
−60
−50
−40
−30
−20
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90
>100
3248
0%10%20%30%
100%
<−100
−90
−80
−70
−60
−50
−40
−30
−20
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90
>100
11
0%10%20%30%
100%
<−100
−90
−80
−70
−60
−50
−40
−30
−20
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90
>100
11
0%10%20%30%
100%
<−100
−90
−80
−70
−60
−50
−40
−30
−20
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90
>100
Figure S5A Sanger Verification of CNP (2',3'-Cyclic Nucleotide 3' Phosphodiesterase) A| Simplification of gel images. Each band is annotated with a letter and the corresponding alignment of Sanger sequences are shown in C. Each band is also annotated for if the it was Sanger sequenced and if the Sanger sequence mapped to the correct position in the genome. For Oligo 1.1, Oligo 5.1, Oligo 5.2 and VLMC 2 Pacbio single-cell data exists (marked in black). For cell Oligo 1.2 Pacbio and limited Sanger sequencing exists (marked in red). Additionally Sanger sequencing was done on a cell not sequenced by Pacbio (Oligo 1 external cell, marked in blue) B| Gel visualization of PCR products C| Genomic position of the different primerpairs. The number in brackets is the expected PCR product length according to according to NCBI’s Primer-Blast.
1 2 3 4
1 2 3 4
chr11:
5 kb mm10100,576,000 100,578,000 100,580,000 100,582,000
PP1 (210)
PP2 (Na)
PP3 (2069)
PP4 (855)
1a1b 2a
3a3b
3c
Annotated isoforms RefSeq
Sanger sequencing
PacBio sequencing
Primer pairs
Olig
o 1.
1O
ligo
1.2
Olig
o 5.
1 O
ligo
5.2
VLM
C 2
100
500
100
500
PP1
PP2
PP3
PP4
Ladder
abc
a
a
b
a
a
a
b
b
Sanger seq didn’t workSequenced and Correct mapping
Not sequenced but Correct mapping in another cellNot sequenced
b
4a
Olig
o 1.
2 O
ligo
1 in
depe
nden
tO
ligo
1.2
Olig
o 1
inde
pend
ent
Sanger verified exon isoforms
A
B
C
D
E
F
Exons: 1 2 3 45’ 3’
1159
2
333
Verified Not verified
D| Sanger sequencing results mapped to the mouse genome E| Isoforms verified by Sanger sequencing:
• PP1 verified that exon 1 isn’t an artifact and that when exon 1 is prescent then exon 2 is truncated
• PP2 verified that full length exon 2 exists, but failed to show internal introns in exon 2 that could be seen in the sequencing data
• PP3 verified isoforms with exon 3 exists but failed to show isoforms without exon 3
• PP4 failed to verify exon 4 internal introns. The number to the left of each isoform indicate how many instances of that exon isoform that were sequenced in all single cells combined. Note that most sequences comes from Oligo 1.1, and that sanger sequencing was done on an independent cell. F| Single cell isoforms from PacBio sequencing Na, Not applicable, i.e. no valid primer product in primer blast
Figure S5B Amplification-free PacBio sequencing of CNP (2',3'-Cyclic Nucleotide 3' Phosphodiesterase) The single cell PacBio exon isoforms that are verified by both sanger sequencing and amplification-free bulk sequencing are colored green. If either sanger sequencing or bulk sequencing could verify an isoform it is colored yellow. If no method could verify the isoform it is colored red.
chr11:
2 kb mm10
100,576,000 100,578,000 100,580,000 100,582,000
Amplification-free bulk sequencing
Annotated isoforms RefSeq
Verification of PacBio suggested exon isoforms
1159
2
333
Verified by either sanger and amplification-free sequencingVerified by both sanger and amplification-free sequencing
Neither verified by sanger nor amplification-free sequencing
Figure S5C Sanger Verification Hmgcs1 (3-Hydroxy-3-Methylglutaryl-CoA Synthase 1) A| Simplification of gel images. Each band is annotated with a letter and the corresponding alignment of Sanger sequences are shown in C. Each band is also annotated for if the it was Sanger sequenced and if the Sanger sequence mapped to the correct position in the genome. For Oligo 1.1 and Oligo 5.2 Pacbio single-cell data exists (marked in black). For cell Oligo 1.2 Pacbio and limited Sanger sequencing exists (marked in red). Additionally Sanger sequencing was done on a cell not sequenced by Pacbio (Oligo 1 external cell, marked in blue) B| Gel visualization of PCR products C| Genomic position of the different primerpairs. The number in brackets is the expected PCR product length according to according to NCBIs Primer-Blast.
1 2 3 4 5 6
a
b
a
a
chr13:5 kb mm10
119,695,000 119,700,000 119,705,000
PP1 (151, 210) PP2 (456)O
ligo
1 in
depe
nden
t O
ligo
1.2
Olig
o 1.
2
a
a
2a1a1a1c1b
1 2
1 2
a
abc
100
500
100
500
Annotated isoforms RefSeq
Olig
o 1.
1O
ligo
1.2
Olig
o 5.
2
Sanger sequencing
PacBio seqeucning
Primer pairs
Olig
o 1
inde
pend
ent
Ladd
er
PP1
PP2
Sanger verified exon isoforms33
21
7
25
Sanger seq didn’t workSequenced and Correct mapping
Not sequenced but Correct mapping in another cellNot sequenced
Verified Not verified
A
B
C
D
E
F
Exons: 1 2 3 4 5 6 7 8 910 11
100
D| Sanger sequencing results mapped to the mouse genome E| Isoforms verified by Sanger sequencing:
• PP1 verified the existence of three isoforms of exon 2, either it is absent, or it has a short or long version of the exon.
• PP2 failed to verify internal introns in exon 11 The number to the left of each isoform indicate how many instances of that exon isoform that were sequenced in all single cells combined. F| Single cell isoforms from PacBio sequencing
Figure S5D Amplification-free PacBio sequencing of Hmgcs1 (3-Hydroxy-3-Methylglutaryl-CoA Synthase 1) The single cell PacBio exon isoforms that are verified by both sanger sequencing and amplification-free bulk sequencing are colored green. If either sanger sequencing or bulk sequencing could verify an isoform it is colored yellow. If no method could verify the isoform it is colored red.
chr13:
10 kb mm10
119,690,000 119,695,000 119,700,000 119,705,000 119,710,000
Amplification-free bulk sequencing
Annotated isoforms RefSeq
33
21
7
25
Verification of PacBio suggested exon isoforms
Verified by either sanger and amplification-free sequencingVerified by both sanger and amplification-free sequencing
Neither verified by sanger nor amplification-free sequencing
Figure S5E Sanger Verification Mbp (Myelin Basic Protein) A| Simplification of gel images. Each band is annotated with a letter and the corresponding alignment of Sanger sequences are shown in C. Each band is also annotated for if the it was Sanger sequenced and if the Sanger sequence mapped to the correct position in the genome. For Oligo 1.1 Pacbio sequencing, PCR amplification and limited Sanger sequencing data exists (marked in green). For Oligo 1.2 Pacbio and Sanger sequencing exists (marked in red). For Oligo 5.1 and Oligo 5.2 Pacbio single-cell data exists (marked in black). Additionally Sanger sequencing was done on a cell not sequenced by Pacbio (Oligo 1 external cell, marked in blue) B| Gel visualization of PCR products C| Genomic position of the different primerpairs. The number in brackets is the expected PCR product length according to according to NCBI’s Primer-Blast. Note that PP6, PP7 and PP8 are spanning exon junctions.
chr18:10 kb mm10
82,560,000 82,565,000 82,570,000 82,575,000 82,580,000 82,585,000
100
500La
dder
Ladd
erPP1
PP2
PP3
PP4
PP5
PP6
PP7
PP8
PP1 (173,251)
PP2 (258,291,336,369)
PP3 (250,283,373,406)
PP4 (354,387,465,477,510,555,588)
PP5 (1289)
PP6 (101) PP7 (185)
PP8 (80)
ab
ab
a
baba
ab
ab a
b
a
ab
ab
ab
ab
ab
ab
ab a
b
acd
abc
a
a
a
a
a
5e1b
4a4b
4a
2b2a
4c
1a
4b
6a3a
7b
5a
e
5c5b
Olig
o 1.
1O
ligo
1.2
Olig
o 1
inde
pend
ent
Olig
o 1.
1O
ligo
1.2
Olig
o 5.
1 O
ligo
5.2
Anno
tate
d Re
fSeq
Isof
orm
s
100
500
100
500
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
Sanger sequencing
PacBio seqeucning
Primer pairsOligo 1.1
Oligo 1.2
Oligo 1 independent
Sanger verified isoforms163516
85*
A
B
C
D
E
FSanger seq didn’t workSequenced and Correct mapping
Not sequenced but Correct mapping in another cellNot sequenced
25
Exons: 1 2 3 4 65 7
Verified Not verifiedNot tested
D| Sanger sequencing results mapped to the mouse genome E| Isoforms verified by Sanger sequencing:
• PP1 and PP6 verifies both isoforms of exon 2. (Note PP6 upstream primer spans the junction between exon 2 and 3)
• PP2 also verifies both isoforms of exon 2 and that exon 3, 4 and 5 are expressed together.
• PP3 verified one of the isoforms with exon 6, however the isoform without exon 6 could be seen by gel electrophoresis but not be verified by Sanger sequencing
• PP4 verifies both the existence of 2 isoforms of exon 2 and 2 isoforms of exon 6
• PP5 verified the existence of internal introns in exon 7, and that those isoforms varies from cell to cell.
• PP7 verified the connection between exon 5 and 7 (note that the upstream primer spans the exon junction between exon 5 and 7).
• PP8 failed to map. (Note that the downstream primer spans the exon junction between exon 6 and 7).
The number to the left of each isoform indicate how many instances of that exon isoform that were sequenced in all single cells combined. The (*) indicated all isoforms with internal intron in exon 7. Note that not all exon isoforms were attempted to be verified by Sanger sequencing, for example the isoform spanning only exon 1 and 7. F| Single cell isoforms from PacBio sequencing
chr18:10 kb mm10
82,560,000 82,570,000 82,580,000 82,590,000
Amplification-free bulk sequencing
Annotated isoforms RefSeq
Exon isoforms single cell sequencing
8
1616
36
255*
Verified by either sanger and amplification-free sequencingVerified by both sanger and amplification-free sequencing
Neither verified by sanger nor amplification-free sequencing
Figure S5F Amplification-free PacBio sequencing of Mbp (Myelin Basic Protein) The single cell PacBio exon isoforms that are verified by both sanger sequencing and amplification-free bulk sequencing are colored green. If either sanger sequencing or bulk sequencing could verify an isoform it is colored yellow. If no method could verify the isoform it is colored red. The star denotes that all isoforms with 3ʹ UTR introns are calculated, regardless of other exon structures and the length of the intron.
Figure S5G Sanger Verification Mobp (Myelin-Associated Oligodendrocyte Basic Protein) A| Simplification of gel images. Each band is annotated with a letter and the corresponding alignment of Sanger sequences are shown in C. Each band is also annotated for if the it was Sanger sequenced and if the Sanger sequence mapped to the correct position in the genome. For Oligo 1.1, Oligo 5.1 and Oligo 5.2 Pacbio single-cell data exists (marked in black). For cell Oligo 1.2 Pacbio and limited Sanger sequencing exists (marked in red). Additionally Sanger sequencing was done on a cell not sequenced by Pacbio (Oligo 1 external cell, marked in blue) B| Gel visualization of PCR products C| Genomic position of the different primerpairs. The number in brackets is the expected PCR product length according to according to NCBI’s Primer-Blast. D| Sanger sequencing results mapped to the mouse genome
1 2 3 4 5 6
1a3b3a
4a3c6a
5b5a
PP1 (Na) PP2 (408)
PP4 (512)
PP5 (2654)
PP6 (Na)
PP3 (349)O
ligo
1.2
Olig
o 1
inde
pend
ent
a a
a
a
a
aa
b
b
c
Annotated isoforms RefSeq
Olig
o 1.
2
Olig
o 5.
1
Olig
o 5.
2
1 2 3 4 5 6
100
500
100
500
Olig
o 1.
2 O
ligo
1 in
depe
nden
t
Primer pairs
Sanger sequencing
chr9:10 kb mm10
120,155,000 120,160,000 120,165,000 120,170,000 120,175,000 120,180,000
Olig
o 1.
1
PacBio seqeucning
Sanger verified isoforms
PP1
PP2
PP3
PP4
PP5
PP6
Ladd
er
Sanger seq didn’t workSequenced and Correct mapping
Not sequenced but Correct mapping in another cellNot sequenced
A
B
C
D
E
F
3
178
74
2 Verified
Not verifiedNot tested
Exons: 1 2 3 4 5
3
E| Isoforms verified by Sanger sequencing:
• PP1 verifies the existence of an exon between exon 2 and 3. Note that the forward Sanger sequencing primer is on exon 2.
• PP2 didn’t work • PP3 verifies two different lengths of exon 4 • PP4 verifies a long version of exon 5 • PP5 verifies an internal intron in exon 5 • PP6 verifies that if exon 5 is short if exon 6 and 7 exist
The number to the left of each isoform indicate how many instances of that exon isoform that were sequenced in all single cells combined. Due to space limitation only exon isoforms with more than 1 UMI are shown. F| Single cell isoforms from PacBio sequencing Na, Not applicable, i.e. no valid primer product in primer blast
Figure S5H Amplification-free PacBio sequencing of Mobp (Myelin-Associated Oligodendrocyte Basic Protein) The single cell PacBio exon isoforms that are verified by both sanger sequencing and amplification-free bulk sequencing are colored green. If either sanger sequencing or bulk sequencing could verify an isoform it is colored yellow. If no method could verify the isoform it is colored red.
chr9:10 kb mm10
120,150,000 120,160,000 120,170,000 120,180,000
Amplification-free bulk sequencing
Annotated isoforms RefSeq
Exon isoforms single cell sequencing
3
178
74
23
Verified by either sanger and amplification-free sequencingVerified by both sanger and amplification-free sequencing
Neither verified by sanger nor amplification-free sequencing
Figure S5I Sanger Verification Plp1 (Proteolipid Protein 1) A| Simplification of gel images. Each band is annotated with a letter and the corresponding alignment of Sanger sequences are shown in C. Each band is also annotated for if the it was Sanger sequenced and if the Sanger sequence mapped to the correct position in the genome. For all cells except Oligo 1.2 only Pacbio sequencing data exists (marked in black). For cell Oligo 1.2 Pacbio and limited Sanger sequencing exists (marked in red). Additionally Sanger sequencing was done on a cell not sequenced by Pacbio (Oligo 1 external cell, marked in blue) B| Gel visualization of PCR products C| Genomic position of the different primerpairs. The number in brackets is the expected PCR product length according to according to NCBI’s Primer-Blast. D| Sanger sequencing results mapped to the mouse genome
1 2 3
acd e
a
1 2 3
a
b
fa
chrX:
Oligo 1.1
Oligo 5.1
Oligo 5.2
VLMC 1
VLMC 2
10 kb mm10
136,820,000 136,825,000 136,830,000 136,835,000 136,840,000
PP1 (280, 385)
1a1b
PP2 (1921)
PP3 (1070)
3a
2f3a
2d2c2b2a
b
2e
Annotated isoforms Refseq
Oligo 1.2
Olig
o 1.
2 O
ligo
1 in
depe
nden
tO
ligo
1.2 a
b
Olig
o 1
inde
pend
ent
a
b
fa
100
500
100
500
Primer pairs
Sanger sequencing
Sanger verified isoforms
PacBio seqeucning
133408
2710
Verified
Not verifiedNot tested
Sanger seq didn’t workSequenced and Correct mapping
Not sequenced but Correct mapping in another cellNot sequenced
PP1
PP2
PP3
Ladd
er
A
B
C
D
E
F
Exons: 1 2 3 84 5 6
E| Isoforms verified by Sanger sequencing:
• PP1 verified two isoforms of exon 3 • PP2 and PP3 verified multiple internal intron of exon 8
The number to the left of each isoform indicate how many instances of that exon isoform that were sequenced in all single cells combined. Due to space limitation only exon isoforms with more than 1 UMI are shown. F| Single-cell isoforms from Pacbio sequencing. Note that for all cells except Oligo 1.2 the Pacbio sequencing isoforms have been collapsed due to space limitations. ¨
Figure S5J Amplification-free PacBio sequencing of Plp1 (Proteolipid Protein 1)
chrX:
10 kb mm10
136,825,000 136,830,000 136,835,000 136,840,000
Amplification-free bulk sequencing
Plp1
Annotated isoforms Refseq
Exon isoforms single cell sequencing
133408
2710
Verified by either sanger and amplification-free sequencingVerified by both sanger and amplification-free sequencing
Neither verified by sanger nor amplification-free sequencing
The single cell PacBio exon isoforms that are verified by both sanger sequencing and amplification-free bulk sequencing are colored green. If either sanger sequencing or bulk sequencing could verify an isoform it is colored yellow. If no method could verify the isoform it is colored red.
Figure S6 Major isoform usage. A) Percentage major isoform was calculated across all genes with more than 10 transcripts for each cell. The major isoform was defined per gene and cell and could therefore change between cells. B) Percentage major isoform and number of exons per gene. Each dot is a gene. C) Percentage major isoform and gene expression. Each dot is a gene.
Figure S7 Zoom-in on coding and non-coding exon splice sites. Histogram showing the offset from median for splice sites, in terms of percentage of zero offset. The data for coding and non-coding region is the same as in figure 3A, but. The data for within transcript variation is the same as in Additional file 1: Figure S3, but zoomed-in. Note that the Y-axis is truncated and that the X-axis is shown in bins of 10 bp outside the region of ± 10 bp.
60558
0.00%
0.25%
0.50%
0.75%
100%
2595
0.00%
0.25%
0.50%
0.75%
100%
Transcripts5 prime 3 prime
a b c d e f
Codi
ngN
on-c
odin
gW
ithin
UM
I var
iatio
n
Position “b”
Non
-cod
ing
With
in U
MI v
aria
tion
Position “c”
Codi
ng
Codi
ngN
on-c
odin
gW
ithin
UM
I var
iatio
n
Position “d”
Codi
ngN
on-c
odin
gW
ithin
UM
I var
iatio
n
Position “e”
119793
0.00%
0.25%
0.50%
0.75%
100%
<−100
−90
−80
−70
−60
−50
−40
−30
−20
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90
>100
24403
0.00%
0.25%
0.50%
0.75%
100%
<−100
−90
−80
−70
−60
−50
−40
−30
−20
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90
>100
9592
0.00%
0.25%
0.50%
0.75%
100%
58828
0.00%
0.25%
0.50%
0.75%
100%12075
0.00%
0.25%
0.50%
0.75%
100%
3455
0.00%
0.25%
0.50%
0.75%
100%
4325
0.00%
0.25%
0.50%
0.75%
100%972
0.00%
0.25%
0.50%
0.75%
100%
119793
0.00%
0.25%
0.50%
0.75%
100%
<−100
−90
−80
−70
−60
−50
−40
−30
−20
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90
>100
24403
0.00%
0.25%
0.50%
0.75%
100%
<−100
−90
−80
−70
−60
−50
−40
−30
−20
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90
>100
Figure S8: Example ideal read >m150922_014300_42134_c100884922550000001823198604021672_s1_p0/3975/ccs GATCTCTACTATATGCGAATGATACGGCGACCACCGATGGTATAGGGATCAGTAGATTTGAACAGAGCCAGAGGTACCATGCTCTTGCCAGTAAAATTCTTTTCCCATGTTTTTGAATACATTTGTATTTAATGTGGCTGAAATGACAAAAACAAAGGTCCTAAATTTTAGAATAGATGTTATCTTTTTATTTAATCTTTTAAAAAAAAAACTGGATGTTTCAATCTTTTAAATTGCAAGACACAAGGCAATTCCAACTGGGCATTTCAATCTGGTTTTAGGTGCTTTAGAGATAATTGCTAGGCAGTGCCAAATGAAGGCATTAGTGCTTTGTACCCAGAAATTGGCCTCTACATGCAGCACTAAGATTAATGCAGATTTCTCTTGAACCTTCCAGTCTTCCTTGTTGGTGGTAATGTGCTTTGTTCTTTCCTTTTTTTCCTCAAGAAGAAATGTTAGTATGTCTTAGAACCTAGGTAACGAAGTGCACTTATGCTCATAAAATTGCTTGCACCTAATCCAAAAGTCGTGGTCTTGCCTTAGTCCATTAACATGGTCTTCATTTCCAGTTTGCATTGACTGCTACAGCACTTACTAATTTGGCTTTTATGTTTAAATGGTTCTGTGATAATCAAAACTTAAATAGTTACTGTTCGTTCATTGTGGTACTTTTGTATTCAGTTCTGTACTCAGAGATGGCACCAATTGTTCATTTACTATGCACCACCTACTGCTTATATAAGAGAACTAATAACTCTTTCAGCCGATACTTAACACATCATGTATTCCTTGCAGCTTCTAGGACATCTTCCATACATGTATGGATGTGTACATACATATACATGTGTACATACTGGTATGGCCCAGCACATATTTCTTAAGCTCTGCAGCCATATATAGAAGTCCTTGTTATTTGGGAGGACAGATTAAATGCCTTTCATTCACAAGAAAATGAAGAGCTCTCTGGGTTCCAAAATGCTGATGTTGAATACATGATATATATGCACACATATAGGCACTTGCATGTTAGCATGTTCATGTCTGTGACTGTCTTTTGAGTTATATAGGAGTGGCAGAGATTGTGTCACTTTAACTTGTTGTCGACTTAGCCTGAAAACCTTACTAGTTCAGAATCATACTCCTGACTGTCTTCTGGAGCAACAGGGAACTGCATGGCCAACTTAAAGTACTTGTTATAAATGAACCAGAAATCTGATTTTAAAATACATTTTTCATTTTAATTGTTTTTGCATTTTGTTTTGTTAATCCTTAAAGGATGGATTGTGCTTTTAAGAATGTAAGTGTTATGTATTTGTCATGAATCAACAAAACAGCTTTTAAAAAAGACTAATTGGAACAAGTGGGTGGCACTGGCTTAATGCTGTTAATGTTTCTAAACGATTTTACATTTAGATTATATATCGGATTCATATTGAGATACCTCCAAAGCACTGTTTTATAGAAAGATCTGACTTCAATAAATATTTTTGTATTTTACATGGGCCAGTTTATGTACCGCTATTTACACTATTATTTCCTATAGACAATACACACTGTCTTTGTAGTTTTTGTATTTTTGTTTTGTTTTACTGGTTTTGCTTCTTGATGGTAATATACTCTGTCTGGTGTGGGATATTTTCCACACTTTAGAATTTGTATAAGAAACTGGTCCATGTAAGTACTTTCCATGTTTTCTCTTCAAATGTTTAAAGTGCTAGCTGATGTACGTACGATATCTCTTCTCAGATATTTGCCTGTCTGTTTGCCCAAAAATTGCTTCTAAATCAATAAAGATTCTTTTATTTCTTAAGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACGATCGGTGGTCGCCGTATCATTCGCATATAGTAGAGA Illumina adaptor PacBio barcodes – start PacBio barcodes – end UMI Site of template switching PolyA / PolyT
Figure S9: Example concatenated read >m150920_000225_42134_c100885002550000001823198604021652_s1_p0/1024/ccs TACTAGAGTAGCACTCGAATGATACGGCGACCACCGATGGAAGAGGGTGTCCTGACTTCCTTTGATGATGAACAGTGATGTGAAAGTGTAAGCCAAACCTTTTCCTCCTCAACTTCCTTTTGGTCATGGTTTTTCGTCACAGCAATAGAAACCCTAATTAGGATATATCCCTGAAACTGGAGTTACAGAAGCTTGTGAGTGGCCATGTGGTGCCAGGAATACAACCCAGCTCCTCCAGAAGGGCAGCCAGTGCACTTAACCACGGAGCCATCTCTCCAGCCCTAGTTAACCTTTAAAAGTAATACATTACTATTTACAAATTAGTATATGTCAAACATTTTCTTCCTCATGTTGAAATGAAATTCTATTCTTCCTGTGATCTTGTTTGTGTATTCTTTGGATGTTATAACTTGCCAAAAATCCTTGATATTATAGCTTGCCAATGTTTGCTGACAAATGTAATTCTTTAGAAAAGTTTGTTTTTCATAAACATATTTAGTTTTATTGTAATATTTATCTGACCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACGATCGGTGGTCGCCGTATCATTCGAGTGCTACTCTAGTATACTAGAGTAGCACTCGAATGATACGGCGACCACCGATCTTGCAGGGACCGTCGGGGCTTCCTCGACGAGGCCGTTCGGAAGGTCTCCTGCTCCGTCTCGAGAGCTGCTTTCTCCTTCCGCACACGCTACCCGGCTGCTGCGGCCCCAGAACGCCCGGGTGAGGAGTTGGTTGTAGTGAGCAGTTCCGATCCCTTGGGCTACCGGCGGCGAGCGCCCGAGCCGCTCCTCCCAATGGCGAAGAAGACGTACGACCTGCTTTTCAAGCTGCTCCTGATCGGGACTCGGGAGTGGGCAAGACCTGCGTCCTTTTTCGTTTTTCGGACGATGCCTTCAATACCACCTTTATTTCCACCATAGGAATAGACTTTAAGATCAAAACAGTGGAACTACAAGGAAAGAAGATCAAGCTACAGATATGGGACACAGCAGGCCAGGAGCGATTTCACACCATCACAACCTCCTACTACAGAGGAGCAATGGGCATCATGCTAGTGTATGACATCACCAACGGTAAAAGCTTTGAGAACATCAGCAAGTGGCTTAGAAACATAGATGAGCATGCCAATGAAGATGTGGAAAGAATGTTACTAGGGAACAAGTGTGACATGGACGACAAGAGAGTTGTACCGAAAGGCAAAGGAGAACAGATTGCAAGGGAGCATGGTATTAGGTTTTTGAGACTAGTGCAAAAGCAAATATAAACATCGAAAAGGCGTTCTCACATTAGCTGAAGACATCCTCCGAAAGACCCCTGTAAAAGAACCCAACAGTGAAAACGTAGATATCAGCAGTGGAGGAGGCGTGACGGGCTGGAAGAGCAAGTGCTGCTGAGTGCTCTCCTGTCCATCTGCTGCCATCCACCATCCGGTTCTCTTCTTGCTGCAAAATAAAACACTCTGTCCATTTTTAACTCTAAACAGATATTTTTGTTTCTCATCTTAACTATTCAATCCACCTATTTTATTTGTTCTTTCATCTGTGACTGCTTGCGGACTATTATAATTTTCTTCAAACAAACAAACAAAAATGTATAGAGAAATCATGTCTGTGAGTTCATTTTGAGATTTACTTGCTCACTCAGCCCTGCACTTCAGTTGTATTATAGTCCAGTTCTTATCAACATTAAACTAGAGCAATCATTTTCAACTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACGATCGGTGGTCGCCGTATCATTCGAGTGCTACTCTAGTATCAGACGATGCGTCATGAATGATACGGCGACCACCGATCGTTTTTTTTTTTTTTTTTTTTTTTTTTTCTTCGGCTTCCTACTACCTGTGCCTTTAAGTACTTTTCTGTCTTCTAAAGATGCTTTGTCTTCTATTTTCCTATCTTCTCTAACAGCTTTTTCTTCACCTTCCTGTATGTCTTCTTCGTCTTTCACTTCCATTTTCTAATCTTCTTTCTCCAAATACTAGCCTAACCTTTTTGTTTTTTCAAGACAGGGTTTCTCTGTGTAGCCCTGGCTGTCCTGGAACTCACTCTGTAGACCAGGCTGGCCTCGAACTCAGAAATCCGCCTGCCTCTGCCTCCCAAGTGCTAGGATTAAAGGCGTGCGCCACCATCGCCCGGCTAGCCTAACACTTCCATCTTCCCTGTTTCCAACTCATGCCTTCAAACGTACATGCGCCAGTCGAAGGATTTTTATAACGGCCGTCAACTTAACCTACTTTTCTCAATCTTTTAAAACTACACCTCATTTTCAAACTATTCTCTACAACTTTACTGTTTTAAATGCCACTTAGATTCTATTTAGGTAACCTTCGTTTTAATCTACAAGGCCGACCTTCAAACTAGAACCTTTTAGAACTTCACAAAACCTCCCTTTACAATCTCCTAAACTGCTCTGGTCAGCCTCCATTATACAGTACAAATACTTTTCTCCTCATTTCCTTTTTCAAACTGTTTAAATAAGCCTTCATGTTATCTCTTAAGATCTTCTTAGATTATTAAGACTAAGTAATATTTCGATAAGCTACTCTATTAGCTTAAGTTTAGAGTTCTAATTCTTTTTACTGCTCAATCTTTTTAATTAAAAACTTATCTGCGATTTCCTCGGGCTGAGTCTCCTGCCTCACGAGCTCAGCTGTGCTGCTCTACGCTGCTCTGCTCTCGCTGCCTGAATGCCTCCCAACCACATCGCCCTCCAATATCGGTGGTCGCCGTATCATTCATGACGCATCGTCTGA Illumina adaptor PacBio barcodes – start PacBio barcodes – end UMI Site of template switching PolyA / PolyT
Figure S10: List of custom oligonucleotides (5ʹ to 3ʹ direction): Idx1: TCAGACGATGCGTCATGAATGATACGGCGACCACCGAT Idx2: CTATACATGACTCTGCGAATGATACGGCGACCACCGAT Idx3: TACTAGAGTAGCACTCGAATGATACGGCGACCACCGAT Idx4: TGTGTATCAGTACATGGAATGATACGGCGACCACCGAT Idx5: ACACGCATGACACACTGAATGATACGGCGACCACCGAT Idx6: GATCTCTACTATATGCGAATGATACGGCGACCACCGAT C1-P1-T31: Bio-AATGATACGGCGACCACCGATCGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT C1-P1-RNA-TSO: Bio-AAUGAUACGGCGACCACCGAUNNNNNGGG (RNA)