6 Month Allelic Series RNAseq QC

30
6 Month Allelic Series RNAseq QC 1

description

6 Month Allelic Series RNAseq QC. QC summary. - PowerPoint PPT Presentation

Transcript of 6 Month Allelic Series RNAseq QC

Page 1: 6 Month Allelic Series  RNAseq  QC

6 Month Allelic Series RNAseq QC

1

Page 2: 6 Month Allelic Series  RNAseq  QC

2

QC summary

QC was performed on all 192 samples focusing on determining failed or outliersamples. Four samples are recommended for omission from the final analysisdataset based on evidence of RNA degradation, PCA analysis, and model-basedgene outlier detection. Those four samples can be found on slide 19.

Additionally two correctable issues were identified. First, one flowcell worth of samples was run an additional time to add read depth to the 100 million required.This re-run was inadvertently run as 75-mers instead of 50-mer so the samplesare a mix of read length. Secondly, for a subset of cortex samples (Q92 and Q140)there appears to be an infinitesimal but detectable amount of liver tissue. Theoverall dilution is 500-1000x, but given the extraordinary sensitivity of RNAseq thisis still measureable. We have recommended a simple filter to remove those liver transcripts based on the fact that they have a recognizable correlation pattern (listed on slide 29), but other methods may be more sensitive.

Page 3: 6 Month Allelic Series  RNAseq  QC

3

How does CHDI QC RNAseq data in general?

• Mostly we’re looking for outliers• Also showing overall experiment

worked• When we find outliers, we try to

determine the cause– That helps show it is an outlier and

not part of the biology• Methods

– Principal Components Analysis– RNA degradation plots– Paired end insert size– Read lengths– Read mapping efficiency– Repetitive sequences and their origin– Highly expressed genes– # gene outliers

-300 -200 -100 0 100 200 300Comp.1 [17.40%]

-200

-150

-100

-50

0

50

100

150

200

Com

p.2

[6.7

7%]

Page 4: 6 Month Allelic Series  RNAseq  QC

4

PCA whole datasetColor by Tissue

cortexLiverstriatum

Shape by SexFM

• Not surprisingly, tissuescluster

• Strong sex effect in liver• Cortex is tightly clustered

Page 5: 6 Month Allelic Series  RNAseq  QC

5

PCA striatumColor by Q Length

Q111Q140Q175Q20Q80Q92

Shape by GenotypeHET

• Q lengths cluster, good sign the design worked

• Q92, 111, 140, 175 uniquelycluster

• They even stagger in Q length order

• Couple potential outliers (in red outline)

30811_718L_striatum_Q80_HET_F_L5.LB14_1.clipped

35481_833L_striatum_Q92_HET_M_L3.LB10_1.clipped

Page 6: 6 Month Allelic Series  RNAseq  QC

6

PCA cortexColor by Q Length

Q111Q140Q175Q20Q80Q92

• Only Q175 stands outsidethe main cluster

• Possible Q175 outliers,but hard to be certain

20947_452L_cortex_Q175_HET_M_L8.LB2_1.clipped

21051_460L_cortex_Q175_HET_F_L3.LB6_1.clipped

Page 7: 6 Month Allelic Series  RNAseq  QC

7

PCA liverColor by Q Length

Q111Q140Q175Q20Q80Q92

Shape by SexFM

• Strong sex clustering willneed to be accounted for

• No strong Q clusters (sex masking?)

• One potential outlier

450_Liver_Q175_HET_M_L8.LB1_1.clipped

Page 8: 6 Month Allelic Series  RNAseq  QC

8

Duplication in brain (representative examples)

20914_1_449L_striatum_Q175_WT_M_L1.LB1_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

20914_1_449L_striatum_Q175_WT_M_L1.LB1_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

20921_1_450L_cortex_Q175_HET_M_L2._1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0102030405060

Perce

ntage

20921_1_450L_cortex_Q175_HET_M_L2._2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0102030405060

Perce

ntage

20927_1_450L_striatum_Q175_HET_M_L1.LB2_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

20927_1_450L_striatum_Q175_HET_M_L1.LB2_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

20940_1_451L_striatum_Q175_WT_M_L6.LB3_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

20940_1_451L_striatum_Q175_WT_M_L6.LB3_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

Perce

ntage

20947_1_452L_cortex_Q175_HET_M_L8.LB2_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0102030405060

Perce

ntage

20947_1_452L_cortex_Q175_HET_M_L8.LB2_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

20953_1_452L_striatum_Q175_HET_M_L8.LB4_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

20953_1_452L_striatum_Q175_HET_M_L8.LB4_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

20966_1_453L_striatum_Q175_WT_M_L2.LB5_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50Pe

rcenta

ge20966_1_453L_striatum_Q175_WT_M_L2.LB5_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

20973_1_454L_cortex_Q175_HET_M_L4.LB3_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

20973_1_454L_cortex_Q175_HET_M_L4.LB3_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

20979_1_454L_striatum_Q175_HET_M_L4.LB6_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

Perce

ntage

20979_1_454L_striatum_Q175_HET_M_L4.LB6_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

20992_1_455L_striatum_Q175_WT_M_L2.LB7_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

20992_1_455L_striatum_Q175_WT_M_L2.LB7_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

20999_1_456L_cortex_Q175_HET_M_L5.LB4_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

20999_1_456L_cortex_Q175_HET_M_L5.LB4_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

21005_1_456L_striatum_Q175_HET_M_L5.LB8_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0102030405060

Perce

ntage

21005_1_456L_striatum_Q175_HET_M_L5.LB8_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

21018_1_457L_striatum_Q175_WT_F_L1.LB9_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

• Duplication is consistent andhovers between 13-24%

• No red flags• Higher in striatum than

cortex generally• Origin of the majority of the

duplicated sequences is mitochondrial

Page 9: 6 Month Allelic Series  RNAseq  QC

9

Liver duplication (representative examples)520_Liver_Q111_HET_M_L8.LB12_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

520_Liver_Q111_HET_M_L8.LB12_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0102030405060

Perce

ntage

522_Liver_Q111_HET_F_L8.LB13_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

05

10152025303540

Perce

ntage

522_Liver_Q111_HET_F_L8.LB13_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0102030405060

Perce

ntage

524_Liver_Q111_HET_F_L4.LB14_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0102030405060

Perce

ntage

524_Liver_Q111_HET_F_L4.LB14_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

526_Liver_Q111_HET_F_L4.LB15_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

526_Liver_Q111_HET_F_L4.LB15_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0102030405060

Perce

ntage

528_Liver_Q111_HET_F_L6.LB16_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

528_Liver_Q111_HET_F_L6.LB16_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

642_Liver_Q20_HET_M_L3.LB18_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0102030405060

Perce

ntage

642_Liver_Q20_HET_M_L3.LB18_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

644_Liver_Q20_HET_M_L4.LB19_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50Pe

rcenta

ge644_Liver_Q20_HET_M_L4.LB19_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0102030405060

Perce

ntage

646_Liver_Q20_HET_M_L6.LB20_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

646_Liver_Q20_HET_M_L6.LB20_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0102030405060

Perc

entag

e

648_Liver_Q20_HET_M_L1.LB21_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0102030405060

Perc

entag

e

648_Liver_Q20_HET_M_L1.LB21_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0102030405060

Perc

entag

e

650_Liver_Q20_HET_F_L7.LB22_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perc

entag

e

650_Liver_Q20_HET_F_L7.LB22_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perc

entag

e

652_Liver_Q20_HET_F_L2.LB23_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0102030405060

Perce

ntage

652_Liver_Q20_HET_F_L2.LB23_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

654_Liver_Q20_HET_F_L1.LB25_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0102030405060

Perce

ntage

654_Liver_Q20_HET_F_L1.LB25_2.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

656_Liver_Q20_HET_F_L6.LB27_1.clipped.fastq

1 3 5 7 9 10+DuplicationLevel

0

10

20

30

40

50

Perce

ntage

• Liver duplication is much higher, 40-50%

• Major duplicated sequences are all mouse pheromone receptors (Mup1-21)

• Hurts our true read depth, but nothing terrible

• Should keep in mind for future liver work

Page 10: 6 Month Allelic Series  RNAseq  QC

10

20914_449L_striatum_Q175_WT_M_L1.LB1_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

20921_450L_cortex_Q175_HET_M_L2._1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

20927_450L_striatum_Q175_HET_M_L1.LB2_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

20940_451L_striatum_Q175_WT_M_L6.LB3_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

20947_452L_cortex_Q175_HET_M_L8.LB2_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

20953_452L_striatum_Q175_HET_M_L8.LB4_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

20966_453L_striatum_Q175_WT_M_L2.LB5_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

20973_454L_cortex_Q175_HET_M_L4.LB3_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

20979_454L_striatum_Q175_HET_M_L4.LB6_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

20992_455L_striatum_Q175_WT_M_L2.LB7_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

20999_456L_cortex_Q175_HET_M_L5.LB4_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

21005_456L_striatum_Q175_HET_M_L5.LB8_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

21018_457L_striatum_Q175_WT_F_L1.LB9_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

21025_458L_cortex_Q175_HET_F_L6.LB6_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

21031_458L_striatum_Q175_HET_F_L8.LB10_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

21044_459L_striatum_Q175_WT_F_L6.LB11_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

21051_460L_cortex_Q175_HET_F_L3.LB6_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

21057_460L_striatum_Q175_HET_F_L8.LB12_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

21070_461L_striatum_Q175_WT_F_L3.LB13_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

21077_462L_cortex_Q175_HET_F_L7.LB7_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

21083_462L_striatum_Q175_HET_F_L6.LB14_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

21096_463L_striatum_Q175_WT_F_L1.LB15_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

21103_464L_cortex_Q175_HET_F_L3.LB8_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

21109_464L_striatum_Q175_HET_F_L5.LB16_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

23346_513L_striatum_Q111_WT_M_L4.LB18_1.clipped

Bin1 Bin18 Bin35 Bin52 Bin69 Bin860

0.2

0.4

0.6

0.8

1

5’ -> 3’ degradation charts (representative examples)Color by TranscriptBin

1-499500-9991000-19992000-29993000-39994000-49995000+

Displays the likelihood of gettingfull length transcripts for variousmRNA lengths• Very high quality samples in general• Most samples show >70% of all mRNA

molecules are >80% complete• Liver on average more degraded• Some samples have degradation in the

longer mRNA species (one marked in red)

Page 11: 6 Month Allelic Series  RNAseq  QC

11

Suspect samples by 5’ -> 3’ degradation

454_Liver_Q175_HET_M_L3._1.clipped456_Liver_Q175_HET_M_L7.LB4_1.clipped20947_452L_cortex_Q175_HET_M_L8.LB2_1.clipped845_Liver_Q92_HET_F_L6.LB25_1.clipped522_Liver_Q111_HET_F_L8.LB13_1.clipped452_Liver_Q175_HET_M_L1.LB2_1.clipped776_Liver_Q140_HET_F_L8.LB13_1.clipped716_Liver_Q80_HET_F_L7.LB6_1.clipped843_Liver_Q92_HET_F_L6.LB23_1.clipped21051_460L_cortex_Q175_HET_F_L3.LB6_1.clipped

Page 12: 6 Month Allelic Series  RNAseq  QC

12

GC content per read has a red flag20914_1_449L_striatum_Q175_WT_M_L1.LB1_1.clipped.fastq

0 10 20 30 40 50 60GC#

0

1000000

2000000

3000000

4000000

Coun

t

20914_1_449L_striatum_Q175_WT_M_L1.LB1_2.clipped.fastq

0 10 20 30 40 50 60GC#

0

1000000

2000000

3000000

4000000

Coun

t

20921_1_450L_cortex_Q175_HET_M_L2._1.clipped.fastq

0 10 20 30 40 50 60 70 80GC#

0500000

10000001500000200000025000003000000

Coun

t

20921_1_450L_cortex_Q175_HET_M_L2._2.clipped.fastq

0 10 20 30 40 50 60 70 80GC#

0500000

10000001500000200000025000003000000

Coun

t

20927_1_450L_striatum_Q175_HET_M_L1.LB2_1.clipped.fastq

0 10 20 30 40 50 60GC#

0

1000000

2000000

3000000

4000000

5000000

Coun

t

20927_1_450L_striatum_Q175_HET_M_L1.LB2_2.clipped.fastq

0 10 20 30 40 50 60GC#

0

1000000

2000000

3000000

4000000

5000000

Coun

t

20940_1_451L_striatum_Q175_WT_M_L6.LB3_1.clipped.fastq

0 10 20 30 40 50 60GC#

0

1000000

2000000

3000000

4000000

5000000

Coun

t

20940_1_451L_striatum_Q175_WT_M_L6.LB3_2.clipped.fastq

0 10 20 30 40 50 60GC#

0

1000000

2000000

3000000

4000000

5000000

Coun

t

20947_1_452L_cortex_Q175_HET_M_L8.LB2_1.clipped.fastq

0 10 20 30 40 50 60GC#

0

1000000

2000000

3000000

4000000

Coun

t

20947_1_452L_cortex_Q175_HET_M_L8.LB2_2.clipped.fastq

0 10 20 30 40 50 60GC#

0

1000000

2000000

3000000

4000000

5000000

Coun

t

20953_1_452L_striatum_Q175_HET_M_L8.LB4_1.clipped.fastq

0 10 20 30 40 50 60GC#

0

1000000

2000000

3000000

4000000

5000000

Coun

t

20953_1_452L_striatum_Q175_HET_M_L8.LB4_2.clipped.fastq

0 10 20 30 40 50 60GC#

0

1000000

2000000

3000000

4000000

5000000

Coun

t

20966_1_453L_striatum_Q175_WT_M_L2.LB5_1.clipped.fastq

0 10 20 30 40 50 60GC#

0

1000000

2000000

3000000

4000000

5000000Co

unt

20966_1_453L_striatum_Q175_WT_M_L2.LB5_2.clipped.fastq

0 10 20 30 40 50 60GC#

0

1000000

2000000

3000000

4000000

5000000

Coun

t

20973_1_454L_cortex_Q175_HET_M_L4.LB3_1.clipped.fastq

0 10 20 30 40 50GC#

0

1000000

2000000

3000000

4000000

Coun

t

20973_1_454L_cortex_Q175_HET_M_L4.LB3_2.clipped.fastq

0 10 20 30 40 50 60GC#

0

1000000

2000000

3000000

4000000

Coun

t

20979_1_454L_striatum_Q175_HET_M_L4.LB6_1.clipped.fastq

0 10 20 30 40 50 60GC#

0

1000000

2000000

3000000

4000000

5000000

Coun

t

20979_1_454L_striatum_Q175_HET_M_L4.LB6_2.clipped.fastq

0 10 20 30 40 50 60GC#

0

1000000

2000000

3000000

4000000

5000000

Coun

t

20992_1_455L_striatum_Q175_WT_M_L2.LB7_1.clipped.fastq

0 10 20 30 40 50 60GC#

0

1000000

2000000

3000000

4000000

5000000

Coun

t

20992_1_455L_striatum_Q175_WT_M_L2.LB7_2.clipped.fastq

0 10 20 30 40 50 60GC#

0

1000000

2000000

3000000

4000000

5000000

Coun

t

20999_1_456L_cortex_Q175_HET_M_L5.LB4_1.clipped.fastq

0 10 20 30 40 50 60GC#

0500000

1000000150000020000002500000300000035000004000000

Coun

t

20999_1_456L_cortex_Q175_HET_M_L5.LB4_2.clipped.fastq

0 10 20 30 40 50GC#

0500000

1000000150000020000002500000300000035000004000000

Coun

t

21005_1_456L_striatum_Q175_HET_M_L5.LB8_1.clipped.fastq

0 10 20 30 40 50 60GC#

0

1000000

2000000

3000000

4000000

5000000

Coun

t

21005_1_456L_striatum_Q175_HET_M_L5.LB8_2.clipped.fastq

0 10 20 30 40 50 60GC#

0

1000000

2000000

3000000

4000000

5000000

Coun

t

21018_1_457L_striatum_Q175_WT_F_L1.LB9_1.clipped.fastq

0 10 20 30 40 50 60GC#

0

1000000

2000000

3000000

4000000

5000000

Coun

t

8 of the samples have a “shoulder”in the GC# chartThis is usually a really bad thing• Suggests non-mouse or

non-biological sequence

Page 13: 6 Month Allelic Series  RNAseq  QC

13

Those same samples flag for read length as well20914_1_449L_striatum_Q175_WT_M_L1.LB1_1.clipped.fastq

0 10 20 30 40 50 60SequenceLength

0100000002000000030000000400000005000000060000000

Coun

t

20914_1_449L_striatum_Q175_WT_M_L1.LB1_2.clipped.fastq

0 10 20 30 40 50 60SequenceLength

0100000002000000030000000400000005000000060000000

Coun

t

20921_1_450L_cortex_Q175_HET_M_L2._1.clipped.fastq

0 10 20 30 40 50 60 70 80SequenceLength

05000000

1000000015000000200000002500000030000000

Coun

t

20921_1_450L_cortex_Q175_HET_M_L2._2.clipped.fastq

0 10 20 30 40 50 60 70 80SequenceLength

05000000

1000000015000000200000002500000030000000

Coun

t

20927_1_450L_striatum_Q175_HET_M_L1.LB2_1.clipped.fastq

0 10 20 30 40 50 60SequenceLength

0100000002000000030000000400000005000000060000000

Coun

t

20927_1_450L_striatum_Q175_HET_M_L1.LB2_2.clipped.fastq

0 10 20 30 40 50 60SequenceLength

0100000002000000030000000400000005000000060000000

Coun

t

20940_1_451L_striatum_Q175_WT_M_L6.LB3_1.clipped.fastq

0 10 20 30 40 50 60SequenceLength

0100000002000000030000000400000005000000060000000

Coun

t20940_1_451L_striatum_Q175_WT_M_L6.LB3_2.clipped.fastq

0 10 20 30 40 50 60SequenceLength

010000000200000003000000040000000500000006000000070000000

Coun

t

20947_1_452L_cortex_Q175_HET_M_L8.LB2_1.clipped.fastq

0 10 20 30 40 50 60SequenceLength

05000000

100000001500000020000000250000003000000035000000

Coun

t

20947_1_452L_cortex_Q175_HET_M_L8.LB2_2.clipped.fastq

0 10 20 30 40 50 60SequenceLength

0100000002000000030000000400000005000000060000000

Coun

t

20953_1_452L_striatum_Q175_HET_M_L8.LB4_1.clipped.fastq

0 10 20 30 40 50 60SequenceLength

0

10000000

20000000

30000000

40000000

50000000

Coun

t

20953_1_452L_striatum_Q175_HET_M_L8.LB4_2.clipped.fastq

0 10 20 30 40 50 60SequenceLength

0100000002000000030000000400000005000000060000000

Coun

t

20966_1_453L_striatum_Q175_WT_M_L2.LB5_1.clipped.fastq

0 10 20 30 40 50 60SequenceLength

0100000002000000030000000400000005000000060000000

Coun

t

20966_1_453L_striatum_Q175_WT_M_L2.LB5_2.clipped.fastq

0 10 20 30 40 50 60SequenceLength

0100000002000000030000000400000005000000060000000

Coun

t

20973_1_454L_cortex_Q175_HET_M_L4.LB3_1.clipped.fastq

0 10 20 30 40 50 60SequenceLength

0100000002000000030000000400000005000000060000000

Coun

t

20973_1_454L_cortex_Q175_HET_M_L4.LB3_2.clipped.fastq

0 10 20 30 40 50 60SequenceLength

0

10000000

20000000

30000000

40000000

50000000

Coun

t

20979_1_454L_striatum_Q175_HET_M_L4.LB6_1.clipped.fastq

0 10 20 30 40 50 60SequenceLength

010000000200000003000000040000000500000006000000070000000

Coun

t

20979_1_454L_striatum_Q175_HET_M_L4.LB6_2.clipped.fastq

0 10 20 30 40 50 60SequenceLength

010000000200000003000000040000000500000006000000070000000

Coun

t

20992_1_455L_striatum_Q175_WT_M_L2.LB7_1.clipped.fastq

0 10 20 30 40 50 60SequenceLength

0100000002000000030000000400000005000000060000000

Coun

t

20992_1_455L_striatum_Q175_WT_M_L2.LB7_2.clipped.fastq

0 10 20 30 40 50 60SequenceLength

0100000002000000030000000400000005000000060000000

Coun

t

20999_1_456L_cortex_Q175_HET_M_L5.LB4_1.clipped.fastq

0 10 20 30 40 50 60SequenceLength

0

10000000

20000000

30000000

40000000

50000000

Coun

t

20999_1_456L_cortex_Q175_HET_M_L5.LB4_2.clipped.fastq

0 10 20 30 40 50 60SequenceLength

0

10000000

20000000

30000000

40000000

50000000

Coun

t

21005_1_456L_striatum_Q175_HET_M_L5.LB8_1.clipped.fastq

0 10 20 30 40 50 60SequenceLength

0

10000000

20000000

30000000

40000000

50000000

Coun

t

21005_1_456L_striatum_Q175_HET_M_L5.LB8_2.clipped.fastq

0 10 20 30 40 50 60SequenceLength

0100000002000000030000000400000005000000060000000

Coun

t

21018_1_457L_striatum_Q175_WT_F_L1.LB9_1.clipped.fastq

0 10 20 30 40 50 60SequenceLength

0100000002000000030000000400000005000000060000000

Coun

t

Those same samples have a mixof 50mer reads and 75mer readsThat’s very odd

At this point we asked our sequencinglab for clarification on what happened

Page 14: 6 Month Allelic Series  RNAseq  QC

14

Our sequencing partner found the cause

264401 20921_1_450L_cortex_Q175_HET_M 20130523 V02604 VIRT 2264416 23535_1_528L_cortex_Q111_HET_F 20130523 V02604 VIRT 3264418 28243_1_644L_cortex_Q20_HET_M 20130523 V02604 VIRT 4264397 35624_1_844L_striatum_Q92_WT_F 20130523 V02604 VIRT 1264447 35631_1_845L_cortex_Q92_HET_F 20130604 V02761 VIRT 1264448 35657_1_847L_cortex_Q92_HET_F 20130604 V02761 VIRT 2264451 454_Liver_Q175_HET_M 20130604 V02761 VIRT 3264455 462_Liver_Q175_HET_F 20130604 V02761 VIRT 4

The 8 suspect samples

For these 8 samples, the initial run didn’t get a full 100 million reads.  When that happens the lab runs the samples again and then merges the run into a full “virtual run” of the full read depth we paid for.  That’s all good.  The strange thing that happened to us this time was that the run they added our 8 samples to (they add it to ongoing flow cells) happened to be a 75mer run.  Again no big problem usually, and what they do is clip off 25 bases in their processing and all is compatible.  This specific time they forgot to trim, so we saw the ugly intermediate state. What this means is that the data for those 8 are fine.  They are longer, but still good reads from our samples.  

Page 15: 6 Month Allelic Series  RNAseq  QC

15

Mitochondrial rate in brain

8-9% of the reads are mtRNAnothing surprising there

Page 16: 6 Month Allelic Series  RNAseq  QC

16

Mitochondrial rate in liver

6-7% of reads are mtRNAAgain in line with expectations

Page 17: 6 Month Allelic Series  RNAseq  QC

17

Other QC parameters that looked great

• Insert sizes: All right around 175 as expected• Sense/antisense sequence ratio: 1:1 as expected• Sequence coverage

– 40% of mouse transcriptome detected in brain– About 30% of mouse transcriptome detected in liver

• Mapped read rate in the upper 90s – 98% for brain, 96% for liver

• 95-97% of our reads are mapped to known genes– 3-5% intergenic regions

Page 18: 6 Month Allelic Series  RNAseq  QC

18

Model based outlier detection

20914_449L_striatum_Q175_WT_M_L1.LB1_... 23385_516L_striatum_Q111_HET_M_L1.LB2... 28295_648L_cortex_Q20_HET_M_L8.LB21_1... 30772_715L_striatum_Q80_WT_F_L3.LB11_... 33205_781L_striatum_Q140_WT_F_L6.LB6_...0

1000

2000

3000

4000

5000

6000

7000

8000

Method by which we look for the number of genes that areoutliers after accounting for our modeled effects• 2 samples stand out, and additional 4-6 are suspect, but probably

OK (Q92 Het males)

30811_718L_striatum_Q80_HET_F_L5.LB14_1.clipped

35481_833L_striatum_Q92_HET_M_L3.LB10_1.clipped

Page 19: 6 Month Allelic Series  RNAseq  QC

19

Integrating the sample QC to choose omissions

A very simple way to determine what to throw out is to look for multiple strikes against a sample

454_Liver_Q175_HET_M_L3._1.clipped456_Liver_Q175_HET_M_L7.LB4_1.clipped20947_452L_cortex_Q175_HET_M_L8.LB2_1.clipped845_Liver_Q92_HET_F_L6.LB25_1.clipped522_Liver_Q111_HET_F_L8.LB13_1.clipped452_Liver_Q175_HET_M_L1.LB2_1.clipped776_Liver_Q140_HET_F_L8.LB13_1.clipped716_Liver_Q80_HET_F_L7.LB6_1.clipped843_Liver_Q92_HET_F_L6.LB23_1.clipped21051_460L_cortex_Q175_HET_F_L3.LB6_1.clipped

5’ -> 3’ charts30811_718L_striatum_Q80_HET_F_L5.LB14_1.clipped

35481_833L_striatum_Q92_HET_M_L3.LB10_1.clipped

20947_452L_cortex_Q175_HET_M_L8.LB2_1.clipped

21051_460L_cortex_Q175_HET_F_L3.LB6_1.clipped

450_Liver_Q175_HET_M_L8.LB1_1.clipped

PCA outliers30811_718L_striatum_Q80_HET_F_L5.LB14_1.clipped

35481_833L_striatum_Q92_HET_M_L3.LB10_1.clipped

Model based outliers

Page 20: 6 Month Allelic Series  RNAseq  QC

20

30811_718L_striatum_Q80_HET_F_L5.LB14_1.clipped

35481_833L_striatum_Q92_HET_M_L3.LB10_1.clipped

20947_452L_cortex_Q175_HET_M_L8.LB2_1.clipped

21051_460L_cortex_Q175_HET_F_L3.LB6_1.clipped

Final list of proposed samples for omission

Page 21: 6 Month Allelic Series  RNAseq  QC

21

Liver contamination in cortex Q140 and Q92?

While the sequencing lab was looking into the 75mer issue I ran cortex through some basic statistical modeling (omitting the samples mentioned previously)

I found changes, but the pattern and biology was all wrong

Page 22: 6 Month Allelic Series  RNAseq  QC

Alb

ENSMUSG000000293680

1

2

3

4

5

6

7

8

22

Color by Q LengthQ111Q140Q175Q20Q80Q92

Every single change is an increase

Completely off in Q175, 111, 80, and 20On (but not that strongly in 140 and 92)

It’s make no sense for Q111 to be skippedand for Q175 to go back to normal

Logg

ed F

PK

M

Albumin is the top hit?Isn’t Albumin liver specific?

Page 23: 6 Month Allelic Series  RNAseq  QC

23

Some of the other changed genes are suspicious

• Albumin• ApoC3, C1, • Mup3, 10, 18, 19• FABP1• Urate oxidase

All reasonably solid liver markers

DAVID functional annotation also suggests the altered genesare liver related (p < 10-5)

Page 24: 6 Month Allelic Series  RNAseq  QC

241 3 7 20 50 200 500 2000 5000 20000 50000 200000 500000 2000000 5000000

33030_768L_cortex_Q140_HET_M_L4.LB9_1.clipped

1

2

34579

20

3040507090

200

300400500700900

2000

30004000500070009000

20000

3000040000500007000090000

200000

300000400000500000700000900000

2000000

30000004000000500000070000009000000

768_

Liver

_Q14

0_HE

T_M_

L3.LB

9_1.c

lippe

d

A subset of genes with good correlation between liver and cortexbut shifted from the 1:1 axis

Page 25: 6 Month Allelic Series  RNAseq  QC

251 3 7 20 50 200 500 2000 5000 20000 50000 200000 500000 2000000 5000000

33030_768L_cortex_Q140_HET_M_L4.LB9_1.clipped

1

2

34579

20

3040507090

200

300400500700900

2000

30004000500070009000

20000

3000040000500007000090000

200000

300000400000500000700000900000

2000000

30000004000000500000070000009000000

768_

Liver

_Q14

0_HE

T_M_

L3.LB

9_1.c

lippe

d

Same chart with the “significant” genes in red

Page 26: 6 Month Allelic Series  RNAseq  QC

261 3 7 20 50 200 500 2000 5000 20000 50000 200000 500000 2000000 5000000

23353_514L_cortex_Q111_HET_M_L1.LB9_1.clipped

1

2

34579

20

3040507090

200

300400500700900

2000

30004000500070009000

20000

3000040000500007000090000

200000

300000400000500000700000900000

2000000

30000004000000500000070000009000000

514_

Liver

_Q11

1_HE

T_M_

L8.LB

9_1.c

lippe

d

Same chart and shading in Q111, notice the Lack of linear correlation

Page 27: 6 Month Allelic Series  RNAseq  QC

27

What we suspect happened

• The basic problem is that liver specific transcripts should not have correlated expression in cortex

• A very small amount of liver contamination has occurred. The shift is 500 to 1000 times lower than normal liver expression

• What this means is only the absolute highest liver expressed genes are detected at all

• The challenge is uniquely identifying the affected genes

Cortex Striatum Liver

Albumin 173 0.01 40979

FPKMs of albumin, which should not exist in brain

Page 28: 6 Month Allelic Series  RNAseq  QC

281 3 7 20 50 200 500 2000 5000 20000 50000 200000 500000 2000000 5000000

33056_770L_cortex_Q140_HET_M_L7.LB10_1.clipped

1

2

34579

20

3040507090

200

300400500700900

2000

30004000500070009000

20000

3000040000500007000090000

200000

300000400000500000700000900000

2000000

30000004000000500000070000009000000

770_

Liver

_Q14

0_HE

T_M_

L2.LB

10_1

.clipp

ed

Liver filter created as• Liver mean count > 2000• Mean ratio of liver to cortex > 500• Cortex count > 0

Not a bad first approximation

Page 29: 6 Month Allelic Series  RNAseq  QC

29

Effect of filtering out the liver specific genes from the cortex data

Q80 Q92 Q111 Q140 Q1750

10

20

30

40

50

60

70

80

90

Hits pre-filterHits post-filter

Page 30: 6 Month Allelic Series  RNAseq  QC

30

Summary of QC

• All but 4 of the 192 samples can move forward to the analysis

• A filter to clear out highly expressed liver genes is needed for the cortex Q140 and Q92 sets

• Striatum PCA plots show that CAG length is the single largest global element of variance!