Detecting Copy Number Variation With Short Paired Reads
description
Transcript of Detecting Copy Number Variation With Short Paired Reads
Detecting Copy Number Variation With
Short Paired Reads
Department of Computer Science University of Toronto
Genome Informatics 2009
Paul Medvedev, Marc Fiume, Misko Dzamba,
Tim Smith, Adrian Dalca, Mike Brudno
Copy Number Variants (CNVs)
• Large regions that appear a different number of times within different indiv.
• CNVs are associated with a number of diseases
• Input– reference human genome– sequenced donor genome
• Output– CNV annotations in ref
Previous Approach
DOC 1 2 1 0 1
Ref
Ref 1 1 1 1 1
CNV
0.8 2.3 0.5 0.5 1.7
CNV
Campbell et al 2008Chiang et al 2009Yoon et al 2009
Campbell et al 2008Chiang et al 2009Yoon et al 2009
Using depth of coverage:
Our Approach:
• Capture adjacency information about the donor genome in a graph.
• Use these adjacencies together with DOC
Donor Graph
Step 1: represent reference adjacencies
Donor Graph
Step 1: represent reference adjacencies
Donor Graph
Ref
Donor
Step 2: represent donor adjacencies
Ref
Donor
Donor Graph
Step 2: represent donor adjacencies
Which walk is the donor?
DOC
Ref
1111221Path
Ref 1 2 1 1 1 1 1
CNV
We find a path that is “most faithful” to the DOC – using probabilistic model to score “faithfulness”– use network flow to find traversal counts of walk with max score
0.8 2.3 2.6 0.5 1.4 1.7 1.1
Use depth-of-coverage:
Preliminary Results
• NA18507 individual sampled with Illumina, hg18 reference
• Total of 3730 CNV calls
• 2165 losses, 1565 gains
Size DistributionSize Distribution
58%
6%
1%
35%Just Loss
Both
Just Gain
None
Preliminary Results
After randomly shuffling our calls:
Sensitivity: Kidd et al.’s (2008) LOSS calls (141 calls)
88%
6% 0%
6%
Percentage of Kidd’s callsthat overlap one of ours:
11%
68%
9%
12%
DGV Loss
DGV Both
DGV Gain
None
Percent of our calls that overlap with DGV:
After randomly shufflingour calls:
Specificity: Database of Genomic Variants (DGV)
11%
7%
12%
70%
Conclusion
• Presented a method for detecting CNVs
• Combines – depth-of-coverage – paired-end mapping
• Improves– compared to paired-end mapping:
• Increased sensitivity in repeating regions – segmental duplications
– compared to depth-of-coverage methods:• better resolution (1Kb vs. 30Kb)
• Global optimization approach
Detecting Copy Number Variation
Paul Medvedev
Marc Fiume
Misko Dzamba
Tim Smith
Adrian Dalca
Mike Brudno
Genome Informatics 2009