Detecting Copy Number Variation With Short Paired Reads

12
Detecting Copy Number Variation With Short Paired Reads Department of Computer Science University of Toronto Genome Informatics 2009 Paul Medvedev , Marc Fiume, Misko Dzamba, Tim Smith, Adrian Dalca, Mike Brudno

description

Detecting Copy Number Variation With Short Paired Reads. Department of Computer Science University of Toronto Genome Informatics 2009. Paul Medvedev , Marc Fiume, Misko Dzamba, Tim Smith, Adrian Dalca, Mike Brudno. Copy Number Variants (CNVs). - PowerPoint PPT Presentation

Transcript of Detecting Copy Number Variation With Short Paired Reads

Page 1: Detecting  Copy Number Variation With Short Paired Reads

Detecting Copy Number Variation With

Short Paired Reads

Department of Computer Science University of Toronto

Genome Informatics 2009

Paul Medvedev, Marc Fiume, Misko Dzamba,

Tim Smith, Adrian Dalca, Mike Brudno

Page 2: Detecting  Copy Number Variation With Short Paired Reads

Copy Number Variants (CNVs)

• Large regions that appear a different number of times within different indiv.

• CNVs are associated with a number of diseases

• Input– reference human genome– sequenced donor genome

• Output– CNV annotations in ref

Page 3: Detecting  Copy Number Variation With Short Paired Reads

Previous Approach

DOC 1 2 1 0 1

Ref

Ref 1 1 1 1 1

CNV

0.8 2.3 0.5 0.5 1.7

CNV

Campbell et al 2008Chiang et al 2009Yoon et al 2009

Campbell et al 2008Chiang et al 2009Yoon et al 2009

Using depth of coverage:

Our Approach:

• Capture adjacency information about the donor genome in a graph.

• Use these adjacencies together with DOC

Page 4: Detecting  Copy Number Variation With Short Paired Reads

Donor Graph

Step 1: represent reference adjacencies

Page 5: Detecting  Copy Number Variation With Short Paired Reads

Donor Graph

Step 1: represent reference adjacencies

Page 6: Detecting  Copy Number Variation With Short Paired Reads

Donor Graph

Ref

Donor

Step 2: represent donor adjacencies

Page 7: Detecting  Copy Number Variation With Short Paired Reads

Ref

Donor

Donor Graph

Step 2: represent donor adjacencies

Page 8: Detecting  Copy Number Variation With Short Paired Reads

Which walk is the donor?

DOC

Ref

1111221Path

Ref 1 2 1 1 1 1 1

CNV

We find a path that is “most faithful” to the DOC – using probabilistic model to score “faithfulness”– use network flow to find traversal counts of walk with max score

0.8 2.3 2.6 0.5 1.4 1.7 1.1

Use depth-of-coverage:

Page 9: Detecting  Copy Number Variation With Short Paired Reads

Preliminary Results

• NA18507 individual sampled with Illumina, hg18 reference

• Total of 3730 CNV calls

• 2165 losses, 1565 gains

Size DistributionSize Distribution

Page 10: Detecting  Copy Number Variation With Short Paired Reads

58%

6%

1%

35%Just Loss

Both

Just Gain

None

Preliminary Results

After randomly shuffling our calls:

Sensitivity: Kidd et al.’s (2008) LOSS calls (141 calls)

88%

6% 0%

6%

Percentage of Kidd’s callsthat overlap one of ours:

11%

68%

9%

12%

DGV Loss

DGV Both

DGV Gain

None

Percent of our calls that overlap with DGV:

After randomly shufflingour calls:

Specificity: Database of Genomic Variants (DGV)

11%

7%

12%

70%

Page 11: Detecting  Copy Number Variation With Short Paired Reads

Conclusion

• Presented a method for detecting CNVs

• Combines – depth-of-coverage – paired-end mapping

• Improves– compared to paired-end mapping:

• Increased sensitivity in repeating regions – segmental duplications

– compared to depth-of-coverage methods:• better resolution (1Kb vs. 30Kb)

• Global optimization approach

Page 12: Detecting  Copy Number Variation With Short Paired Reads

Detecting Copy Number Variation

Paul Medvedev

Marc Fiume

Misko Dzamba

Tim Smith

Adrian Dalca

Mike Brudno

Genome Informatics 2009