LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1....

26
LaByRInth An Improved Algorithm for Low-Coverage Biallelic Genetic Imputation Jason Vander Woude

Transcript of LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1....

Page 1: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

LaByRInthAn Improved Algorithm for Low-Coverage Biallelic Genetic Imputation

Jason Vander Woude

Page 2: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

Outline1. Project History

2. Genetics

3. Imputation

a. LB-Impute

b. LaByRInth

i. Modeling

4. Future Work

Page 3: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

Kansas State Agronomy1. Correlate physical features with genetics in wheat and wheatgrass

a. Plant height

b. Number of seeds per head

2. Fit equations to data to model distributions

3. LaByRInth imputation

Outline ↣ Project History

Page 4: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

Chromosomes1. Chromosomes encode genetic information

2. Chromosome is a sequence of bonded bases

a. A/T and C/G

b. 5’ and 3’

c. 5’ATGACACTGTGACA3’ uniquely identifies

Outline ↣ Genetics

Page 5: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

Heterozygous and Homozygous1. Wheat has 42 chromosomes

2. Chromosomes come in pairs (homologs)

a. Homologs serve same genetic purpose

b. Base pairs can be completely different

c. Each parent contributes one chromosome to each homologous pair

Outline ↣ Genetics

Page 6: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

Genetic Imputation1. Expensive to collect all genetic information

2. Patterns are expected based on known breeding

3. Imputation is used to fill in the gaps

a. Build a mathematical model of the expected process

b. Use known genetic sites to infer unknown sites

Outline ↣ Imputation

Page 7: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

How well does imputation actually work?

Page 8: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

Genetics of a Biallelic Homologous Chromosome Pair

524 positions in the chromosomes

Reference base in both homologs

Alternate base in both homologs

1 homolog with Ref. and 1 with Alt.

100 different plants

Page 9: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

Sampled

Page 10: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

LB-Impute

Page 11: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

Genetics vs LB-Impute

Page 12: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

LB-Impute1. Leaves large sections

of the chromosome un-imputed

2. Designed for F2 populations

Outline ↣ Imputation ↣ LB-Impute

P1

F1

F2

F3

F4

Homozygous & Polymorphic

Heterozygous

Page 13: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

F2 vs F5

Page 14: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

LaByRInth1. Low-coverage Biallelic R-package Imputation

2. Initially supposed to be re-write of LB-Impute (Java to R)

3. Found many areas for improvement

a. Project took unexpected direction

b. A few weeks became more than a year

4. Open source

Outline ↣ Imputation ↣ LaByRInth

Page 15: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

Modeling Strategies1. Option 1: Use a model that ignores some biology (varying levels)

○ Often able to exactly “solve” the model

Outline ↣ Imputation ↣ LaByRInth ↣ Modeling

Page 16: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

Modeling Strategies1. Option 1: Use a model that ignores some biology (varying levels)

○ Often able to exactly “solve” the model

2. Option 2: Use a model that accurately captures biology

○ May not be able to “solve” the model exactly

Outline ↣ Imputation ↣ LaByRInth ↣ Modeling

Page 17: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

Modeling Strategies1. Option 1: Use a model that ignores some biology (varying levels)

○ Often able to exactly “solve” the model

2. Option 2: Use a model that accurately captures biology

○ May not be able to “solve” the model exactly

3. An analogy: find the area of a circle

○ Use a polygon to approximate the area

i. Area of polygon may be able to be exactly computed

○ Use formula πr2

i. π cannot be represented exactly

Outline ↣ Imputation ↣ LaByRInth ↣ Modeling

Page 18: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

LaByRInth Strategy1. Have not found a good way to do option 2 (capture biology)

2. Two different ideas for option 1 (exact solution to model)

a. Extend LB-Impute strategy to other generations

i. Assumes we can segment the chromosome

b. New method based on different biological assumptions

i. Assume limited genetic change during reproduction

Outline ↣ Imputation ↣ LaByRInth ↣ Modeling

Page 19: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

How well does LaByRInth work?

Page 20: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

Genetics

Page 21: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

LaByRInth

Page 22: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

Genetics vs LaByRInth

Page 23: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

This Summer1. Implement and test both concept methods

a. Real data

b. Simulated data

2. Write and submit paper

3. Package code and release

Outline ↣ Future Work

Page 24: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

Thanks to my advisors,Dr. Nathan Tintle (Dordt)Dr. Jesse Poland (Kansas State)Dr. Mike Janssen (Dordt)

Page 25: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

Questions?

Page 26: LaByRInth - University of Nebraska–Lincolnjvanderwoude2/imputation/dordt.pdf · LaByRInth 1. Low-coverage Biallelic R-package Imputation 2. Initially supposed to be re-write of

Fitting Gaussian Curves

Outline ↣ Project History