Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar...

Finding Bit

Patterns

Applying haplotype models to association study design

Natalie CastellanaKedar DhamdhereRussell Schwartz

August 16, 2005

10000010100010010

00010100101101001

01101101001000010

10101011111000010

Problem: Applying haplotype models

Input:

Output:a set of recurring patterns of the form

(start column, end column, pattern)

(14,17,“0010”)

Major Allele

Minor allele

Background

SNPHaplotype

Association TestGiven that this sample has haplotype 1101, does it have the disease?

1000011010110100000010

…1110101…

…1000011…

Genetic Variation

Mutation:

…1000001…

Recombination:

…1110011…

…1000101…

…1001001…

Because of recombination, similar genetic variation can be found within closely linked regions.

Controls:

Cases:

Data Sets

Download from

HapMap.org

Generate using MS

Apply Disease

Apply Haplotype

Perform Association

10010011101

1001001010110001110100

01100101101

Input: 1001001010110

1001001110100

0110010110100

1000111010010

Go through each SNP and determine which SNP’s accurately predict which samples have the disease and which do not.

Case: 0 0 1 1 0 1 0 1

0 1 0 1 0 0 0 0

0 0 1 1 1 0 0 0

Control: 0 0 0 0 1 0 1 0

0 0 1 0 0 1 1 0

1 1 1 0 0 0 0 1

Testing individual SNP’s

Haplotype block method

Instead of looking at each individual SNP, we can look at groups of contiguous SNP’s.

1101000000…11…

1101100100…01…

0111000000…10…

1101100100…00…

Haplotype motif method

Notion that a sequence is the concatenation of segments (like the block method) but does not require conservation of boundaries.

1101000000…1100100100…0111000000…1101100111…

Approximation Algorithm

General idea:

10000100…………………………………

00011100…………………………………

11011110…………………………………

01010110…………………………………

c c c cc c c c

Pick the best partition, minimizing the number of motifs needed to explain all the data.

Finding Motifs

0 1 1 0 1 0 0 1 1 0 0 0 1 1 0 0 1

000…000 000..100

……… 111…111

Problems

Really, really, really slow

Took over a week to partition our biggest data set.

Added a ‘max leaves explored’ feature.Useless for larger c.

Real Data

55 60 65 70 75 80 85 90 95 100

Penetrance parameter (p)

single SNP

Bounded Block

4-gamete Block

Bounded Block htSNP

4-gamete Block htSNP

Motif htSNP

Motif Approx

Simulated Data

50 55 60 65 70 75 80 85 90 95 100

Penetrance parameter (p)

single SNP

Bounded Block

4- gamete Block

Bounded Block htSNP

4- gamete Block htSNP

Motif htSNP

Motif Approx

False Positives

3 2.5 2 1.5 1

LOD Cutoff

tive R

single SNP

Bounded Block

4- gamete Block

Bounded Block htSNP

4- gamete Block htSNP

Motif htSNP

Motif Approx

Expectation

General Linear Program

Objective Function: minimize: x + y + zConstraints: x + y <= 2 1 1 0 x 2 x +2z <= 5 1 0 2 * y <= 5 z 0 <= x <= 3 0 <= y <= Inf -Inf <= z <= 0

A Linear Program

Input: A matrix with M rows and N columns

Output: The minimum number of motifs.

Variables

X’s: each x corresponds to a motif

Define a motif by a tuple:

(start column, end column, string pattern)

Y’s: each y corresponds to a row partition

Define a row partition by a set of motifs:

{(1,e1,“…”),(e1+1,e2,“…”),...,(en,N,“…”)}

Constraints

Exactly one partition must be chosen per row.

If a motif used in a row partition is not chosen, then the row partition may not be chosen.

Minimize the sum of all X’s.

Example

10001101

X’s: (1,1,“1”),(1,2,“10”),(1,3,“100”), etc.

Y’s: (1,1,“1”),(1,8,“0001101”)

(1,2,“10”),(3,3,“0”),(4,8,“01101”)

Constraint Matrix(1)

all X’s all Y’s

(1,1,“1”) (1,1,“0”)…(1,2,“10”) Y_1 Y_2 …

Row 1 0 0 … 0 1 1 …

Row 2 0 0 … 0 0 0 …

Row 3 0 0 … 0 1 1 …

Row M 0 0 0

Y_1 := (1,1,“1”),(1,8,“0001101”)

Y_2 := (1,2,“10”),(3,3,“0”),(4,8,“01101”)

Exactly one row partition must be chosen per row.

Constraint Matrix(2)

If a motif used in a row partition is not chosen, then the row partition may not be chosen.

all X’s all Y’s

(1,1,“1”) (1,1,“0”)…(1,2,“10”) Y_1 Y_2 …

Row i: (1,1,“1”) 1 0 … 0 -1 0 …

(1,2,“10”) 0 0 … 1 0 -1 …

(1,3,“100”) 0 0 … 0 0 0 …

.. … … … … … … …

(8,8,“1”) 0 0 … 0 0 0

Y_1 := (1,1,“1”),(1,8,“0001101”)

Y_2 := (1,2,“10”),(3,3,“0”),(4,8,“01101”)

Constraint Matrix x’s y’s

1 K K+1 K+P

0 1 0 0 0 0 0 …0 0 0 0 1 1 1 0 0 0 0…. 0 ** Constraint 1 ** 2 0 0 0 0 0 …0 0 0 0 1 0 0 1 1 1 0…. 0 == 1 … M 0 0 0 0 0 …0 0 0 0 0 0 1 0 0 0 1…. 1

1 1 1 0 0 0 0 …0 0 0 0 -1 0 0 0 ….0 0 ** Constraint 2 ** 2 0 1 0 0 0 …0 0 0 0 -1 -1 0 0….-1 0 >= 0 … K_1 0 0 1 0 0 …0 0 0 0 0 0 0 0 ….0 0

. . . M

Where K is the number of unique motifs, K_i is the number of motifs appearing in row i,

and P is the number of unique partitions

Problems

Each row has N(N+1)/2 motifs. So there will be a polynomial number of X’s. Good!

Each row can be partitioned in 2^(N-1) ways. So there will be an exponential number of Y’s. Bad!

Solution: column generation

Column generation

We find the optimal solution to the problem which contains all X’s and only some of the Y’s.

Then we see if adding any Y’s would improve the solution.

Where are we now?

Where are we going?

Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar...

Documents

Transcript of Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar...

"efflux pumps" by Kedar Ghimire

The Relationship Between Haplotype-Based Haplotype Lengthweb.stanford.edu/group/rosenberglab/papers/MehtaEtAl2019...HIGHLIGHTED ARTICLE | INVESTIGATION The Relationship Between Haplotype-Based

Kedar Enterprises, Navi Mumbai, Angle Grinder

Computational Approaches to Haplotype Inference

Haplotype analysis

© 2013 D. M. Dhamdhere Facets of Academics Prof. D. M. Dhamdhere CSE Department, IIT Bombay dmd@cse.iitb.ac.in All original content © 2013 D. M. Dhamdhere.

Kedar Kashyap in Chhattisgarh.pdf

Pooled Sequence Haplotype Estimator

Systems Programming and Operating Systems by Dhamdhere

Haplotype Discovery and Modeling

Haplotype Based Association Tests

Dr. Alexandre (Sandy) Kedar Curriculum Vitae and …weblaw.haifa.ac.il/en/Faculty/Kedar/Documents/sandy cv.pdfDr. Alexandre (Sandy) Kedar Curriculum Vitae and List of Publications

Evaluation of Algorithms for the List Update Problem Suporn Pongnumkul R. Ravi Kedar Dhamdhere.

AWI labreport-kedar

On Stochastic Minimum Spanning Trees Kedar Dhamdhere Computer Science Department Joint work with: Mohit Singh, R. Ravi (IPCO 05)

Dhamdhere OS2E Chapter 01 PowerPoint Slides

kedar documents

Approximation Algorithms for Stochastic Combinatorial Optimization R. Ravi Carnegie Mellon University Joint work with: Kedar Dhamdhere, CMU Kedar Dhamdhere,

Talegaon Dhamdhere€¦ · Ahemdabad and vice versa. The trafﬁc will increase substantially once 4 lane 'Shikrapur - TD - Chaufula' belt will be completed. Talegaon Dhamdhere Kasturi

Dhamdhere OS2E Chapter 03 Power Point Slides 2