A Simplified View of DCJ-Indel Distance Phillip Compeau A Simplified View of DCJ- Indel Distance...

37
A Simplified View of DCJ-Indel Distance Phillip Compeau A Simplified View of DCJ-Indel Distance Phillip Compeau University of California- San Diego Department of Mathematics 1

Transcript of A Simplified View of DCJ-Indel Distance Phillip Compeau A Simplified View of DCJ- Indel Distance...

1

A Simplified View of DCJ-Indel Distance Phillip Compeau

A Simplified View of DCJ-Indel Distance

Phillip CompeauUniversity of California-San DiegoDepartment of Mathematics

A Simplified View of DCJ-Indel Distance Phillip Compeau

2

Abstract

• Braga et al., 2010: Solved problem of DCJ-indel sorting in linear time.

• Goals:

1. “Hardwire” DCJ sorting into DCJ-indel sorting.

2. Characterize solution space for DCJ-indel sorting.• DCJ solution space known (Braga and Stoye, 2010).

3

A Simplified View of DCJ-Indel Distance Phillip Compeau

Section 1: Preliminaries

1. Preliminaries

2. Encoding Indels as DCJs

3. DCJ-Indel Sorting

4. The Solution Space of DCJ-Indel Sorting

5. Conclusion

A Simplified View of DCJ-Indel Distance Phillip Compeau

4

The Discrete Genome

• Genome (Π): formed of two matchings• genes g(Π): each numbered gene has a head and a tail.• adjacencies (a(Π)): a blue matching on V(g(Π))

Γ

Π

A Simplified View of DCJ-Indel Distance Phillip Compeau

5

The Discrete Genome

• Chromosome: component of Π (alternating path or cycle)• Linear or circular depending on path or cycle of Π

• Telomere: path endpoint of Π; has null adjacency {v, Ø}

Γ

Π

A Simplified View of DCJ-Indel Distance Phillip Compeau

6

• Double-cut-and-join operation (DCJ; Yancopoulos et al., 2005): “cuts” genome in two places and rejoins adjacencies.

• DCJ Distance (dDCJ(Π, Γ)): minimum # of DCJs required to transform Π into Γ (having the same genes).

The Double-Cut-and-Join Operation

A Simplified View of DCJ-Indel Distance Phillip Compeau

7

The DCJ Incorporates Many Operations

A Simplified View of DCJ-Indel Distance Phillip Compeau

8

The Breakpoint Graph

• B(Π, Γ) is formed from the adjacencies of Π and Γ.

• B(Π, Γ) also comprises (alternating) red-blue paths and cycles.

A Simplified View of DCJ-Indel Distance Phillip Compeau

9

DCJ Distance Formula

• Bergeron et al., 2006: If Π and Γ share the same genes, then the DCJ distance is given by the following formula:

• N = # of genes• c(Π, Γ) = # of cycles in B(Π, Γ)• peven(Π, Γ) = # of even paths in B(Π, Γ)

A Simplified View of DCJ-Indel Distance Phillip Compeau

10

Indels and the DCJ-Indel Distance

• Indel: The insertion or deletion of a chromosome or chromosomal interval (consecutive genes).• Assumption: we can’t remove a gene common to Π and Γ

• DCJ-Indel Distance (dindDCJ(Π, Γ)): Minimum # of DCJs and

indels required to transform Π into Γ.• Braga et al., 2010: Solve DCJ-indel sorting in linear time.• Lots of cases…can we simplify it?

a bØ Øa b dc a b cØab c

d

11

A Simplified View of DCJ-Indel Distance Phillip Compeau

Section 2: Encoding Indels as DCJs

1. Preliminaries

2. Encoding Indels as DCJs

3. DCJ-Indel Sorting

4. The Solution Space of DCJ-Indel Sorting

5. Conclusion

A Simplified View of DCJ-Indel Distance Phillip Compeau

12

• Ma et al., 2009: View deletion as formation and removal of a circular chromosome.

• Idea: Indel = DCJ creating circular chromosome• Wait…what about the deletion of circular chromosomes?

Deletion DCJ Creating Circular Chromosome

a bØ Øa b dc a b cØ ab c

d

a d

b c a b

a bb ca d

DCJ DCJ DCJ

DCJ

A Simplified View of DCJ-Indel Distance Phillip Compeau

13

Apparent Exceptions

• Apparent Exception #1: Two deleted circular chromosomes are created from a single DCJ.

ab c

d

b ca d

DCJ

3 Operations

A Simplified View of DCJ-Indel Distance Phillip Compeau

14

Apparent Exceptions

• Apparent Exception #1: Two deleted circular chromosomes are created from a single DCJ.

ab c

d

1 Operation

ab c

d

b ca d

DCJ

3 Operations

A Simplified View of DCJ-Indel Distance Phillip Compeau

15

Apparent Exceptions

• Apparent Exception #2: A deleted circular chromosome is never involved in a DCJ

• Circular singleton of Π: A circular chromosome of Π that shares no genes with Γ.

• Question: Can we delete all circular singletons first?

A Simplified View of DCJ-Indel Distance Phillip Compeau

16

Apparent Exceptions

• Apparent Exception #2: A deleted circular chromosome is never involved in a DCJ

• Circular singleton of Π: A circular chromosome of Π that shares no genes with Γ.

• Question: Can we delete all circular singletons first? YES!

A Simplified View of DCJ-Indel Distance Phillip Compeau

17

Handling Circular Singletons

• Proposition: When transforming Π into Γ via a minimum collection of DCJs and indels, no gene belonging to a circular singleton of Π can ever appear in the same chromosome as a gene of Γ.

• Corollary 1: If Π* is formed from Π by removing a circular singleton from Π, then dind

DCJ(Π*, Γ) = dindDCJ(Π, Γ) – 1.

• Let sing(Π, Γ) = # of circular singletons of Π and Γ.

• Corollary 2: If Π0 and Γ0 are formed by removing all circular singletons from Π and Γ, then dind

DCJ(Π, Γ) = dindDCJ(Π0 , Γ0) + sing(Π, Γ)

A Simplified View of DCJ-Indel Distance Phillip Compeau

18

A Novel View of DCJ-Indel Distance

• WLOG we may henceforth assume that sing(Π, Γ) = 0.

• A completion of Π is a genome Π’ such that:• g(Π’) = g(Π) U g(Γ)• a(Π’) = a(Π) U perfect matching on V(Π’) – V(Π)

• New chromosomes of Π’ are circular: the indels of Π’

• Theorem:

A Simplified View of DCJ-Indel Distance Phillip Compeau

19

A Novel View of DCJ-Indel Distance

• An optimal completion achieves the optimum below.

• A completion of Π is a genome Π’ such that:• g(Π’) = g(Π) U g(Γ)• a(Π’) = a(Π) U perfect matching on V(Π’) – V(Π)

• New chromosomes of Π’ are circular: the indels of Π’

• Theorem:

20

A Simplified View of DCJ-Indel Distance Phillip Compeau

Section 3: DCJ-Indel Sorting

1. Preliminaries

2. Encoding Indels as DCJs

3. DCJ-Indel Sorting

4. The Solution Space of DCJ-Indel Sorting

5. Conclusion

A Simplified View of DCJ-Indel Distance Phillip Compeau

21

Open Vertices

• π-open vertex: vertex not found in Π (must be matched in Π’)• path endpoint in B(Π, Γ) must be π-open/γ-open or

telomere (or both)• Define {π, π}-paths, {π, γ}-paths, π-paths in B(Π, Γ)

• Idea: Construct B(Π*, Γ*) from B(Π, Γ) by matching vertices.

A Simplified View of DCJ-Indel Distance Phillip Compeau

22

Necessary Conditions for B(Π*, Γ*)

• Lemma 1: If (Π*, Γ*) is an optimal completion of (Π, Γ), then every {π, π}-path ({γ, γ}-path) of length 2k – 1 in B(Π, Γ) embeds into a cycle of length 2k in B(Π*, Γ*).

A Simplified View of DCJ-Indel Distance Phillip Compeau

23

Necessary Conditions for B(Π*, Γ*)

• Lemma 1: If (Π*, Γ*) is an optimal completion of (Π, Γ), then every {π, π}-path ({γ, γ}-path) of length 2k – 1 in B(Π, Γ) embeds into a cycle of length 2k in B(Π*, Γ*).• Picture:

ππ

π π

ππ

π π

Cycle

B(Π’, Γ’) B(Π’’, Γ’)

dDCJ(Π’’, Γ’) < dDCJ(Π’, Γ’)Vs.

A Simplified View of DCJ-Indel Distance Phillip Compeau

24

2-Bracelet

Necessary Conditions for B(Π*, Γ*)

• Lemma 1: If (Π*, Γ*) is an optimal completion of (Π, Γ), then every {π, π}-path ({γ, γ}-path) of length 2k – 1 in B(Π, Γ) embeds into a cycle of length 2k in B(Π*, Γ*).

• Remaining components of B(Π*, Γ*):• bracelet: cycle linking {π, γ}-paths• chain: path linking π-paths/γ-paths via intermediate {π, γ}-

pathsππ

γ γ

π π γ γπ

π ππ

3-Chain

2-Chain

A Simplified View of DCJ-Indel Distance Phillip Compeau

25

• Lemma 2: B(Π*, Γ*) can contain only 2-bracelets, 2-chains, and 3-chains.

• Picture:

Necessary Conditions for B(Π*, Γ*)

ππ

π π

γ γ

P1 P2

ππ

π π

γ γ

P1 P2Cycle

B(Π’, Γ’) B(Π’’, Γ’)

dDCJ(Π’’, Γ’) < dDCJ(Π’, Γ’)Vs.

A Simplified View of DCJ-Indel Distance Phillip Compeau

26

Necessary Conditions for B(Π*, Γ*)

• Lemma 3: B(Π*, Γ*) cannot have one 2-chain joining two odd π-paths and another 2-chain joining two even π-paths. The same holds for γ-paths.• Picture:

ππ

π π

P1

odd

P2

odd

P3

even

P4

even

B(Π’, Γ’)

ππ

π π

EvenPath

EvenPath

B(Π’’, Γ’)

dDCJ(Π’’, Γ’) < dDCJ(Π’, Γ’)

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Ø

Vs.

A Simplified View of DCJ-Indel Distance Phillip Compeau

27

Sorting Algorithm

1. Remove all circular singletons of Π and Γ.

2. Lemma 1 Close every {π, π}-path ({γ, γ}-path) into a cycle by adding a single new adjacency to Π* (Γ*).

3. Form a maximum set of 2-bracelets (only chains remaining).

4. Form a maximum set of even 2-chains by linking pairs of π-paths (γ-paths) having opposite parity (Lemma 3).

5. If pπ, γ is odd, then link the remaining {π, γ}-path with any remaining π-path and γ-path.

6. Arbitrarily link pairs of remaining π-paths, all of which have the same parity. Do the same for any γ-paths remaining.

A Simplified View of DCJ-Indel Distance Phillip Compeau

28

• Theorem: The preceding algorithm solves DCJ-indel sorting in linear time, and it implies a DCJ-indel distance formula:

where δ = 1 only if pπ, γ is odd and either:

1. pπodd > pπ

even , pγodd > pγ

even ; or

2. pπodd < pπ

even , pγodd < pγ

even

Otherwise, δ = 0.

DCJ-Indel Distance

ind

29

A Simplified View of DCJ-Indel Distance Phillip Compeau

Section 4: The Solution Space of DCJ-Indel

Sorting1. Preliminaries

2. Encoding Indels as DCJs

3. DCJ-Indel Sorting

4. The Solution Space of DCJ-Indel Sorting

5. Conclusion

A Simplified View of DCJ-Indel Distance Phillip Compeau

30

Encompassing all Possible Cases

• The solution space is known for DCJ-sorting (Braga and Stoye, 2010).

• Thus, we only need to find all optimal completions, and the specific operations will fall out in the wash.

A Simplified View of DCJ-Indel Distance Phillip Compeau

31

Handling Circular Singletons

• The circular singletons of Π must be removed in sing(Π) steps. We have two options:

1. Delete all the circular singletons of Π.

2. Perform k “fusion” DCJs followed by sing(Π) – k chromosome deletions.

• This poses a straightforward (yet tedious) counting problem.

A Simplified View of DCJ-Indel Distance Phillip Compeau

32

Adding Necessary Conditions on B(Π*, Γ*)

• Proposition 1: Every π-path embedding into a 3-chain of an optimal completion must have the same parity.

• Proposition 2: If pπ, y is even, then B(Π*, Γ*) must contain a maximum collection of even 2-chains.

• Proofs are slightly more involved…

A Simplified View of DCJ-Indel Distance Phillip Compeau

33

Finishing the Job

• Four cases, depending on path statistics.

1. pπ, γ is odd:

a) pπodd > pπ

even , pγodd > pγ

even (or vice-versa); δ = 1

b) pπodd > pπ

even , pγodd < pγ

even (or vice-versa); δ = 0

2. pπ, γ is even:

a) pπodd > pπ

even , pγodd > pγ

even (or vice-versa); δ = 0

b) pπodd > pπ

even , pγodd < pγ

even (or vice-versa); δ = 0

• These cases are tedious but straightforward and can be handled similarly.

34

A Simplified View of DCJ-Indel Distance Phillip Compeau

Section 5: Conclusion

1. Preliminaries

2. Encoding Indels as DCJs

3. DCJ-Indel Sorting

4. The Solution Space of DCJ-Indel Sorting

5. Conclusion

A Simplified View of DCJ-Indel Distance Phillip Compeau

35

Future Work

• Correspondence with Braga et al., 2010?

• Varying the indel cost?• Charge indel cost ≤ DCJ cost, take minimum total cost.• Most of the simplifying sorting lemmas hold, but actually

computing the minimum cost appears difficult in this model.

• The problem is solved! (under framework of Braga et al., 2010)

36

A Simplified View of DCJ-Indel Distance Phillip Compeau

Questions?

A Simplified View of DCJ-Indel Distance Phillip Compeau

37

Shameless Plug

• www.rosalind.info

• A novel education website that teaches bioinformatics through programming exercises.

• Have “professor” environment for assigning programming exercises to your bioinformatics classes.