Parallel #2 Paper – Phylogeny and Branch and Bound Algorithms George McGinn...

11
Parallel #2 Paper – Phylogeny and Branch and Bound Algorithms George McGinn ([email protected])

Transcript of Parallel #2 Paper – Phylogeny and Branch and Bound Algorithms George McGinn...

Page 1: Parallel #2 Paper – Phylogeny and Branch and Bound Algorithms George McGinn (georgemcginn@yahoo.com)

Parallel #2 Paper – Phylogeny and Branch and Bound

Algorithms

George McGinn ([email protected])

Page 2: Parallel #2 Paper – Phylogeny and Branch and Bound Algorithms George McGinn (georgemcginn@yahoo.com)

Tree Building (Phylogengy)

• PHYLIP (Phylogeny Inference Package):

• PHYLIP is a free package of programs for inferring phylogenies. One of the most popular, it is currently available for all the major OSes and features over a dozen different algorithms for coming up with the trees.

• I choose the Penny algorithm, and look into possible parallel implementations of it.

Page 3: Parallel #2 Paper – Phylogeny and Branch and Bound Algorithms George McGinn (georgemcginn@yahoo.com)

Finding the best Tree Algorithms 3 types of solutions:

• Exhaustive- Search every tree

• Heuristic-use some algorithm to make a good (possibly best) tree

• Most Parsimonious-prove that a given tree is the best tree, (hopefully) without searching every tree

Page 4: Parallel #2 Paper – Phylogeny and Branch and Bound Algorithms George McGinn (georgemcginn@yahoo.com)

Exhaustive

• Number of elements=

N# trees (N)

T(1) = 1

T(2) = 1

T(3) = T(2) * 3

T(4) = T(4) * 4

T(N) = (N-1) * N

• Positioning Nodes:Simple Factorial

• Big O = n! n! = (n!)^2 = way too big!

• 10 nodes = 6,584,094,720,000

• 10 nodes (Eliminating mirror images) – still about 35 million

Page 5: Parallel #2 Paper – Phylogeny and Branch and Bound Algorithms George McGinn (georgemcginn@yahoo.com)

Heuristic Search:

• Types of PHYLIP searches:Neighbor, Factor, GENDIST

• Algorithm to find the “best” tree – the tree is dependent on the order in which they are received so Jumble options are made to see the different trees possible. Very fast, but inexact. Returns one tree (generally).

Page 6: Parallel #2 Paper – Phylogeny and Branch and Bound Algorithms George McGinn (georgemcginn@yahoo.com)

Most Parsimonious Trees

• Penny (DNAPenny) – Uses Branch and Bound to come up with the optimal solution(s).

Also CLIQUE searches.

Page 7: Parallel #2 Paper – Phylogeny and Branch and Bound Algorithms George McGinn (georgemcginn@yahoo.com)

Branch and Bound sidetrack

• Traveling Salesman example: line up all the possible solutions, fully calculate one, and then attempt to all the rest (Depth First Search). When you solution must be worse, disregard that node or subtree. If better then previous best one, then that one becomes the new best solution. If equal, save to list.

• Does not HAVE to try all possibles: efficiency depends very much on input order and data and may actually calc them all. Note that if all subtrees need to be explored, this can actually be slower due to algorithm overhead!

Page 8: Parallel #2 Paper – Phylogeny and Branch and Bound Algorithms George McGinn (georgemcginn@yahoo.com)

Back to Penny• Add all nodes in order, then backtrack. Make tree of first two species: (A,B) Add C in first place: ((A,B),C) Add D in first place: (((A,D),B),C) Add D in second place: ((A,(B,D)),C) Add D in third place: (((A,B),D),C) Add D in fourth place: ((A,B),(C,D)) Add D in fifth place: (((A,B),C),D) Add C in second place: ((A,C),B) Add D in first place: (((A,D),C),B) Add D in second place: ((A,(C,D)),B) Add D in third place: (((A,C),D),B) Add D in fourth place: ((A,C),(B,D)) Add D in fifth place: (((A,C),B),D) Add C in third place: (A,(B,C)) Add D in first place: ((A,D),(B,C)) Add D in second place: (A,((B,D),C)) Add D in third place: (A,(B,(C,D))) Add D in fourth place: (A,((B,C),D)) Add D in fifth place: ((A,(B,C)),D)

And so forth!

Page 9: Parallel #2 Paper – Phylogeny and Branch and Bound Algorithms George McGinn (georgemcginn@yahoo.com)

Parallelization of the Branch and Bound Algorithm on Distributed

memory machines• Problem groups should be in large enough blocks and are uniform in

size so a single integer can determine which block is currently being examined. Each processor initially takes a certain range, and has the next block integer set to the number of processors. When a processor is done, it broadcasts that it is taking the next block (so all of the other processors up their counter), and then starts to process it.

• On really large networks, this probably would best be modified to remove all the communication overhead by running things in lockstep. This is not optimal as some scenarios will almost immediately remove their trees.

Page 10: Parallel #2 Paper – Phylogeny and Branch and Bound Algorithms George McGinn (georgemcginn@yahoo.com)

Parallelization of the Branch and Bound Algorithm on Distributed

memory machines, pt 2• So in the set up above, the possible problem groups might be the

different variations for where the C was added (3 variations).

• Problems with this:

On distributed memory systems, all the messages for taking new blocks might overload the network.

• Potentially, the message that indicates a new best solution has been found might only transmit at the end of a block to insure that these messages do not also get out of hand. This may cause extra paths to be traversed that otherwise would be skipped.

Page 11: Parallel #2 Paper – Phylogeny and Branch and Bound Algorithms George McGinn (georgemcginn@yahoo.com)

• Links and Fun:

• Phylip (Phylogeny Inference Package):

http://www.molbiol.ox.ac.uk/documentation/phylip/index.html

Penny Algorithms:

http://www.molbiol.ox.ac.uk/documentation/phylip/penny.html

• A parallel synchronized branch and bound algorithm:

http://www.epfl.ch/SIC/SA/publications/SCR94/6-94-page15.html

(EPFL Supercomputing Review - n. 6 - nov. 94)

This was my originally intended paper – however it ended up being too dense for me to sufficiently present it!

• Branch and Bound intro:http://www-fp.mcs.anl.gov/otc/Guide/OptWeb/discrete/integerprog/section2_1_1.html

• Original idea for implementing B&B with Phylogeny:

Hendy, M. D., and D. Penny. 1982. Branch and bound algorithms to determine minimal evolutionary trees. Mathematical Biosciences 59: 277-290