Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping...

37
Testing alternative hypotheses

Transcript of Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping...

Page 1: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Testing alternative hypotheses

Page 2: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Outline

• Topology tests:– Templeton test

• Parametric bootstrapping (briefly)

• Comparing data sets

Page 3: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Topology tests

• The best tree for your data contradicts a prior hypothesis. This does not mean that the data refute the hypothesis

• Compare the optimality score of the best tree and the best trees given the hypothesis

Page 4: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Tree space

Region of tree space satisfying the hypothesis

Optimal tree

Optimal tree satisfying the hypothesis

Page 5: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Does one tree explain the data significantly better than the other?• If the data are “significantly” more

compatible with the optimal tree than the constrained tree, the hypothesis is rejected

• Parsimony framework– Constrained tree length = X– Optimal tree length = Y– Is the cost (X-Y) significant?

Page 6: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Templeton test

A T G T G A A C A AB T G T G A C C A AC T G C G G C C T AD A G C G G C G T AE A A C T A A G T GF A A C T A A G C GL1 1 1 1 1 2 2 1 2 1 = 12 L4 3 2 2 2 2 1 3 3 2 = 20

Page 7: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Templeton test

A T G T G A A C A AB T G T G A C C A AC T G C G G C C T AD A G C G G C G T AE A A C T A A G T GF A A C T A A G C GL1 1 1 1 1 2 2 1 2 1 = 12 L4 3 2 2 2 2 1 3 3 2 = 20Diff 2 1 1 1 0-1 2 1 1 =

Page 8: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Templeton test

Score Rank

2 1.5

2 1.5

1 5.5

1 5.5

1 5.5

1 5.5

1 5.5

-1 -5.5

Sum of the negative ranks = 5.5

N (number of chars varying in length) = 8

P-value = ca. 0.045

Page 9: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Problems of topology tests

• The tests compare trees, they don’t compare the competing hypotheses

Page 10: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Tree space

Region of tree space satisfying the hypothesis

Optimal tree

Optimal tree satisfying the hypothesis

Page 11: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Another problem of topology tests

• Suppose we had a prior hypothesis that species A-B form a clade

• We conduct a phylogenetic analysis of 8 species and find that A-B do not form a clade

• The shortest tree that has them as a clade is 6 steps longer (decay = -6) which is significant under a Templeton test

Page 12: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

• Suppose we had a prior hypothesis that species A-Z form a clade

• We conduct a phylogenetic analysis of 100 species and find that A-Z do not form a clade

• The shortest tree that has them as a clade is 6 steps longer (decay = -6) which is significant under a Templeton test

Another problem of topology tests

Page 13: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Are these results equivalent?

• The two hypotheses are differently stringent– The former delimits a much larger proportion of tree-space

• One solution is to reverse the question: If the hypothesis were true, how likely is it that the optimal tree would reject it?

– Requires parametric bootstrapping

Page 14: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Find the region of tree space that is plausible if the hypothesis is true:

Tree space

Optimal tree

Optimal tree satisfying the hypothesis

Hypothesis rejected

Page 15: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Find the region of tree space that is plausible if the hypothesis is true:

Tree space

Optimal tree

Optimal tree satisfying the hypothesis

Hypothesis not rejected

Page 16: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

How do you do this?

• Find the optimal tree under the constraint (not just the optimal topology but also branch lengths, etc.)

• Simulate data up that tree many times

• For each data set calculate the cost of the hypothesis

• If the observed cost was greater than the cost from the simulated data, the hypothesis is rejected.

Page 17: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Strepsiptera sister to the

Diptera (Whiting et

al. 1997)

Page 18: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Could be a long-branch problem(Huelsenbeck, 1997)

Page 19: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

What if this were the true tree?

Probability of Strepsiptera being sister to Diptera on the MP tree = 92%

Page 20: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Testing hypotheses

• Topology tests are good ways to test hypotheses

• Parametric bootstraping tests are powerful but laborious

• Other approaches are available using likelihood or Bayesian approaches (later)

Page 21: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Multiple data sets for the same sets of taxa

• Analyze each data set separately and then compare the trees (consense)

• Concatenate the data and conduct a single combined analysis (combine)

Page 22: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Argument for consensus

• If the same clades appear with multiple data sets we can be more confident

• The method is conservative

Page 23: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Is consensus conservative?Barrett et al. 1994. Syst. Zool. 40:486

Page 24: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Is consensus conservative?Barrett et al. 1994. Syst. Zool. 40:486

Page 25: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Arguments against combined analysis

• Some data sets might have strong misleading signals (e.g., due to lab errors)

• Different partitions might have tracked different histories

Page 26: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Conditional combined analysis

• Assess if the data look like they have tracked different histories– If they do not: combine– If they do: analyze separately

• Can you do this with topology tests?

Page 27: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Optimal tree for data set 2

Optimal tree for data set 1

Do they conflict?

Page 28: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

But topology tests can be used more carefully

• Two data sets don’t conflict significantly if there is one tree that neither data set rejects

• Two data sets do conflict if:– Data set 1 rejects all trees that lack a certain

clade– Data set 2 rejects all trees that have that same

clade

Page 29: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Optimal tree for data set 2

Optimal tree for data set 1

Significantly worse

Optimal tree without the constraint for data set 2

Optimal tree with the constraint for data set 1

Significantly worse

Page 30: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

The Incongruence Length Difference (ILD) test

(Farris et al., 1994)• Conflict is manifest as longer trees (or

lower likelihood)

• Look to see how length (or likelihood) increases when we combine data

• Determine significance compared to random partitions

Page 31: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

ILD test (= Partition Homogeneity Test in PAUP*)

One TACATAAACAAGCCTAAAATGCGACACTACGTTCACTGTTACGCTCTCCACTGCCTAGACGAAGAAGCTTCATwo TACATAAACAAGCCCAAAATGCGACACTACGTCCACTGTTATGCTCTCCACTGCCTAGACGAAGACGCTTCAThree TACATAAACAAGCCCAAAATGCGACACTACGTCCACTGTTACGCTCTTCACTGCCTAGACGAGGATGCCTCGFour TACATAAATAAGCCAAAAATGCGACACTACGTTCATTGTTACGCACTCCATTGCCTCGACGAAGAAGCTTCAFive TACATAAACAAACCCAAAATGCGACACTACGTCCACTGTTATGCTCTCCACTGTCTAGACGAAGACGCTTCGSix TACATAAACAAGCCCAAGATGCGTCACTACGTCCACTGCTACGCCCTCCACTGTCTCGACGAGGAGGCCTCGSeven TACATAAACAAACCAAAAATGCGACACTACGTCCATTGTTACGCCCTACACTGCCTAGACGAAGACGCTTCAEight TACATAAACAAACCAAAAATGCGACACTACGTCCATTGTTACGCCCTACACTGCCTAGACGAAGACGCTTCA

Partition 1Length = 12

Partition 2Length = 9

Combined L = 21

Page 32: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

ILD test (= Partition Homogeneity Test in PAUP*)

One TACATAAACAAGCCTAAAATGCGACACTACGTTCACTGTTACGCTCTCCACTGCCTAGACGAAGAAGCTTCATwo TACATAAACAAGCCCAAAATGCGACACTACGTCCACTGTTATGCTCTCCACTGCCTAGACGAAGACGCTTCAThree TACATAAACAAGCCCAAAATGCGACACTACGTCCACTGTTACGCTCTTCACTGCCTAGACGAGGATGCCTCGFour TACATAAATAAGCCAAAAATGCGACACTACGTTCATTGTTACGCACTCCATTGCCTCGACGAAGAAGCTTCAFive TACATAAACAAACCCAAAATGCGACACTACGTCCACTGTTATGCTCTCCACTGTCTAGACGAAGACGCTTCGSix TACATAAACAAGCCCAAGATGCGTCACTACGTCCACTGCTACGCCCTCCACTGTCTCGACGAGGAGGCCTCGSeven TACATAAACAAACCAAAAATGCGACACTACGTCCATTGTTACGCCCTACACTGCCTAGACGAAGACGCTTCAEight TACATAAACAAACCAAAAATGCGACACTACGTCCATTGTTACGCCCTACACTGCCTAGACGAAGACGCTTCA

Page 33: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

ILD test (= Partition Homogeneity Test in PAUP*)

Combined L = 25

One TACATAAACAAGCCTAAAATGCGACACTACGTTCACTGTTACGCTCTCCACTGCCTAGACGAAGAAGCTTCATwo TACATAAACAAGCCCAAAATGCGACACTACGTCCACTGTTATGCTCTCCACTGCCTAGACGAAGACGCTTCAThree TACATAAACAAGCCCAAAATGCGACACTACGTCCACTGTTACGCTCTTCACTGCCTAGACGAGGATGCCTCGFour TACATAAATAAGCCAAAAATGCGACACTACGTTCATTGTTACGCACTCCATTGCCTCGACGAAGAAGCTTCAFive TACATAAACAAACCCAAAATGCGACACTACGTCCACTGTTATGCTCTCCACTGTCTAGACGAAGACGCTTCGSix TACATAAACAAGCCCAAGATGCGTCACTACGTCCACTGCTACGCCCTCCACTGTCTCGACGAGGAGGCCTCGSeven TACATAAACAAACCAAAAATGCGACACTACGTCCATTGTTACGCCCTACACTGCCTAGACGAAGACGCTTCAEight TACATAAACAAACCAAAAATGCGACACTACGTCCATTGTTACGCCCTACACTGCCTAGACGAAGACGCTTCA

Partition 1Length = 14

Partition 2Length =11

Page 34: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Results Sum of Number of tree lengths replicates ----------------------------- 1661 1 1662 2 1663 1 1665* 9 1666 8 1667 9 1668 5 1669 11 1670 10 1671 9

* = sum of lengths for original partition P value = 1 - (87/100) = 0.130000

Sum of Number of tree lengths replicates -------------------------------- 1672 10 1673 7 1674 4 1675 4 1676 1 1677 4 1678 2 1679 1 1680 1 1683 1

Page 35: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

What does a positive result mean?

• The data sets have tracked different histories?

• The original partition is non-random

• Does not even look at topology

Page 36: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Option if you find conflict

• Conduct separate analyses only

• Delete taxa until conflict disappears - then combine

• Combine anyway

Page 37: Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.

Conditional conditional combined analysis

• You believe that conflict reflects data partitions tracking different histories– Keep the data separate and find ways to

summarize the discrepancy

• You believe that conflict reflects artifactual signals (noise) in one or both data sets– Combine anyway in the hope that the real

signal will come to dominate