MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module...

42
Molecular Evolution and Phylogeny (2) Sebastián E. Ramos-Onsins Centre of Research in Agricultural Genomics (CRAG ) 1 Module 2: Core Bioinformatics Module 2: Core Bioinformatics MSc in Bioinformatics Course 2014-15

Transcript of MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module...

Page 1: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

Molecular Evolution and Phylogeny (2)Sebastián E. Ramos-Onsins

Centre of Research in Agricultural Genomics

(CRAG )

1

Module 2: Core BioinformaticsModule 2: Core Bioinformatics

MSc in Bioinformatics

Course 2014-15

Page 2: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

2 Sebastián E. Ramos-OnsinsMolecular Evolution

Representation of the genealogical relationships

among species, genes, population or even

individuals.

Phylogeny:

Ziheng Yang (2006)

Page 3: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

3 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

A tree is a graphical representation of the relationships between

lineages using a tree structure in nodes and branches.

Rooted vs Unrooted Trees:

1

2

3

4

5

6

12

3

4

5

6

Page 4: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

4 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Cladogram vs Phylogram Trees:

1

2

3

4

5

6

1

2

3

4

5

6

Qualitative Lengths are represented

Page 5: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

5 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Unsolved vs resolved Trees:

Star Tree Partially resolved Tree Resolved Tree

Page 6: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

6 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Species vs Gene Trees:

1

2

3

4

5

6

1

2

3

4

5

6

Based on multiple information

of the species

Based on a single or few regions of

(ex.) DNA of the species

Page 7: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

7 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Ultrametric and AdditiveTrees: (not excludent)

1

2

3

4

5

6

Ex: d45 <= d43 = d53

The distances between any three

nodes connected by the same internal

node are equal.

d15 = d1i + dij + djk + dk5

The distances between species on the tips of

the tree are equal to the sum of the lengths

of the branches connecting them.

1

2

3

4

5

6

Page 8: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

8 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Let’s create a tree history using R:

Page 9: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

9 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Distance Methods

- Maximum Parsimony

- Maximum Likelihood

- Bayesian Inference

Page 10: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

10 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Distance Methods

- Maximum Parsimony

- Maximum Likelihood

- Bayesian Inference

Page 11: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

11 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Distance Methods

Two steps:

- Calculate the distance matrix.

- Reconstruct the phylogenetic tree from matrix.

Page 12: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

12 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Distance Methods

UPGMA (Unweighted Pair Group Method with Arithmetic Mean)

1 2 3 4

1 0

2 1 0

3 2 4 0

4 3 5 6 0

Page 13: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

13 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Distance Methods

UPGMA (Unweighted Pair Group Method with Arithmetic Mean)

1 2 3 4

1 0

2 1 0

3 2 4 0

4 3 5 6 0

3 4 5

3 0

4 6 0

5 3 4 0

Page 14: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

14 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Distance Methods

UPGMA (Unweighted Pair Group Method with Arithmetic Mean)

1 2 3 4

1 0

2 1 0

3 2 4 0

4 3 5 6 0

3 4 5

3 0

4 6 0

5 3 4 0

4 6

4 0

6 4.67 0

Page 15: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

15 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Distance Methods

UPGMA (Unweighted Pair Group Method with Arithmetic Mean)

node1 node2 go.to.n

ode

Div

1 1 - 5 0.5

2 2 - 5 0.5

3 3 - 6 1.5

4 4 - 7 2.33

5 2 1 6 1.0

6 5 3 7 0.83

7 6 4 - -

1

2

3

4

5

6

7

0.5

1.5

2.33

Page 16: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

16 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Distance Methods

NJ (Neighbour-Joining): Minimum evolution tree criterion based on the

smallest sum of total length branches.

Starting from a star-tree, join the two nodes that give the minimum length

distance, repeat the process until resolve the tree.

From Yang 2006

To calculate the distances, it is assumed they are additive.

Page 17: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

17 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Distance Methods

- Maximum Parsimony

- Maximum Likelihood

- Bayesian Inference

Page 18: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

18 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

-Maximum Parsimony:

-Criterion based on minimum evolution.

-The best tree is the tree with the minimum number of changes.

-Reconstruct all possible trees assigning values to the internal nodes and score the

trees according to the number of changes.

-Heuristic methods are necessary for large samples.

-Long Branch Attraction (LBA) is specially problematic in MP trees; MP trees support

wrong reconstructions in case having longer branches (join together).

A

A

AG

G G G

A

A G A Aa

b d

c a

b

d

c

Page 19: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

19 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Distance Methods

- Maximum Parsimony

- Maximum Likelihood

- Bayesian Inference

Page 20: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

20 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Maximum Likelihood:

-Criterion is the maximum probability tree.

-Calculate the probability of a tree for a given evolutionary model

-Computationally expensive calculations to obtain the ML tree.

-Nice statistical properties. Popular method and gives reasonable results.

Page 21: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

21 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Distance Methods

- Maximum Parsimony

- Maximum Likelihood

- Bayesian Inference

Page 22: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

22 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Bayesian Inference

Seek for a distribution of compatible trees with the highest probabilities

according to a given model and a prior distribution of the parameters included.

Main criticisms concerning the selection of the prior distributions.

Method also popular and gives reasonable results.

Based on the Bayes theorem (inverse probability theorem):

P(A|B) = P(A) x P(B|A)

P(B)

P(A) x P(B|A)

P(A) x P(B|A) + P(Ā) x P(B|Ā)=

Page 23: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

23 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Let’s do a simple tree reconstruction using R:

Page 24: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

24 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Different methods to contrast the support of phylogenetic trees

-Depending on the method of reconstruction (Bremer Support in MP)

-Non-parameteric methods of resampling (no model is assumed)

-Parametric methods (assuming a model)

Page 25: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

25 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Different methods to contrast the support of phylogenetic trees

-Depending on the method of reconstruction (Bremer Support in MP)

-Non-parameteric methods of resampling (no model is assumed)

-Parametric methods (assuming a model)

Page 26: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

26 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Non-parameteric methods of resampling (no model is assumed)

Jacknife

Bootstrap

-Draw a subset of the data

-This data is used to infer again the tree

-The support for the obtained tree is obtained from the number of

times the same clusters (nodes) are obtained in the

pseudoreplicates.

Page 27: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

27 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Non-parameteric methods of resampling (no model is assumed)

Jacknife

Bootstrap

Assumptions:

-Data size is large, so we have accurate estimates of the error.

-Each position (column in the alignment) is independent from each

other.

Results:

The resulted values are not directly a probability value but a support

value of the reliability of the obtained tree.

Page 28: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

28 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Non-parameteric methods of resampling (no model is assumed)

Bootstrap

1

2

3

4

5

Page 29: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

29 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Non-parameteric methods of resampling (no model is assumed)

Bootstrap

1234567

ATCTTCT

GTCTTCT

ATGATCC

ATGAACC

AGGAACC

1

2

3

4

5

Page 30: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

30 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Non-parameteric methods of resampling (no model is assumed)

Bootstrap

1234567

ATCTTCT

GTCTTCT

ATGATCC

ATGAACC

AGGAACC

Resampling

1137721

AACTTTA

GGCTTTG

AAGCCTA

AAGCCTA

AAGCCGA

1

2

3

4

5

Page 31: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

31 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Non-parameteric methods of resampling (no model is assumed)

Bootstrap

Resampling Do Tree

1

2

3

4

5

1

2

3

4

5

1234567

ATCTTCT

GTCTTCT

ATGATCC

ATGAACC

AGGAACC

1137721

AACTTTA

GGCTTTG

AAGCCTA

AAGCCTA

AAGCCGA

Page 32: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

32 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Non-parameteric methods of resampling (no model is assumed)

Bootstrap

Resampling Do Tree

1

2

3

4

5

1

2

3

4

5

+1

+1

+0

1234567

ATCTTCT

GTCTTCT

ATGATCC

ATGAACC

AGGAACC

1137721

AACTTTA

GGCTTTG

AAGCCTA

AAGCCTA

AAGCCGA

Page 33: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

33 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Non-parameteric methods of resampling (no model is assumed)

Bootstrap

Resampling Do Tree

1

2

3

4

5

1

2

3

4

5

+1

+1

+0

… and repeat again n times!

1234567

ATCTTCT

GTCTTCT

ATGATCC

ATGAACC

AGGAACC

1137721

AACTTTA

GGCTTTG

AAGCCTA

AAGCCTA

AAGCCGA

Page 34: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

34 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Let’s do a Bootstrap analysis using R:

Page 35: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

35 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Different methods to contrast the support of phylogenetic trees

-Depending on the method of reconstruction (Bremer Support in MP)

-Non-parameteric methods of resampling (no model is assumed)

-Parametric methods (assuming a model)

- Parametric bootstraping

- Bayesian Inference

Page 36: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

36 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Different methods to contrast the support of phylogenetic trees

-Depending on the method of reconstruction (Bremer Support in MP)

-Non-parameteric methods of resampling (no model is assumed)

-Parametric methods (assuming a model)

- Parametric bootstraping

Repetition of phylogeny based on a given model

Page 37: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

37 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Different methods to contrast the support of phylogenetic trees

-Depending on the method of reconstruction (Bremer Support in MP)

-Non-parameteric methods of resampling (no model is assumed)

-Parametric methods (assuming a model)

-Bayesian Inference

-Bayesian inference itself collects compatible trees assuming

the uncertainty of the tree

Page 38: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

38 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Phylogenomics: An approach to obtain the Species Tree

In case the speciation process is close among species, a gene tree can give

an erroneous topology:

Page 39: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

39 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Phylogenomics: An approach to obtain the Species Tree

In case the speciation process is close among species, a gene tree can give

an erroneous topology:

Incomplete Lineage Sorting

Anomalous Region

Page 40: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

40 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Phylogenomics: An approach to obtain the Species Tree

-Having a large number of regions (or also information from different

sources) can help to solve the incongruence.

-Heuristic methods based on a Supermatrix (concatenate all regions as

one) or on a Supertree (make a single tree from individual trees) are used.

-Likelihood-based methods are computationally expensive but are

statistically well supported.

Page 41: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

41 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Let’s try to obtain the species Tree using the library phybase in R:

Page 42: MSc in Bio informaticsmscbioinformatics.uab.cat/base/documents/... · MScin Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics Molecular Evolution 22 Sebastián

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

42 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Use of phylogenies for different objectives:

- Ancestral sequence reconstruction

- Dating ancestral events

- Detection of selection (Syn vs Nsyn positions)

- Correlation of the phylogenetic signal with phenotypic Traits