BMI/CS 776 Lecture 2Chromosome breaks/ joins • Breakage • Double-stranded cut • Causes:...

44
BMI/CS 776 Lecture 2 Colin Dewey 2007.01.25

Transcript of BMI/CS 776 Lecture 2Chromosome breaks/ joins • Breakage • Double-stranded cut • Causes:...

  • BMI/CS 776Lecture 2

    Colin Dewey2007.01.25

  • Today

    • The biology of nucleic acids• Trees• Homology forests

  • What is life?

    • living cell: membrane with genetic material

    • organism: one or more connected cells

    membrane chromosome

    genome

  • Central dogma

    • Complicated regulation at each step• Regulatory cycles

    DNA RNA Proteintranscription translation

  • DNA Composition

    Deoxyadenosine monophosphate

    polynucleotide

  • DNA bases

  • DNA structure

  • The importance of pairing

    • Complementation -> Replication

    “the the specific pairing we have postulated immediately suggests a possible copying mechanism

    for the genetic material”- Watson & Crick (1953)

  • DNA Replication

  • RNA

    • Differences• Ribose sugar, not deoxyribose• Uracil (U) instead of thymine• Single stranded

    • Enzymatic activity -> “RNA World”adenosine

    monophosphate

  • Important Enzymes

    • RNA Polymerase: RNA->RNA, DNA->RNA• Primase: DNA->RNA• Reverse transcriptase: RNA->DNA• Telomerase: RNA->DNA

  • Mutation

    • Substitutions• Insertions• Deletions• Rearrangements

  • Base mispairing

    • Base not paired with complement• Causes: replication error, radiation damage,

    chemical mutagen

    • CpG: highly mutable in animals• Causes substitution mutations

  • Replication Slippage

    • Strand separation during replication• Re-pairing at wrong place• Repair results in insertion or deletion• Common in repetitive regions

  • Recombination - start

    • Interaction of highly similar regions of DNA

    • Formation of Holliday junction

    • Junction migration

  • Recombination - end

    • Junction resolved• Possible outcomes• No crossing over• Crossing over• Gene conversion• No gene conversion

  • Recombination results

    • Many types of mutations can occur due to recombination:

    • inversion• insertion• deletion• chromosome fissions/fusions

  • Chromosome breaks/joins

    • Breakage• Double-stranded cut• Causes: radiation damage, endonucleases

    • Joining• Ligase

    • Mutations: inversions, transpositions, fusions, fissions

  • Mutation fixation

    • Whether or not a mutation becomes frequent in population depends on natural selection and random drift

    • Multi-cellular organisms: mutations must occur in germline to have evolutionary effect

    • Key distinction between probability of mutation and probability of fixation

  • Mutation summaryclass causes

    substitution base mispairing

    insertion base mispairing, recombination

    deletion base mispairing, recombination

    rearrangement recombinationchromosome breaks/joins

  • Homology

    • Common ancestry• Characters• morphological• molecular

    • Richard Owen: “the same organ in different animals under every variety of form and function”

  • Nucleotide homology

    • What is a evolutionary character?• Position in DNA or RNA

    • Single-stranded characters• Properties: position, state• x is a “copy” of y if x was initially base-

    paired with y during template-dependent synthesis

  • Double-stranded case

    • double-stranded character x• comprised of x+ and x- (single-stranded)• properties: position, state, orientation (+ or -)• x (ds) a copy of y (ss) if x+ or x- a copy of y• y (ss) a copy of x (ds) if y a copy of x+ or x- • x (ds) a copy of y (ds) if x+ or x- a copy of x+ or

    x-

  • Mutation

    • Changes character states or positions, not relationships

    • Repair after damage: can create new relationships if template-driven

  • Nucleotide homology

    • x is “derived” from y if x1, x2, ... , xT exists s.t. y = x1 and x = xT and xi+1 is a copy of xi

    • x is “homologous” to y if there existed a character z s.t. both x and y are derived from z

  • Refinements of homology

    • Not all homology relationships are equal• Fitch: orthology, paralogy, xenology• Each has different biological implications

  • Xenology

    • Result of horizontal transfer

    XA1 XC XB

    Xanc

    XA2

  • Refinements of Homology ancestor A

    species A species B

    XA1 XA2 XB

    Xanc Yanc

    YA YB ZA ZB1 ZB2

    Zanc

    Orthologous: Diverged from LCA due to a speciation event

    Paralogous: Diverged from LCA due to a duplication eventFitch,1970:

  • • Duplication is directed if removing one of A or A’ from G’ does not give G

    • Examples• retrotransposition• segmental duplications

    • Evolutionary consequence: source more likely to retain ancestral role

    Directed Duplications

    targetsource

    G

    G’A A’

  • • Duplication is undirected if removal of A from G’ gives G and removal from A’ from G’ gives G

    • Examples• tandem duplication• whole genome/chromosome

    duplication

    • Evolutionary consequence: both copies under very similar evolutionary pressures

    Undirected Duplications

    G

    G

    G’A A’

    G’A

    A’

  • Topoorthology

    • Characters x and x’ are topoorthologous if they are orthologous and neither is derived from the target of a directed duplication since the time of the last common ancestor of x and x’

    • “topo” = position: topoorthologs are more likely have similar genomic contexts

  • Monotopoorthology

    • Characters x and x’ are monotopoorthologous if they are topoorthologous and neither is derived from an undirected duplication since the time of the last common ancestor of x and x’.

    • Only transitive subrelation of homology• One-to-one relation

  • Refinements of Orthology ancestor

    species A species B

    directed duplication

    targetsource

    undirected duplication

    XA1 XA2 XB

    Xanc Yanc

    YA YB ZA ZB1 ZB2

    Zanc

    Topoorthologs: (XA1, XB), (XA2, XB)Monotopoorthologs: (YA, YB), (ZA, ZB1)

  • Trees in biology

    Darwin’s first tree (1837)

  • Graph basics

    • Graph: vertices (V), edges (E)• Path, Cycle, Connected• Degree, Leaf• Forest, Tree, Binary Tree

  • Phylogenetic X-trees

    • Phylogenetic X-tree• Weighted Phylogenetic X-tree• Binary Phylogenetic X-tree• Rooted Phylogenetic X-tree• Binary Rooted Phylogenetic X-tree

  • Dissimilarity Maps

    • Dissimilarity maps• Connection between dissimilarity maps and

    weighted phylogenetic X-trees

    • Metrics• Tree metrics weighted phylogenetic X-

    trees

    • Four point condition tree metric

  • Is there a species tree?

    • Not really!• Untree-like mechanisms• Hybrid speciation• Horizontal transfer (xenology)• Incomplete lineage sorting

  • Incomplete lineage sorting

    Pollard et al., 2006, PLoS Genetics

  • Nucleotide trees

    • Relationships between nucleotides are trees!• nucleotide position has one parent

    • Exceptions (mispairing)• heteroduplex DNA from recombination• replication slippage

  • Homology forest

    1.3. HOMOLOGY FORESTS 17

    time to common ancestry of a set of related genes [Hartl and Clark, 1997].Exceptions to the nucleotide tree model occur due to mispairing events

    in which one nucleotide is paired with neither its parent nor its child (Sec-tion 1.1). A double-stranded position with a mispairing of this type thushas two parents. Recombination and replication slippage are two mecha-nisms that can cause such mispairings. However, these exceptions are notimportant from an evolutionary standpoint. In the case of recombination,we may assume that either no heteroduplex DNA (which contains these mis-pairings) is formed, or that heteroduplex DNA is always repaired by excisingone strand of the region, rather than just those positions that have incorrectcomplementary bases. Replication slippage can be similarly explained awayby assuming excision of one strand of the mispaired region, and replacementvia the opposite strand. Although these assumptions are violated, the se-quences that result from these assumed scenarios are indistinguishable fromthose that occur naturally.

    Homology forests

    We now formally define the notion of a homology forest, which represents allevolutionary relationships between nucleotide positions. We use the notationσai to denote the i

    th element of a sequence σa = σa1 , . . . ,σan of length n. Letσ = σa1 , . . . ,σan denote the complement of σ, where σai is the character withstate complementary to σai . That is, σ

    ai and σai are double-stranded DNA

    characters with identical position and orientations of + and −, respectively.By a set of sequence characters S = {σ1, . . . ,σk} we mean the set of n1 +n2 · · · + nk sequence characters that form the sequences σ1,σ2, . . . ,σk oflengths n1, n2, . . . , nk, respectively. Lastly, for S = {σ1, . . . ,σk}, let S ={σ1, . . . ,σk}.

    Definition 1.15 Given a set of sequence characters S, a homology forest,F is a forest with leaves labeled by S ∪ S and with at most one of σai andσai used as a leaf label, ∀σai ∈ S. A phylogenetic X-tree in F represents theevolutionary history of a set of homologous sequence characters, X.

    Given a set of sequences, the multiple alignment problem consists ofconstructing the homology forest for those sequences. In order to do so weneed to model the types of events described in Section 1.

    Notes

  • Multiple alignment

    • Given set S of sequences• Multiple alignment = construction of

    homology forest on S

    • Further: annotated homology forest

  • Annotated Trees of Life• Alignment trees are annotated with duplications (undirected

    or directed), speciations, and horizontal transfers

    X1

    • Alignment trees:

    W38 Y42

    Y30

    X3 Z15

    Z67 X87

    Z9 W2Y4

    X10 Z23

    speciation directed duplication undirected duplicationhorizontal transfer

  • Next time

    • Topic: “Modeling nucleotide evolution”• Reading: Text 8.1-8.3