. Class 1: Introduction. The Tree of Life Source: Alberts et al.

Post on 19-Dec-2015

218 views 2 download

Tags:

Transcript of . Class 1: Introduction. The Tree of Life Source: Alberts et al.

.

Class 1: Introduction

The Tree of Life

Sou

rce:

Alb

erts

et

al

The Cell

Example: Tissues in Stomach

DNA Components

Four nucleotide types: Adenine Guanine Cytosine Thymine

Hydrogen bonds: A-T C-G

The Double HelixS

ourc

e: A

lber

ts e

t al

DNA DuplicationS

ourc

e: M

ath

ews

& v

an H

old

e

DNA OrganizationS

ourc

e: A

lber

ts e

t al

Genome Sizes

E.Coli (bacteria) 4.6 x 106 bases Yeast (simple fungi) 15 x 106 bases Smallest human chromosome 50 x 106 bases Entire human genome 3 x 109 bases

GenesThe DNA strings include: Coding regions (“genes”)

E. coli has ~4,000 genes Yeast has ~6,000 genes C. Elegans has ~13,000 genes Humans have ~32,000 genes

Control regions These typically are adjacent to the genes They determine when a gene should be

expressed “Junk” DNA (unknown function)

Transcription

Coding sequences can be transcribed to RNA

RNA nucleotides: Similar to DNA, slightly different backbone Uracil (U) instead of Thymine (T)

Sou

rce:

Mat

hew

s &

van

Hol

de

RNA Editing

RNA EditingS

ourc

e: M

ath

ews

& v

an H

old

e

RNA roles

Messenger RNA (mRNA) Encodes protein sequences

Transfer RNA (tRNA) Adaptor between mRNA molecules and amino-

acids (protein building blocks) Ribosomal RNA (rRNA)

Part of the ribosome, a machine for translating mRNA to proteins

...

Transfer RNA

Anticodon: matches a codon (triplet of mRNA nucleotides)

Attachment site: matches a specific amino-acid

Translation

Translation is mediated by the ribosome Ribosome is a complex of protein & rRNA

molecules The ribosome attaches to the mRNA at a

translation initiation site Then ribosome moves along the mRNA sequence

and in the process constructs a poly-peptide When the ribosome encounters a stop signal, it

releases the mRNA. The construct poly-peptide is released, and folds into a protein.

Translation

Sou

rce:

Alb

erts

et

al

Translation

Sou

rce:

Alb

erts

et

al

Translation

Sou

rce:

Alb

erts

et

al

Translation

Sou

rce:

Alb

erts

et

al

Translation

Sou

rce:

Alb

erts

et

al

Genetic Code

Protein Structure

Proteins are poly-peptides of 70-3000 amino-acids

This structure is (mostly) determined by the sequence of amino-acids that make up the protein

Protein Structure

Evolution

Related organisms have similar DNA Similarity in sequences of proteins Similarity in organization of genes along the

chromosomes Evolution plays a major role in biology

Many mechanisms are shared across a wide range of organisms

During the course of evolution existing components are adapted for new functions

Evolution

Evolution of new organisms is driven by Diversity

Different individuals carry different variants of the same basic blue print

Mutations The DNA sequence can be changed due to

single base changes, deletion/insertion of DNA segments, etc.

Selection bias

Course Goals

Computational tools in molecular biology

We will cover computational tasks that are posed by modern molecular biology

We will discuss the biological motivation and setup for these tasks

We will understand the the kinds of solutions exist and what principles justify them

Four Aspects

Biological What is the task?

Algorithmic How to perform the task at hand efficiently?

Learning How to adapt parameters of the task form

examples

Statistics How to differentiate true phenomena from

artifacts

Example: Sequence Comparison

Biological Evolution preserves sequences, thus similar genes might

have similar function

Algorithmic Consider all ways to “align” one sequence against

another

Learning How do we define “similar” sequences? Use examples to

define similarity

Statistics When we compare to ~106 sequences, what is a random

match and what is true one

Topics I

Dealing with DNA/Protein sequences: Genome projects and how sequences are found Finding similar sequences Models of sequences: Hidden Markov Models Transcription regulation Protein Families Gene finding

Topics II

Gene Expression: Genome-wide expression patterns Data organization: clustering Reconstructing transcription regulation Recognizing and classifying cancers

Topics III

Models of genetic change: Long term: evolutionary changes among species Reconstructing evolutionary trees from current day

sequences Short term: genetic variations in a population Finding genes by linkage and association

Topics IV

Protein World: How proteins fold - secondary & tertiary structure How to predict protein folds from sequences data

alone How to analyze proteins changes from raw

experimental measurements (MassSpec) 2D gels

Class Structure

2 weekly meeting Class: Mondays 16-18 Targil: Tuesday 18-20

Grade: 60% in five question sets

Each contains theoretical problems & practical computer questions

40% test 5% bonus for active participation

Exercises & Handouts

Check regularly

http://www.cs.huji.ac.il/~cbio