Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

34
Bioinformatics

Transcript of Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Page 1: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Bioinformatics

Page 2: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Not only small molecules and QM, MM techniques rule the world.

Page 3: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Central dogma of molecular biology

• Term is due to Francis Crick• The conversion DNA →

protein is not direct, RNA is involved

• DNA is the information store, RNA is messenger (mRNA), transporter (tRNA), biomolecular nanomachine (rRNA)

source: wikipedia.org

Page 4: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Nucleic acids• four letters (DNA, RNA)• sequence - AACTAACG (5’ → 3’)• DNA – double helix• RNA – “single stranded” helix, folding (double helical

regions, C2’ -OH → secondary and tertiary motifs)

Page 5: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

nucleoside

nucleotide

Page 6: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

B-DNA A-DNA Z-DNA

B

A

Z

Page 7: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

RNA secondary motifs

Nowakowski and Tinoco, Seminars in Virology 8, 153, 1997.

Page 8: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

RNA

source: http://complex.upf.es/~josep/RNA.jpg, http://www.biosci.ki.se/groups/ljo/images/phe_trna_large.jpg, http://rna.ucsc.edu/rnacenter/images/70s_atrna.jpg

Page 9: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Proteins• 20 letters• primary structure - sequence AMNTSSTVG (N-end → C-

end)

Alberts, Molecular Biology of the Cell, 5th Ed.

Page 10: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

• secondary structure (random coil, -helix,

β-sheet, loops)• several secondary structure elements

form motifs• e.g. greek key, β-α-β, HTH

Page 11: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

• tertiary structure (the arrangements of motifs into domain/s)

• quartenary structure (multimeric complexes)

Page 12: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Proteins

source:http://calstate.fullerton.edu/news/arts/2003/photos/protein-art.jpg

Page 13: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Proteins

source: Petsko, Ringe – Protein structure and function

Page 14: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

http://www.cellsignal.com/reference/pathway/NF_kappaB.html

Page 15: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Systems biology

• focuses on the systematic study of complex interactions in biological systems using a new perspective - holism instead of reductionism • holism – the properties of a system cannot be determined or

explained by its component parts alone

• one of the goals of systems biology is to discover new emergent properties

• new field, boom since 2000, very little covered in CZ

Page 16: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Systems biology

source: wikipedia.org

Page 17: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Systems biology• based on mathematical modelling of systems, control

theory, cybernetics• engineering view on complex biological systems• e.g. answers questions about robustness of the given

system when one of its part fails• or about response of a systems upon the change of the

environmental conditions

Page 18: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

quantum chemistry

molecular dynamics

bioinformatics

systems biology

Page 19: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Bioinformatics

• application of information technology to the field of molecular biology, genomics and related biological disciplines

• tremendous amount of data• the creation and advancement of databases, algorithms,

computational and statistical techniques, and theory to solve problems arising from the management and analysis of biological data

Page 20: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Podle definičního třídění ruských vědců rozlišujeme dva obory paranormálních jevů: bioinformatika a bioenergetika. Bioinformatika (tzn. mimosmyslové vnímání, ESP) zahrnuje získávání a výměnu informací mimosmyslovou cestou (nikoli normálními smyslovými orgány). V podstatě rozlišujeme následující formy bioinformace: hypnózu (kontrolu vědomí), telepatii, dálkové vnímání, prekognici, retrokognici, mimotělní zkušenost, "vidění" rukama nebo jinými částmi těla, inspiraci a zjevení.

zdroj: http://www.esoterika.cz/clanek/2992-mimosmyslova_spionaz_dalkove_pozorovani_i_.htm

Page 21: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Bioinformatics

• sequence analysis (sequence bioinformatics)• structural analysis (structural bioinformatics)• functional analysis (systems biology)

Page 22: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

• genetic code• gene• genome, genomics

• large data sets• high throughput

• human genome• DNA localized mainly in nucleus, each nucleus carries the

whole genetic information• 3.2 billions bp• 25 000 – 30 000 genes• ca 1,5 % codes for proteins, the rest - junk DNA

• what is proteome?• proteomics

• Is it more difficult to study genome or proteome?

Page 23: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Sequential bioinformatics

• reconstruction of sequence fragments• searching of genes and other interesting regions in the genome• junk DNA – 95% of human genome is made by non-coding

sequences, either no function, or not yet understood

• querying huge genomes for a given sequence• comparison of genes within a specie – similarities between protein functions

• comparison of genes between species – organism's evolutionary relationships (phylogenetic analysis)

Page 24: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Sequence alignment• Procedure of comparing sequences• Point mutations – easy

• More difficult example

• However, gaps can be inserted to get something like this

ACGTCTGATACGCCGTATAGTCTATCTACGTCTGATTCGCCCTATCGTCTATCT

ACGTCTGATACGCCGTATAGTCTATCTCTGATTCGCATCGTCTATCT

ACGTCTGATACGCCGTATAGTCTATCT----CTGATTCGC---ATCGTCTATCT

gapless alignment

gapped alignmentinsertion × deletionindel

Page 25: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Flavors of sequence alignment

pair-wise alignment × multiple sequence alignment

Page 26: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Flavors of sequence alignment

global alignment × local alignment

global

local

align entire sequence

stretches of sequence with the highest density of matches are aligned, generating islands of matches or subalignments in the aligned sequences

Page 27: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Identity matrix

Scoring systems I• DNA and protein sequences can be aligned so that the

number of identically matching pairs is maximized.

• Counting the number of matches gives us a score (3 in this case). Higher score means better alignment.

• This procedure can be formalized using substitution matrix.

A T T G - - - TA – - G A C A T

A T C G

A 1

T 0 1

C 0 0 1

G 0 0 0 1

Page 28: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Scoring systems II• For nucleotide sequences identity matrix is usually good

enough.• For protein sequences, identity matrix is not sufficient to

describe biological and evolutionary proceses.• It’s because amino acids are not exchanged with the same

probability as can be conceived theoretically.• For example substitution of aspartic acids D by glutamic acid E

is frequently observed. And change from aspartic acid to tryptophan W is very rare.

• Why is that?1. Triplet-based genetic code

GAT (D) → GAA (E), GAT (D) → TGG (W)

2. Both D and E have similar properties, but D and W differ considerably. D is hydrophylic, W is hydrophobic, D → W mutation can greatly alter 3D structure and consequently function.

Page 29: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Substitution matrices

small, polar

small, nonpolar

polar or acidic

basic

large, hydrophobic

aromatic

Zvelebil, Baum, Understanding bioinformatics.

Positive score – frequency of substitutions is greater than would have occurred by random chance.

Zero score – frequency is equal to that expected by chance.

Negative score – frequency is less than would have occurred by random chance.

Page 30: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Sequence database search

BLAST

Google of sequence world

Page 31: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Phylogenetic analysis

Page 32: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Structural bioinformatics• the function of chemical moiety is given by its structure• while DNA structure is “given” (double-helix), RNA and

proteins can accommodate very different conformations (i.e. specific arrangements of atoms in 3D space)

• structural bioinformatics covers• analysis of the NA and proteins structure • prediction of structure from the sequence

Page 33: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Protein structure prediction• secondary structure prediction

• the conformational state of each residue is predicted as H (helix), E (extended, β-sheet), C (coil)

• accuracy: 80%

• tertiary structure prediction• why?

• many sequences are known, not that many 3D structures has been solved

• some proteins (e.g. transmembrane) are difficult to characterize experimentally

• many proteins have known function, but unknown structure (which is however needed to understand their behavior in detail)

• ab initio, threading, homology modelling

Page 34: Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

CASP

• Critical Assessment of Structure Prediction• http://predictioncenter.org/• since 1994, every 2 years, CASP10 in preparation

• predict solved, but not publicly released structures

• competition of individual groups in 3D prediction:• human groups – answer in 14 days• software (automated prediction) – answer in 48 hours