Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)
-
Upload
camden-willets -
Category
Documents
-
view
256 -
download
0
Transcript of Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)
Machine Learning& Bioinformatics
Machine Learning & Bioinformatics 1
Tien-Hao Chang (Darby Chang)
Machine Learning & Bioinformatics 2
Molecular biology Nucleic acid
– DNA
– RNA
Central dogma– Transcription
– Translation
Protein– Amino acid
– Primary structure
– Secondary structure
– Tertiary structure
Nucleic acid A nucleic acid is a macromolecule composed
of chains of monomeric nucleotide In biochemistry these molecules carry genetic
information or form structures within cells The most common nucleic acids are
deoxyribonucleic acid (DNA) and ribonucleic
acid (RNA)
Machine Learning & Bioinformatics 3
http://juang.bst.ntu.edu.tw/BC2008/images/NA%20Fig1.jpg
Nucleic acid components
Sugar
Machine Learning & Bioinformatics 5
http://www.mun.ca/biology/scarr/Fg10_09b_revised.gif
Nucleic acid components
Base Purine
–Adenine (A) and guanine (G)
Pyrimidine–Thymine (T), cytosine (C)
–Uracil (U, only in RNA)
Machine Learning & Bioinformatics 6
http://www.elmhurst.edu/~chm/vchembook/images/580bases.gif
http://fig.cox.miami.edu/~cmallery/150/chemistry/sf3x14a.jpg
DNA Chemically, DNA is a long polymer of simple units
called nucleotides, with a backbone made of sugars
and phosphate groups joined by ester bonds Attached to each sugar is one
of four types of molecules
called bases It is the sequence of these four
bases along the backbone that
encodes informationMachine Learning & Bioinformatics 9
http://upload.wikimedia.org/wikipedia/commons/8/87/DNA_orbit_animated_small.gif
DNA
Base pairing Each type of base on one strand forms a bond with
just one type of base on the other strand Here, purines form hydrogen bonds to pyrimidines,
with A bonding only to T, and C bonding only to G DNA sequence
– 5’CpGpCpApApTpT
3’TpTpApApCpGpC
– CGCGAATT
Machine Learning & Bioinformatics 10
http://www.ucl.ac.uk/~sjjgsca/NucleotidePairing.jpg
http://www.coe.drexel.edu/ret/personalsites/2005/dayal/curriculum1_files/image001.jpg
Double helix
Hydrogen bond A hydrogen bond exists between an electronegative atom and
a hydrogen atom bonded to another electronegative atom This type of force always involves a hydrogen atom and the
energy of this attraction is close to that of weak covalent
bonds (155 kJ/mol), thus the name – Hydrogen Bonding Biological functions
– DNA/RNA base paring
– protein secondary/tertiary structure formation
– some properties of water molecule
– antibody-antigen (and other protein-protein) binding
Machine Learning & Bioinformatics 13
http://upload.wikimedia.org/wikipedia/commons/4/43/Liquid_water_hydrogen_bond.png
Hydrogen bond is resulted from electronegativity
http://courses.biology.utah.edu/horvath/biol.3525/1_DNA/Fig2/marty_1.jpg
Grooves
DNA structure
Machine Learning & Bioinformatics 16
http://www.youtube.com/watch?v=qy8dk5iS1f0&NR=1
Any Questions?
Machine Learning & Bioinformatics 17
About DNA
http://fig.cox.miami.edu/~cmallery/255/255hist/mcb4.1.dogma.jpg
Central dogma
Central dogma The process by witch information is extracted from
the nucleotide sequence of a gene and then used to
make a protein is essentially the same for all living
things on Earth
and is described by the grandly
named central dogma of
molecular biology Information in cells passes from
DNA to RNA to proteinsMachine Learning & Bioinformatics 19
http://upload.wikimedia.org/wikipedia/commons/3/3a/Crick's_1958_central_dogma.svg
RNA Information stored from DNA is used to make a more
transient, single-stranded polynucleotide called RNA
(Ribonucleic Acid) RNA is very similar to DNA, but differs in a few important
structural details– in the cell RNA is usually single stranded, while DNA is usually
double stranded
– RNA nucleotides contain ribose while DNA contains deoxyribose
(a type of ribose that lacks one oxygen atom)
– in RNA the nucleotide uracil substitutes for thymine, which is
present in DNAMachine Learning & Bioinformatics 20
http://www.dadamo.com/wiki/dna-rna.png
Central dogma
Transcription Transcription is the synthesis of RNA under the
direction of DNA Both nucleic acid sequences use the same
language, and the information is simply
transcribed, or copied DNA sequence is copied by RNA polymerase to
produce a complementary nucleotide RNA
strand, called messenger RNA (mRNA)Machine Learning & Bioinformatics 22
DNA transcription
Machine Learning & Bioinformatics 23
http://www.youtube.com/watch?v=vJSmZ3DsntU
Transcription detail
Machine Learning & Bioinformatics 24
http://www-class.unl.edu/biochem/gp2/m_biology/animation/m_animations/gene2.swf
RNA
Various types mRNA
– messenger RNA (mRNA) is the RNA that carries
information from DNA to the ribosome
– the coding sequence of the mRNA determines the
amino acid sequence in the protein that is produced
Non-coding RNA
Machine Learning & Bioinformatics 25
Various RNA types
Non-coding RNA Many RNAs do not code for protein These ncRNAs encode in specific genes (RNA
genes) or mRNA introns The most common ncRNAs are transfer RNA
(tRNA) and ribosomal RNA (rRNA) Other ncRNAs such as microRNA (miRNA)
involve in post-transcriptional gene regulation
Machine Learning & Bioinformatics 26
http://eurheartj.oxfordjournals.org/content/vol0/issue2010/images/large/ehp57301.jpeg
Central dogma
Translation Translation is the second stage of protein
biosynthesis Translation occurs in the cytoplasm where the
ribosomes are located In translation, mRNA is decoded to produce a
specific polypeptide according to the rules
specified by the genetic code
Machine Learning & Bioinformatics 28
From RNA to protein synthesis
Machine Learning & Bioinformatics 29
http://www.youtube.com/watch?v=NJxobgkPEAo
Protein translation
Machine Learning & Bioinformatics 30
http://www.youtube.com/watch?v=nl8pSlonmA0
http://biology.kenyon.edu/courses/biol114/Chap05/code.gif
Any Questions?
Machine Learning & Bioinformatics 32
About central dogma
Protein
Machine Learning & Bioinformatics 33
Protein Proteins are large organic compounds made of amino
acids arranged in a linear chain and joined together
by peptide bonds between the carboxyl and amino
groups of adjacent amino acid residues Proteins can also work together to achieve a
particular function, and they often associate to form
stable complexes
Machine Learning & Bioinformatics 34
Protein
Amino acid In chemistry, an amino acid is a molecule that
contains both amine and carboxyl functional
groups In biochemistry, this term refers to alpha-
amino acids with the general formula
H2NCHRCOOH, where R is an organic
substituent
Machine Learning & Bioinformatics 35
http://upload.wikimedia.org/wikipedia/commons/thumb/c/ce/AminoAcidball.svg/702px-AminoAcidball.svg.png
Amino acid
Various side chains The various alpha amino acids differ in which
side chain (R group) is attached to their alpha
carbon They can vary in size from just a hydrogen
atom in glycine through a methyl group in
alanine to a large heterocyclic group in
tryptophan
Machine Learning & Bioinformatics 37
http://upload.wikimedia.org/wikipedia/commons/thumb/3/37/Aa.svg/2000px-Aa.svg.png
http://juang.bst.ntu.edu.tw/BC2008/images/Amino%281%29%202007/A1-7.JPG
http://juang.bst.ntu.edu.tw/BC2008/images/Amino%281%29%202007/A1-9.JPG
Machine Learning & Bioinformatics 41
http://www.russell.embl-heidelberg.de/aas/other_images/lb3.gif
Amino acid
The building blocks of proteins Amino acids combine in a condensation
reaction and the new “amino acid residue” are held together by peptide bonds
Proteins are defined by their unique sequence of residues (primary structure)
As the letters form various words, amino acids form a vast variety of sequences/proteins
Machine Learning & Bioinformatics 42
http://upload.wikimedia.org/wikipedia/commons/thumb/6/6d/Peptidformationball.svg/2000px-Peptidformationball.svg.png
http://juang.bst.ntu.edu.tw/BC2008/images/Amino(1)%202007/A1-11.JPG
http://juang.bst.ntu.edu.tw/BC2008/images/Amino(1)%202007/A1-13.JPG
Protein
After knowing amino acids Amino acids form short polymer chains called
peptides or longer chains called either
polypeptides or proteins The process of such formation from an mRNA
template (obeying genetic code) is known as
translation, which is part of protein
biosynthesis
Machine Learning & Bioinformatics 46
Protein structure hierarchy
Machine Learning & Bioinformatics 47
http://cropandsoil.oregonstate.edu/classes/css430/lecture%209-07/figure-09-03.JPG
http://juang.bst.ntu.edu.tw/BC2008/images/Protein(1)%202007/P1-4.JPG
http://juang.bst.ntu.edu.tw/BC2008/images/Protein(1)%202007/P1-8.JPG
50
http://juang.bst.ntu.edu.tw/BC2008/images/Protein(1)%202007/P1-9.JPG
Protein structure hierarchy
Secondary structure In biochemistry and structural biology,
secondary structure is the general three-
dimensional form of local segments of
biopolymers such as proteins and nucleic acids It does not, however, describe specific atomic
positions in three-dimensional space, which
are considered to be tertiary structure
Machine Learning & Bioinformatics 52
http://juang.bst.ntu.edu.tw/BC2008/images/Protein(2)%202007/P2-3.JPG
Protein structure hierarchy
Tertiary structure The three-dimensional structure of a protein or
any other macromolecule, as defined by the
atomic coordinates Describe the spatial relations among it
secondary structures Tertiary structure is considered to be largely
determined by the protein’s primary sequence
Machine Learning & Bioinformatics 54
Protein tertiary structure
Experiment techniques The majority of protein structures have been
solved with X-ray crystallography The second common way is NMR (Nuclear
Magnetic Resonance)– lower resolution
– limited to small proteins
– provide time-dependent information in solution
Machine Learning & Bioinformatics 55
http://campusapps.fullerton.edu/news/arts/2003/photos/protein-art.jpg
Protein structure hierarchy
Quaternary structure Many proteins are actually
assemblies of more than one
polypeptide chain, which in the
context of the larger assemblage
are known as protein subunits In addition to the tertiary structure
of the subunits, multiple-subunit
proteins possess a quaternary
structure, which is the arrangement
into which the subunits assembleMachine Learning & Bioinformatics 57
http://courses.cm.utexas.edu/jrobertus/ch339k/overheads-1/ch6_quat-struct1.jpg
Protein sub-structure
Machine Learning & Bioinformatics 58
Protein sub-structure
Domain A part of protein sequence
and structure that can
evolve, function, and exist
independently About 25–500 aa Often form functional
units
Machine Learning & Bioinformatics 59
http://upload.wikimedia.org/wikipedia/commons/6/67/1pkn.png
http://upload.wikimedia.org/wikipedia/commons/7/79/Zinc_finger_DNA_complex.png
Zinc fingers are small protein structural motifs that can coordinate zinc ions to help stabilize their folds
Protein sub-structure
Motif A sequence motif indicate a nucleotide or
amino-acid sequence pattern that is
widespread and often has a biological
significance For proteins, a sequence motif is distinguished
from a structural motif, a motif formed by the
three dimensional arrangement of amino acids,
which may not be adjacentMachine Learning & Bioinformatics 61
Protein sub-structure
Structure motif A 3D structural element or fold, which appears
also in a variety of other molecules In the context of proteins, the term is
sometimes used interchangeably with
“structure domain,” although a domain need
not be a motif nor, if it contains a motif, need
not be made up of only one
Machine Learning & Bioinformatics 62
http://www.biomedcentral.com/content/figures/1471-2164-8-60-8.jpg
http://juang.bst.ntu.edu.tw/BC2008/images/Protein(1)%202007/P1-3.JPG
Molecular biology
Reference 台大莊榮輝教授網站
– http://juang.bst.ntu.edu.tw/BC2008/index.htm
交大分子生物學網站– http://www.life.nctu.edu.tw/~mb/c40101.htm
Machine Learning & Bioinformatics 66
Any Questions?
Machine Learning & Bioinformatics 67
About molecular biology