Agraian Revolution Industrial Revolution Comp 555 ...

27
Comp 555 - BioAlgorithms - Spring 2018 Welcome to the 4th Revolution! Agraian Revolution Industrial Revolution Biotechnology Revolution Electronics/Digital Revolution

Transcript of Agraian Revolution Industrial Revolution Comp 555 ...

Page 1: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - BioAlgorithms - Spring 2018

Welcome to the 4th Revolution!

Agraian Revolution Industrial Revolution

Biotechnology RevolutionElectronics/Digital Revolution

Page 2: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

Comp 555’s Intended Audience● Suitable for undergraduate and graduate students● CS majors who want to learn bioinformatics● Non CS majors from the statistics, biology,

biostatistics, and general sciences who are interested in the algorithms. Problems,and approaches used in bioinformatics.

● BCB/BBSP students○ Starting next year COMP 555

will be co-listed as BCB 555

2

Page 3: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

Why?● Benefits for Computer Scientists

○ See CS fundamentals applied to real problems○ What computer scientists can learn from biology

■ Robust, parallel, self-repairing, and energy efficient● Benefits for Biologist Multidiciplinary

○ Help to close the CS-Bio “language” gap○ Appreciate CS as more than “coding”○ What is a correct algorithm? An efficient one?

● Growth Potential○ Bioinformatics is a very marketable skill○ Future of CS and Biology

3

Page 4: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

Bioinformatics = Biology + Information● What is information?

○ Information: that which resolves uncertainty○ Computer scientists measure information in “bits”○ Information = -log2(probability)

■ A coin is tossed and lands heads. How many bits?■ A 7 is rolled on a pair of dice. How many bits?■ A 3 is rolled? How many bits?

○ Information systems need mechanisms for■ Storing information (memory)■ Processing information (logic)■ Transporting information (networks/connectivity)

● Computer science is about information● How about biological systems? ● Are they Information systems?

4

6 out of 36 possible rolls are “7”s.Thus, a roll of 7 conveys:

-log2(6/36) = 2.58 bits

There are only 2 ways to roll a “3”,(1,2) or (2,1)

Thus, a roll of 2 conveys:-log2(2/36) = 4.17 bits

Page 5: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

Information in Biological Systems● Information is somehow passed

between successive generations of plants and animals

● Distinguishable traits are inherited (phenotypes)

● Latent (recessive) traits can be masked by dominant traits, yet reappear in later generations

● Laws of Heredity

● Basis for Genetics

5

Page 6: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

Where is “Biological” Information?● Enter the Age of Microbiology● Cells: the smallest structural unit

of a living organism capable of functioning independently

● Cell composition by weight○ 70% water○ 7% small molecules

salts, amino acids, nucleotides, lipids (fats, oils, waxes)○ 23% larger polymers

proteins, polysaccharides● Two cell types

○ Prokaryotes: bacteria (no nucleus)○ Eukaryotes: yeast, plants, and animals (with nucleus)

6

Prokaryote

Eukaryote

Page 7: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

Looking for the Source of Heredity● In 1879 Walther Fleming, a German anatomist, discovered

threadlike structures clearly visible during cell division. He called this material chromatin, which was later renamed chromosomes

● Zoologists Oskar Hertwig and Hermann Fol first observed the process of fertilization in detail in the early 1880s. In 1881, Edward Zacharias showed that chromosomes contained nucleic acids. In 1884, Hertwig wrote:

"nucleic acid is the substance that is responsible not only for fertilization but also for the transmissionof hereditary characteristics."

7

Fleming

Hertwig Fol

Page 8: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

Early 20th Century Genetics● In 1908, Thomas Hunt Morgan and Alfred H. Sturtevant

showed that genes were located on chromosomes. Experimenting with Drosophila (fruit flies) they found sex chromosomes, sex-linked traits, and crossing-over. They were able to associate mutations to specific chromosomal regions, thus mapping gene locations.

● By the 1930's biochemists knew that the nucleic acid present in chromosomes was DeoxyriboNucleic Acid, DNA. They also knew that chromosomes contained proteins in addition to DNA. DNA appeared to be long repetitive chains, and therefore, it seemed unlikely to carry information. Proteins, however, looked more interesting and were generally assumed to contain genetic materials. DNA was considered as just some sort of “glue”.

8

Morgan

Page 9: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

Inferring Genetic MapsEven without knowing the mechanisms of how heredity information is represented, clever scientists (Morgan) were able to “map” genes…

9

Page 10: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

Steps to Infer a Genetic Map● Verify Mendelian ratios

○ Brown (64 + 21 + 19 + 6 + 5 + 4 + 1 + 1) /484 = 0.250○ Blue-eyes (58 + 20 + 19 + 12 + 6 + 4 + 4 + 1) / 484 = 0.256○ Triangle-nose (54 + 21 + 20 + 12 + 6 + 5 + 4 + 1) / 484 = 0.254○ Short-finger (54 + 21 + 20 + 9 + 5 + 4 + 4 + 1) / 484 = 0.244

● Test pairwise linkages (we expect 1/4 × 1/4 × 484 ≈ 30 if independent)

○ Short-finger & triangle-nose 54 + 21 + 20 + 4 = 99○ Triangle nose & brown 21 + 6 + 4 + 1 = 32○ Short-finger & brown 21 + 5 + 4 + 1 = 31○ Blue-eyes & triangle-nose 20 + 12 + 6 + 4 = 42○ Short-finger & blue-eyes 20 + 4 + 4 + 1 = 29○ Brown & blue-eyes 19 + 6 + 4 + 1 = 30

● Indicates

○ Short-fingers & triangle-nose are closely linked○ Blue-eyes & triangle-nose are probably linked○ Short-finger & blue-eyes appear independent, thus

the triangle nose gene should lie between them○ Brown gene is likely to be on another chromosome

10

Page 11: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

DNA's Central Role● In 1944, Oswald Avery showed that one of those

“nucleic acids”, DNA, not proteins, carries hereditary information.

● In the late 1940’s and early 50’s Linus Pauling and associates develop modeling methods for simultaneously determining structure and chemical make-up of proteins and other large molecules.

● In 1952, James Watson and Francis Crick, are able to determine the structure and chemical makeup of DNA, using X-ray crystallography data collected by Rosalind Franklin and Maurice Wilkins.

11

Beginning of Molecular Biology!

Page 12: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

More on DNA ● The information stored in DNA organizes inanimate molecules into living

organisms and orchestrates their lifelong function

● A long “complementary” chain of nucleotides

● Each nucleotide has 3 components○ A phosphate group○ A ribose sugar○ One of four nitrogenous bases

● The information resides in the sequence of bases

12

Page 13: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

DNA Components● DNA is used by all living organisms

● Differences in nitrogenous bases code sequences that distinguish

○ Plants from animals○ Species○ Individuals

● A complete DNA sequence for an organism is called its “genome”

● Code sequences are composed of 4 bases (Adenine, Cytosine, Guanine, Thymine)

● Each base binds with another specific base (Thymine with Adenine and Cytosine with Guanine)

● A DNA molecule is comprised of primary sequence and a redundant "complementary" copy that allows it to self replicate (each acts like a template for the other sequence)

13

Page 14: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

DNA Schematic● Many more details are required to give a complete picture

of DNA○ Complementary strands are antiparallel and, thus,

oriented○ Not a simple twist, but has a major and minor

grooves which are recognized by interacting with proteins

● Rather than keep track of all the details we will often consider DNA as a string of nucleotides

● By convention DNA sequences are always ordered in the 5’-to-3’ direction. Not coincidentally, this is also the order in which they can be synthesized using an important class of molecules call polymerases, which we will discuss in more detail in the next lecture.

14

Page 15: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

Biological Computing Machines● DNA is an “Operating System” with “programs”

○ Collect raw materials and covert chemicals to energy○ Perform specialized functions (neurons, muscle, retinal cones)○ Protect and repair itself○ Replicate itself, or duplicate an entire organism

● How are these “programs” encoded?

● What biological machinery “executes” this program?

● How is the program’s execution sequenced?

15

Page 16: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

Genes are the Programs● Specific subsequences of DNA bases determine specific functions (programs) of a cell, these

subsequences are called “genes”

● Genes are distributed throughout a genome

● Not all DNA sequence sections contain genes

● Genes might not be entirely contiguous within the DNA sequence

● Genes can be either active or inactive

● Genes provide instructions for assembling proteins, which are the machinery of life

● How are these instructions encoded?

● What is the output of these programs?

16

Page 17: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

What are Proteins?● Proteins are incredibly diverse Protein● Structural proteins (collagen) provide structural

support and rigidity● Enzymes act as biological catalysts (pepsin) that

hasten critical reactions without taking part in them● Proteins transport small molecules and minerals to

where they are needed within an organism (hemoglobin)

● Used for signaling and intercellular communication (insulin) Protein

● Absorb photons to enable vision (rhodopsin)● Proteins are assembled from simple molecules, called

amino acids.

17

Page 18: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

Amino Acids● 20 main ingredients of proteins

● Varying side chain is shown in dark blue

● Orange indicates non-polar and hydrophobic,the remainder are polar or hydrophilic

● Magenta indicates acidic

● Cyan indicates a base

18

Page 19: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

Encoding Protein Assembly● Each DNA base can be one of 4 values (G,C,A,T)

● Proteins are polymer chains of amino acids ranging in length from tens to millions

● There are 20 amino acids

● How do you encode variable length chains of 20 amino acids using only 4 bases?

● Do you need other codings?

● Clearly, we can’t encode 20 different amino acids using only one base. How many encodings are possible using a pair of bases? How many with three?

19

Page 20: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

Codons● Triplets of nucleotide bases determine

the amino acid sequence of a protein

● All genes begin with a particular code, AUG, for the amino acid Methonine

● Three codes are used to indicate STOP, and thus end the transcription process for the gene

● Most amino acids have redundant encodings

20

Page 21: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

From Genes to ProteinsThe central dogma of molecular biology is that information encoded by the bases of DNA are transcribed by RNA and then converted into proteins.

21

Page 22: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

Is the Code Perfect?● Proteins are generally unaffected by small variations in their code sequence,

particularly changes to a small number of bases

● Minor variations in genes, called alleles, are responsible for individual variations (blood-type, hair color, etc.)

● Errors in translation (the substitution for one amino acid for the one encoded by the gene), occur at roughly 0.1% of all residues. This means that a single large protein will have at least one incorrect amino acid somewhere! Many of these will still function, in part because the substituted residue will often be adequate. Still, is a bit curious that this level of error is acceptable.

● In eukaryotes (humans, plants) gene sequences are not contiguous. They are broken into codon carrying segments called exons separated by seemingly meaningless base sequences called introns

22

Page 23: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

How Big is a Genome?● The human genome is roughly 3 billion bases

● A typical gene is roughly 3000 bases

● The largest known human gene (dystrophin) has 2.4 million bases

● The estimated number of human genes is roughly 30K

● The genome is nearly identical for every human (99.9%)

● Human DNA is 98% identical to chimpanzee DNA.

● The functions are unknown for more than 50% of discovered genes.

● Genes appear to be concentrated in random areas along the genome, with vast expanses of noncoding DNA between.

23

Page 24: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

Is Bigger Better? More Advanced?● The genome size of a species is relatively constant● Large variations can occur across species lines● Not strictly correlated with organism complexity● Genome lengths can vary as much as 100 fold between similar species● Length and variability are more of an indications of a phylum’s susceptibility

to mutation

24

Amoeba (Amoeba dubia) ~ 670 BbpSalamander (120.60pg, Necturus lewisi, Gulf coast waterdog) ~118 Bbase pairsFrog (13.40pg, Ceratophrys ornata (8n), Ornate horned frog) ~13 Bbase pairsMarbled Lung fish (130pg) ~ 130 BbpOrchids (angiosperms) have the the largest variation within a species (strains that can interbreed and generate fertile progeny) with a range that varies at least 168-fold.

Page 25: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

Genome Variation

25

Length and variability are more of an indications of a phylum’s susceptibility to mutations than its complexity

(C-value = the Amount of DNA in an unreplicated gametic nucleus. It is measures in pico Grams, and 1pg = 978M base pairs.)

Page 26: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

Summary● Information resolves uncertainty● Heredity provides evidence that information is passed on by biological

systems● Biological information is stored as DNA● Genes are segments of DNA sequence that encode assembly instructions for

proteins● The central dogma of molecular biology is that DNA sequences are

transcribed by RNA polymerases into mRNAs that are then translated by ribosomes into proteins.

● A genome’s length is not a good indicator of its information content

26

Page 27: Agraian Revolution Industrial Revolution Comp 555 ...

Comp 555 - Fall 2019

Next Time● Examining DNA sequences● Looking for Patterns● Data-driven hypotheses● DNA Replication

27