Post on 05-Oct-2020
©2012 Sami Khuri
Biology/CS 123A
Bioinformatics I
2.1
Sami Khuri
Department of Computer Science
San José State University
Biology/CS 123A
Fall 2012
Bioinformatics
Two
Introduction to Bioinformatics
©2012 Sami Khuri
What is Bioinformatics?
� The Human Genome
Project (HGP)
� Mapping
� Model Organisms
� Types of Databases
�Applications of
Bioinformatics
� Genome Research
©2012 Sami Khuri
©2012 Sami Khuri
Pathway to Genomic
Medicine
Sequencing of
the human
DNA
Personalized
medicine
Cure for
diseases
Implicating
genetic
variants with
human disease
Interpreting
the human
genome
sequence
Human
Genome
Project
ENCODE
Project
HapMap
Project
Genomic
Medicine
©2012 Sami Khuri
The Human Genome
Project• The HGP is a multinational effort, begun by the USA in
1988, whose aim is to produce a complete physical map of all human chromosomes, as well as the entire human DNA sequence.
• The ultimate goal of genome research is to find all the genes in the DNA sequence and to develop tools for using this information in the study of human biologyand medicine.
• The primary goal of the project is to make a series of descriptive diagrams (called maps) of each human chromosome at increasingly finer resolutions.
©2012 Sami Khuri
Bioinformatics and
the Internet• The recent enormous increase in biological data
has made it necessary to use computer information technology to collect, organize, maintain, access, and analyze the data.
• Computer speed, memory, exchange of information over the Internet has greatly facilitated bioinformatics.
• The bioinformatics tools available over the Internet are accessible, generally well developed, fairly comprehensive, and relatively easy to use.
@2002-10 Sami Khuri
Cytogenetic map
of chromosome
19
telomere
centromeretelomere
©2012 Sami Khuri
©2012 Sami Khuri
Biology/CS 123A
Bioinformatics I
2.2
©2012 Sami Khuri
@2002-10 Sami Khuri
Other Species
As part of the HGP, genomes of other organisms, such as
bacteria, yeast, flies and mice are also being studied.
p53 gene
pax6 geneDiabetes
C elegans
Baker’s yeast
DNA repair
Cell division
Chimps are infected with SIV
Very rarely progress to AIDS
©2012 Sami Khuri
@2002-10 Sami Khuri
Other Sequenced
Genomes
©2012 Sami Khuri
Model Organisms
• A model organism is an organism that is extensively studied to understand particular biological phenomena.
• Why have model organisms? The hope is that discoveries made in model organisms will provide insight into the workings of other organisms.
• Why is this possible? This works because evolution reuses fundamental biological principles and conserves metabolic, regulatory, and developmental pathways.
Studying Human Diseases
Copyright © 2006 Pearson Prentice Hall, Inc.
©2012 Sami Khuri
©2012 Sami Khuri
Goals of the HGP
• To identify all the approximately 20,000-
25,000 genes in human DNA,
• To determine the sequences of the 3.2 billion
chemical base pairs that make up human
DNA,
• To store this information in databases,
• To improve tools for data analysis,
• To address the ethical, legal, and social issues
(ELSI) that may arise from the project.
©2012 Sami Khuri
HGP Finished Before
Deadline
• In 1991, the USA Congress was told
that the HGP could be done by 2005 for
$3 billion.
• It ended in 2003 for $2.7 billion,
because of efficient computational
methods.
©2012 Sami Khuri
Biology/CS 123A
Bioinformatics I
2.3
©2012 Sami Khuri
What is Bioinformatics?
Set of Tools
• The use of computers to collect,
analyze, and interpret biological
information at the molecular level.
• A set of software tools
for molecular sequence
analysis
©2012 Sami Khuri
What is Bioinformatics?
A Discipline
• The field of science, in which biology,
computer science, and information
technology merge into a single discipline.
Definition of NCBI (National Center for Biotechnology Information)
• The ultimate goal of bioinformatics is to
enable the discovery of new biological insights
and to create a global perspective from which
unifying principles in biology can be discerned.
©2012 Sami Khuri
Why Study Bioinformatics (I)
• Bioinformatics is intrinsically
interesting.
• Bioinformatics offers the prospect
of finding better drug targets earlier
in the drug development process.
– By looking for genes in model organisms that are
similar to a given human gene, researchers can
learn about protein the human gene encodes and
search for drugs to block it.
@2002-10 Sami Khuri
How can Bioinformatics Help?
©2012 Sami Khuri
Rational drug designStructure-based drug design
use docking algorithmsto design molecule thatcould bind the model structure
Scientific American July 2000 ©2012 Sami Khuri ©2012 Sami Khuri
Why Study Bioinformatics (II)
• Molecular biology is the new
frontier of 21st century science.
– DNA, RNA, genes, stem cells,
etc.. are everywhere in the
news.
• Science Magazine celebrated
its 125th anniversary by issuing twenty five
big questions facing science over the next
quarter-century. www.sciencemag.org/sciext/125th
©2012 Sami Khuri
Biology/CS 123A
Bioinformatics I
2.4
©2012 Sami Khuri
Science: Top 25 Questions (I)
* What Is the Universe Made Of?
* What is the Biological Basis of Consciousness?
• Why Do Humans Have So Few Genes?
• To What Extent Are Genetic Variation and Personal Health Linked?
* Can the Laws of Physics Be Unified?
* How Much Can Human Life Span Be Extended?
• What Controls Organ Regeneration?
• How Can a Skin Cell Become a Nerve Cell?
• How Does a Single Somatic Cell Become a Whole Plant?
* How Does Earth's Interior Work?
* Are We Alone in the Universe?
* How and Where Did Life on Earth Arise?
©2012 Sami Khuri
Science: Top 25 Questions (II)
• What Determines Species Diversity?
• What Genetic Changes Made Us Uniquely Human?* How Are Memories Stored and Retrieved?
• How Did Cooperative Behavior Evolve?
• How Will Big Pictures Emerge from a Sea of Biological Data?
* How Far Can We Push Chemical Self-Assembly?
* What Are the Limits of Conventional Computing?
• Can We Selectively Shut Off Immune Responses?• Do Deeper Principles Underlie Quantum Uncertainty and Nonlocality?
• Is an Effective HIV Vaccine Feasible?* How Hot Will the Greenhouse World Be?
* What Can Replace Cheap Oil -- and When?
@2002-10 Sami Khuri @2002-10 Sami Khuri
Red Blood Cells
©2012 Sami Khuri
@2002-10 Sami Khuri
©2012 Sami Khuri ©2012 Sami Khuri
What do Bioinformaticians do?
• They analyze and interpret data
• Develop and implement algorithms
• Design user interface
• Design database
• Automate genome analysis
• They assist molecular biologists in data
analysis and experimental design.
©2012 Sami Khuri
Biology/CS 123A
Bioinformatics I
2.5
©2012 Sami Khuri
Databases for Storage
and Analysis
- Databases store data that need to be analyzed
- By comparing sequences, we discover:- How organisms are related to one another
- How proteins function
- How populations vary
- How diseases occur
- The improvement of sequencing methods generated a lot of data that need to be:
- stored - organized - curated
- annotated - managed - networked
- accessed - assessed
Types of Databases
In 2006 there were
858 databases
classified into 14
major categories
©2012 Sami Khuri
Three Major Databases
• GenBank from the NCBI (National Center of Biotechnology Information), National Library of Medicine http://www.ncbi.nlm.nih.gov
• EBI (European Bioinformatics Institute) from the European Molecular Biology Library
http://www.ebi.ac.uk
• DDBJ (DNA DataBank of Japan)
http://www.ddbj.nig.ac.jp
©2012 Sami Khuri
GenBank Taxonomic
Sampling
Homo sapiens
Mus musculus
Drosophila melanogaster
Caenorhabditis elegans
Arabidopsis thaliana
Oryza sativa
Rattus norvegicus
Danio rerio
Saccharomyces cerevisiae
62.1%
7.7%
6.1%
3.3%
2.9%
1.3%
0.8%
0.6%
0.6%
©2012 Sami Khuri
©2012 Sami Khuri
GenBank
GenBank is the NIH genetic sequence
database of all publicly available DNA
and derived protein sequences, with
annotations describing the biological
information these records contain.
©2012 Sami Khuri
What does NCBI do?
NCBI: established in 1988 as a national resource
for molecular biology information.
– it creates public databases,
– it conducts research in computational biology,
– it develops software tools for analyzing genome data,
and
– it disseminates biomedical information,
all for the better understanding of molecular
processes affecting human health and disease.
©2012 Sami Khuri
Biology/CS 123A
Bioinformatics I
2.6
©2012 Sami Khuri
Applications of
Genome Research
Current and potential applications of Genome Research include:
– Molecular Medicine
– Microbial Genomics
– Risk Assessment
– Bioarcheology, Anthropology, Evolution and Human Migration
– DNA Identification
– Agriculture, Livestock Breeding and Bioprocessing
©2012 Sami Khuri
Molecular Medicine
• Improve the diagnosis of disease
• Detect genetic predispositions to disease
• Create drugs based on molecular
information
• Use gene therapy and control systems as
drugs
• Design custom drugs on individual genetic
profiles.
©2012 Sami Khuri
©2012 Sami Khuri
Microbal Genomics
• Swift detection and treatment in clinics of disease-causing microbes: pathogens
• Development of new energy sources: biofuels
• Monitoring of the environment to detect chemical warfare
• Protection of citizens from biological and chemical warfare
• Efficient and safe clean up of toxic waste.
©2012 Sami Khuri
DNA Identification I
• Identify potential suspects whose DNA may match evidence left at crime scenes
• Exonerate persons wrongly accused of crimes
• Establish paternity and other family relationships
• Match organ donors with recipients in transplant programs
Louis XVII
Louis XVII: son of Louis XV1 and Marie-Antoinette who
died from tuberculosis in 1795 at the age of 12
©2012 Sami Khuri ©2012 Sami Khuri
DNA Identification II
• Identify endangered and protected species as an aid to wildlife officials and also to prosecute poachers
• Detect bacteria and other organisms that may pollute air, water, soil, and food
• Determine pedigree for seed or livestock breeds
• Authenticate consumables such as wine and caviar
©2012 Sami Khuri
Biology/CS 123A
Bioinformatics I
2.7
©2012 Sami Khuri
Agriculture, Livestock
Breeding and Bioprocessing
• Grow disease-resistant, insect-resistant, and drought-resistant crops
• Breed healthier, more productive, disease-resistant farm animals
• Grow more nutritious produce
• Develop biopesticides
• Incorporate edible vaccines into food products
@2002-10 Sami Khuri
©2012 Sami Khuri
Mighty Mouse versus
Human Genome
• Humans and mice have
similar genomes, but
their genes are ordered
differently
• ~245 rearrangements
– Reversals
– Fusions
– Fissions
– Translocation
©2012 Sami Khuri ©2012 Sami Khuri
Translocation
Fusion and Fission
Translocation1 2 3 9 8 7
44 5 6 2 4 1
1 2 6 2 4 1
4 5 3 9 8 7
1 2 3 4
5 61 2 3 4 5 6
Fusion
Fission
Figure 4.13a The Biology of Cancer (© Garland Science 2007)
Chromosomal Translocations
©2012 Sami Khuri
@2002-10 Sami Khuri
Flies have orthologs to
humans disease-causing
genes in categories such
as:
• neurological
• renal
• immunological
• endocrine
• cardiovascular
• metabolic
• blood-vessel and
• cancerous disorders
Flies can provide insights into human disease at the
systems level, revealing how different genes interact in vivoDiscovering Genomics, Campbell, 2007 ©2012 Sami Khuri
©2012 Sami Khuri
Biology/CS 123A
Bioinformatics I
2.8
What have we learned
from the HGP?
A small
portion of the
genome
codes for
proteins,
tRNAs
and rRNAs
©2012 Sami Khuri
@2002-10 Sami Khuri
What have we learned
from the HGP?
The small number of genes
©2012 Sami Khuri
©2012 Sami Khuri
Alternative Splicing
Genomic Medicine by Guttmacher et al., NEJM, 2002
@2002-10 Sami Khuri
The Alpha-Tropomyosin
Gene
©2012 Sami Khuri
©2012 Sami Khuri
Gene Prediction
• Problem: Given a genomic DNA sequence, identify where the genes are.
• Input: A genomic DNA sequence.
• Output: Location of gene elements in the raw, genomic DNA sequence, including (for eukaryotes):
– exons
– introns
Anatomy of an Intron
©2012 Sami Khuri
©2012 Sami Khuri
Biology/CS 123A
Bioinformatics I
2.9
©2012 Sami Khuri
Building upon the
Foundations of HGP • As we build upon the foundation laid by the
Human Genome Project, our ability to explore
uncharted frontiers will hinge upon melding
biological know-how with expertise in computer
science, physics, math, clinical research,
bioethics, and many other disciplines.
• A firm understanding of the powerful potential
of genomics, proteomics, and bioinformatics
will be essential to success in this amazing new
world. Discovering Genomics, Campbell, 2007 – Preface by Francis Collins
©2012 Sami Khuri
Genomics is a Way of
Seeing Life• Genome: the complete (haploid) DNA content of an
organism.
• Genomics: the field of genome studies.
• Genomics
– is not just a collection of methods
– has become an enhanced way of seeing life.
• Genomics includes the study of interaction of
molecules inside the cell:
DNA Protein Lipids Carbohydrates
• Genomics requires us to analyze, hypothesize, think,
and formulate models.
©2012 Sami Khuri
Pathway to Genomic
Medicine
Sequencing of
the human
DNA
Personalized
medicine
Cure for
diseases
Implicating
genetic
variants with
human disease
Interpreting
the human
genome
sequence
Human
Genome
Project
ENCODE
Project
HapMap
Project
Genomic
Medicine
©2012 Sami Khuri
ENCODE
ENCODE: ENCyclopedia Of DNA
Elements
• Goal: Compile a Comprehensive
Encyclopedia of all Functional Elements in
the Human Genome
• Initial Pilot Project: 1% of Human Genome
• Apply Multiple, Diverse Approaches to Study
and Analyze that 1% in a Consortium Fashion
©2012 Sami Khuri
The ENCODE Project
• 44 regions of the human genome were
selected, spanning 30 megabases (about 1% of
the human genome).
• The ENCODE regions include:
– About 50% randomly selected loci
– About 50% containing well-known genes
• Example: alpha and beta globins, CFTR
• The ENCODE Project Consortium released
its findings in a 2007 article (>250 coauthors)
©2012 Sami Khuri
Major Findings of ENCODE
• The majority of all nucleotides are transcribed
as part of
– Coding transcripts
– Noncoding RNAs
– Random transcripts that may have no biological
function.
• Many genes have multiple, previously
undetected, transcription start sites
– Regulatory sequences are as likely to be upstream
as downstream of the major start sites.
©2012 Sami Khuri
Biology/CS 123A
Bioinformatics I
2.10
Constrained Sequences
Many important
functional
sequences are not
conserved.
They are even highly
variable among humans.
©2012 Sami Khuri ©2012 Sami Khuri
Pathway to Genomic
Medicine
Sequencing of
the human
DNA
Personalized
medicine
Cure for
diseases
Implicating
genetic
variants with
human
disease
Interpreting
the human
genome
sequence
Human
Genome
Project
ENCODE
Project
HapMap
Project
Genomic
Medicine
©2012 Sami Khuri
Genomic Variations
• Collection of genomic variations
makes any person a unique human
being. It contributes to that person’s:
– Potential to learn
– Predisposition to disease
– Predisposition to drug addiction
– Response to pharmaceutical interventions
• There are variations within, as well
as, between populations.
• The variation between individual genomes has
sparked a biotech boom in the area of SNP discovery.
GCCCGCCTC
CGGGCGGAG
GCC CTC
CGG GAG
Single Nucleotide Polymorphism (SNP)
GCCCACCTCCTC
CGGGTGGAGGAGCopy Number Variant (CNV)
“Indel” Polymorphism
GCCCACCTC
CGGGTGGAG
Types of
Genomic Variations
©2012 Sami Khuri
Single Nucleotide
PolymorphismA Single Nucleotide Polymorphism is a single base-pair
mutation that occurs at a specific site in the DNA
sequence.
. 90% of all human chromosomes
have the following sequence at a
particular location (i.e., unique locus)
But 10% of all alleles have a
slightly different sequence at that
particular location (i.e., unique locus)
©2012 Sami Khuri
What is a Polymorphism?
• A polymorphism is a difference in DNA
sequence among individuals.
• Genetic variations occurring in more than
1% of a population would be considered
useful polymorphisms for genetic analysis.
• SNP: position in a genome at which two or
more different bases occur in the population,
each with a frequency greater than 1%.
©2012 Sami Khuri
©2012 Sami Khuri
Biology/CS 123A
Bioinformatics I
2.11
©2012 Sami Khuri
SNPs and Human
Variations To classify a variation as a SNP it should
occur in at least 1% of the population.
©2012 Sami Khuri
Where do SNPs Fall?
• SNPs may fall:
–within coding sequences of genes,
–noncoding regions of genes, or
– in the intergenic regions between
genes.
©2012 Sami Khuri
International
Haplotype Map Project
• Goal of International Haplotype Map Project
– Develop a haplotype map of the human genome.
• The “HapMap” describes common patterns of human
DNA sequence variation
– key source for researchers to use to find genes affecting
health, disease, and responses to drugs, and environmental
factors.
• Haplotypes are groups of SNPs transmitted in
“blocks”.
• These blocks can be characterized by a subset of their
SNPs (tags).©2012 Sami Khuri
CAGATCGCTGGATGAATCGCATCTGTAAGCAT
CGGATTGCTGCATGGATCGCATCTGTAAGCAC
CAGATCGCTGGATGAATCGCATCTGTAAGCAT
CAGATCGCTGGATGAATCCCATCAGTACGCAT
CGGATTGCTGCATGGATCCCATCAGTACGCAT
CGGATTGCTGCATGGATCCCATCAGTACGCAC
SNP2
↓
SNP3
↓
SNP4
↓
SNP5
↓SNP6
↓SNP1
↓
Block 1 Block 2
SNP7
↓SNP8
↓
Teri A
. Man
olio
, NH
GR
I, 2008
©2012 Sami Khuri
CAGATCGCTGGATGAATCGCATCTGTAAGCAT
CGGATTGCTGCATGGATCGCATCTGTAAGCAC
CAGATCGCTGGATGAATCGCATCTGTAAGCAT
CAGATCGCTGGATGAATCCCATCAGTACGCAT
CGGATTGCTGCATGGATCCCATCAGTACGCAT
CGGATTGCTGCATGGATCCCATCAGTACGCAC
SNP3
↓SNP5
↓SNP6
↓
Block 1 Block 2
SNP7
↓SNP8
↓
Teri A
. Man
olio
, NH
GR
I, 2008
Block 1 Block 2
©2012 Sami Khuri
CAGATCGCTGGATGAATCGCATCTGTAAGCAT
CGGATTGCTGCATGGATCGCATCTGTAAGCAC
CAGATCGCTGGATGAATCGCATCTGTAAGCAT
CAGATCGCTGGATGAATCCCATCAGTACGCAT
CGGATTGCTGCATGGATCCCATCAGTACGCAT
CGGATTGCTGCATGGATCCCATCAGTACGCAC
SNP3
↓SNP6
↓SNP8
↓
Teri A
. Man
olio
, NH
GR
I, 2008
Block 1 Block 2
©2012 Sami Khuri
Biology/CS 123A
Bioinformatics I
2.12
©2012 Sami Khuri
Tag SNP
GTT 35%
CTC 30%
GTT 10%
GAT 8%
CAT 7%
CAC 6%
other haplotypes 4%
Block 1 Block 2 FrequencySingleton
Teri A
. Man
olio
, NH
GR
I, 2008
©2012 Sami Khuri
GWAS: The New Wave
Hokusai: The Great Wave
The New Wave:
Genome Wide
Association Studies
©2012 Sami Khuri
Genome-Wide Association
Study• Method for interrogating all 10 million
variable points across human genome.
• Variation is inherited in groups, or blocks, so
not all 10 million points have to be tested.
• NIH is interested in advancing genome-wide
association studies (GWAS) to identify
common genetic factors that influence health
and disease.
Teri A. Manolio, NHGRI, 2008
©2012 Sami Khuri
Mapping Relationships
Among SNPs
Christensen and Murray, 2007
©2012 Sami Khuri
@2002-10 Sami Khuri
SNP with its two allelic possibilities
Associations between
SNP variants: various
shades from whiteto red. Deepest red
indicating the
strongest association.
Patterns of triangular
blocks of strong association
are separated by short nodes with very little association.
Finding set of tags is equivalent to
“Minimum Dominating Set Problem”
©2012 Sami Khuri
@2002-10 Sami Khuri
SNP with its two allelic possibilities
Testing for one
SNP might provide
almost complete genetic
information for that block.
Tagging SNP:
block 1 � 3t
block 2 � 8t They can serve as a surrogate
for any variant within its block
©2012 Sami Khuri
Biology/CS 123A
Bioinformatics I
2.13
To produce a genome-wide map of common variation
Genotype 6 Million SNPs in Four populations in Two Phases:
• CEPH (CEU) (Europe - n = 90, trios)
• Yoruban (YRI) (Africa - n = 90, trios)
• Japanese (JPT) (Asian - n = 45)
• Chinese (HCB) (Asian - n =45)
Nature 437: 1299-320, 2005
www.hapmap.orgwww.hapmap.org
©2012 Sami Khuri ©2012 Sami Khuri
SNP: rs2162709 on chromosome 5
©2012 Sami Khuri
Myocardial Infarction
and rs1333049
©2012 Sami Khuri
Genome-wide Association Analysis of Coronary Artery Disease, by Samani et al, NEJM 2007
©2012 Sami Khuri
Methods Used for
SNP Analysis• Regression Models
• Exhaustive Searches
– Two-Locus Interaction
– Higher-Order Interaction
• Recursive Partitioning
Approach
• Random Forest Approach
• Multifactor
Dimensionality Reduction
• Bayesian model selectionDetecting gene-gene interactions that underlie human diseases” by Cordell, Nature, June 2009
©2012 Sami Khuri
Pathway to Genomic
Medicine
Sequencing of
the human
DNA
Personalized
medicine
Cure for
diseases
Implicating
genetic
variants with
human
disease
Interpreting
the human
genome
sequence
Human
Genome
Project
ENCODE
Project
HapMap
Project
Genomic
Medicine
©2012 Sami Khuri
Biology/CS 123A
Bioinformatics I
2.14
©2012 Sami Khuri
Understanding the link between -
DNA sequence Biology/Disease
(Genotype) (Phenotype)
Environment
ATTCGCATGGACC
CA
The Next ChallengePersonalized medicine is the use of diagnostic and screening methods to better manage the individual patient’s disease or predisposition toward a disease.
Personalized medicine will enable risk assessment, diagnosis, prevention, and therapy specifically tailored to the unique characteristics of the individual, thus enhancing the quality of life and public health.
Personalized Medicine is Genotype-Specific Treatment.
Personalized Medicine
©2012 Sami Khuri
©2012 Sami Khuri
Prostate Cancer Diagnosis
cancercontrol.cancer.gov/od/phg/presentations/Xu.pdf
©2012 Sami Khuri
@2002-10 Sami Khuri
GWAS and
Prostate Cancer
cancercontrol.cancer.gov/od/phg/presentations/Xu.pdf
23andme.com
decodeme.com
navigenics.com
Genetic Testing
for the Public
©2012 Sami Khuri ©2012 Sami Khuri
Biomarkers and
Human Disease
• Improve clinical trial design
• Identify disease subsets
• Guide disease selection for new therapies
©2012 Sami Khuri
Biology/CS 123A
Bioinformatics I
2.15
©2012 Sami Khuri
Pharmacogenomics
• Pharmacogenomics deals with the influence of genetic variation on drug response in patients by correlating gene expression or SNPs with a drug's efficacy or toxicity.
• Pharmacogenomics aims to optimize drug therapy, with respect to the patients' genotype, to ensure maximum efficacy with minimal adverse effects.
• Such approaches promise the advent of “personalized medicine” in which drugs and drug combinations are optimized for each individual's unique genetic makeup. www.wikipedia.com
©2012 Sami Khuri
Genotype and
Phenotype Interaction
• Evans and Relling from St. Jude’s Children’s hospital in Memphis considered the efficacy and toxicity of a drug that requires two genes:
– An activator with 2 alleles, and
– A binding site with 2 alleles.
• There are 9 possible genotypes.
• Therapeutic effects depend on the genotype of the drug receptors in combination with the amount of active drug in circulation.
homozygous
for mutation
very
toxic
Discovering Genomics, Campbell, 2007 ©2012 Sami Khuri ©2012 Sami Khuri
Therapeutic
Effects of Drugs• The example highlights the complex web of
protein interactions that pharmacogenomics hopes to decipher.
• Drug response is polygenic, and new technologies are needed to understand the connections between relevant proteins involved in drug responses.
• If genotype-specific medication becomes viable, the physician will need to know the genotype of the ill person to determine the appropriate medication and dosage for optimal therapy.
©2012 Sami Khuri
@2002-10 Sami Khuri
Origins of African
Americans
Source: Esteban González Burchard ©2012 Sami Khuri
Ancestry Informative
Marker
• An Ancestry-Informative Marker (AIM) is a set of polymorphisms for a locus which exhibits substantially different frequencies between populations from different geographical regions.
• By using a number of AIMs one can estimate the geographical origins of the ancestors of an individual and ascertain what proportion of ancestry is derived from each geographical region.
en.wikipedia.org/wiki/
©2012 Sami Khuri
Biology/CS 123A
Bioinformatics I
2.16
©2012 Sami Khuri
SNPs and AIMs
Source: Esteban González Burchard ©2012 Sami Khuri
@2002-10 Sami Khuri
Origins of Latinos and
African Americans
African
Americans
Source: Esteban González Burchard
Self-Identified Race:
Genetic Ancestry
©2012 Sami Khuri ©2012 Sami Khuri
The Superior Doctor
Superior doctors prevent the disease
Mediocre doctors treat the disease before evident
Inferior doctors treat the full blown disease-Huang Dee: Nai - Ching
(2600 B.C. 1st Chinese Medical Text)
©2012 Sami Khuri
Preventive Medicine
• Prevent disease from occurring
• Identify the cause of the disease
• Treat the cause of the disease rather than the symptoms
• Genomics identifies the cause of disease
• “All medicine may become pediatrics” Paul Wise
• Effects of environment, accidents, aging, penetrance …
• Health care costs can be greatly reduced if
– invests in preventive medicine
– one targets the cause of disease rather than symptoms
©2012 Sami Khuri
@2002-09 Sami Khuri
Anatomy Lesson
of Dr. Nicolaes Tulp
1632 oil painting by Rembrandt Harmenszoon van Rijn
©2012 Sami Khuri
Biology/CS 123A
Bioinformatics I
2.17
©2012 Sami Khuri
If Rembrandt was
Around Today
Source: Carlos Cordon-Cardo, Columbia University ©2012 Sami Khuri
Convert all this progress into real riches for science, society, and patients
The Future
©2012 Sami Khuri
Concluding Remarks (I)
• Biology is becoming an information science
• Progression: in vivo to in vitro to in silico
• Are natural languages adequate in predicting quantitative behavior of biological systems?
– Need to produce biological knowledge and
operations in ways that natural languages do not allow
• “Biology easily has 500 years of exciting problems to work on”. Donald Knuth
• We are here to add what we can to, not to get what we can from, Life. William Osler
©2012 Sami Khuri
Concluding Remarks (II)
• Today’s biology courses need to cast a wide net
to capture the imaginations of students
representing many different interests, skills, and
viewpoints.
• Today’s biologists need to think quantitatively
and from a multidisciplinary perspective.
• The role that mathematics and computer science
are playing in biology is still in an embryonic
stage.