Bio-Medical Informatics

Instructor : Hanif YaghoobiWebsite: site444703.44.webydo.com

E-mail : Hyiautcourse@gmail.comMy personal Mail: hanifeyaghoobi@gmail.com

About this Course

• Activities during the semester 5 score:1)Home Works2) MATLAB exercises• Your Final Projects 3 score• Final Exam 12 score

Shortliffe

“ Medical informatics is the rapidly developing scientific field that deals with resources, devices and formalized methods for optimizing the storage, retrieval and management of biomedical information for problem solving and decision making”

Edward Shortliffe, MD, PhD

Organisms

• Classified into two types:

• Eukaryotes: contain a membrane-bound nucleus and organelles (plants, animals, fungi,…)

• Prokaryotes: lack a true membrane-bound nucleus and organelles (single-celled, includes bacteria)

• Not all single celled organisms are prokaryotes!

• Complex system enclosed in a membrane

• Organisms are unicellular (bacteria, baker’s yeast) or multicellular

• Humans:– 60 trillion cells – 320 cell types

Example Animal Cellwww.ebi.ac.uk/microarray/ biology_intro.htm

DNA Basics – cont.

• DNA in Eukaryotes is organized in chromosomes.

Chromosomes

• In eukaryotes, nucleus contains one or several double stranded DNA molecules orgainized as chromosomes

• Humans: – 22 Pairs of autosomes– 1 pair sex chromosomes

Human Karyotype http://avery.rutgers.edu/WSSP/StudentScholars/

Session8/Session8.html

19www.biotec.or.th/Genome/whatGenome.html

What is DNA?

• DNA: Deoxyribonucleic Acid

• Single stranded molecule (oligomer, polynucleotide) chain of nucleotides

• 4 different nucleotides:– Adenosine (A)– Cytosine (C)– Guanine (G)– Thymine (T)

Nucleotide Bases

• Purines (A and G)• Pyrimidines (C and T)• Difference is in base structure

Image Source: www.ebi.ac.uk/microarray/ biology_intro.htm

The Central DogmaProtein Synthesis

Cell Function

Genome Transcriptome Proteome

Transcription Translation

Gene Expression

Genome

• chromosomal DNA of an organism

• number of chromosomes and genome size varies quite significantly from one organism to another

• Genome size and number of genes does not necessarily determine organism complexity

Genome Comparison

ORGANISM CHROMOSOMES GENOME SIZE GENES

Homo sapiens (Humans)

23 3,200,000,000 ~ 30,000

Mus musculus(Mouse)

20 , 2600,000,000 ~30,000

Drosophila melanogaster

(Fruit Fly)

4 180,000,000 ~18,000

Saccharomyces cerevisiae (Yeast)

16 14,000,000 ~6,000

Zea mays (Corn) 10 2,400,000,000 ???

• The DNA in each chromosome can be read as a discrete signal to {a,t,c,g}. (For example: atgatcccaaatggaca…)

• In genes (protein-coding region), during the construction of proteins by amino acids, these nucleotides (letters) are read as triplets (codons). Every codon signals one amino acid for the protein synthesis (there are 20 aa).

• There are 6 ways of translating DNA signal to codons signal, called the reading frames (3 * 2 directions).

…CATTGCCAGT…

DNA Basics – Cont.

…CATTGCCAGT…

Start: ATG

Stop: TAA, TGA, TAG

Exon ExonExon IntronIntron Exon

Understanding Genome Sequences~3,289,000,000 characters:

aattgtgctctgcaaattatgatagtgatctgtatttactacgtgcatat attttgggccagtgaatttttttctaagctaatatagttatttggacttt tgacatgactttgtgtttaattaaaacaaaaaaagaaattgcagaagtgt tgtaagcttgtaaaaaaattcaaacaatgcagacaaatgtgtctcgcagt cttccactcagtatcatttttgtttgtaccttatcagaaatgtttctatg tacaagtctttaaaatcatttcgaacttgctttgtccactgagtatatta tggacatcttttcatggcaggacatatagatgtgttaatggcattaaaaa taaaacaaaaaactgattcggccgggtacggtggctcacgcctgtaatcc cagcactttgggagatcgaggagggaggatcacctgaggtcaggagttac agacatggagaaaccccgtctctactaaaaatacaaaattagcctggcgt ggtggcgcatgcctgtaatcccagctactcgggaggctgaggcaggagaa tcgcttgaacccgggagcggaggttgcggtgagccgagatcgcaccgttg cactccagcctgggcgacagagcgaaactgtctcaaacaaacaaacaaaa aaacctgatacatggtatgggaagtacattgtttaaacaatgcatggaga tttaggttgtttccagtttttactggcacagatacggcaatgaatataat tttatgtatacattcatacaaatatatcggtggaaaattcctagaagtgg aatggctgggtcagtgggcattcatattgagaaattggaaggatgttgtc aaactctgcaaatcagagtattttagtcttaacctctcttcttcacaccc ttttccttggaagaaagctaaatttagacttttaaacacaaaactccatt ttgagacccctgaaaatctgggttcaaagtgtttgaaaattaaagcagag gctttaatttgtacttatttaggtataatttgtactttaaagttgttcca

. . . 35

Goal: Identify components encoded in the DNA sequence

Open Reading Frame

• Protein-encoding DNA sequence consists of a sequence of 3 letter codons

• Starts with the START codon (ATG)• Ends with a STOP codon (TAA, TAG, or TGA)

ATGCTCAGCGTGACCTCA . . . CAGCGTTAA

M L S V T S . . . Q R STP

Finding Open Reading Frames

Try all possible starting points• 3 possible offsets• 2 possible strands

Simple algorithm finds all ORFs in a genome• Many of these are spurious (are not real genes)• How do we focus on the real ones?

ATGCTCAGCGTGACCTCA . . . CAGCGTTAA

M L S V T S . . . Q R STP

Using Additional Genomes

Basic premise“What is important is conserved”

Evolution = Variation + Selection– Variation is random– Selection reflects function

Idea: • Instead of studying a single genome, compare related

genomes• A real open reading frame will be conserved

Phylogentic Tree of Yeasts

39Kellis et al, Nature 2003

S. cerevisiae

S. paradoxus

S. mikataeS. bayanus

C. glabrata

S. castellii

K. lactis

A. gossypii

K. waltii

D. hansenii

C. albicans

Y. lipolytica

N. crassa

M. graminearum

M. grisea

A. nidulans

S. pombe

~10M years

Evolution of Open Reading Frame

ATGCTCAGCGTGACCTCA . . . ATGCTCAGCGTGACATCA . . . ATGCTCAGGGTGACA--A . . . ATGCTCAGG---ACA--A . . .

S. cerevisiaeS. paradoxusS. mikataeS. bayanus

Conservedpositions

Variablepositions

A deletion

Frame shiftchanges interpretationof downstream seq

ExamplesSpurious ORF

Frame shift

[Kellis et al, Nature 2003]

Sequencingerror

Confirmed ORF

ConservedVariable

ATG notconserved

Greedy algorithm to find conserved ORFs surprisingly effective (> 99% accuracy) on verified yeast data

Defining ConservationNaïve approach• Consensus between all

speciesProblem: • Rough grained• Ignores distances between species• Ignores the tree topology

Goal:• More sensitive and

robust methods42

Conserved

Variable

100% conserv 33 5555

Bioinformatics – an area of emerging knowledge

• Each cell of the body contains the whole DNA of the individual (about 40,000 genes in the human genome, each of them comprising from 50 to a mln base pairs – A,T,C or G)

• The Main Dogma in Genetics: DNA->RNA->proteins

• Transcription: DNA (about 5%) -> mRNA – DNA -> pre-RNA -> splicing -> mRNA (only the exons)

• Translation: mRNA -> proteins– Proteins make cells alive and specialised (e.g. blue eyes)– Genome -> proteome N.Kasabov, 2003

Bioinformatics

• The area of Science that is concerned with the development and applications of methods, tools and systems for storing and processing of biological information to facilitate knowledge discovery.

• Interdisciplinary: Information and computer science, Molecular Biology, Biochemistry, Genetics, Physics, Chemistry, Health and Medicine, Mathematics and Statistics, Engineering, Social Sciences.

• Biology, Medicine -- Information Science --> IT, Clinics, Pharmacy, I____________________I • Links to Health informatics, Clinical DSS, Pharmaceutical Industry

N.Kasabov, 2003

Bioinformatics: challenging problems for computer and information sciences

• Discovering patterns (features) from DNA and RNA sequences (e.g. genes, promoters, RBS binding sites, splice junctions)

• Analysis of gene expression data and predicting protein abundance

• Discovering of gene networks – genes that are co-regulated over time

• Protein discovery and protein function analysis

• Predicting the development of an organism from its DNA code (?)

• Modeling the full development (metabolic processes) of a cell (?)

• Implications: health; social,…

N.Kasabov, 2003

Problems in Computational Modeling for Bioinformatics

• Abundance of genome data, RNA data, protein data and metabolic pathway data is now available (see http://www.ncbi.nlm.nih.gov) and this is just the beginning of computational modeling in Bioinformatics

• Complex interactions:– between proteins, genes, DNA code, – between the genome and the environment – much yet to to be discovered

• Stability and repetitiveness: Genes are relatively stable carriers of information.

• Many sources of uncertainty:– Alternative splicing– Mutation in genes caused by: ionising radiation (e.g. X-rays); chemical contamination, replication

errors, viruses that insert genes into host cells, aging processes, etc.– Mutated genes express differently and cause the production of different proteins

• It is extremely difficult to model dynamic, evolving processes

Bioinformatics Important Challenges

Gene Predication

Gene FunctionProtein FunctionProtein 3D Structure

Public Data Base

DNA sequence {A,T,C,G}

Microarray Protein sequenceKMLSLLMARTYW

Gene Expression

Microarray • What can it be used for? • How does it work?• What are the Advantages?

An Example Application

Microarrays can be used for:Comparison of transcription levels between two cells

Examples:Comparison between:Cells from a young mouse vs cell from an old mouse

Drug efficacy:Treated cells vs untreated cells

How it works:Based on hybridization

A =C ≡T =T =G ≡A =C ≡C ≡ ▀

UGAACUGG

A C T T GA C C ▀

TGAACTGG

UGAACUGG

A =C ≡T =T =A ≡A =C ≡C ≡ ▀

UGAAUUGG

A =C ≡T =T =A ≡A =C ≡C ≡ ▀

MicrotiterPlates

Print Head

slides (100)

Probes and the printing process

Print HeadPins

Print Head with Pins

23/2/2008 60

Microarray Technology

probe(on chip)

sample(labelled)

pseudo-colourimage

[image from Jeremy Buhler]

Experimental design Track what’s on the chip

which spot corresponds to which gene

Duplicate experimental spots reproducibility

Controls DNAs spotted on glass

positive probe (induced or repressed)negative probe (bacterial genes on human chip)

oligos on glass or synthesised on chip (Affymetrix)point mutants (hybridisation plus/minus)

Images from scanner Resolution

standard 10m [currently, max 5m] 100m spot on chip = 10 pixels in diameter

Image format TIFF (tagged image file format) 16 bit (65’536 levels of grey) 1cm x 1cm image at 16 bit = 2Mb (uncompressed) other formats exist e.g.. SCN (used at Stanford University)

Separate image for each fluorescent sample channel 1, channel 2, etc.

Images in analysis software The two 16-bit images (cy3, cy5) are compressed into 8-bit images Goal : display fluorescence intensities for both wavelengths using a

24-bit RGB overlay image RGB image :

Blue values (B) are set to 0 Red values (R) are used for cy5 intensities Green values (G) are used for cy3 intensities

Qualitative representation of results

Images : examples

cy5Spot color Signal strength Gene

expression

yellow Control = perturbed unchanged

red Control < perturbed induced

green Control > perturbed repressed

Pseudo-color overlay

Data : DNA Microarray

23/2/2008 66

0 10 20 30 40 50 60time (min)

gene 1

gene 2

gene 3

Data Required: Gene Expression Matrix

t1 t2 t3 t4

g1 0 1 2 1

g2 1 2 1 0

g3 0 1 1 1.

g4 1 2 1 0

23/2/2008 67

Data Required: Gene Expression Matrix

a1 a2 a3 a4

g1 0 3 1 1

g2 1 2 1 0

g3 0 1 1 1.

g4 1 2 1 0

23/2/2008 68

Snap Shot

t1 t2 t3 t4

g1 0 1 2 1

g2 1 2 1 0

g3 0 1 1 1.

g4 1 2 1 0

Time serious

• World Health Organization

Bio-Medical Informatics

Documents

Transcript of Bio-Medical Informatics

Networks and Algorithms in Bio-informatics

Bio-Informatics & High Performance Computing (HPC) for ...kseminar.staff.ipb.ac.id/files/2013/02/Precision-Agriculture.pdf · Bio-Informatics & High Performance Computing (HPC) for

Bio – informatics projects in trichy

th Medical Informatics & Telemedicine...Medical Informatics Talks On: Electronic Health record Bio Medical Informatics e-Prescriptions Clinical care Informatics Radiology Images And

ساوحلا دمح ةاغ...Summary of the 10th International Conference on Biotechnology, Bio Informatics, Bio Medical Sciences and Stem Cell Applications (B3SC)-2017 ـه ءايحلأا

Bio image informatics

5th International Conference on Biotechnology, Bio Informatics, Bio Medical Sciences and Stem Cell Applications (B3SC)

Association for Medical and Bio Informatics Singapore (AMBIS) Role and Mission of AMBIS as a National-Level Association for Medical and Bioinformatics.

7th International Conference on Biotechnology, Bio Informatics, Bio Medical Sciences and Stem Cell Applications (B3SC)

Laboratory for Bio-Medical Informatics

Application of bio informatics

Adopting Tibco Spotfire in Bio-Informatics

CO-ODE/HyOntUse JISC/EPSRC 1 Why I need both OWL/DLs & Frames Alan Rector Medical Informatics Group Bio Health Informatics Forum Department of Computer.

AN INTRODUCTION TO ENLIGHTENED DIRECT …Medical & Surgical Equipment Pharma & Drug Delivery Digital Imaging Medical Laser Technology Bio-Informatics SECTOR BRIEF HEALTH & LIFE SCIENCES

Medical informatics, medical students

Bio-Medical Informatics Instructor : Hanif Yaghoobi Website: site444703.44.webydo.com E-mail : Hyiautcourse@gmail.comHyiautcourse@gmail.com My personal.

Bio-informatics tools

Napier India Medical Imaging Informatics Symposium · Napier India Medical Imaging Informatics Symposium ... • Medical Imaging Informatics is about clinical workflow that ... •

Bio- and Medical- Informatics Presenter: Russell Greiner.

Medical informatics