Role of Computer and Information Science in Biology Presented By Dr G. P. S. Raghava Co-ordinator,...
-
Upload
ethel-charles -
Category
Documents
-
view
216 -
download
1
Transcript of Role of Computer and Information Science in Biology Presented By Dr G. P. S. Raghava Co-ordinator,...
Role of Computer and Information Science in
Biology
Presented By
Dr G. P. S. RaghavaDr G. P. S. Raghava
Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, IndiaCo-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India
&&Visiting Professor, Pohang Univ. of Science & Technology, Republic of KoreaVisiting Professor, Pohang Univ. of Science & Technology, Republic of Korea
Email: [email protected]: [email protected]
Web: http://www.imtech.res.in/raghavaWeb: http://www.imtech.res.in/raghava//
Major Applications & Major Applications & ChallengesChallenges
Introduction to BiologyIntroduction to Biology Genome Annotation: Gene PredictionGenome Annotation: Gene Prediction Analysis and Comparison of SequencesAnalysis and Comparison of Sequences Protein Structure PredictionProtein Structure Prediction DNA Chip (Microarray) technologyDNA Chip (Microarray) technology Proteomics: Analysis of 2D gelProteomics: Analysis of 2D gel Fingerprinting TechniqueFingerprinting Technique Drug developmentDrug development Computer-Aided Vaccine Design Computer-Aided Vaccine Design
Hierarchy in BiologyAtoms
Molecules
Macromolecules
Organelles
Cells
Tissues
Organs
Organ Systems
Individual Organisms
Populations
Communities
Ecosystems
Biosphere
Animal cell
Human ChromosomesHuman Chromosomes
Genes are linearly arranged along Genes are linearly arranged along chromosomeschromosomes
Chromosomes and DNAChromosomes and DNA
DNA can be simplified DNA can be simplified to a string of four to a string of four
lettersletters
GATTACA
(RT)
Sequence to Structure:Sequence to Structure:It’s a matter of dimensions!It’s a matter of dimensions!
1D Nucleic acid sequence1D Nucleic acid sequence
AGT-TTC-CCA-GGG…AGT-TTC-CCA-GGG…
1D Protein sequence1D Protein sequence
Met-Ala-Gly-Lys-His…Met-Ala-Gly-Lys-His…M – A – G – K – H…M – A – G – K – H…
3D Spatial arrangement of atoms3D Spatial arrangement of atoms
Genome AnnotationGenome Annotation
The Process of Adding Biology Information andThe Process of Adding Biology Information and
Predictions to a Sequenced Genome FrameworkPredictions to a Sequenced Genome Framework
Importance of Sequence Importance of Sequence ComparisonComparison
Protein Structure PredictionProtein Structure Prediction– Similar sequence have similar structure & Similar sequence have similar structure &
functionfunction– Phylogenetic TreePhylogenetic Tree– Homology based protein structure predictionHomology based protein structure prediction
Genome AnnotationGenome Annotation– Homology based gene predictionHomology based gene prediction– Function assignment & evolutionary studiesFunction assignment & evolutionary studies
Searching drug targetsSearching drug targets– Searching sequence present or absent across Searching sequence present or absent across
genomesgenomes
Protein Sequence Alignment and Database Protein Sequence Alignment and Database SearchingSearching
Alignment of Two Sequences (Pair-wise Alignment)Alignment of Two Sequences (Pair-wise Alignment)– The Scoring Schemes or Weight MatricesThe Scoring Schemes or Weight Matrices– Techniques of AlignmentsTechniques of Alignments– DOTPLOTDOTPLOT
Multiple Sequence Alignment (Alignment of > 2 Multiple Sequence Alignment (Alignment of > 2 Sequences)Sequences)
–Extending Dynamic Programming to more sequencesExtending Dynamic Programming to more sequences–Progressive Alignment (Tree or Hierarchical Methods)Progressive Alignment (Tree or Hierarchical Methods)–Iterative TechniquesIterative Techniques
Stochastic Algorithms (SA, GA, HMM)Stochastic Algorithms (SA, GA, HMM) Non Stochastic AlgorithmsNon Stochastic Algorithms
Database ScanningDatabase Scanning– FASTA, BLAST, PSIBLAST, ISSFASTA, BLAST, PSIBLAST, ISS
Alignment of Whole GenomesAlignment of Whole Genomes– MUMmer (Maximal Unique Match)MUMmer (Maximal Unique Match)
Alignment of Two SequencesAlignment of Two SequencesDealing Gaps in Pair-wise AlignmentDealing Gaps in Pair-wise Alignment
Sequence Comparison without GapsSequence Comparison without GapsSlide Windos method to got maximum scoreSlide Windos method to got maximum score
ALGAWDEALGAWDE
ALATWDEALATWDE
Total score= 1+1+0+0+1+1+1=5 ; (PID) = (5*100)/7Total score= 1+1+0+0+1+1+1=5 ; (PID) = (5*100)/7
Sequence with variable length should use dynamic programming Sequence with variable length should use dynamic programming
Sequence Comparison with GapsSequence Comparison with Gaps•Insertion and deletion is commonInsertion and deletion is common•Slide Window method failsSlide Window method fails•Generate all possible alignmentGenerate all possible alignment•100 residue alignment require > 10100 residue alignment require > 107575
Alternate Dot Matrix PlotAlternate Dot Matrix PlotDiagnoal * shows align/identical regionsDiagnoal * shows align/identical regions
Dynamic ProgrammingDynamic Programming
Dynamic Programming allow Optimal Alignment Dynamic Programming allow Optimal Alignment between two sequencesbetween two sequences
Allow Insertion and Deletion or Alignment with gapsAllow Insertion and Deletion or Alignment with gaps Needlman and Wunsh Algorithm (1970) for global Needlman and Wunsh Algorithm (1970) for global
alignmentalignment Smith & Waterman Algorithm (1981) for local Smith & Waterman Algorithm (1981) for local
alignmentalignment Important StepsImportant Steps
– Create DOTPLOT between two sequencesCreate DOTPLOT between two sequences– Compute SUM matrixCompute SUM matrix– Trace Optimal PathTrace Optimal Path
Alignment of Multiple SequencesAlignment of Multiple SequencesExtending Dynamic Programming to more sequencesExtending Dynamic Programming to more sequences
–Dynamic programming can be extended for more than twoDynamic programming can be extended for more than two–In practice it requires CPU and Memory (Murata et al 1985)In practice it requires CPU and Memory (Murata et al 1985)– MSA, Limited only up to 8-10 sequences (1989)MSA, Limited only up to 8-10 sequences (1989)–DCA (Divide and Conquer; Stoye et al., 1997), 20-25 sequencesDCA (Divide and Conquer; Stoye et al., 1997), 20-25 sequences–OMA (Optimal Multiple Alignment; Reinert et al., 2000)OMA (Optimal Multiple Alignment; Reinert et al., 2000)–COSA (Althaus et al., 2002)COSA (Althaus et al., 2002)
Progressive or Tree or Hierarchical Methods (CLUSTAL-Progressive or Tree or Hierarchical Methods (CLUSTAL-W)W)
–Practical approach for multiple alignmentPractical approach for multiple alignment–Compare all sequences pair wiseCompare all sequences pair wise–Perform cluster analysisPerform cluster analysis–GGenerate a hierarchy for alignmentenerate a hierarchy for alignment–first aligning the most similar pair of sequencesfirst aligning the most similar pair of sequences–Align alignment with next similar alignment or sequenceAlign alignment with next similar alignment or sequence
Database scanningDatabase scanning
Basic principles of Database searchingBasic principles of Database searching– Search query sequence against all sequence in databaseSearch query sequence against all sequence in database– Calculate score and select top sequencesCalculate score and select top sequences– Dynamic programming is best Dynamic programming is best
Approximation AlgorithmsApproximation Algorithms
FASTAFASTAFast sequence searchFast sequence searchBased on dotplotBased on dotplotIdentify identical words (k-tuples)Identify identical words (k-tuples)Search significant diagonalsSearch significant diagonalsUse PAM 250 for further refinementUse PAM 250 for further refinementDynamic programming for narrow regionDynamic programming for narrow region
Principles of FASTA AlgorithmsPrinciples of FASTA Algorithms
Database Scanning or Fold Database Scanning or Fold RecognitionRecognition
Concept of PSIBLASTConcept of PSIBLAST– Perform the BLAST search (gap handling)Perform the BLAST search (gap handling)– GeneImprove the sensivity of BLAST GeneImprove the sensivity of BLAST – rate the position-specific score matrixrate the position-specific score matrix– Use PSSM for next round of searchUse PSSM for next round of search
Intermediate Sequence SearchIntermediate Sequence Search– Search query against protein databaseSearch query against protein database– Generate multiple alignment or profileGenerate multiple alignment or profile– Use profile to search against PDBUse profile to search against PDB
Comparison of Whole Genomes Comparison of Whole Genomes MUMmer (Salzberg group, MUMmer (Salzberg group,
1999, 2002)1999, 2002)– Pair-wise sequence alignment of Pair-wise sequence alignment of
genomesgenomes– Assume that sequences are closely Assume that sequences are closely
relatedrelated– Allow to detect repeats, inverse repeats, Allow to detect repeats, inverse repeats,
SNPSNP– Domain inserted/deletedDomain inserted/deleted– Identify the exact matchesIdentify the exact matches
How it worksHow it works– Identify the maximal unique match Identify the maximal unique match
(MUM) in two genomes(MUM) in two genomes– As two genome are similar so larger MUM As two genome are similar so larger MUM
will be therewill be there– Sort the matches found in MUM and Sort the matches found in MUM and
extract longest set of possible matches extract longest set of possible matches that occurs in same order (Ordered that occurs in same order (Ordered MUM)MUM)
– Suffix tree was used to identify MUMSuffix tree was used to identify MUM– Close the gaps by SNPs, large insertsClose the gaps by SNPs, large inserts– Align region between MUMs by Smith-Align region between MUMs by Smith-
WatermanWaterman
Protein Structure Protein Structure PredictionPrediction
Experimental TechniquesExperimental Techniques– X-ray Crystallography X-ray Crystallography – NMRNMR
Limitations of Current Experimental Limitations of Current Experimental TechniquesTechniques– Protein DataBank (PDB) -> 24000 protein structuresProtein DataBank (PDB) -> 24000 protein structures– SwissProt -> 100,000 proteinsSwissProt -> 100,000 proteins– Non-Redudant (NR) -> 1,000,000 proteinsNon-Redudant (NR) -> 1,000,000 proteins
Importance of Structure PredictionImportance of Structure Prediction– Fill gap between known sequence and structures Fill gap between known sequence and structures – Protein Engg. To alter function of a proteinProtein Engg. To alter function of a protein– Rational Drug DesignRational Drug Design
Protein StructuresProtein Structures
Techniques of Structure Techniques of Structure PredictionPrediction
Computer simulation based on energy Computer simulation based on energy calculationcalculation– Based on physio-chemical principlesBased on physio-chemical principles– Thermodynamic equilibrium with a minimum free Thermodynamic equilibrium with a minimum free
energyenergy– Global minimum free energy of protein surfaceGlobal minimum free energy of protein surface
Knowledge Based approachesKnowledge Based approaches– Homology Based ApproachHomology Based Approach– Threading Protein SequenceThreading Protein Sequence– Hierarchical MethodsHierarchical Methods
Energy Minimization TechniquesEnergy Minimization TechniquesEnergy Minimization based methods in their pure form, make Energy Minimization based methods in their pure form, make
no priori assumptions and attempt to locate global minma.no priori assumptions and attempt to locate global minma. Static Minimization MethodsStatic Minimization Methods
– Classical many potential-potential can be construtedClassical many potential-potential can be construted– Assume that atoms in protein is in static formAssume that atoms in protein is in static form– Problems(large number of variables & minima and Problems(large number of variables & minima and
validity of potentials)validity of potentials) Dynamical Minimization MethodsDynamical Minimization Methods
– Motions of atoms also consideredMotions of atoms also considered– Monte Carlo simulation (stochastics in nature, time is Monte Carlo simulation (stochastics in nature, time is
not cosider)not cosider)– Molecular Dynamics (time, quantum mechanical, Molecular Dynamics (time, quantum mechanical,
classical equ.)classical equ.) LimitationsLimitations
– large number of degree of freedom,CPU power not large number of degree of freedom,CPU power not adequate adequate
– Interaction potential is not good enough to modelInteraction potential is not good enough to model
Knowledge Based ApproachesKnowledge Based Approaches Homology ModellingHomology Modelling
– Need homologues of known protein Need homologues of known protein structurestructure
– Backbone modellingBackbone modelling– Side chain modelling Side chain modelling – Fail in absence of homologyFail in absence of homology
Threading Based MethodsThreading Based Methods– New way of fold recognitionNew way of fold recognition– Sequence is tried to fit in known structuresSequence is tried to fit in known structures– Motif recognitionMotif recognition– Loop & Side chain modellingLoop & Side chain modelling– Fail in absence of known exampleFail in absence of known example
Hierarcial MethodsHierarcial Methods
Intermidiate structures are predicted, instead of Intermidiate structures are predicted, instead of predicting tertiary structure of protein from amino predicting tertiary structure of protein from amino acids sequenceacids sequence
Prediction of backbone structurePrediction of backbone structure– Secondary structure (helix, sheet,coil)Secondary structure (helix, sheet,coil)– Beta Turn PredictionBeta Turn Prediction– Super-secondary structureSuper-secondary structure
Tertiary structure predictionTertiary structure prediction Limitation Limitation
Accuracy is only 75-80 %Accuracy is only 75-80 %
Only three state predictionOnly three state prediction
cDNA clones(probes)
PCR product amplificationpurification
printing
microarray Hybridise target to microarray
mRNA target)
excitation
laser 1laser 2
emission
scanning
analysis
overlay images and normalise
0.1nl/spot
Major ApplicationsMajor Applications Identification of differentially Identification of differentially
expressed genes in diseased expressed genes in diseased tissues (in presence of drug)tissues (in presence of drug)
Classification of differentially Classification of differentially expressed (genes) or clustering/ expressed (genes) or clustering/ grouping of genes having similar grouping of genes having similar behaviour in different conditionsbehaviour in different conditions
Use expression profile of known Use expression profile of known disease to diagnosis and classify disease to diagnosis and classify of unknown genes of unknown genes
Terms/JargonsTerms/Jargons
Stanford/cDNA chipStanford/cDNA chip one slide/experimentone slide/experiment one spotone spot 1 gene => one spot 1 gene => one spot
or few spots(replica)or few spots(replica) control: control spotscontrol: control spots control: two control: two
fluorescent dyes fluorescent dyes (Cy3/Cy5)(Cy3/Cy5)
Affymetrix/oligo Affymetrix/oligo chipchip
one one chip/experimentchip/experiment
one one probe/feature/cellprobe/feature/cell
1 gene => many 1 gene => many probes (20~25 probes (20~25 mers)mers)
control: match and control: match and mismatch cells.mismatch cells.
Images : examplesImages : examples
Cy3
Cy5 Spot colourSpot colour Signal strengthSignal strength Gene Gene expressionexpression
yellowyellow Control = perturbedControl = perturbed unchangedunchanged
redred Control < perturbedControl < perturbed inducedinduced
greengreen Control > perturbedControl > perturbed repressedrepressed
Pseudo-colour overlay
Processing of imagesProcessing of images Addressing or griddingAddressing or gridding
– Assigning coordinates to each of the spotsAssigning coordinates to each of the spots
SegmentationSegmentation– Classification of pixels either as foreground Classification of pixels either as foreground
or as backgroundor as background
Intensity determination for each spotIntensity determination for each spot– Foreground fluorescence intensity pairs (R, Foreground fluorescence intensity pairs (R,
G)G)– Background intensitiesBackground intensities– Quality measuresQuality measures
Management of Microarray DataManagement of Microarray Data
Magnitude of DataMagnitude of Data– ExperimentsExperiments
50 000 genes in human50 000 genes in human 320 cell types320 cell types 2000 compunds2000 compunds 3 times points3 times points 2 concentrations2 concentrations 2 replicates2 replicates
– Data VolumeData Volume 4*104*1011 11 data-pointsdata-points 10101515 = 1 petaB of Data = 1 petaB of Data
Management of Microarray Management of Microarray DataData
Major IssuesMajor Issues Large volume of microarray data in last few Large volume of microarray data in last few
yearsyears– Storage and efficient accessStorage and efficient access– Comparison and integration of dataComparison and integration of data
Problem of data access and exchangeProblem of data access and exchange– Data scattered around InternetData scattered around Internet– Supplementary material of publicationsSupplementary material of publications– Difficult for user to access relivent dataDifficult for user to access relivent data
Problems with existing databasesProblems with existing databases– Diverse purposeDiverse purpose– Developed for specific purposeDeveloped for specific purpose
Management of Microarray Management of Microarray DataData
Specific DatabaseSpecific Database– Platform (eg.Stanford MA Database; SMD)Platform (eg.Stanford MA Database; SMD)– Organism (Yeast MA global viewer)Organism (Yeast MA global viewer)– Project (Life cycle database of Project (Life cycle database of DrosophilaDrosophila))
Problem with Supplement and MA Problem with Supplement and MA databasesdatabases– Lack of direct accessLack of direct access– Quality not checkedQuality not checked– No standard formatNo standard format– Incomplete data Incomplete data
Pre-processed cDNA Gene Pre-processed cDNA Gene Expression DataExpression Data
On p genes for n slides: p is O(10,000), n is O(10-100), but growing,On p genes for n slides: p is O(10,000), n is O(10-100), but growing,
Genes
Slides
Gene expression level of gene 5 in slide 4
= Log2( Red intensity / Green intensity)
slide 1 slide 2 slide 3 slide 4 slide 5 …
1 0.46 0.30 0.80 1.51 0.90 ...2 -0.10 0.49 0.24 0.06 0.46 ...3 0.15 0.74 0.04 0.10 0.20 ...4 -0.45 -1.03 -0.79 -0.56 -0.32 ...5 -0.06 1.06 1.35 1.09 -1.09 ...
These values are conventionally displayed on a red (>0) yellow (0) green (<0) scale.
Analysis of Microarray DataAnalysis of Microarray Data Analysis of imagesAnalysis of images Preprocessing of gene expression data Preprocessing of gene expression data Normalization of dataNormalization of data
– Subtraction of Background NoiseSubtraction of Background Noise– Global/local Normalization Global/local Normalization – House keeping genes (or same gene) House keeping genes (or same gene) – Expression in ratio (test/references) in logExpression in ratio (test/references) in log
Differential Gene expressionDifferential Gene expression– Repeats and calculate significance (t-test)Repeats and calculate significance (t-test)– Significance of fold used statistical methodSignificance of fold used statistical method
ClusteringClustering– Supervised/Unsupervised (Hierarchical, K-Supervised/Unsupervised (Hierarchical, K-
means, SOM)means, SOM) Prediction or Supervised Machine Learnning Prediction or Supervised Machine Learnning
(SVM)(SVM)
Normalization TechniquesNormalization Techniques Global normalizationGlobal normalization
– Divide channel value by meansDivide channel value by means Control spotsControl spots
– Common spots in both channelsCommon spots in both channels– House keeping genesHouse keeping genes– Ratio of intensity of same gene in two channel is used for Ratio of intensity of same gene in two channel is used for
correctioncorrection Iterative linear regressionIterative linear regression Parametric nonlinear nomalization Parametric nonlinear nomalization
– log(CY3/CY5) vs log(CY5))log(CY3/CY5) vs log(CY5))– Fitted log ratio – observed log ratioFitted log ratio – observed log ratio
General Non Linear NormalizationGeneral Non Linear Normalization– LOESSLOESS– curve between log(R/G) vs log(sqrt(R.G))curve between log(R/G) vs log(sqrt(R.G))
ClassificationClassification
Task: Task: assign objects to classes assign objects to classes (groups) on the basis of (groups) on the basis of measurements made on the objectsmeasurements made on the objects
Unsupervised: Unsupervised: classes unknown, want classes unknown, want to discover them from the data to discover them from the data (cluster analysis)(cluster analysis)
Supervised: Supervised: classes are predefined, classes are predefined, want to use a (training or learning) set want to use a (training or learning) set of labeled objects to form a classifier of labeled objects to form a classifier for classification of future observationsfor classification of future observations
Issues in ClusteringIssues in Clustering
Pre-processing (Image analysis and Pre-processing (Image analysis and Normalization)Normalization)
Which genes (variables) are usedWhich genes (variables) are used Which samples are usedWhich samples are used Which distance measure is usedWhich distance measure is used Which algorithm is appliedWhich algorithm is applied How to decide the number of clusters How to decide the number of clusters KK
Unsupervised LearnningUnsupervised Learnning
Hierarchical clustering: Hierarchical clustering: merging two merging two branches at the time until all vari-ablesbranches at the time until all vari-ables
(genes) are in one tree. [it does not answer the (genes) are in one tree. [it does not answer the question of “howquestion of “how
many gene clusters there are”?]many gene clusters there are”?] K-mean clustering: K-mean clustering: assuming there are K assuming there are K
clusters. [what if this assump-tionclusters. [what if this assump-tion is incorrect?]is incorrect?] Model-based clustering: Model-based clustering: the number of the number of
clusters is determined dynami-callyclusters is determined dynami-cally [could be one of the most promising methods][could be one of the most promising methods]
Supervised AnalysisSupervised Analysis
Fisher’s linear discriminant Fisher’s linear discriminant analysisanalysis
Quadratic discriminant analysisQuadratic discriminant analysis Logistic regression Logistic regression (a linear (a linear
discriminant analysis)discriminant analysis) Neural networksNeural networks Support vector machineSupport vector machine
Traditional ProteomicsTraditional Proteomics
1D gel electrophoresis (SDS-PAGE)1D gel electrophoresis (SDS-PAGE) 2D gel electrophoresis2D gel electrophoresis Protein ChipsProtein Chips
– Chips coated with proteins/AntibodiesChips coated with proteins/Antibodies– large scale version of ELISAlarge scale version of ELISA
Mass SpectrometryMass Spectrometry– MALDI: Mass fingerprintingMALDI: Mass fingerprinting– Electrospray and tandem mass Electrospray and tandem mass
spectrometryspectrometry Sequencing of Peptides (N->C)Sequencing of Peptides (N->C) Matching in Genome/Proteome DatabasesMatching in Genome/Proteome Databases
Overview of 2D Gel Overview of 2D Gel SDS-PAGE + Isoelectric focusing (IEF)SDS-PAGE + Isoelectric focusing (IEF)
– Gene Expression StudiesGene Expression Studies– Medical Applications Medical Applications – Sample ExperimentsSample Experiments
Capturing and Analyzing DataCapturing and Analyzing Data– Image AcquistionImage Acquistion– Image Sizing & OrientationImage Sizing & Orientation– Spot IdentificationSpot Identification– Matching and AnalysisMatching and Analysis
Comparision/Matcing of Gel Comparision/Matcing of Gel ImagesImages
Compare 2 gel imagesCompare 2 gel images– Set X and y axisSet X and y axis– Overlap matching spotsOverlap matching spots– Compare intensity of spotsCompare intensity of spots
Scan against databaseScan against database– Compare query gel with all gelsCompare query gel with all gels– Calculate similarity scoreCalculate similarity score– Sort based on score Sort based on score
Differential Differential Proteomics:Proteomics:
Fingerprints of Fingerprints of DiseaseDisease
PhenotypicPhenotypicChangesChanges
PhenotypicPhenotypicChangesChanges
Normal CellsNormal Cells
Disease CellsDisease Cells
•Differential protein expression• Protein nitration patterns•Altered phosporylation•Altered glycosylation profiles
•Differential protein expression• Protein nitration patterns•Altered phosporylation•Altered glycosylation profiles Utility
•Target discovery•Disease pathways•Disease biomarkers
Utility•Target discovery•Disease pathways•Disease biomarkers
Fingerprinting TechniqueFingerprinting Technique
What is fingerprintingWhat is fingerprinting– It is technique to create specific pattern for a given It is technique to create specific pattern for a given
organism/personorganism/person– To compare pattern of query and target objectTo compare pattern of query and target object– To create Phylogenetic tree/classification based on patternTo create Phylogenetic tree/classification based on pattern
Type of FingerprintingType of Fingerprinting– DNA FingerprintingDNA Fingerprinting– Mass/peptide fingerprintingMass/peptide fingerprinting– Properties based (Toxicity, classification)Properties based (Toxicity, classification)– Domain/conserved pattern fingerprinting Domain/conserved pattern fingerprinting
Common ApplicationsCommon Applications– Paternity and Maternity Paternity and Maternity – Criminal Identification and ForensicsCriminal Identification and Forensics– Personal Identification Personal Identification – Classification/Identification of organismsClassification/Identification of organisms– Classification of cellsClassification of cells
Fingerprinting Techniques: Principles & Fingerprinting Techniques: Principles & ApplicationsApplications
What is fingerprintingWhat is fingerprinting Type of FingerprintingType of Fingerprinting Common ApplicationsCommon Applications
Role of Computer in DNA FingerprintingRole of Computer in DNA Fingerprinting– Searching Restriction EnzymesSearching Restriction Enzymes– Searching VNTRsSearching VNTRs– Computation of size of DNA fragmentsComputation of size of DNA fragments– Optimization of gelsOptimization of gels– Comparison of patternsComparison of patterns– Creation of Phylogenetic treeCreation of Phylogenetic tree
Drug Design Drug Design
History of Drug/Vaccine developmentHistory of Drug/Vaccine development– Plants or Natural ProductPlants or Natural Product
Plant and Natural products were source for medical Plant and Natural products were source for medical substancesubstance
Example: foxglove used to treat congestive heart failureExample: foxglove used to treat congestive heart failure Foxglove contain digitalis and cardiotonic glycosideFoxglove contain digitalis and cardiotonic glycoside Identification of active componentIdentification of active component
– Accidental ObservationsAccidental Observations Penicillin is one good examplePenicillin is one good example Alexander Fleming observed the effect of moldAlexander Fleming observed the effect of mold Mold(Penicillium) produce substance penicillinMold(Penicillium) produce substance penicillin Discovery of penicillin lead to large scale screeningDiscovery of penicillin lead to large scale screening Soil micoorganism were grown and testedSoil micoorganism were grown and tested Streptomycin, neomycin, gentamicin, tetracyclines etc.Streptomycin, neomycin, gentamicin, tetracyclines etc.
Drug DesignDrug Design Chemical Modification of Known DrugsChemical Modification of Known Drugs
– Drug improvement by chemical modificationDrug improvement by chemical modification– Pencillin G -> Methicillin; morphine->nalorphinePencillin G -> Methicillin; morphine->nalorphine
Receptor Based drug designReceptor Based drug design– Receptor is the target (usually a protein)Receptor is the target (usually a protein)– Drug molecule binds to cause biological effectsDrug molecule binds to cause biological effects– It is also called lock and key systemIt is also called lock and key system– Structure determination of receptor is importantStructure determination of receptor is important
Ligand-based drug designLigand-based drug design– Search a lead ocompound or active ligandSearch a lead ocompound or active ligand– Structure of ligand guide the drug design processStructure of ligand guide the drug design process
Drug Design based on Bioinformatics ToolsDrug Design based on Bioinformatics Tools Detect the Molecular Bases for DiseaseDetect the Molecular Bases for Disease
– Detection of drug binding siteDetection of drug binding site
– Tailor drug to bind at that siteTailor drug to bind at that site
– Protein modeling techniquesProtein modeling techniques
– Traditional Method (brute force testing)Traditional Method (brute force testing)
Rational drug design techniquesRational drug design techniques
– Screen likely compounds built Screen likely compounds built
– Modeling large number of compounds (automated)Modeling large number of compounds (automated)
– Application of Artificial intelligenceApplication of Artificial intelligence
– Limitation of known structuresLimitation of known structures
Important Points in Drug Design based on Important Points in Drug Design based on Bioinformatics ToolsBioinformatics Tools
Application of GenomeApplication of Genome– 3 billion bases pair3 billion bases pair– 30,000 unique genes30,000 unique genes– Any gene may be a potential drug targetAny gene may be a potential drug target– ~500 unique target~500 unique target– Their may be 10 to 100 variants at each Their may be 10 to 100 variants at each
target genetarget gene– 1.4 million SNP1.4 million SNP– 1010200200 potential small molecules potential small molecules
Concept of Drug and VaccineConcept of Drug and Vaccine
Concept of DrugConcept of Drug– Kill invaders of foreign pathogensKill invaders of foreign pathogens– Inhibit the growth of pathogensInhibit the growth of pathogens
Concept of VaccineConcept of Vaccine– Generate memory cellsGenerate memory cells– Trained immune system to face Trained immune system to face
various existing disease agentsvarious existing disease agents
VACCINESVACCINES
AA. SUCCESS STORY. SUCCESS STORY::• COMPLETE ERADICATION OF SMALLPOXCOMPLETE ERADICATION OF SMALLPOX• WHO PREDICTION : ERADICATION OF PARALYTICWHO PREDICTION : ERADICATION OF PARALYTIC
POLIO THROUGHOUT THE WORLD BY YEAR 2003POLIO THROUGHOUT THE WORLD BY YEAR 2003• SIGNIFICANT REDUCTION OF INCIDENCE OF DISEASES:SIGNIFICANT REDUCTION OF INCIDENCE OF DISEASES:
DIPTHERIA, MEASLES, MUMPS, PERTUSSIS, RUBELLA,DIPTHERIA, MEASLES, MUMPS, PERTUSSIS, RUBELLA,POLIOMYELITIS, TETANUSPOLIOMYELITIS, TETANUS
B.NEED OF AN HOURB.NEED OF AN HOUR1) SEARCH FOR NONAVAILABILE EFFECTIVE VACCINES FOR 1) SEARCH FOR NONAVAILABILE EFFECTIVE VACCINES FOR
DISEASES LIKE: DISEASES LIKE: MALARIA, TUBERCULOSIS AND AIDSMALARIA, TUBERCULOSIS AND AIDS
2) IMPROVEMENT IN SAFETY AND EFFICACY OF PRESENT2) IMPROVEMENT IN SAFETY AND EFFICACY OF PRESENTVACCINESVACCINES3) LOW COST3) LOW COST4) EFFICIENT DELIVERY TO NEEDY4) EFFICIENT DELIVERY TO NEEDY5) REDUCTION OF ADVERSE SIDE EFFECTS5) REDUCTION OF ADVERSE SIDE EFFECTS
Computer Aided Vaccine Computer Aided Vaccine DesignDesign
Whole Organism of PathogenWhole Organism of Pathogen– Consists more than 4000 genes and Consists more than 4000 genes and
proteinsproteins– Genomes have millions base pairGenomes have millions base pair
Target antigen to recognise pathogenTarget antigen to recognise pathogen– Search vaccine target (essential and non-Search vaccine target (essential and non-
self)self)– Consists of amino acid sequence (e.g. A-V-L-Consists of amino acid sequence (e.g. A-V-L-
G-Y-R-G-C-T ……)G-Y-R-G-C-T ……) Search antigenic region (peptide of Search antigenic region (peptide of
length 9 amino acids)length 9 amino acids)
Major steps of endogenous antigen processing
Computer Aided Vaccine Computer Aided Vaccine DesignDesign
Problem of Pattern RecognitionProblem of Pattern Recognition– ATGGTRDAR ATGGTRDAR EpitopeEpitope– LMRGTCAAYLMRGTCAAY Non-epitopeNon-epitope– RTTGTRAWR RTTGTRAWR EpitopeEpitope– EMGGTCAAYEMGGTCAAY Non-epitopeNon-epitope– ATGGTRKAR ATGGTRKAR EpitopeEpitope– GTCVGYATTGTCVGYATT EpitopeEpitope
Commonly used techniquesCommonly used techniques– Statistical (Motif and Matrix)Statistical (Motif and Matrix)– AI TechniquesAI Techniques
Why computational tools are required for prediction.
200 aa proteins
Chopped to overlapping peptides of 9 amino acids
192 peptides
invitro or invivo experiments for detecting which snippets of protein will spark an immune response.
10-20 predicted peptides
Bioinformatics Tools
ThanksThanks