Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director...
-
Upload
kerry-wood -
Category
Documents
-
view
212 -
download
0
Transcript of Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director...
![Page 1: Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.](https://reader036.fdocuments.in/reader036/viewer/2022083010/5697bfbc1a28abf838ca1909/html5/thumbnails/1.jpg)
Basic Overview of Bioinformatics Tools and
Biocomputing Applications II
Dr Tan Tin Wee
Director
Bioinformatics Centre
![Page 2: Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.](https://reader036.fdocuments.in/reader036/viewer/2022083010/5697bfbc1a28abf838ca1909/html5/thumbnails/2.jpg)
Common Computational Analyses• Sequence Assembly• Simple sequence analysis
– Translation and reverse Complement, ORF– Composition statistics (protein & DNA)– Molecular mass– Total charge and pI; local hydropathy– Simple determination of secondary structures – Restriction site analysis– Internal repeat analysis
• Detection of active sites, functional residues, characteristic structures, substrates, and processing signals
![Page 3: Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.](https://reader036.fdocuments.in/reader036/viewer/2022083010/5697bfbc1a28abf838ca1909/html5/thumbnails/3.jpg)
Common Computational Analyses
• Database sequence search
• Multiple alignment
• 2 and 3 Structure prediction; transmembrane helix detection
• Structure modeling
• Docking prediction and design
• Hidden Markov model searches
![Page 4: Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.](https://reader036.fdocuments.in/reader036/viewer/2022083010/5697bfbc1a28abf838ca1909/html5/thumbnails/4.jpg)
Database Searching
• Text-based Database Searching -using a text string to match an annotation in a sequence database record, ie. Keyword search
• Sequence-based Database Searching -using a biological sequence to match its whole or parts of its sequence to the sequences of every sequence database records
![Page 5: Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.](https://reader036.fdocuments.in/reader036/viewer/2022083010/5697bfbc1a28abf838ca1909/html5/thumbnails/5.jpg)
Text-Based Database Searching• Examples: Entrez, SRS, DBGET, AceDB
- common integrated database systems• Search Concepts
– Boolean Search - AND, OR, NOT– Broadening Search– Narrowing the Search– Proximity searching, soundex– Wild Card, Stemming eg. Thala* for thalasemia, thalassemia,
thalassemic
• Use standard string search algorithms and boolean operations, vocabulary matches
![Page 6: Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.](https://reader036.fdocuments.in/reader036/viewer/2022083010/5697bfbc1a28abf838ca1909/html5/thumbnails/6.jpg)
Text-based Database Searching
• Example: To find the human homolog of the Drosophila per gene• Procedure
– Web to Entrez– All Fields : enter "human" "per"– Hits returned, irrelevant - broaden search– "human" "period" - more hits– check every one, find the human RIGUI gene
• Hit and miss, clever guess work, free form or controlled vocabulary (MeSH terms)?Use Boolean searches?
![Page 7: Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.](https://reader036.fdocuments.in/reader036/viewer/2022083010/5697bfbc1a28abf838ca1909/html5/thumbnails/7.jpg)
Sequence-based Database Searching
• Homology Search
• Global or Local Sequence Alignment
• Needleman-Wunch Algorithm
• Smith-Waterman Algorithm
• Lipman - Pearson FASTA
• Altschul's BLAST
• Take a sequence, pairwise comparison with each sequence in the database
![Page 8: Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.](https://reader036.fdocuments.in/reader036/viewer/2022083010/5697bfbc1a28abf838ca1909/html5/thumbnails/8.jpg)
Sequence-based Database Searching
• Basic Assumptions:• Sequences of homologous Genes/Protein diverge over
time even though structure and/or function change little• Significant sequence similarity inferred as potential
structural /functional similarity or common evolutionary origin
• Based on well-characterised protein, infer the function of an unknown sequence at gene or protein sequence level.
![Page 9: Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.](https://reader036.fdocuments.in/reader036/viewer/2022083010/5697bfbc1a28abf838ca1909/html5/thumbnails/9.jpg)
Sequence-based Database Searching
• Global Alignmentforces complete alignment of the pairwise comparison of the two input sequences
• Local Alignmentlooks for local stretches of similarity and tries to align the most similar segments
• Algorithms used may be similar, but output different, statistics needed to assess results
![Page 10: Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.](https://reader036.fdocuments.in/reader036/viewer/2022083010/5697bfbc1a28abf838ca1909/html5/thumbnails/10.jpg)
Sequence-based Database Searching
• Alignment Scoring
• Substitution score and substitution matrixPAM, BLOSUM
• affine gap costs/gap penalty and gap scores
• Optimal alignments, dynamic programmingNeedleman-Wunsch algorithm,Smith-Waterman algorithm (SSEARCH)
• Additional heuristics to speed up the search - FASTA, BLAST
![Page 11: Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.](https://reader036.fdocuments.in/reader036/viewer/2022083010/5697bfbc1a28abf838ca1909/html5/thumbnails/11.jpg)
Some definitions
• Affine gap costs - scoring system for gaps within alignments which charges a penalty for gap formation and additional per-residue penalty proportional to size of gap
• Alignment score - numerical value indicating the overall quality of an alignment, the higher the better the alignment.
• Algorithm - fixed procedure embodied in a computer program
• Heuristics - a computer science term referring to guesses made by the program to approximate results, usually based on arbitrary or predefined rules.
• Gapped Alignment - alignment of sequences where gaps are permitted
![Page 12: Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.](https://reader036.fdocuments.in/reader036/viewer/2022083010/5697bfbc1a28abf838ca1909/html5/thumbnails/12.jpg)
Computational Genefinding
• Major challenge in genome project
• Given a DNA sequence, where does a gene begin and stop? - ORF
• Where are the exons and introns?
• Where are the transcription elements?
• Gene structure and other regulatory elements?
![Page 13: Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.](https://reader036.fdocuments.in/reader036/viewer/2022083010/5697bfbc1a28abf838ca1909/html5/thumbnails/13.jpg)
Genomic Elements
• Intron-exon splice sites• Start-Stop codons• Branch Points• Promoters and terminators of transcription• Polyadenylation sites• ribosomal binding sites• Topoisomerase II binding sites• Topoisomerase I cleavage sites• Transcription factor binding sites
![Page 14: Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.](https://reader036.fdocuments.in/reader036/viewer/2022083010/5697bfbc1a28abf838ca1909/html5/thumbnails/14.jpg)
Detecting Genomic Elements
• Local sites and motifs/patterns for such element - signals and signal sensors
• Extended variable-length regions eg exons and introns- contents and content sensors
• Linguistic technique - gene structure described in formal grammar - GeneLang genefinding program
![Page 15: Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.](https://reader036.fdocuments.in/reader036/viewer/2022083010/5697bfbc1a28abf838ca1909/html5/thumbnails/15.jpg)
Signal sensors
• Simple consensus sequenceUse of Pattern matching algorithms
• Weight matricesallow for weighted score for each weight matrix sensors to be summed
• Use of Artificial Neural Networks (ANN)
![Page 16: Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.](https://reader036.fdocuments.in/reader036/viewer/2022083010/5697bfbc1a28abf838ca1909/html5/thumbnails/16.jpg)
Content Sensors
• Long ORF for bacteria• Statistical models eg. Markov models -
GeneMarkstatistical models of nucleotide frequencies and dependencies in codon structure
• Neural Nets eg Grailexon detection by neural network combined with signal sensors for exon-intron splice sites
![Page 17: Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.](https://reader036.fdocuments.in/reader036/viewer/2022083010/5697bfbc1a28abf838ca1909/html5/thumbnails/17.jpg)
Some Definitions
• Artificial Neural Nets - statistical pattern recognition method - a type of nonlinear regression
• Markov Models - statistical models for sequences in which the probability of each residue depends on the residues preceding it.
• Dynamic Programming - type of algorithm widely used for constructing sequence aligments and for evaluating all posible candidate gene structure
![Page 18: Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.](https://reader036.fdocuments.in/reader036/viewer/2022083010/5697bfbc1a28abf838ca1909/html5/thumbnails/18.jpg)
Other Genefinding methods
• Use of dynamic programmingLinguistic rules for functional featuresParameters of a Markov Process on hidden variables - hidden Markov Models (HMM)
• HMM genefinder - EcoParse, Xpound GeneMark HMM, Veil, HMMgene, GenScan