Tools and Softwares Group 13 Shashi Ranjan Murali Sivaramakrishnan Ognjen Perisic.

Click here to load reader

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of Tools and Softwares Group 13 Shashi Ranjan Murali Sivaramakrishnan Ognjen Perisic.

  • Slide 1
  • Tools and Softwares Group 13 Shashi Ranjan Murali Sivaramakrishnan Ognjen Perisic
  • Slide 2
  • Introduction Importance of aligning two sequences Identify the structure and function Evolutionary similarities Why use computers to do this alignment? Sequences are very, very, very long Need to evaluate many alignments
  • Slide 3
  • Tools and Softwares BLAST PIP MAKER FASTA Motif Explorer MACAW Five Methods for finding Conserved Segments.
  • Slide 4
  • BLAST Overview Basic Local Alignment Search Tool Dynamic programming O(mn) ! BLAST heuristically finds high scoring segment pairs (HSP) Identical length segments from 2 sequences with high scores I.e. ungapped local alignments
  • Slide 5
  • BLAST Nuts & Bolts Given: query sequence q, word length w, word score threshold T, segment score threshold S: 1.Generate a word set (neighborhood words) that score at least T when compared to words form q 2.Search the dB for matches to words in list, a match indicates a possible alignment 3.Extend all matches to seek high-scoring segment pairs and return those pairs
  • Slide 6
  • Slide 7
  • BLAST Extensions Gapped BLAST 3 Changes to the original BLAST 1.Word extension criteria modified 2.Allow gapped alignments, need to find just one ungapped alignment 3.Use dynamic programming of Smith- Waterman to produce final alignment
  • Slide 8
  • BLAST Extensions Position Specific Iterated BLAST Basic Idea 1.Use results from BLAST query to build a profile 2.Search dB with profile instead of initial query 3.Iterate
  • Slide 9
  • BLAST Programs NCBI BLAST ----
  • Slide 10
  • PipMaker WWW site for comparing DNA sequences Identifies conserved sequences Provides graphical output in PDF or PS format Uses a variant of Gapped BLAST - BLASTz
  • Slide 11
  • Tools And Software FASTA Motif Explorer Accessibility
  • Slide 12
  • FASTA Overview 1.Given: two sequences x,y and k=size of exactly matching strings. 1.Tabulate the offset between x,y for all matching strings of size k (k- tuples) 2.For offset with high frequency, try to combine their k-tuples into regions 2.Extend the best regions without gaps as long as score improves. (using the substitution matrix such as PAM250) (init1) 3.Check if un-gapped regions can be combined with a small gapped region. (initn used to rank the library sequences.) 4.Construct an optimal alignment of the highest ranking seq. (opt)
  • Slide 13
  • FASTA- offset tabulation X = HARFYAAQIVL Y = VDMAAQIA K-tuple offsets of x (k=1) A=(2,6,7), I=(9), , Q=(8), V=(10), Offset difference for each k-tuple of Y: VDMAAQIA 9-2-322-6 21-2 32 Offsets-6-5-4-3-20123489 Frequency11211411
  • Slide 14
  • Slide 15
  • FASTA Package FASTAscan a protein or DNA sequence library for similar sequences. TFASTAcompare a protein sequence to a DNA sequence library. FASTX, FASTYcompare a DNA sequence to a protein sequence database. FASTF, TFASTFcompare an ordered peptide mixture against a protein or translated DNA database. FASTS, TFASTS compare a set of short peptide fragments against a protein or translated DNA database. LFASTA, PLFASTA concentrates more on the local regions and reports more than on one sequence. RDF2for evaluating the statistical significance of a similarity score.
  • Slide 16
  • Motif Explorer
  • Slide 17
  • Slide 18
  • Motif Explorer - Architecture
  • Slide 19
  • Accessibility FASTA: Motif Explorer
  • Slide 20
  • MACAW A Workbench for Multiple Alignment Construction and Analysis Allows user to construct multiple protein alignments by locating, analyzing, editing and combining blocks of aligned sequence segments. MACAW incorporate several features: 1.Regions of local similarity are located by a search algorithm that avoids many of the limitations of individual algorithms; 2.The statistical significance of blocks of similarity is evaluated using a mathematical theory developed during paste 15 years; 3.User can edit each block by moving its boundaries or by eliminating particular segments, and blocks may be linked to form a composite multiple alignment.
  • Slide 21
  • Theoretical boundaries Problem: For a set of n protein sequences each of length l, a region of similarity common to all may begin anywhere in each sequence and it should be compared to all other sequences, that is l n alignments should be checked. This search space is too large. MACAW imposes a single condition on the alignment it seeks: that all segments show a minimal amount of mutual similarity. This can examine all of the search space in O(n 2 l 2 ) time.
  • Slide 22
  • Terminology For a set of n sequences, one subset of segments of some specific length from each of m (m