8/3/2019 Basic Local Alignment and Search Tool
1/33
Basic Local Alignment and Search Tool(BLAST)
Database searching
10-02-2009
Barbera van Schaik
8/3/2019 Basic Local Alignment and Search Tool
2/33
2
Why use BLAST?
Dynamic Programming is not suitable for comparing a query
sequence against a database Takes too much time!
BLAST is a heuristic method to find the highest locally optimal
alignments
BLAST improved overall speed of searches
BLAST maintains good sensitivity
8/3/2019 Basic Local Alignment and Search Tool
3/33
3
BLAST terminology
query sequence
blast
targetdatabase(GenBank/
SwissProt)
output
sequence list:Hits/subject
information aboutinput query sequence, e.g.,function
The aim of a database (blast) search is to discover sequence homologyon basis of sequence similarity
BLAST returns similar sequences, not necessarily biological similarsequences
8/3/2019 Basic Local Alignment and Search Tool
4/33
4
BLAST variants
blastptblastnamino acid query
blastxblastn/tblastxnucleotide query
protein databasenucleotidedatabase
Sequence type
blastn: finds NT sequences similar to your NT sequenceblastp: finds AA sequences similar to your AA sequenceblastx: finds AA sequences similar to translation of your NT
sequence (if you cannot recognize an ORF)tblastn: translate AA sequence and searches against NTdatabase (for finding pseudogenes)
tblastx: keep computers busy
8/3/2019 Basic Local Alignment and Search Tool
5/33
5
http://blast.ncbi.nlm.nih.gov/Blast.cgi
Webinterf
acechange
snowand
then
8/3/2019 Basic Local Alignment and Search Tool
6/33
6
http://blast.ncbi.nlm.nih.gov/Blast.cgi
8/3/2019 Basic Local Alignment and Search Tool
7/33
7
http://blast.ncbi.nlm.nih.gov/Blast.cgi
8/3/2019 Basic Local Alignment and Search Tool
8/33
8
BLASTing a sequence at NCBI programs
8/3/2019 Basic Local Alignment and Search Tool
9/33
9
BLASTing a sequence at NCBI enter accession
8/3/2019 Basic Local Alignment and Search Tool
10/33
10
BLASTing a sequence at NCBI enter sequence
8/3/2019 Basic Local Alignment and Search Tool
11/33
11
Database choice
Protein databases
Also good for protein coding nucleotide queries
Choose a non-redundant database
Nucleotide databases
Non-redundant database
Filter on organism / other Entrez query
8/3/2019 Basic Local Alignment and Search Tool
12/33
12
BLASTing a sequence at NCBI parameters
8/3/2019 Basic Local Alignment and Search Tool
13/33
Blast algorithm: step 1
protein query sequence
protein database
compile list of words
of length W
13
8/3/2019 Basic Local Alignment and Search Tool
14/33
Blast algorithm: step 2Initial search
Use PAM/BLOSUM matrix
Find word of length W thatscores at least T (T=11)
Exact matches only
The parameter T dictates the
speed and sensitivity-increasing T increases speed,decreases sensitivity
14
8/3/2019 Basic Local Alignment and Search Tool
15/33
Join words on same diagonal(ungapped)
database sequence
word hit
High-scoring segment pair (score=S')
distance AScore>T Score>T
15
8/3/2019 Basic Local Alignment and Search Tool
16/33
Join words on same diagnonal
database sequence
High-scoring segment pair (score=S')
Extend HSP until score drops small amount below highest scoreof shorter alignment
score score HSP
extend to left (similar to right)
stop extensiondrops below threshold
score S for this alignment
If S>threshold (based on random sequences) then keep HSP16
8/3/2019 Basic Local Alignment and Search Tool
17/33
Finding HSPs
T>11
T>13
HSP
(Join and extend)
17
8/3/2019 Basic Local Alignment and Search Tool
18/33
Trigger gapped extension
HSP
Gapped extension
18
8/3/2019 Basic Local Alignment and Search Tool
19/33
19
BLASTing a sequence at NCBI parameters
8/3/2019 Basic Local Alignment and Search Tool
20/33
20
Masking of sequences low complexity
Low complexity repeats in genome
Many amino-acid stretches in proteins
BLAST recognizes these regions as similar
but, they are NOT biologically related
8/3/2019 Basic Local Alignment and Search Tool
21/33
21
Masking of sequences highly abundant sequences
First query sequence against database that contains domainsrepresentative of large sequence families
Alu repeatsProtein kinase catalytic domainsVector sequences
Then mask these domains in the query sequence and continuesearch
Masking option replaces these regions with XXXXXXX
8/3/2019 Basic Local Alignment and Search Tool
22/33
22
When do you change the parameters?
The database youre searching OR
filter the reported entries by keywordOR increase the nr of reportedmatches OR increase Expect (theevalue threshold) OR rejectsequences too similar to the query
(those with very low e-values)
BLAST reports too many matches
The substitution matrix or the gappenalties to check the matchsrobustness
Your match has a borderline E-value
The substitution matrix or the gappenalties
BLAST doesnt report any results
Sequence filter (automatic masking)The sequence youre interested in
contains many identical residues; ithas a biased composition
Parameters to changeReason
Parametersare
already
optimiz
ed
8/3/2019 Basic Local Alignment and Search Tool
23/33
23
BLASTing a sequence at NCBI parameters
8/3/2019 Basic Local Alignment and Search Tool
24/33
24
BLASTing a sequence at NCBI job status
8/3/2019 Basic Local Alignment and Search Tool
25/33
25
If it takes too long: try another BLAST server
http://blast.ncbi.nlm.nih.gov/Blast.cgi
http://www.expasy.ch/tools/blast/http://www.ch.embnet.org/software/
bBLAST.html
http://www.ebi.ac.uk/blast
http://blast.ddbj.nig.ac.jp/top-e.html
BLAST / PSI-BLAST
BLASTBLAST
BLAST
BLAST / PSI-BLAST
USA
EuropeEurope
Europe
Japan
URLProgramCountry /continent
Warning:d
ifferentdata
base(versions)!
8/3/2019 Basic Local Alignment and Search Tool
26/33
26
BLASTing a sequence at NCBI blast summary
8/3/2019 Basic Local Alignment and Search Tool
27/33
27
BLASTing a sequence at NCBI used parameters
8/3/2019 Basic Local Alignment and Search Tool
28/33
28
BLASTing a sequence at NCBI graphical display
8/3/2019 Basic Local Alignment and Search Tool
29/33
29
BLASTing a sequence at NCBI hit list
How often would thishit have occurred bychance?
Rule of thumb:E-value < 0.0001
8/3/2019 Basic Local Alignment and Search Tool
30/33
30
BLASTing a sequence at NCBI
8/3/2019 Basic Local Alignment and Search Tool
31/33
31
Alternatives for homology searches
http://genome.ucsc.edu/BLATUSA
http://www.ddbj.nig.ac.jp/search/
search-e.html
SSEARCH/ FASTA
Japan
http://www.ch.embnet.org/software/GMFDF_form.html
SSEARCHEurope
http://www.ebi.ac.uk/fasta33FASTAEurope
http://fasta.bioch.Virginia.edu/fastaFASTAUSA
AddressProgramCountry /continent
8/3/2019 Basic Local Alignment and Search Tool
32/33
32
8/3/2019 Basic Local Alignment and Search Tool
33/33
33
Top Related