EMBL ( European Molecular Biology Laboratory ) Nucleotide Sequence Database
Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide...
-
Upload
gillian-carter -
Category
Documents
-
view
217 -
download
1
Transcript of Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide...
![Page 1: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/1.jpg)
Access to sequences:GenBank – a place to start and then some more...
Links: embl nucleotide archive http://www.ebi.ac.uk/ena/ DNA data bank of Japan http://www.ddbj.nig.ac.jp/ GenBank http://www.ncbi.nlm.nih.gov/
![Page 2: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/2.jpg)
![Page 3: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/3.jpg)
contains wealth of many types of data
![Page 4: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/4.jpg)
…but the main part represent sequences (DNA, RNA, aa; short fragments, genomes…)
for the explained sample of GenBank sequence recordclick here
there is lots of categories and information, but you can view the sequencealso in much more streamlined form (called FASTA format):
>gi|1293613|gb|U49845.1|SCU49845 Saccharomyces cerevisiae TCP1-beta gene, partial cds; and Axl2p (AXL2) and Rev7p (REV7) genes, complete cdsGATCCTCCATATACAACGGTATCTCCACCTCAGGTTTAGATCTCAACAACGGAACCATTGCCGACATGAGACAGTTAGGTATCGTCGAGAGTTACAAGCTAAAACGAGCAGTAGTCAGCTCTGCATCTGAAGCCGCTGAAGTTCTACTAAGGGTGGATAACATCATCCGTGCAAGACCAAGAACCGCCAATAGACAACATATGTAACATATTTAGGATATACCTCGAAAATAATAAACCGCCACACTGTCATTATTATAATTAGAAACAGAACGCAAAAATTATCCACTATATAATTCAAAGACGCGAAAAAAAAAGAACAACGCGTCATAGAACTTTTGGCAATTCGCGTCACAAATAAATTTTGGCAACTTATGTTTCCTCTTCGAGCAGTACTCGAGCCCTGTCTCAAGAATGTAATAATACCCATCGTAGGTATGGTTAAAGATAGCATCTCCACAACCTCAAAGCTCCTTGCCGAGAGTCGCCCTCCTTTGTCGAGTAATTTTCACTTTTCATATGAGAACTTATTTTCTTATTCTTTACTCTCACATCCTGTAGTGATTGACACTGCAACAGCCACCATCACTAGAAGAACAGAACAATTACTTAATAGAAAAATTATATCTTCCTCGAAACGATTTCCTGCTTCCAACATCTACGTATATCAAGAAGCATTCACTTACCATGACACAGCTTCAGATTTCATTATTGCTGACAGCTACTATATCACTACTCCATCTAGTAGTGGCCACGCCCTATGAGGCATATCCTATCGGAAAACAATACCCCCCAGTGGCAAGAGTCAATGAATCGTTTACATTTCAAATTTCCAATGATACCTATAAATCGTCTGTAGACAAGACAGCTCAAATAACATACAATTGCTTCGACTTACCGAGCTGGCTTTCGTTTGACTCTAGTTCTAGAACGTTCTCAGGTGAACCTTCTTCTGACTTACTATCTGATGCGAACACCACGTTGTATTTCAATGTAATACTCGAGGGTACGGACTCTGCCGACAGCACGTCTTTGAACAATACATACCAATTTGTTGTTACAAACCGTCCATCCATCTCGCTATCGTCAGATTTCAATCTATTGGCGTTGTTAAAAAACTATGGTTATACTAACGGCAAAAACGCTCTGAAACTAGATCCTAATGAAGTCTTCAACGTGACTTTTGACCGTTCAATGTTCACTAACGAAGAATCCATTGTGTCGTATTACGGACGTTCTCAGTTGTATAATGCGCCGTTACCCAATTGGCTGTTCTTCGATTCTGGCGAGTTGAAGTTTACTGGGACGGCACCGGTGATAAACTCGGCGATTGCTCCAGAAACAAGCTACAGTTTTGTCATCATCGCTACAGACATTGAAGGATTTTCTGCCGTTGAGGTAGAATTCGAATTAGTCATCGGGGCTCACCAGTTAACTACCTCTATTCAAAATAGTTTGATAATCAACGTTACTGACACAGGTAACGTTTCATATGACTTACCTCTAAACTATGTTTATCTCGATGACGATCCTATTTCTTCTGATAAATTGGGTTCTATAAACTTATTGGATGCTCCAGACTGGGTGGCATTAGATAATGCTACCATTTCCGGGTCTGTCCCAGATGAATTACTCGGTAAGAACTCCAATCCTGCCAATTTTTCTGTGTCCATTTATGATACTTATGGTGATGTGATTTATTTCAACTTCGAAGTTGTCTCCACAACGGATTTGTTTGCCATTAGTTCTCTTCCCAATATTAACGCTACAAGGGGTGAATGGTTCTCCTACTATTTTTTGCCTTCTCAGTTTACAGACTACGTGAATACAAACGTTTCATTAGAGTTTACTAATTCAAGCCAAGACCATGACTGGGTGAAATTCCAATCATCTAATTTAACATTAGCTGGAGAAGTGCCCAAGAATTTCGACAAGCTTTCATTAGGTTTGAAAGCGAACCAAGGTTCACAATCTCAAGAGCTATATTTTAACATCATTGGCATGGATTCAAAGATAACTCACTCAAACCACAGTGCGAATGCAACGTCCACAAGAAGTTCTCACCACTCCACCTCAACAAGTTCTTACACATCTTCTACTTACACTGCAAAAATTTCTTCTACCTCCGCTGCTGCTACTTCTTCTGCTCCAGCAGCGCTGCCAGCAGCCAATAAAACTTCATCTCACAATAAAAAAGCAGTAGCAATTGCGTGCGGTGTTGCTATCCCATTAGGCGTTATCCTAGTAGCTCTCATTTGCTTCCTAATATTCTGGAGACGCAGAAGGGAAAATCCAGACGATGAAAACTTACCGCATGCTATTAGTGGACCTGATTTGAATAATCCTGCAAATAAACCAAATCAAGAAAACGCTACACCTTTGAACAACCCCTTTGATGATGATGCTTCCTCGTACGATGATACTTCAATAGCAAGAAGATTGGCTGCTTTGAACACTTTGAAATTGGATAACCACTCTGCCACTGAATCTGATATTTCCAGCGTGGATGAAAAGAGAGATTCTCTATCAGGTATGAATACATACAATGATCAGTTCCAATCCCAAAGTAAAGAAGAATTATTAGCAAAACCCCCAGTACAGCCTCCAGAGAGCCCGTTCTTTGACCCACAGAATAGGTCTTCTTCTGTGTATATGGATAGTGAACCAGCAGTAAATAAATCCTGGCGATATACTGGCAACCTGTCACCAGTCTCTGATATTGTCAGAGACAGTTACGGATCACAAAAAACTGTTGATACAGAAAAACTTTTCGATTTAGAAGCACCAGAGAAGGAAAAACGTACGTCAAGGGATGTCACTATGTCTTCACTGGACCCTTGGAACAGCAATATTAGCCCTTCTCCCGTAAGAAAATCAGTAACACCATCACCATATAACGTAACGAAGCATCGTAACCGCCACTTACAAAATATTCAAGACTCTCAAAGCGGTAAAAACGGAATCACTCCCACAACAATGTCAACTTCATCTTCTGACGATTTTGTTCCGGTTAAAGATGGTGAAAATTTTTGCTGGGTCCATAGCATGGAACCAGACAGAAGACCAAGTAAGAAAAGGTTAGTAGATTTTTCAAATAAGAGTAATGTCAATGTTGGTCAAGTTAAGGACATTCACGGACGCATCCCAGAAATGCTGTGATTATACGCAACGATATTTTGCTTAATTTTATTTTCCTGTTTTATTTTTTATTAGTGGTTTACAGATACCCTATATTTTATTTAGTTTTTATACTTAGAGACATTTAATTTTAATTCCATTCTTCAAATTTCATTTTTGCACTTAAAACAAAGATCCAAAAATGCTCTCGCCCTCTTCATATTGAGAATACACTCCATTCAAAATTTTGTCGTCACCGCTGATTAATTTTTCACTAAACTGATGAATAATCAAAGGCCCCACGTCAGAACCGACTAAAGAAGTGAGTTTTATTTTAGGAGGTTGAAAACCATTATTGTCTGGTAAATTTTCATCTTCTTGACATTTAACCCAGTTTGAATCCCTTTCAATTTCTGCTTTTTCCTCCAAACTATCGACCCTCCTGTTTCTGTCCAACTTATGTCCTAGTTCCAATTCGATCGCATTAATAACTGCTTCAAATGTTATTGTGTCATCGTTGACTTTAGGTAATTTCTCCAAATGCATAATCAAACTATTTAAGGAAGATCGGAATTCGTCGAACACTTCAGTTTCCGTAATGATCTGATCGTCTTTATCCACATGTTGTAATTCACTAAAATCTAAAACGTATTTTTCAATGCATAAATCGTTCTTTTTATTAATAATGCAGATGGAAAATCTGTAAACGTGCGTTAATTTAGAAAGAACATCCAGTATAAGTTCTTCTATATAGTCAATTAAAGCAGGATGCCTATTAATGGGAACGAACTGCGGCAAGTTGAATGACTGGTAAGTAGTGTAGTCGAATGACTGAGGTGGGTATACATTTCTATAAAATAAAATCAAATTAATGTAGCATTTTAAGTATACCCTCAGCCACTTCTCTACCCATCTATTCATAAAGCTGACGCAACGATTACTATTTTTTTTTTCTTCTTGGATCTCAGTCGTCGCAAAAACGTATACCTTCTTTTTCCGACCTTTTTTTTAGCTTTCTGGAAAAGTTTATATTAGTTAAACAGGGTCTAGTCTTAGTGTGAAAGCTAGTGGTTTCGATTGACTGATATTAAGAAAGTGGAAATTAAATTAGTAGTGTAGACGTATATGCATATGTATTTCTCGCCTGTTTATGTTTCTACGTACTTTTGATTTATAGCAAGGGGAAAAGAAATACATACTATTTTTTGGTAAAGGTGAAAGCATAATGTAAAAGCTAGAATAAAATGGACGAAATAAAGAGAGGCTTAGTTCATCTTTTTTCCAAAAAGCACCCAATGATAATAACTAAAATGAAAAGGATTTGCCATCTGTCAGCAACATCAGTTGTGTGAGCAATAATAAAATCATCACCTCCGTTGCCTTTAGCGCGTTTGTCGTTTGTATCTTCCGTAATTTTAGTCTTATCAATGGGAATCATAAATTTTCCAATGAATTAGCAATTTCGTCCAATTCTTTTTGAGCTTCTTCATATTTGCTTTGGAATTCTTCGCACTTCTTTTCCCATTCATCTCTTTCTTCTTCCAAAGCAACGATCCTTCTACCCATTTGCTCAGAGTTCAAATCGGCCTCTTTCAGTTTATCCATTGCTTCCTTCAGTTTGGCTTCACTGTCTTCTAGCTGTTGTTCTAGATCCTGGTTTTTCTTGGTGTAGTTCTCATTATTAGATCTCAAGTTATTGGAGTCTTCAGCCAATTGCTTTGTATCAGACAATTGACTCTCTAACTTCTCCACTTCACTGTCGAGTTGCTCGTTTTTAGCGGACAAAGATTTAATCTCGTTTTCTTTTTCAGTGTTAGATTGCTCTAATTCTTTGAGCTGTTCTCTCAGCTCCTCATATTTTTCTTGCCATGACTCAGATTCTAATTTTAAGCTATTCAATTTCTCTTTGATC
where first line introduced by ‘>’ represent the header, anything after firstline break is considered to be the sequence. Fasta (or Pearson’s) format is the most widely used sequence format in Bioinformatics!
![Page 5: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/5.jpg)
!but first, you have to find it!
![Page 6: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/6.jpg)
you can search by keyword(could be name, abbreviation...)
![Page 7: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/7.jpg)
... or unique identifier ‘Accesion number’
![Page 8: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/8.jpg)
... or first filter out all sequences of particular organism
![Page 9: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/9.jpg)
... and then use keyword
![Page 10: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/10.jpg)
check results you want to save, click ‘Display settings, ‘Apply’
![Page 11: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/11.jpg)
and copy results into any text editor
![Page 12: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/12.jpg)
or click ‘Send to’, set Format to Fasta and save to wherever you want to
This way, you can also download whole protein/nucleotide set of any particular taxonomic unit,or even the genomic sequence. Try to figure out how!
![Page 13: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/13.jpg)
... you can also search by similarity/homology using BLAST
![Page 14: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/14.jpg)
• set of sequence comparison algorithms (1990)• search sequence databases for optimal local alignments to a query• Heuristic approach based on Smith Waterman algorithm• Finds best local alignments• Provides statistical significance• www, standalone, and network clients
The BLAST programs (Basic Local Alignment Search Tools)
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) “Basic local alignment search tool.” J. Mol. Biol. 215:403-410.
Altschul SF, Madden TL, Schaeffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.” NAR 25:3389-3402.
BLAST+
![Page 15: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/15.jpg)
1) Choose the sequence (query)
2) Select the BLAST program
3) Choose the database to search
4) Choose optional parameters
The BLAST programs (Basic Local Alignment Search Tools)
![Page 16: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/16.jpg)
Program
Description
blastp Compares an amino acid query sequence against a protein sequence database.
blastn Compares a nucleotide query sequence against a nucleotide sequence database.
blastx
Compares a nucleotide query sequence translated in all reading frames against a protein sequence
database. You could use this option to find potential translation products of an unknown nucleotide
sequence.
tblastnCompares a protein query sequence against a
nucleotide sequence database dynamically translated in all reading frames.
tblastxCompares the six-frame translations of a nucleotide query sequence against the six-frame translations of
a nucleotide sequence database.
Program
Description
blastp Compares an amino acid query sequence against a protein sequence database.
blastn Compares a nucleotide query sequence against a nucleotide sequence database.
blastx
Compares a nucleotide query sequence translated in all reading frames against a protein sequence
database. You could use this option to find potential translation products of an unknown nucleotide
sequence.
tblastnCompares a protein query sequence against a
nucleotide sequence database dynamically translated in all reading frames.
tblastxCompares the six-frame translations of a nucleotide query sequence against the six-frame translations of
a nucleotide sequence database.
The BLAST programs: Select the BLAST program
![Page 17: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/17.jpg)
Program Notes
Megablast
Contiguous Nearly identical sequences
Discontiguous
Cross-species comparison
Position Specific
PSI-BLASTAutomatically generates a
position specific score matrix (PSSM)
RPS-BLAST Searches a database of PSI-BLAST PSSMs
Program Notes
Megablast
Contiguous Nearly identical sequences
Discontiguous
Cross-species comparison
Position Specific
PSI-BLASTAutomatically generates a
position specific score matrix (PSSM)
RPS-BLAST Searches a database of PSI-BLAST PSSMs
nucleotide only protein only
The BLAST programs: Select the BLAST program
![Page 18: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/18.jpg)
first choose appropriate database/algorithm, i.e. if you have aa sequence and you are after proteins, use blastp (protein blast), if you’re looking for coding sequence, use tblastn (translated blast) etc...
![Page 19: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/19.jpg)
paste your query sequence or acc. # here
sometimes it’s handy to zoom in the search for specific group
![Page 20: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/20.jpg)
How does it work?BLAST Algorithm in layers
“The central idea of the BLAST algorithm is to confine attention to segment pairs that contain a word pair of length w with a score of at least T.” Altschul et al. (1990)
Three heuristic layers: seeding, extension, and evaluation
• Seeding – identify where to start alignment
• Extension – extending alignment from seeds
• Evaluation – Determine which alignments are statistically significant
![Page 21: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/21.jpg)
BLAST Algorithm: Seeding
compile a list of word pairs (w=3)above threshold T
Example: for a human RBP query…FSGTWYA… (query word is in red)
A list of words (w=3) is:FSG SGT GTW TWY WYAYSG TGT ATW SWY WFAFTG SVT GSW TWF WYS
BLAST locates all common words in a pair of sequences, then uses them as seeds for the alignment
Discriminating between real and artificial matches is done using an estimate of probability that the match might occur by chance.
scores (S) and e-values (E) of BLAST hits
word=defined number of letters
![Page 22: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/22.jpg)
BLAST Algorithm: Seeding: Score
score=alignment quality
![Page 23: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/23.jpg)
• Substitution matrices are used for amino acid alignments. – each possible residue substitution is given a score
• A simpler unitary matrix is used for DNA pairs (+1 for match, -2 mismatch)
6
BLAST Algorithm: Seeding: Scoring matrix
aa frequency, aa properties
![Page 24: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/24.jpg)
BLOSUM vs PAM
• BLOSUM 62 as the default in BLAST 2.0. - tailored for comparisons of moderately distant proteins, performs
well in detecting closer relationships. - search for distant relatives may be more sensitive with a different
matrix.
BLOSUM 45 BLOSUM 62 BLOSUM 90
PAM 250 PAM 160 PAM 100
More Divergent Less Divergent
PAM (Percent Accepted Mutation)- theoretical approach- based on assumptions of mutation probabilities
BLOSUM (BLOcks SUbstitution Matrix)- empirical- constructed from multiply aligned protein families- ungapped segments (blocks) clustered based on percent identity
BLAST Algorithm: Seeding: Scoring matrix
![Page 25: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/25.jpg)
BLAST Algorithm: Seeding: E value
• Low E-values suggest that sequences are homologous• Statistical significance depends on both the size of the alignments and the size
of the sequence database‣ Important consideration for comparing results across different searches‣ E-value increases as database gets bigger‣ E-value decreases as alignments get longer
Suggested BLAST Cutoffs
• For nucleotide based searches, one should look for hits with E-values of 10^-6 or less and sequence identity of 70% or more
• For protein based searches, one should look for hits with E-values of 10^-3 or less and sequence identity of 25% or more
e- value= significance of the alignment
The number of different alignments with scores equivalent to or better than S that are expected to occur in a database search by chance. The lower the E value, the more significant the score.
![Page 26: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/26.jpg)
when you manage to find a hit (i.e. a match between a “word” and a database entry), extend the hit in either direction.
Keep track of the score (use a scoring matrix)
Stop when the score drops below some cutoff.
KENFDKARFSGTWYAMAKKDPEG 50 RBP (query)
MKGLDIQKVAGTWYSLAMAASD. 44 lactoglobulin (hit)
Hit!extendextend
BLAST Algorithm: Extension and Evaluation
originally hits extended in either direction X refinement of BLAST: two independent hits required
![Page 27: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/27.jpg)
BLAST Algorithm: Extension and Evaluation
BLAST algorithm extends the initial “seed” hit into an HSP
HSP = high scoring segment pair = Local optimal alignment
![Page 28: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/28.jpg)
BLAST Algorithm: Extension and Evaluation
![Page 29: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/29.jpg)
![Page 30: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/30.jpg)
BLAST-related tools for genomic DNA
• MegaBLAST at NCBI
• BLAT (BLAST-like alignment tool). BLAT parses an entire genomic DNA database into words (11mers), then searches them against a query-a mirror image of the BLAST strategy
http://genome.ucsc.edu
• SSAHA at Ensembl uses a similar strategy as BLAThttp://www.ensembl.org
![Page 31: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/31.jpg)
it’ll even tell you, whether itfound any known domain
... or level of similarity
![Page 32: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/32.jpg)
scroll down to bottom...
the more the better
![Page 33: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/33.jpg)
check hits you want to save ... then click ‘Download’
![Page 34: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/34.jpg)
Access to sequenced data: Species and Taxa Specific Databases
https://genome.ucsc.edu/ENCODE/
http://www.genecards.org/
http://www.biobase-international.com/product/hgmd
![Page 35: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/35.jpg)
Comparative database of eukaryotic pathogens
![Page 36: Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649dce5503460f94ac1ce2/html5/thumbnails/36.jpg)
gene/metabolic pathway oriented databases