CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence...
-
Upload
brian-phillips -
Category
Documents
-
view
217 -
download
0
Transcript of CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence...
CS 177 Hands-on lab with databases
Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises
Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises
Quiz #1
Homework #1 Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises
Al-Bawardy, Rasha F. 13Antonio, Dion 13Berro, Reem G. 14Chien, Yu Fung 11Dharker, Nachiket S. 12Eunkyung, An 13Gansberger, Kristen M.Gupta, Madhur V. 12Hand, Damon 12Hua, Dong 13Karim, Halima R. 11Kebede, Mikael 11Koyama, Kaori 9Kwak, Yoon I. 13Marwin, Victor M. 5Mody, Manali 10Moorjani, Priya G. 14Qukub, Dunia 12Ryan, Caitlyn E. 14Williams, Bernadette 10Yahan, Lin 12Yawo, Akrodou 6Zhou, Leming
14
The International Nucleotide SequenceThe International Nucleotide SequenceDatabase CollaborationDatabase Collaboration
EBI
GenBankGenBank
DDBJDDBJ
EMBLEMBL
EMBLEMBL
Entrez
SRS
getentry
NIGNIGCIB
NCBI
NIHNIH
•Submissions•Updates •Submissions
•Updates
•Submissions•Updates
SequinBankItftp
ATTGACTA
Primary vs. Derivative DatabasesPrimary vs. Derivative DatabasesACGTGC
TTGACA
CGTGAAT
TGACTA
TATAGCCG
ACGTG
C
ACGTGC
ACGTGC
TTGACA
TTGACA
TTGACA
CGTGA C
GTGA
CGTGA
ATTGACTA
ATTGACTA ATTGACTA
ATTGACTA
TATAGCCG
TATAGCCG
TATAGCCG
TATAGCCG
GenBank
TATAGCCG TATAGCCGTATAGCCGTATAGCCG
ATGA
CATT
GAGA
ATT
ATTCC GAGA
ATTCCGAGA
ATT
ATTCC GAGA
ATTCC
SequencingCenters
GAGA
ATTCC GAGA
ATTCC
UniGene
RefSeq
GenomeAssembly
Labs
Curators
Algorithms
TATAGCCGAGCTCCGATACCGATGACAA
The Entrez Databases
The (ever) Expanding Entrez System
Nucleotide
Protein
Structure
PubMed
PopSet
Genome
OMIM
Taxonomy
Books
ProbeSet
3D Domains
UniSTS
SNP
CDD
Entrez
UniGeneJournals
PubMedCentral
Genbank
Search and retrieval of sequences
Entrez is a retrieval system for searching several linked databases. It provides access to:PubMed; Nucleotide; Protein; Structure; Genome; PopSet; OMIM; Taxonomy and more.
BLAST® (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA.
Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises
BLAST selections
Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises
GenBank format
Fasta format
Sequence formats
ASN.1
DNAStrider
EMBL
Fitch
GCG
GenBank/GB
IG/Stanford
MSF
NBRF
Olsen
PAUP/NEXUS
Pearson/Fasta
Phylip
PIR/CODATA
Plain/Raw
Pretty
Zuker
- FASTA is a popular sequence format
- it also is a sequence similarity and homology search tool (similar to BLAST) used by EMBL-EBI
NOTE:
Convertible in ReadSeq (Web based)
http://bimas.dcrt.nih.gov/molbio/readseq/
http://www.hgmp.mrc.ac.uk/embnet.news/vol6_1/ForCon/forcon.html
or ForCon (stand-alone application)
Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises
2) Go to Entrez nucleotide. Find all sequences for the following terms:
neander
Neanderthals
Neanderthal
neanderthal
neanderthal*
Homo sapiens neanderthalensis
Lab exercises
1) How many sequences are available in GenBank for Neanderthals? Depends on your search strategy …
1
0
1
1
6
6
2) Go to Entrez taxonomy. Try to find all sequences for Neanderthals!
6
Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises
Lab exercises
4) How many nucleotide sequences are available for the house mouse Mus musculus? Try both Entrez nucleotides and Entrez taxonomy. How do you explain the difference? Entrez taxonomy
Entrez nucleotides
5) A man is found murdered in Yellowstone National Park. Few hairs of unidentified origin are recovered on the victim’s clothes. The samples arrive in the lab and DNA is isolated and sequenced:
CCATGCATATAAGCATGTACATAATATTATATTCTTACATAGGACATATTAACTCAATCTCATAATTCAT
Formulate a hypothesis regarding the origin of the recovered hairs and potential links with the killing!
Canis lupus (Gray Wolf)
5.403.701
5.458.506 (Mus musculus)
5.393.552 (house mouse)
5.458.527 (Mus musculsus OR house mouse)
Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises
The Poliovirus Problem
VOL 297, 9 August 2002
Cello, J; Paul, A.V. & Wimmer, E.:
Chemical Synthesis of Poliovirus cDNA: Generation of Infectious Virus in the Absence of Natural Template
- they generated about 7.7 kilobases of single-stranded RNA genome based on the know genetic map
- DNA fragments were synthesized from purified oligo- nucleotides (average length 69: bases)
- the cDNA was then transcribed into highly infectious RNA Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises
The Poliovirus Problem
17 July 2002
Weiss, R.:
Mail-Order Molecules Brew a Terrorism Debate
- mail-order oligonucleotides can be used to manufacture a deadly virus
- because they are so small, most oligos lack a “fingerprint”
- call for more control and/or institutional oversight
Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises
The Poliovirus Problem
- search in Genbank for nucleotide sequences of the poliovirus
- copy about 100 bp from a sequence of your choice and paste it into the search window of blastn, is the fragment identifiable as poliovirus?
- if so, do a blastn search with a 90 bp, 80 bp, 70 bp … fragment
- what is the length of the shortest fragment still identifiable as poliovirus?
- is this fragment shorter than the average length of 69 bp used to synthesize the poliovirus?
- do these oligos have a “fingerprint” (i.e. can ‘typical’ oligos with lengths of 20-50 be assigned to a particular organism)?
Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises
Are these oligos so small that they lack a “fingerprint” ??
Homework assignment lecture #4
Explain in your own words and in simple termsthe basics of the BLAST tool!
- assignment is due on 6 Oct 2003, 3:30 PM
- send your assignment as e-mail attachment to [email protected]
(type your name and the term “homework” in the subject line)
- maximum size: 500 words
Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises