Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic...
-
date post
21-Dec-2015 -
Category
Documents
-
view
219 -
download
2
Transcript of Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic...
![Page 1: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/1.jpg)
Data
Sequences
and
Other Stuff
![Page 2: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/2.jpg)
Sequence Data
![Page 3: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/3.jpg)
Nucleic Acid and Protein Sequences
Sources of Genetic Sequences User GCG supplied databases
Flat File Oracle Relational Database
NCBI supplied databases Other databases
![Page 4: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/4.jpg)
Sequence Databases
Genbank EMBL DDBJ
NCBI PIR Swiss-Prot Swiss-Prot TrEMBL
![Page 5: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/5.jpg)
Genbank
Primary nucleic acid sequence database Maintained by NCBI
National Center for Biotechnology Information http://www.ncbi.nlm.nih.gov
Current Release 122, 2/2001 11,720,120,326 bases 10,896,781 sequences
![Page 6: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/6.jpg)
![Page 7: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/7.jpg)
Species 1995 1996 1997 1998 1999 2000 2001 Increase(since 1995)
Increase(12 months)
all: 16109 23119 32880 43516 61952 87751 95168 490% 40.9%
Viruses: 1845 2122 2678 2968 3573 4428 4857 163% 32.4%
Bacteria: 2939 3847 6091 8711 14322 22758 24878 746% 53.3%
Archaea: 162 235 385 555 1015 1709 1906 1076% 68.8%
Eukaryota: 10366 15901 22596 29926 41420 56961 61571 493% 37.4%
How Many Organisms Are In The Sequence Databases?(April 1, 2001)
![Page 8: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/8.jpg)
Other NCBI Databases
HTGS EST STS GSS RefSeq Unigene Genomic
![Page 9: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/9.jpg)
HTGS
High Throughput Genomic Sequences ‘Unfinished' DNA sequences generated by the high-
throughput sequencing centers Phase 0
Single-few pass reads of a single clone (not contigs) Phase 1
Unfinished, may be unordered, unoriented contigs, with gaps Phase 2
Unfinished, ordered, oriented contigs, with or without gaps Phase 3
Primary division (Genbank) Finished, no gaps (with or without annotations)
![Page 10: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/10.jpg)
EST
Expressed Sequence Tags “Single-pass" cDNA sequences Generally representative of the 3’ ends of
cDNAs More “full-length” ESTs now available
![Page 11: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/11.jpg)
STS
Sequence Tagged Sites Sequence and mapping data Short genomic landmark sequences
![Page 12: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/12.jpg)
GSS
Genome Survey Sequences Similar to the EST division, except that its
sequences are genomic in origin, rather than cDNA Random “single pass read” genome survey
sequences. Cosmid/BAC/YAC end sequences Exon trapped genomic sequences alu PCR sequences
![Page 13: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/13.jpg)
RefSeq
NCBI Reference Sequence project Provides reference sequence standards
for the naturally occurring molecules from chromosomes to mRNAs to proteins
Stable reference point for: mutation analysis gene expression studies polymorphism discovery
![Page 14: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/14.jpg)
RefSeq…
Curated RefSeq transcripts and proteins
Genome Annotation contigs, transcripts, and proteins
Complete Genomes genomes, chromosomes, and proteins
![Page 15: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/15.jpg)
Unigene
Experimental system for automatically partitioning GenBank sequences into a non-redundant set of gene-oriented clusters Each UniGene cluster contains sequences that
represent a unique gene, as well as related information such as the tissue types in which the gene has been expressed and map location.
Includes EST and cDNA sequences Includes human, rat, mouse, cow and zebrafish
![Page 16: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/16.jpg)
HomoloGene
Curated and calculated orthologs and homologs for genes represented in UniGene and LocusLink
Includes human, mouse, rat, zebrafish, cow and drosophila
![Page 17: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/17.jpg)
LocusLink
Provides a single query interface to curated sequence and descriptive information about genetic loci Nomenclature Aliases Sequence accessions Phenotypes EC numbers MIM numbers UniGene clusters Homology Map locations Web sites
![Page 18: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/18.jpg)
EMBL and DDBJ
European Molecular Biology Laboratory Hinxton, UK http://www.ebi.ac.uk/
DNA Data Bank of Japan Mishima, Japan http://www.ddbj.nig.ac.jp/
![Page 19: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/19.jpg)
Coordination with Genbank
Prevents duplication Genbank enters sequences from U.S.
journals and researchers EMBL handles European data DDBJ handles Asian data Data exchanged daily
![Page 20: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/20.jpg)
Sequence submissions
Sequences entered from journals Sequences submitted by individual
researchers BankIt
NCBI WWW Site Sequin
Multi-platform program
![Page 21: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/21.jpg)
Sequence Names
DO NOT rely on names to find particular sequences
Few conventions Organism
Hum: Human Mus: mouse Eco: E. coli Syn: synthetic
![Page 22: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/22.jpg)
Last Letter(s)
Sometimes gives useful information cg: Complete genome Viruses
![Page 23: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/23.jpg)
Other Letters
Specifies a particular sequence vsvcg
Vesicular stomatitis virus (Indiana serotype) complete genome
![Page 24: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/24.jpg)
EMBL File Names
Ec: E. coli Hs: Human
![Page 25: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/25.jpg)
Locus name
Names are short, fairly non-descriptive, and can change from one release to another vsvcg
The complete sequence for the virus VSV
Most “mnemonic” names already taken Genbank now using accession numbers
as locus names
![Page 26: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/26.jpg)
Accession Numbers
Each sequence submitted to a database is assigned a unique primary accession number
Accession numbers do not change If a sequence is merged with another, a new
accession number is assigned, and the original number becomes a secondary accession number
Accession numbers may include version numbers AO2428.2
![Page 27: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/27.jpg)
Accession Numbers
Using GCG to access sequences via their accession number
Data Library:Accession Number Flatfile - vi:JO2428 RDB - gcgnuc: JO2428
![Page 28: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/28.jpg)
The Sequence Record
Different for each database Locus (Name) Accession Number Keywords Description Properties References The Sequence
![Page 29: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/29.jpg)
analyze% typedata ge:humcftrm!!NA_SEQUENCE 1.0LOCUS HUMCFTRM 6129 bp mRNA PRI 15-DEC-1989DEFINITION Human cystic fibrosis mRNA, encoding a presumed transmembrane conductance regulator (CFTR).ACCESSION M28668NID g180331KEYWORDS cystic fibrosis; transmembrane conductance regulator.SOURCE Human, cDNA to mRNA. ORGANISM Homo sapiens Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo.REFERENCE 1 (bases 1 to 6129) AUTHORS Riordan,J.R., Rommens,J.M., Kerem,B., Alon,N., Rozmahel,R., Grzelczak,Z., Zielenski,J., Lok,S., Plavsic,N., Chou,J.-L., Drumm,M.L., Iannuzzi,M.C., Collins,F.S. and Tsui,L.-C. TITLE Identification of the cystic fibrosis gene: Cloning and characterization of complementary DNA JOURNAL Science 245, 1066-1073 (1989) MEDLINE 89368940
![Page 30: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/30.jpg)
COMMENT A three base-pair deletion spanning positions 1654-1656 is observed in cDNAs from cystic fibrosis patients.FEATURES Location/Qualifiers source 1. .6129 /organism="Homo sapiens" /db_xref="taxon:9606" CDS 133. .4575 /note="cystic fibrosis transmembrane conductance regulator" /codon_start=1 /db_xref="PID:g180332" /translation="MQRSPLEKASVVSKLFFSWTRPILRKGYRQRLELSDIYQIPSVD SADNLSEKLEREWDRELASKKNPKLINALRRCFFWRFMFYGIFLYLGEVTKAVQPLLL LNRFSKDIAILDDLLPLTIFDFIQLLLIVIGAIAVVAVLQPYIFVATVPVIVAFIMLR AYFLQTSQQLKQLESEGRSPIFTHLVTSLKGLWTLRAFGRQPYFETLFHKALNLHTAN WFLYLSTLRWFQMRIEMIFVIFFIAVTFISILTTGEGEGRVGIILTLAMNIMSTLQWA VNSSIDVDSLMRSVSRVFKFIDMPTEGKPTKSTKPYKNGQLSKVMIIENSHVKKDDIW PSGGQMTVKDLTAKYTEGGNAILENISFSISPGQRVGLLGRTGSGKSTLLSAFLRLLN TEGEIQIDGVSWDSITLQQWRKAFGVIPQKVFIFSGTFRKNLDPYEQWSDQEIWKVAD EVGLRSVIEQFPGKLDFVLVDGGCVLSHGHKQLMCLARSVLSKAKILLLDEPSAHLDP VTYQIIRRTLKQAFADCTVILCEHRIEAMLECQQFLVIEENKVRQYDSIQKLLNERSL FRQAISPSDRVKLFPHRNSSKCKSKPQIAALKEETEEEVQDTRL"BASE COUNT 1886 a 1181 c 1330 g 1732 tORIGIN
![Page 31: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/31.jpg)
HUMCFTRM Length: 6129 April 13, 1998 13:00 Type: N Check: 6781 .. 1 AATTGGAAGC AAATGACATC ACAGCAGGTC AGAGAAAAAG GGTTGAGCGG 51 CAGGCACCCA GAGTAGTAGG TCTTTGGCAT TAGGAGCTTG AGCCCAGACG 101 GCCCTAGCAG GGACCCCAGC GCCCGAGAGA CCATGCAGAG GTCGCCTCTG 151 GAAAAGGCCA GCGTTGTCTC CAAACTTTTT TTCAGCTGGA CCAGACCAAT 201 TTTGAGGAAA GGATACAGAC AGCGCCTGGA ATTGTCAGAC ATATACCAAA 251 TCCCTTCTGT TGATTCTGCT GACAATCTAT CTGAAAAATT GGAAAGAGAA 301 TGGGATAGAG AGCTGGCTTC AAAGAAAAAT CCTAAACTCA TTAATGCCCT 351 TCGGCGATGT TTTTTCTGGA GATTTATGTT CTATGGAATC TTTTTATATT 401 TAGGGGAAGT CACCAAAGCA GTACAGCCTC TCTTACTGGG AAGAATCATA 451 GCTTCCTATG ACCCGGATAA CAAGGAGGAA CGCTCTATCG CGATTTATCT
![Page 32: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/32.jpg)
analyze% typedata -ref GB_PR:HUMIFNRF1A
!!NA_SEQUENCE 1.0LOCUS HUMIFNRF1A 7721 bp DNA PRI 10-NOV-1992DEFINITION Homo sapiens interferon regulatory factor 1 gene, complete cds.ACCESSION L05072NID g184648KEYWORDS interferon regulatory factor 1.SOURCE Homo sapiens Placenta DNA. ORGANISM Homo sapiens Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo.REFERENCE 1 (bases 1 to 7721) AUTHORS Cha,Y., Sims,S.H., Romine,M.F., Kaufmann,M. and Deisseroth,A.B. TITLE Human interferon regulatory factor 1: intron/exon organization JOURNAL DNA Cell Biol. 11, 605-611 (1992) MEDLINE 93000481
![Page 33: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/33.jpg)
FEATURES Location/Qualifiers source 1. .7721 /organism="Homo sapiens" /db_xref="taxon:9606" /tissue_type="Placenta" /map="5q23-q31" exon 1. .219 /gene="IRF1" /note="putative" /number=1 5'UTR join(1. .219,1279. .1287) /gene="IRF1" gene join(1. .219,1279. .1287) /gene="IRF1" intron 220. .1278 /gene="IRF1" /number=1 exon 1279. .1374 /gene="IRF1" /number=2 CDS join(1288. .1374,2738. .2837,3630. .3806,3916. .3965, 4073. .4202,4386. .4508,5040. .5089,6248. .6383,6670. .6794) /gene="IRF1" /codon_start=1 /product="interferon regulatory factor 1" /db_xref="PID:g184649" /translation="MPITRMRMRPWLEMQINSNQIPGLIWINKEEMIFQIPWKHAAKH GWDINKDACLFRSWAIHTGRYKAGEKEPDPKTWKANFRCAMNSLPDIEEVKDQSRNKG SSAVRVYRMLPPLTKNQRKERKSKSSRDAKSKAKRKSCGDSSPDTFSDGLSSSTLPDD HSSYTVPGYMQDLEVEQALTPALSPCAVSSTLPDWHIPVEVVPDSTSDLYNFQVSPMP STSEATTDEDEEGKLPEDIMKLLEQSEWQPTNVDGKGYLLNEPGVQPTSVYGDFSCKE EPEIDSPGGDIGLSLQRVFTDLKNMDATWLDSLLTPVRLPSIQAIPCAP"
![Page 34: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/34.jpg)
intron 1375. .2737 /gene="IRF1" /number=2 exon 2738. .2837 /gene="IRF1" /number=3 intron 2838. .3629 /gene="IRF1" /number=3 exon 3630. .3806 /gene="IRF1" /number=4 intron 3807. .3915 /gene="IRF1" /number=4 exon 3916. .3965 /gene="IRF1" /number=5 intron 3966. .4072 /gene="IRF1" /number=5
...
exon 5040. .5089 /gene="IRF1" /number=8 intron 5090. .6247 /gene="IRF1" /number=8 exon 6248. .6383 /gene="IRF1" /number=9 intron 6384. .6669 /gene="IRF1" /number=9 exon 6670. .7656 /gene="IRF1" /number=10 3'UTR 6795. .7656BASE COUNT 1750 a 1946 c 2253 g 1772 tORIGIN
![Page 35: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/35.jpg)
analyze% typedata -ref est:hum091226f!!NA_SEQUENCE 1.0LOCUS HUM091226F 152 bp mRNA EST 02-APR-1996DEFINITION Homo sapiens retinal fovea EST HFV091226 sequence.ACCESSION L48850NID g1254959KEYWORDS EST; expressed sequence tag.SOURCE Homo sapiens (clone: EST HFV091226) age normalized retinal foveae cDNA to mRNA. ORGANISM Homo sapiens Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo.REFERENCE 1 (sites) AUTHORS Adams,M.D., Kerlavage,A.R., Fields,C. and Venter,J.C. TITLE 3,400 new expressed sequence tags identify diversity of transcripts in human brain JOURNAL Nature Genet. 4 (3), 256-267 (1993) MEDLINE 93364420REFERENCE 2 (sites) AUTHORS Liew,C.C., Hwang,D.M., Fung,Y.W., Laurenssen,C., Cukerman,E., Tsui,S. and Lee,C.Y. TITLE A catalogue of genes in the cardiovascular system as identified by expressed sequence tags JOURNAL Proc. Natl. Acad. Sci. U.S.A. 91 (22), 10645-10649 (1994) MEDLINE 95024171REFERENCE 3 (bases 1 to 152) AUTHORS Bernstein,S.L., Borst,D.E., Neuder,M.E. and Wong,P. TITLE Characterization of a human fovea cDNA library and regional differential gene expression in the human retina JOURNAL Genomics 32 (3), 301-308 (1996)
![Page 36: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/36.jpg)
FEATURES Location/Qualifiers source 1. .152 /organism="Homo sapiens" /note="Expressed sequence tags (first pass sequencing) from randomly selected bacteriophage clones (mRNA-cDNA) from human retinal fovea. The library is age normalized from ten sets of donor foveae 2-79 years old. /db_xref="taxon:9606" /clone="EST HFV091226" /dev_stage="age normalized" /tissue_type="retinal foveae" mRNA <1. .>152 /standard_name="EST HFV091226"BASE COUNT 31 a 42 c 41 g 36 t 2 othersORIGIN
![Page 37: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/37.jpg)
analyze% typedata -ref sts:humswx153!!NA_SEQUENCE 1.0LOCUS HUMSWX153 192 bp DNA STS 24-MAY-1993DEFINITION Human chromosome X STS sWXD153; single read.ACCESSION L15212NID g292645KEYWORDS STS; primer; sequence tagged site.SOURCE Homo sapiens DNA. ORGANISM Homo sapiens Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo.REFERENCE 1 (bases 1 to 192) AUTHORS Kere,J., Nagaraja,R., Mumm,S.R., Ciccodicola,A., D'Urso,M. and Schlessinger,D. TITLE Mapping human chromosomes by walking with sequence-tagged sites from end fragments of yeast artificial chromosome inserts JOURNAL Genomics 14, 241-248 (1992) MEDLINE 93052321
![Page 38: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/38.jpg)
COMMENT Submitted by: David Schlessinger, Center for Genetics in Medicine, Washington University School of Medicine, Box 8232 4566 Scott Avenue, St. Louis, MO 63110, USA e-mail: [email protected] Primer A: TAAAGGGATCGCCAAGGAC Primer B: CTTACTCATTTGCTGGATTCTC STS size: 85bp Template: 600 ng/100ul Primer: 40 pmoles/100ul dNTPs: 100 uM MgCl2: 1.5 mM KCl: 100 mM TrisHCl: 10 mM Taq Polymerase: 0.125 U NH4Cl: 5 mM pH: 8.6 Total Vol: 5 ul PCR Profile: Denaturation: 94 degrees C for 1.00 minute(s) Annealing: 55 degrees C for 2.00 minute(s) Polymerization: 72 degrees C for 2.00 minute(s) PCR Cycles: 35 Thermal Cycler: P-E.
![Page 39: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/39.jpg)
FEATURES Location/Qualifiers source 1. .192 /organism="Homo sapiens" /db_xref="taxon:9606" /map="Xq13-q24" STS 60. .144 /standard_name="sWXD153" primer_bind 60. .78 primer_bind complement(123. .144)BASE COUNT 72 a 26 c 60 g 29 t 5 othersORIGINanalyze%
![Page 40: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/40.jpg)
Swiss-Prot
http://www.expasy.ch/sprot/ Protein Database University of Geneva Arranged by protein function Release 39.15 March 19, 2001 94,152 entries Provides annotated protein records
![Page 41: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/41.jpg)
Swiss-Prot Names
Protein_Species Allows easier comparisons when studying
evolutionary relationships H1b_Human
Human histone 1b
![Page 42: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/42.jpg)
Swiss-Prot Names
Vgl*_* Viral glycoproteins
VGLG_HRSVL Viral GLycoprotein G Human Respiratory Syncytial Virus Long
strain
![Page 43: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/43.jpg)
analyze% typedata swp:H1b_Human
!!AA_SEQUENCE 1.0ID H1B_HUMAN STANDARD; PRT; 218 AA.AC P10412;DT 01-MAR-1989 (REL. 10, CREATED)DT 01-MAR-1989 (REL. 10, LAST SEQUENCE UPDATE)DT 01-JUN-1994 (REL. 29, LAST ANNOTATION UPDATE)DE HISTONE H1B (H1.4).GN H1F4.OS HOMO SAPIENS (HUMAN).OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; TETRAPODA; MAMMALIA;OC EUTHERIA; PRIMATES.RN [1]RP SEQUENCE FROM N.A.RX MEDLINE; 92009931.RA ALBIG W., KARDALINOU E., DRABENT B., ZIMMER A., DOENECKE D.;RL GENOMICS 10:940-948(1991).RN [2]RP SEQUENCE.RC TISSUE=SPLEEN;RX MEDLINE; 87057092.RA OHE Y., HAYASHI H., IWAI K.;RL J. BIOCHEM. 100:359-368(1986).
![Page 44: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/44.jpg)
CC -!- FUNCTION: HISTONES H1 ARE NECESSARY FOR THE CONDENSATION OFCC NUCLEOSOME CHAINS INTO HIGHER ORDER STRUCTURES.CC -!- SUBCELLULAR LOCATION: NUCLEAR.CC -!- THIS VARIANT ACCOUNTS FOR 60% OF HISTONE H1.DR EMBL; M60748; G184074; -.DR PIR; A24413; HSHU1B.DR PIR; C40335; C40335.DR HSSP; P08287; 1GHC.KW CHROMOSOMAL PROTEIN; NUCLEAR PROTEIN; DNA-BINDING; MULTIGENE FAMILY;KW ACETYLATION; METHYLATION.FT INIT_MET 0 0FT MOD_RES 1 1 ACETYLATION.FT MOD_RES 25 25 METHYLATION (PARTIAL).FT DOMAIN 35 113 GLOBULAR.SQ SEQUENCE 218 AA; 21734 MW; 5A277FB0 CRC32;
![Page 45: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/45.jpg)
H1B_HUMAN Length: 218 April 13, 1998 13:19 Type: P Check: 2701 .. 1 SETAPAAPAA PAPAEKTPVK KKARKSAGAA KRKASGPPVS ELITKAVAAS 51 KERSGVSLAA LKKALAAAGY DVEKNNSRIK LGLKSLVSKG TLVQTKGTGA 101 SGSFKLNKKA ASGEAKPKAK KAGAAKAKKP AGAAKKPKKA TGAATPKKSA 151 KKTPKKAKKP AAAAGAKKAK SPKKAKAAKP KKAPKSPAKA KAVKPKAAKP 201 KTAKPKAAKP KKAAAKKK analyze%
![Page 46: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/46.jpg)
Swiss-Prot TrEMBL
Translation of all EMBL Nucleic Acid coding sequences not yet present in Swiss-Prot
Allows rapid availability without immediate annotation
Release 16.3 March 30, 2001 436,896 entries
![Page 47: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/47.jpg)
TrEMBL Divisions
Everything in TrEMBL: spt sp_bacteria sp_fungi sp_human sp_invertebrate sp_mammal sp_mhc sp_organelle sp_phage sp_plant sp_rodent sp_unclassified sp_vertebrate
![Page 48: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/48.jpg)
Protein Identification Resource - PIR
http://pir.georgetown.edu/ National Biomedical Research Foundation Georgetown University Current Release 67.05 March 23, 2001 219,178 Entries
![Page 49: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/49.jpg)
National Biomedical Research Foundation
Database begun over twenty years ago by Margaret O. Dayhoff
Originally published sequences in book form
Started with sequences derived from direct amino acid sequencing
![Page 50: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/50.jpg)
analyze% typedata -ref PIR1:HSHU1B
!!AA_SEQUENCE 1.0P1;HSHU1B - histone H1-4 - humanN;Alternate names: histone H1.4; histone H1bC;Species: Homo sapiens (man)C;Date: 31-Dec-1988 #sequence_revision 12-Apr-1996 #text_change 05-Sep-1997C;Accession: C40335; A24413R;Albig, W.; Kardalinou, E.; Drabent, B.; Zimmer, A.; Doenecke, D.Genomics 10, 940-948, 1991A;Title: Isolation and characterization of two human H1 histone genes within clusters of core histone genes.A;Reference number: A40335; MUID:92009931A;Accession: C40335A;Status: preliminaryA;Molecule type: DNAA;Residues: 1-219 <ALB>A;Cross-references: GB:M60748; NID:g184073; PID:g184074A;Experimental source: bloodR;Ohe, Y.; Hayashi, H.; Iwai, K.J. Biochem. 100, 359-368, 1986A;Title: Human spleen histone H1. Isolation and amino acid sequence of a main variant, H1b.A;Reference number: A24413; MUID:87057092A;Accession: A24413A;Molecule type: proteinA;Residues: 2-219 <OHE>A;Experimental source: spleen
![Page 51: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/51.jpg)
C;Comment: This variant accounts for 60% of histone H1.C;Genetics:A;Gene: GDB:H1F4A;Cross-references: GDB:120030; OMIM:142220A;Map position: 12q11-12q21C;Superfamily: histone H1C;Keywords: acetylated amino end; chromosomal protein; DNA binding; methylated amino acid; nucleosome; spleenF;2-219/Product: histone H1-4 #status experimental <MAT>F;2-32/Domain: amino-terminal <NH2>F;33-110/Domain: globular <GLB>F;111-219/Domain: carboxyl-terminal <END>F;2/Modified site: acetylated amino end (Ser) (in mature form) #status experimentalF;26/Modified site: N6-methyllysine (Lys) (partial) #status experimental
![Page 52: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/52.jpg)
iProClass Database - PIR
http://pir.georgetown.edu/iproclass/ Comprehensive family relationships and
structural/functional classifications and features of proteins Superfamilies Families Domains
![Page 53: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/53.jpg)
GCG Supplied Databases
GCG sequence database files are NOT normal UNIX files. UNIX commands cannot be used to
manipulate sequences in these databases Stored as Data Libraries Stored in Oracle RDB
![Page 54: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/54.jpg)
Sequence Data Updates
Genbank Daily
GCG Flat file No longer updated Last update June, 2000
GCG SeqStore Oracle RDB Daily updates
![Page 55: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/55.jpg)
Database listing – GCG-FF
Databases available:
GenBank Release 118.0 (06/2000)
EMBL (Abridged) Release 62.0 (03/2000)
PIR-Protein Release 65.0 (06/2000)
NRL_3D Release 27.0 (03/2000)
SWISS-PROT Release 39.0 (06/2000)
SP-TREMBL Release 14.0 (06/2000)
PROSITE Release 16.0 (07/1999)
Restriction Enzymes (REBASE) (06/2000)
![Page 56: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/56.jpg)
Database listing – SeqStore
Databases available:
GCGNUC updated nightly by DATASERVE
GCGPROT updated weekly by DATASERVE
GCGEST updated nightly by DATASERVE
PROSITE Release 15.0 (07/1999)
Restriction Enzymes (REBASE) (06/2000)
![Page 57: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/57.jpg)
Data Libraries
Allows rapid searches Sequences organized into groups Each data library can be referred to by a
logical name Individual sequences can be extracted
from the data library.
![Page 58: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/58.jpg)
Logical Names:GCG Sequence Databases
http://www.microbio.uab.edu/seqCourse/datalib.htm
![Page 59: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/59.jpg)
GCG SeqStore (Oracle-based Sequences)
Data Library Names
![Page 60: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/60.jpg)
Database Name DescriptionNucleic Acid Sequences
gcgnuc All Genbank nucleotide sequences (except ESTs) updated nightly by SeqStore
gcgest All Genbank Expressed Sequence Tags updated nightly by SeqStore
Protein Sequences
gcgprot All Swissprot and Swissprot TrEMBL sequences updated nightly by SeqStore
![Page 61: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/61.jpg)
GCG Flat-file
Data Library Names
![Page 62: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/62.jpg)
Nucleic Acid Databases (Genbank and EMBL)
Database Name(s) DescriptionGenEMBL, GE Entire database (except tags)
genemblplus gep geplus Entire database (including tags)
Bacterial, Bacteria, Ba Bacterial sequences
HTG High throughput genome
Invertebrate, In Invertebrate sequence
Organelle, Or Organelle sequences
Other_Mammalian, OtherMammal, OtherMamm, Om
non-rodent, non-primate Mammalian sequences
Other_Vertebrate, Ov, OtherVertebrate, OtherVert
non-mammalian Vertebrate sequences
![Page 63: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/63.jpg)
Nucleic Acid Databases…
Database Name(s) DescriptionPatent, Pat Sequences from patents and
patent applications
Phage, Ph Phage sequences
Plant, Pl Plant and Fungal sequences
Primate, Pr Primate (Mammalian) sequences
Rodent, Ro Rodent (Mammalian) sequences
Structural_RNA, Structural St Structural RNA sequences (such as rRNAs)
Synthetic, Sy Synthetic sequences
Unannotated, Un Unannotated sequences
Viral, Vi Viral sequences
![Page 64: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/64.jpg)
Sequence Tag Databases
Database Name(s) DescriptionEST Expressed sequence tags
GSS Genome survey sequences
STS Sequence-tagged site sequences
Tags EST, STS, and GSS
![Page 65: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/65.jpg)
Protein Databases
Database Name(s) DescriptionPIR,P Entire PIR-Protein Protein
Sequence Data Library
Protein, Prot, PIR1 PIR-Protein annotated sequences
New, Nw PIR-Protein preliminary and unverified sequences
PIR2 PIR-Protein preliminary sequences
PIR3 PIR-Protein unverified sequences
SwissProt, Swiss Entire SwissProt Protein Sequence Data Library
Sptrembl, spt Newly added preliminary sequences, translated from EMBL
swissprotplus swplus swp SwissProt + SPTrEMBL
![Page 66: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/66.jpg)
NCBI Blast Databases
![Page 67: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/67.jpg)
Nucleotide Databases for NetBlast Searching nr Non-redundant Genbank+EMBL+DDBJ+PDB sequences
(but no EST's or STS's)
pdb PDB nucleotide sequences
vector Vector subset of Genbank
yeast Saccharomyces cerevisiae genomic nucleotide sequences
est Non-redundant Database of Genbank+EMBL+DDBJ EST Division
sts Non-redundant Database of Genbank+EMBL+DDBJ STS Division
htgs High Throughput Genomic Sequences
mito Database of mitochondrial sequences, Rel. 1.0, July 1995
kabat Kabat Sequences of Nucleic Acid of Immunological Interest
epd Eukaryotic Promotor Database
alu Select Alu Repeats from REPBASE
gss Genome Survey Sequence, includes single_pass genomic data
ecoli E. coli genomic nucleotide sequences
Drosophila genome Drosophila genome provided by Celera and Berkeley
month All new or revised Genbank+EMBL+DDBJ+PDB sequences released in the last 30 days
![Page 68: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/68.jpg)
Protein Databases for NetBlast Searchingnr Non-redundant Genbank CDS
translations+PDB+SwissProt+PIR
pdb PDB protein sequences
swissprot SwissProt sequences
yeast Saccharomyces cerevisiae protein sequences
kabat Kabat Sequences of Proteins of Immunological Interest
alu Translations of Select Alu Repeats from REPBASE
ecoli E. coli genomic CDS translations
Drosophila genome Drosophila genome proteins provided by Celera and Berkeley
month All new or revised Genbank CDS translation+PDB+SwissProt+PIR sequences released in the last 30 days
![Page 69: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/69.jpg)
Specifying Sequences
Filename Data library specification Accession number specification
![Page 70: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/70.jpg)
Sequences within your own directories
Use the normal file specification:
lefkowit/sequences/vsvcg.seq
![Page 71: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/71.jpg)
Sequences within a Data Library
Flatfile Data Library:Sequence Name sw:vglg_vsvsj - VSV G protein in the
SwissProt library primate:humada
The sequence for human adenosine deaminase mRNA
SeqStore gcgprot:vglg_vsvsj gcgnuc:humada
![Page 72: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/72.jpg)
Sequence Formats
GCG requires a specific sequence format Sequences entered from outside GCG
must be reformatted analyze% reformat
GCG program analyze% readseq
Non-GCG addition
![Page 73: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/73.jpg)
Non-GCG Sequence File
analyze% cat seq.txt
ACGAAGACAAACAAACCATTATTATCATTAAAAGGCTC
AGGAGAAACTTTAACAGTAATCAAAATGTCTGTTACAG
TCAAGAGAATCATTGACAACACAG
analyze%
![Page 74: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/74.jpg)
analyze% reformat
analyze% reformat -check seq.txt
Reformat rewrites sequence file(s), scoring matrix file(s), or enzyme
data file(s) so that they can be read by GCG programs.
Minimal Syntax: % reformat [-INfile=]reformat.txt -Default
Prompted Parameters: None
Local Data Files:
-DATa=translate.txt three-letter to one-letter codes
![Page 75: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/75.jpg)
Optional Parameters: [-OUTfile=]NewSeqName names the output file-EXTension=.seq specifies a file name extension for the output-LIStfile[=reformat.list] writes a list file of output sequence names-MSF reformats sequences into an MSF output file-RSF reformats sequences into an RSF output file-PROtein or -NUCleotide insists that the sequences are reformatted as protein or nucleotide sequences-DEGap removes gap characters (. and ~) from the sequence-LINesize=50 sets number of characters per line-BLOcksize=10 sets number of characters per block-BLAnklines=1 puts blank lines between the sequence lines-NONUMbering suppresses numbering-NOCOMments suppresses comments-DNA changes U into T-RNA changes T into U-UPPer makes all sequence characters uppercase-LOWer makes all sequence characters lowercase-ONEIntothree translates one-letter peptides into three-letter-THReeintoone translates three-letter peptides into one-letter-NOHEAding input sequence from stdin contains no header information
![Page 76: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/76.jpg)
-COMparison reformats a scoring matrix instead of a sequence (used with -PROtein or -NUCleotide, insists that the matrix is reformatted as a protein or nucleotide scoring matrix)-GAPweight=12 specifies the gap creation penalty associated with the scoring matrix-LENgthweight=4 specified the gap extension penalty associated with the scoring matrix-SCAle=10 multiplies each value in the scoring matrix by 10 (use any number from .01 to 100.0)-EQUALSformat writes the scoring matrix in a form that may be more easily read-OLDCMPformat converts a pre-Version 9 scoring matrix into a Version 9 scoring matrix (all options used with -COMparison can also be used with -OLDCMPformat. -PROtein or -NUCleotide must be specified with -OLDCMPformat-TRANSlate=filename.txt lets you name the translation table-NOMONitor suppresses the screen trace showing each output file Add what to the command line ?
No ".." divider seq.txt length: 100 bpanalyze%
![Page 77: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/77.jpg)
analyze% cat seq.txt'!!NA_SEQUENCE 1.0 REFORMAT of: seq.txt check: 3430 from: 1 to: 100 April 9, 1998 14:31 (No documentation) seq.txt Length: 100 April 9, 1998 14:31 Type: N Check: 3430 .. 1 ACGAAGACAA ACAAACCATT ATTATCATTA AAAGGCTCAG GAGAAACTTT 51 AACAGTAATC AAAATGTCTG TTACAGTCAA GAGAATCATT GACAACACAG analyze%
Reformatted Sequence
![Page 78: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/78.jpg)
GCG Sequence Import Programs
fromstaden fromembl fromgenbank frompir fromig fromfasta fromtrace
![Page 79: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/79.jpg)
GCG Sequence Export Programs
tostaden topir toig tofasta
![Page 80: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/80.jpg)
ReadSeq
General reformatting program
![Page 81: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/81.jpg)
analyze% readseqanalyze% readseqreadSeq (1Feb93), multi-format molbio sequence reader. Name of output file (?=help, defaults to display):seq.fasta 1. IG/Stanford 10. Olsen (in-only) 2. GenBank/GB 11. Phylip3.2 3. NBRF 12. Phylip 4. EMBL 13. Plain/Raw 5. GCG 14. PIR/CODATA 6. DNAStrider 15. MSF 7. Fitch 16. ASN.1 8. Pearson/Fasta 17. PAUP/NEXUS 9. Zuker (in-only) 18. Pretty (out-only) Choose an output format (name or #):8
![Page 82: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/82.jpg)
Name an input sequence or -option:seq.txt Name an input sequence or -option:
analyze% cat seq.fasta>seq.txt, 100 bases, D66 checksum.ACGAAGACAAACAAACCATTATTATCATTAAAAGGCTCAGGAGAAACTTTAACAGTAATCAAAATGTCTGTTACAGTCAAGAGAATCATTGACAACACAGanalyze%
ReadSeq Formatted Sequence
![Page 83: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/83.jpg)
Sequence File Utilities
Chopup Break up long lines in a text file prior to
running reformat Breakup
Breakup long sequences into individual, overlapping sequence files
![Page 84: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/84.jpg)
>uunt, 751719 bases, 1F08 checksum.ATGGCTAATAATTATCAAACTTTATATGATTCAGCAATAAAAAGGATTCCATACGATCTTATTTCTGATCAAGCTTATGCAATTCTACAAAATGCTAAAACTCATAAAGTTTGCGATGGTGTTTTATATATAATTGTAGCCAATGCCTTTGAAAAAAGTATTATTAACGGTAATTTTATTAACATTATTTCTAAATATCTAAGCGAAGAATTCAAAAAGGAAAATATTGTTAATTTTGAATTTATTATAGACAATGAAAAATTATTAATTAATAGCAATTTTTTAATTAAAGAAACTAATATTAAAAATCGTTTTAATTTTAGTGATGAACTTTTACGTTACAATTTTAACAATTTAGTAATTAGTAATTTTAATCAAAAAGCGATTAAGGCGATTGAAAATTTATTTTCAAATAACTATGATAATAGTTCAATGTGTAACCCTTTATTTTTATTTGGTAAAGTTGGTGTTGGTAAAACGCATATCGTGGCTGCTGCTGGTAATCGTTTTGCTAATAGTAATCCTAATTTAAAAATTTATTATTATGAAGGGCAAGATTTTTTTCGAAAGTTTTGTTCTGCTTCGTTAAAAGGGACTAGTTATGTTGAAGAGTTTAAAAAAGAAATTGCTTCAGCAGATTTATTAATTTTTGAAGATATTCAAAATATCCAATCACGTGATTCAACGGCTGAATTGTTTTTTAATATCTTTAATGATATAAAATTAAATGGTGGAAAAATTATCTTAACATCTGACCGTACACCAAACGAACTTAATGGTTTTCATAATCGAATTATTTCGAGATTAGCGTCAGGTTTGCAGTGTAAAATTTCTCAACCCGACAAAAATGAAGCTATTAAAATTATTAATAATTGGTTTGAATTCAAAAAAAAATATCAAATTACTGACGAAGCTAAAGAATATATTGCTGAAGGTTTTCACACTGATATTAGACAGATGATtGGTAATCTAAAACAAATTTGTTTTTGAGCGGACAATGATACTAATAAAGATTTAATAATCACAAAAGATTATGTAATTGAGTGTTCAGTTGAAAACGAAATTCCACTAAATATTGTTGTTAAAAAACAATTTAAACC
![Page 85: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/85.jpg)
analyze% readseqreadSeq (1Feb93), multi-format molbio sequence reader. Name of output file (?=help, defaults to display):uunt.seq 1. IG/Stanford 10. Olsen (in-only) 2. GenBank/GB 11. Phylip3.2 3. NBRF 12. Phylip 4. EMBL 13. Plain/Raw 5. GCG 14. PIR/CODATA 6. DNAStrider 15. MSF 7. Fitch 16. ASN.1 8. Pearson/Fasta 17. PAUP/NEXUS 9. Zuker (in-only) 18. Pretty (out-only) Choose an output format (name or #):5 Name an input sequence or -option:uunt Name an input sequence or -option:
![Page 86: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/86.jpg)
analyze% more uunt.sequunt uunt, Length: 751719 (today) Check: 7944 .. 1 ATGGCTAATA ATTATCAAAC TTTATATGAT TCAGCAATAA AAAGGATTCC 51 ATACGATCTT ATTTCTGATC AAGCTTATGC AATTCTACAA AATGCTAAAA 101 CTCATAAAGT TTGCGATGGT GTTTTATATA TAATTGTAGC CAATGCCTTT 151 GAAAAAAGTA TTATTAACGG TAATTTTATT AACATTATTT CTAAATATCT 201 AAGCGAAGAA TTCAAAAAGG AAAATATTGT TAATTTTGAA TTTATTATAG 251 ACAATGAAAA ATTATTAATT AATAGCAATT TTTTAATTAA AGAAACTAAT 301 ATTAAAAATC GTTTTAATTT TAGTGATGAA CTTTTACGTT ACAATTTTAA 351 CAATTTAGTA ATTAGTAATT TTAATCAAAA AGCGATTAAG GCGATTGAAA 401 ATTTATTTTC AAATAACTAT GATAATAGTT CAATGTGTAA CCCTTTATTT 451 TTATTTGGTA AAGTTGGTGT TGGTAAAACG CATATCGTGG CTGCTGCTGG 501 TAATCGTTTT GCTAATAGTA ATCCTAATTT AAAAATTTAT TATTATGAAG 551 GGCAAGATTT TTTTCGAAAG TTTTGTTCTG CTTCGTTAAA AGGGACTAGT ...
751301 GAAAATAAAC TACGATTTGA TTAGAATGAA TTTTTTGTTG TTTCTTAATT 751351 GTATCAAGTA TATCTTCATT TTTTTTTAGA CTAATAAAAT TAGCCATAAA 751401 AATTATTTTT CACTAGAAAC TGTTAGACTA TGACGCCCTT TAAGTCTTCT 751451 TCTAGCTAAA ACATTACGCC CATTTTTTGT TTTCATGCGT GCACGAAAAC 751501 CATGCACTTT TGCTCTTTTA CGATTATTAG GTTGAAACGT TCTTTTCATA 751551 AATCCACCGC CCTCTTACTT TTTTGAAAAC ATAATATGGA TTATTATAAC 751601 ATTTTAGTTA TTTTTTATTT AATATATTTT TTTAAAAAAG TCAATGATAT 751651 CTTTTTAAAA ATAAACATAT ATAATATGAT AATAGGACAA AGATTATTTA 751701 TAAAAAATAG AGGTTACTA
![Page 87: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/87.jpg)
analyze% map uunt.seq Map maps a DNA sequence and displays both strands of the mapped sequencewith restriction enzyme cut points above the sequence and proteintranslations below. Map can also create a peptide map of an amino acidsequence. ***Error: Sequence "uunt.seq" could not be read or is not in GCG format
analyze% breakup uunt.seq BreakUp reads a GCG-format sequence file containing more than 350,000sequence characters and writes it as a set of separate, shorter,overlapping sequence files that can be analyzed by Wisconsin Package programs. uunt_0.seq length: 110000 bp uunt_1.seq length: 110000 bp uunt_2.seq length: 110000 bp uunt_3.seq length: 110000 bp uunt_4.seq length: 110000 bp uunt_5.seq length: 110000 bp uunt_6.seq length: 110000 bp uunt_7.seq length: 51719 bp analyze%
![Page 88: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/88.jpg)
Specifying Multiple Sequences
![Page 89: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/89.jpg)
Multiple sequences
If the program prompts with: sequences(s), file(s), or file name(s), then it can accept more than one input file
![Page 90: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/90.jpg)
Specifying Multiple Sequences
Wild Card Specification File of File Names
List Files Multiple Sequence Format File
![Page 91: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/91.jpg)
Wild card specification (flatfile)
GenEMBL:* All sequences in Genbank and EMBL
Primate:* All primate sequences in GenBank
Primate:Hum* All Human sequences in GenBank EMBL uses HS for human
![Page 92: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/92.jpg)
Wild card specification (SeqStore)
gcgnuc:* All sequences in Genbank and EMBL
Must create a query or list for most groupings
![Page 93: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/93.jpg)
File of Sequence Names
List Files You or certain GCG programs can
construct a file containing any number of sequence names.
![Page 94: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/94.jpg)
Specify as @Sequence_names.fil
The @ tells the program that Sequence_names.fil is a file of sequence names
The program uses all listed sequences
![Page 95: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/95.jpg)
Contents of a File of Sequence Names
Begin with a comment Sequence file names follow a double
period at the end of a line: .. Other comments can be included if
preceded by a ! One sequence name per line
![Page 96: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/96.jpg)
File of Sequence Names...
Put an ! in front of a name to have the program ignore that particular entry.
A sequence name may include a wild card The file can contain another file of
sequence names as a listing It must be preceded by an @
![Page 97: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/97.jpg)
hsp70.fil File
January 21, 1998 ..
SWP:Hs70_Brelc SWP:Hs70_Chick SWP:Hs70_Human SWP:Hs70_Leido SWP:Hs70_Leima SWP:Hs70_Maize SWP:Hs70_Mouse SWP:Hs70_Pethy SWP:HS77_YeastSWP:GR78_Yeast -BEGin=43 -END=682sequences/hsp70/ssa4.pepob0/users/lefkowit/sequences/hsp70/ssa1.pepSWP:DNAK_EColi
![Page 98: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/98.jpg)
Multiple Sequence Files (msf)
File containing multiple sequences that are related and have been aligned
Specifying msf files: filename.msf{*} The {*}indicates which sequences are to be used
You can exclude a sequence in subsequent analyses by preceding its name within the msf file with an ! sign.
![Page 99: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/99.jpg)
hsp70.msf
PileUp of: @Hsp70.Fil Symbol comparison table: GenRunData:NWSGapPep.Cmp CompCheck: 1254 GapWeight: 3.0 GapLengthWeight: 0.1
Pileup.Msf MSF: 738 Type: P December 26, 1990 13:39 Check: 288 .. Name: Hs70_Plafa Len: 738 Check: 9820 Weight: 1.00Name: Hs70_Thean Len: 738 Check: 120 Weight: 1.00!Name: Hs70_Leido Len: 738 Check: 7985 Weight: 1.00// 1 50Hs70_Plafa .......... .....MASAK GSKPNLPESN IAIGIDLGTT YSCVGVWRNE Hs70_Thean .......... .......... .......MTG PAIGIDLGTT YSCVAVYKDN Hs70_Leido .......... .......... ......MTFD GAIGIDLGTT YSCVGVWQNE
51 100Hs70_Plafa NVDIIANDQG NRTTPSYVAF T.DTERLIGD AAKNQVARNP ENTVFDAKRL Hs70_Thean NVEIIPNDQG NRTTPSYVAF T.DTERLIGD AAKNQEARNP ENTIFDAKRL Hs70_Leido RVDIIANDQG NRTTPSYVAF TSDSERLIGD AAKNQVAMNP HNTVFDAKRL
![Page 100: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/100.jpg)
rsf Files
Rich Sequence Format Allows entry of additional information
about each sequence File can contain multiple sequences
Allows gaps Different sequences do not need to be
related Create and Edit rsf files within SeqLab
![Page 101: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/101.jpg)
rsf Sequence Information
Creator/author of the sequence Sequence weight Creation date One-line description of the sequence Offset, or the number of leading gaps in a
sequence that is part of an alignment or fragment assembly project
Known sequence features
![Page 102: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/102.jpg)
rsf File Specification
Similar to msf files hsp70.rsf{*}
Use all the sequences in the file hsp70.rsf{hs70_human}
Only use this single sequence hsp70.rsf{hs70*}
Only use sequences whose name starts with hs70
![Page 103: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/103.jpg)
analyze% more rsb.rsf!!RICH_SEQUENCE 1.0..{name dc-62-18537descrip Description: PileUp of: *.seqtype DNAlongname dc-62-18537checksum 8717creation-date 4/10/98 15:45:50strand 1sequence TCCACCGTGCTCGACACAATCACTCCAAAATACACAATCCAACAGCAATCCCTCCACTCA ACCACCTCCGAAAACACACCCAGCTCCACACAAATACCCACAGCATCCGAGCCCTCCACA TTAAATCCTAAT}{name swed-60-860descrip Description: PileUp of: *.seqtype DNAlongname swed-60-860checksum 8595creation-date 4/10/98 15:45:50strand 1sequence TCCACCGTGATCGACACAATCACTCCAAAATACACAATCCAACAGCAATCCCTCCACTCA ACCACCTCCGAAAACACACCCAGCTCCACACAAATACCCACAGCATCCGAGCCCTCCACA TCAAATCCTACT}
![Page 104: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/104.jpg)
Finding and Displaying Sequences
![Page 105: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/105.jpg)
List Refinement
Run search program 1 Create a list of file names Use as input to search program 2 Create a second list of file names Edit the listfile at each step as necessary. etc.
![Page 106: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/106.jpg)
Programs Which Create a List of Sequences
Names Blast Lookup StringSearch FindPatterns FastA TFastA
![Page 107: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/107.jpg)
Names
Searches sequence names for a match analyze% names primate:Hum*
Will create a file listing all human sequences present in GenBank
Dependent on knowing name features GenBank:Hum* EMBL:Hs*
![Page 108: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/108.jpg)
analyze% names -check pr:huma* Names identifies GCG data files and sequence entries by name. It canshow you what set of sequences is implied by any sequence specification. Minimal Syntax: % names [-INfile=]GenEMBL:Humhb* -Default Prompted Parameters: [-OUTfile=]Term output file name (defaults to your terminal) Options: -SHOwfiles=132 limits documentation in the output file to column 132-NOHEAding suppresses the heading at the top of the file.-NOMONitor suppresses the screen monitor Add what to the command line ? What (file of filenames) output file (* TERM *) ? gb_pr1: huma1aadr huma1acm huma1acmb huma1ar1huma1ar2 huma1at huma1ata huma1atb
![Page 109: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/109.jpg)
analyze% more list.file!!SEQUENCE_LIST 1.0! NAMES from: pr:huma* April 13, 1998 14:55 .. gb_pr1:huma1aadr LOCUS HUMA1AADR 2002 bp mRNA PRI 04-NOV-1991 DEFINITION Human alpha-A1-adrenergic receptor mRNA, complete cds. ACCE
gb_pr1:huma1acm LOCUS HUMA1ACM 1520 bp mRNA PRI 30-OCT-1994 DEFINITION Human alpha-1-antichymotrypsin (AACT) mRNA, complete cds. ACC
gb_pr1:huma1acmb LOCUS HUMA1ACMB 559 bp DNA PRI 30-OCT-1994 DEFINITION Human alpha-1-antichymotrypsin gene, exon 1. ACCESSION M18035
gb_pr1:huma1ar1 LOCUS HUMA1AR1 890 bp DNA PRI 30-OCT-1994 DEFINITION Human alpha-1-antitrypsin-related protein gene, exon 2. ACCESSI
gb_pr1:huma1ar2 LOCUS HUMA1AR2 3758 bp DNA PRI 30-OCT-1994 DEFINITION Human alpha-1-antitrypsin-related protein gene, exons 3, 4 and
gb_pr1:huma1at LOCUS HUMA1AT 143 bp mRNA PRI 30-OCT-1994 DEFINITION Human alpha-1-antitrypsin (alpha-1-AT) mRNA, 3' end. ACCESSION M
gb_pr1:huma1ata LOCUS HUMA1ATA 322 bp DNA PRI 30-OCT-1994 DEFINITION Human alpha-1-antitrypsin gene, exon 1 (unexpressed). ACCESSION
gb_pr1:huma1atb LOCUS HUMA1ATB 1345 bp mRNA PRI 30-OCT-1994 DEFINITION Human alpha-1-antitrypsin mRNA, complete cds. ACCESSION M1146
![Page 110: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/110.jpg)
StringSearch
Old search method Searches for a particular text pattern in the
sequence documentation. Definition Search Record Search
Complete search for possible text occurances
Very Slow!!
![Page 111: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/111.jpg)
Lookup (gcgff only)
Rapid Text Pattern Searching Uses an index of sequence file
documentation Allows field-specific searches Allows AND; OR; NOT matching
![Page 112: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/112.jpg)
Lookup Considerations
Be sure that analyze is set to use a vt100 terminal: analyze% setenv TERM vt100
Lookup may miss some sequences Dependent on the annotation Spelling counts
Searches are case Insensitive
![Page 113: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/113.jpg)
Logical Operators Within a Field
AND: & A & B means find all entries that contain both A
and B. OR: |
A | B means find all entries that contain either A or B.
BUT-NOT: ! A ! B means find all entries that contain A but do
not contain B.
![Page 114: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/114.jpg)
analyze% lookup -check LookUp identifies sequence database entries by name, accession number,author, organism, keyword, title, reference, feature, definition, length, or date. The output is a list of sequences. The LookUp program is experimental in this release. LookUp sometimescrashes or produces incorrect results if you query a nucleic aciddatabase and request fragment output. Please look carefully at yourresults. Minimal Syntax: % lookup [-ALLtext=]Globin -Default
![Page 115: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/115.jpg)
Prompted Parameters: -LIBrary=SwissProt[,...] lookup in specified data libraries -ALLtext=Globin searches all text indices for globin-DEFInition=Globin words indexed independently "Globin & Region"-AUThor=Smithies for more than one "Smithies,O. & Slightom,J.L."-KEYword=Globin see document before using keywords-NAMe=hsggl3 entry name-ACCessionnumber=S12345 accession number-ORGanism="Homo Sapiens" genus and species-REFerence=Cell&1981 complete reference: "Cell & 26 & 191- & 1981"-TITle=History title of citation "History & Duplication"-FEAture=Gamma any word in a feature table-SHOrtest=100 find only sequences of length 100 or more-LONgest=400 find only sequences of length 400 or less-EARliest=01-apr-1992 sequences modified on or after April 1, 1992-LATest=30-apr-1992 sequences modified on or before April 30, 1992-MATch=OR specifies inter-field logic (AND is default)-OUTfile=lookup.list output file for list of sequences
![Page 116: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/116.jpg)
Optional Parameters: -NOWILdcardextension turns off automatic wildcard [email protected] searches in lookup.list instead of libraries-ANNotate=FEAture[,...] shows fields from original annotation in output acceptable values include: ACCession, AUThor, DATe, DEFinition, FEAture, NAMe, KEYword, ORGanism, REFerence, and TITle-FRAgments shows features as fragments instead of whole entries-COMplete shows only features with unambiguous coordinates-MONitor shows databases searched and how many hits found Add what to the command line ?
![Page 117: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/117.jpg)
LOOKUP in what sequence libraries: a) swissprot b) sptrembl c) pir d) embl e) genbank f) em_tags g) gb_tags h) All libraries q) quit Please choose one or more (* h *):
![Page 118: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/118.jpg)
Complete the query form below: All text: Definition: Author: Keyword: Sequence name: Accession number: Organism: Reference: Title: Feature: On or after (dd-mmm-yy): On or before (dd-mmm-yy): Shortest sequence length: Longest sequence length: Inter-field operator: AND Form of output list: Whole Entries Press <Ctrl>D to continue.
![Page 119: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/119.jpg)
SeqStore
Sequence searching
![Page 120: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/120.jpg)
Lookup_rdb (gcgrdb)
Seqstore command-line sequence searching
Barebones – Use Seqstore Web interface
![Page 121: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/121.jpg)
SeqStore Web Searching
Setup multiple criteria for selecting sets of sequences
Save as a query or list Query: Active list. Changes as new sequences are
added List: Static list. o change with database updates
Save to SeqWeb Powerful but can be slow
![Page 122: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/122.jpg)
NCBI Sequence Services
Obtain sequences directly from NCBI Sequence Searches Sequence Retrieval
Other services BLAST Searches Sequence Submission PubMed Searches
![Page 123: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/123.jpg)
Entrez
NCBI Databases on the Web Sequence retrieval Text pattern searches
GenBank is updated on a daily basis Web Site: http://www.ncbi.nlm.nih.gov
![Page 124: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/124.jpg)
Finding Sequences by Similarity
Using GCG
![Page 125: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/125.jpg)
Sequence Similarities
What other sequences have some primary sequence similarity to my query sequence?
Time and cost of the search is dependent on the size of the database Restrict the size of the database
![Page 126: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/126.jpg)
FindPatterns
Look for sequence patterns within sequence files
Allows complex pattern definitions Ambiguous sequence specifications
![Page 127: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/127.jpg)
BLAST; NetBlast
All search combinations possible nt vs. nt database
blastn protein vs. protein database
blastp translated nt vs. protein database
blastx protein vs. translated nt database
tblastn translated nt vs. translated nt database
tblastx
![Page 128: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/128.jpg)
FastA,
Search nucleotide sequences with a nucleotide query
Search protein sequences with a peptide query
![Page 129: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/129.jpg)
TFastA
Translates nucleotide sequences in all 6 reading frames
Search the translated sequences with a peptide query
![Page 130: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/130.jpg)
Displaying Data
analyze% typedata Displays on your screen the contents of any
GCG data file -REF
Display documentation only
![Page 131: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/131.jpg)
Copying Data
analyze% fetch Will copy any GCG data or sequence file to
your director
![Page 132: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/132.jpg)
Sequence Symbols
Sequence symbols Handout lists the sequence symbols
recognized by GCG Ambiguity codes are as proposed by the IUB
nomenclature committee Used by GenBank, EMBL, and NBRF
![Page 133: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/133.jpg)
Nucleotide Symbols IUB/GCG Meaning Complement Staden/Sanger A A T A C C G C G G C G T/U T A T M A or C K 5 R A or G Y R W A or T W 7 S C or G S 8 Y C or T R Y K G or T M 6 V A or C or G B not supported H A or C or T D not supported D A or G or T H not supported B C or G or T V not supported X/N G or A or T or C X -/X (Gap). not G or A or T or C . not supported
![Page 134: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/134.jpg)
Amino Acid Symbols IUB Symbol 3-letter Meaning Codons Depiction A Ala Alanine GCT,GCC,GCA,GCG !GCX B Asp,Asn Aspartic, Asparagine GAT,GAC,AAT,AAC !RAY C Cys Cysteine TGT,TGC !TGY D Asp Aspartic GAT,GAC !GAY E Glu Glutamic GAA,GAG !GAR F Phe Phenylalanine TTT,TTC !TTY G Gly Glycine GGT,GGC,GGA,GGG !GGX H His Histidine CAT,CAC !CAY I Ile Isoleucine ATT,ATC,ATA !ATH K Lys Lysine AAA,AAG !AAR L Leu Leucine TTG,TTA,CTT,CTC,CTA,CTG !TTR,CTX,YTR;YTX M Met Methionine ATG !ATG N Asn Asparagine AAT,AAC !AAY P Pro Proline CCT,CCC,CCA,CCG !CCX Q Gln Glutamine CAA,CAG !CAR R Arg Arginine CGT,CGC,CGA,CGG,AGA,AGG !CGX,AGR,MGR;MGX S Ser Serine TCT,TCC,TCA,TCG,AGT,AGC !TCX,AGY;WSX T Thr Threonine ACT,ACC,ACA,ACG !ACX V Val Valine GTT,GTC,GTA,GTG !GTX W Trp Tryptophan TGG !TGG X Xxx Unknown !XXX Y Tyr Tyrosine TAT, TAC !TAY Z Glu,Gln Glutamic, Glutamine GAA,GAG,CAA,CAG !SAR * End Terminator TAA, TAG, TGA !TAR,TRA;TRR
![Page 135: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/135.jpg)
Other Stuff
Non-sequence Data
![Page 136: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/136.jpg)
NonSequence Data
Non-Sequence Data Data required to run a program Copy to your directory with Fetch
![Page 137: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/137.jpg)
Local Data Files
Copies of GCG Data files stored in your own directory.
May be altered as desired.
![Page 138: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/138.jpg)
Using Local Data Files
Programs will look first in the default directory for a particular data file with a particular name. If not found the public data file will be used. A user may specify a new name for the data
file when running a program.
![Page 139: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/139.jpg)
Restriction Enzyme Files
REBASE (enzyme.dat) REBASE 6/2000 Dr. Richard J. Roberts Cold Spring Harbor Laboratory
Used by: Map, MapSort, MapPlot
![Page 140: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/140.jpg)
Prosite
Dictionary of sequence motifs Dr. Amos Bairoch, University of Geneva
Release 16, 7/1999 over 1300 patterns
Used by: Motifs
![Page 141: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/141.jpg)
Profiles
Database of peptide profiles Drs. Michael Gribskov and Amos Bairoch
Over 600 Profiles Used by ProfileScan
![Page 142: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/142.jpg)
Eukaryotic Transcription Factor Recognition Sites
Transcription Factor Database Dr. David Ghosh, NCBI Release 7.5, 3/96 genmoredata:tfsites.dat Used by:
FindPatterns Map, MapSort, MapPlot
![Page 143: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/143.jpg)
Codon Frequency Tables
Frequency of particular codon usage Look in genmoredata Organism
Human E. coli Drosophila
Used by: BackTranslate, CodonPreference
![Page 144: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/144.jpg)
Translation Tables
Standard Table for translating nucleotide sequences into amino acid sequences
Look in genmoredata Alternate translation tables
Mitochondria Mycoplasma
Used by: Translate, Map, Frames
![Page 145: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/145.jpg)
Symbol Comparison Tables
Amino acid similarities What is the chance that one amino acid can
substitute for another without affecting function?
Used by all sequence comparison programs FastA, TFastA, Blast Gap, BestFit PileUp
![Page 146: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/146.jpg)
Protein Analysis Data
Amino acid properties Charge, hydrophobicity, molecular weight,
secondary structure predictions ect. Protease digestion sites Used by:
PepPlot; PlotStructure
![Page 147: Data Sequences and Other Stuff. Sequence Data Nucleic Acid and Protein Sequences Sources of Genetic Sequences User GCG supplied databases Flat File Oracle.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d585503460f94a370c4/html5/thumbnails/147.jpg)
Free Energy Values
RNA secondary structure prediction Used by:
Mfold, FoldRNA