Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB,...

81
BioinfRes SoSe 18 Bioinforma)cs Resources - Genbank - Lecture & Exercises Prof. B. Rost, Dr. L. Richter, J. Reeb Ins)tut für Informa)k I12

Transcript of Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB,...

Page 1: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

Bioinforma)csResources-Genbank-

Lecture&ExercisesProf.B.Rost,Dr.L.Richter,J.Reeb

Ins)tutfürInforma)kI12

Page 2: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

PreliminarySchedule

* These exercises can earn you a bonus

April 13th Intro, General Overview (1. sh.) June 1th Lecture cancelled April 20th Sequence Databases (2. sh.) June 8th NoSql 2 (7.sh.) April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications (9.sh.) May 11th Lecture cancelled June 29th PredictProtein May 18th SQL (5. sh.) Jul 6th Wrap Up, Q&A May 25th SQL, NoSql (6. sh) Jul 20th Exam

Page 3: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

Na)onalCenterforBiotechnologyInforma)on,NCBI

http://nihrecord.nih.gov/newsletters/2013/07_19_2013/images/milestonesPic6.jpg

●  firstideasinthemiddleofthe80s

●  divisionoftheNa)onalLibraryofMedicine(NLM)insidetheNa)onalIns)tutesofHealth(NIH)

●  poli)calmission

●  foundedin1988

●  DavidLipman

Page 4: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

NCBI’spoli)calmissionasdefinedbythebill:1.  design,develop,implement,andmanageautomatedsystems

forthecollec)on,storage,retrieval,analysis,anddissemina)onofknowledgeconcerninghumanmolecularbiology,biochemistry,andgene)cs;

2.  performresearchintoadvancedmethodsofcomputer-basedinforma)onprocessingcapableofrepresen)ngandanalyzingthevastnumberofbiologicallyimportantmoleculesandcompounds;

3.  enablepersonsengagedinbiotechnologyresearchandmedicalcaretousesystemsdevelopedunderparagraph(1)andmethodsdescribedinparagraph(2);and

4.  coordinate,asmuchasisprac)cable,effortstogatherbiotechnologyinforma)ononaninterna)onalbasis.

Page 5: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

SelectedNCBIAccomplishmentsBlastGenBankatNCBI

NCBIwebsite

GenomesOMIM

PubMed

1990

1992

1994

1995

1996

1997

HumanGenomePubMedCentral

EntrezGene/DTDs

NIHPublicAccessGenomeReferenceConsor)um

1000GenomesProject

1999

2000

2003

2005

2007

2008

Page 6: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

NCBIResources●  NCBIcurrentlyhostsavastbunchofresourceshap://www.ncbi.nlm.nih.gov/guide/all/

●  groupedaccordingtovariouscriteria-  metadata,project-centric-  methodoriented-  topicoriented

●  sortedinthesec)ons:databases,downloads,submissions,tools,howtos

Page 7: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

Genbank’sOrigin

●  WalterGoad,LosAlamosNa)onalLaboratory

●  LosAlamosSequenceDatabase1979

●  Crea)onandreleaseofGenBankin1982

●  Endof1982:2000sequences

●  MovetoNCBIin1992http://www.lanl.gov/science-innovation/features/innovations/images/light/thumbnails/21.jpg

Page 8: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

Minutesfrom20thanniversaryofGenBankin2002

“....AmongthemisamemoonLosAlamosNa)onalLaboratorysta)onerydatedMay9,1980,thatreads:Monday,May12at10:30SteveSimoninvitesyouforcakeandcoffeetocelebrate100,000basesnowintheDNAsequencelibrary.”

takenfromhaps://www.genomeweb.com/genbank-turns-20

Page 9: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

GrowthofGenBankandWGS

-doublingapprox.every18months,diagramforrelease225,Apr.2018-currentversion:release225:260,189,141,631basesinGenbank,2,784,740,996,536basesinWGS-takenfromhap://www.ncbi.nlm.nih.gov/genbank/sta)s)cs

Page 10: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

GrowthofGenBankandWGS

-currentrelease225:208,452,303sequencesinGenbank,621,379,029sequencesinWGS-takenfromhap://www.ncbi.nlm.nih.gov/genbank/sta)s)cs,release225,Apr.2018

Page 11: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

ReferencesforGenBank●  onecurrentcita)onsource:“GenBank”.NucleicAcidsRes.2014Jan;42(Databaseissue):D32-7.doi:10.1093/nar/gkt1030.Epub2013Nov11.

●  PMID:24217914●  themostrecent:“Genbank”.NucleicAcidsRes.2018Jan4;46(D1):D41–D47.Publishedonline2017Nov13th.doi:10.1093/nar/gkx1094

●  PMCID:PMC5753231

Page 12: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

ReferencesforGenBank●  moregeneralforNCBIservices:“DatabaseresourcesoftheNa)onalCenterforBiotechnologyInforma)on”.NucleicAcidsRes.2016Jan4;44(Databaseissue):D7–D19.Publishedonline2015Nov28.doi:10.1093/nar/gkv1290

●  partoftheInterna)onalNucleo)deSequenceDatabaseCollabora)on(INSDC)togetherwithEMBLNucleo)deSequenceDatabase(EMBL-Bank),partoftheEuropeanNucleo)deArchive(ENA)andtheDNADataBankofJapan(DDBJ)

Page 13: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

MostGrowingDivisionsDivision Description Release 197

(8/2013) Annual Increase (%)

WGS* Whole-genome shotgun data 2,035,032,639,807 from Release 219

TSA* Transcriptome shotgun data 149,038,907,599 from Release 219

WGS* Whole-genome shotgun data 500.420.412.665 62.4.

TSA* Transcriptome shotgun data 8.6333123.935 49.9

PHG Phages 119.812.712 42.5

VRL Viruses 1.757.202.472 22.9

BCT Bacteria 10.281.048.518 21.8

ENV Environmental samples 3.743.277.434 10.9

INV Invertebrates 2.737.140.464 9.8

PAT Patented sequences 13.290.161.247 9.7

PLN Plants 5.963.882.822 8.8

GSS Genome survey sequences 23.726.384.753 8.1

VRT Other vertebrates 3.068.956.026 6.3

MAM Other mammals 911.342.025 5.6

... ... ... ...

TOTAL All GenBank sequences 654.613.333.676 45.1 * not distributed with the release; there specific project server sections

Page 14: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

TopOrganisms(Rel.207)Organism Entries Non-WGS base

pair Homo sapiens 20.921.637 17.714.786.437

Mus musculus 9.727.522 9.995.696.539

Rattus norvegicus 2.193.812 6.526.236.496

Bos taurus 2.227.298 5.410.360.312

Zea mays 4.177.175 5.201.714.457

Sus scrofa 3.297.029 4.895.127.638

Danio rerio 1.727.668 3.133.901.682

Triticum aestivum 1.796.780 1.927.718.314

... ... ...

Oryza sativa Japonica Group

1.376.410 1.265.556.227

... ... ...

Arabidopsis thaliana 2.578.785 1.202.100.008

... ...

Page 15: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

TopOrganisms(Rel.219)Organism Entries Non-WGS base pair

Homo sapiens 24,231,652 18,893,466,733

Mus musculus 9,883,173 10,229,286,664

Rattus norvegicus 2,197,781 6,528,984,315

Bos taurus 2,229,235 5,429,379,063

Zea mays 4,197,803 5,227,077,026

Sus scrofa 3,298,802 5,071,347,463

Hordeum vulgare ssp. vulgare

1,346,798 3,235,834,212

Danio rerio 1,729,033 3,190,913,255

Ovis canadanensis canadanensis

72 2,590,574,434

Triticum aestivum 1,812,814 1,942,831,630

... ... ...

Oryza sativa Japonica Group

1,378,262 1,642,328,218

... ... ...

Escherichia coli 118,884 1,571,576,668

... ...

Page 16: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

Distribu)onofSequenceFiles(Rel.207)Division Number of Files

BCT 178 CON 317 ENV 81 EST 478 HTG 142 INV 126 PAT 219 PLN 107 TSA 175 VRL 34

Release 207 consists of 2333 text files in total. Release 225 consists of 3120 text files in total.

Page 17: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

Distribu)onofSequenceFiles(Rel.2019)Division Number of Files

BCT 350 CON 359 ENV 97 EST 483 HTG INV 153 PAT 290 PHG 4 PLN 145 PRI 56 SYN 10 TSA 230 VRL 48

Release 219 consists of 2225 text files in total.

Page 18: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

DatabaseFiles(Rel.225)

●  GenBankcomesinasetofcompressedtextfilesavailableviaFTP

●  seekp://kp.ncbi.nih.gov/genbank/gbrel.txt●  3120ASCIIfiles(listedindivisionplusaddi)onallistfiles)intherangeof0.7-520MB

●  uncompressed~885GB●  eachfileconsistsoftwopor)ons

Page 19: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

DatabaseFiles●  Part1:highlyconserveddatabasefileheaders1 10 20 30 40 50 60 70 79 ---------+---------+---------+---------+---------+---------+---------+--------- GBBCT1.SEQ Genetic Sequence Data Bank April 15 2015 NCBI-GenBank Flat File Release 207.0 Bacterial Sequences (Part 1) 51396 loci, 92682287 bases, from 51396 reported sequences ---------+---------+---------+---------+---------+---------+---------+--------- 1 10 20 30 40 50 60 70 79

●  Part1:sequenceentriesforthatdivisiondescribedintheheader

Page 20: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

1 10 20 30 40 50 60 70 79!---------+---------+---------+---------+---------+---------+---------+---------!GBSMP.SEQ Genetic Sequence Data Bank! December 15 1992!! GenBank Flat File Release 74.0!! Structural RNA Sequences!! 2 loci, 236 bases, from 2 reported sequences!!LOCUS AAURRA 118 bp ss-rRNA RNA 16-JUN-1986!DEFINITION A.auricula-judae (mushroom) 5S ribosomal RNA.!ACCESSION K03160!VERSION K03160.1!KEYWORDS 5S ribosomal RNA; ribosomal RNA.!SOURCE A.auricula-judae (mushroom) ribosomal RNA.! ORGANISM Auricularia auricula-judae! Eukaryota; Fungi; Eumycota; Basidiomycotina; Phragmobasidiomycetes;! Heterobasidiomycetidae; Auriculariales; Auriculariaceae.!REFERENCE 1 (bases 1 to 118)! AUTHORS Huysmans,E., Dams,E., Vandenberghe,A. and De Wachter,R.! TITLE The nucleotide sequences of the 5S rRNAs of four mushrooms and! their use in studying the phylogenetic position of basidiomycetes! among the eukaryotes! JOURNAL Nucleic Acids Res. 11, 2871-2880 (1983)!FEATURES Location/Qualifiers! rRNA 1..118! /note="5S ribosomal RNA"!BASE COUNT 27 a 34 c 34 g 23 t!ORIGIN 5' end of mature rRNA.! 1 atccacggcc ataggactct gaaagcactg catcccgtcc gatctgcaaa gttaaccaga! 61 gtaccgccca gttagtacca cggtggggga ccacgcggga atcctgggtg ctgtggtt!//!!

LOCUS ABCRRAA 118 bp ss-rRNA RNA 15-SEP-1990!DEFINITION Acetobacter sp. (strain MB 58) 5S ribosomal RNA, complete sequence.!ACCESSION M34766!VERSION M34766.1!KEYWORDS 5S ribosomal RNA.!SOURCE Acetobacter sp. (strain MB 58) rRNA.! ORGANISM Acetobacter sp.! Prokaryotae; Gracilicutes; Scotobacteria; Aerobic rods and cocci;! Azotobacteraceae.!REFERENCE 1 (bases 1 to 118)! AUTHORS Bulygina,E.S., Galchenko,V.F., Govorukhina,N.I., Netrusov,A.I.,! Nikitin,D.I., Trotsenko,Y.A. and Chumakov,K.M.! TITLE Taxonomic studies of methylotrophic bacteria by 5S ribosomal RNA! sequencing! JOURNAL J. Gen. Microbiol. 136, 441-446 (1990)!FEATURES Location/Qualifiers! rRNA 1..118! /note="5S ribosomal RNA"!BASE COUNT 27 a 40 c 32 g 17 t 2 others!ORIGIN ! 1 gatctggtgg ccatggcggg agcaaatcag ccgatcccat cccgaactcg gccgtcaaat! 61 gccccagcgc ccatgatact ctgcctcaag gcacggaaaa gtcggtcgcc gccagayy!//!---------+---------+---------+---------+---------+---------+---------+---------!1 10 20 30 40 50 60 70 79!

Page 21: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

TheGenBankFlatFileFormat

●  asequenceentryconsistsofmanyrecords(lines)●  eachrecordconsistsoftwoparts

●  Part1:columns1-10/EntryFieldName

●  Part2:remaininglinewiththecontent

Page 22: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

Part1/1●  akeyword,beginningincolumn1oftherecord(e.g.,REFERENCEisakeyword)

●  asubkeywordbeginningincolumn3,withcolumns1and2blank(e.g.,AUTHORSisasubkeywordofREFERENCE)

●  orasubkeywordbeginningincolumn4,withcolumns1,2,and3blank(e.g.,PUBMEDisasubkeywordofREFERENCE)

Page 23: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

Part1/2

●  blankcharacters,indica)ngthatthisrecordisacon)nua)onoftheinforma)onunderthekeywordorsubkeywordaboveit

●  acode,beginningincolumn6,indica)ngthenatureofanentry(featurekey)intheFEATUREStable

Page 24: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

Part1/3●  anumber,endingincolumn9oftherecord:-  Thisnumberoccursinthepor)onoftheentrydescribingtheactualnucleo)desequenceanddesignatesthenumberingofsequenceposi)ons

●  twoslashes(//)inposi)ons1and2,markingtheendofanentry

Page 25: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

Part2●  Thesecondpartofeachsequenceentryrecordcontainstheinforma)onappropriatetoitskeyword

●  inposi)ons13to80forkeywords

●  inposi)ons11to80forthesequence

Page 26: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

EntryFieldTypes(incomplete)●  Locus:Ashortmnemonicnamefortheentry,chosentosuggestthesequence'sdefini)on;mandatorykeyword/exactlyonerecord.

●  Defini4on:Aconcisedescrip)onofthesequence;mandatorykeyword/oneormorerecords

●  Accession:-  theprimaryaccessionnumberisaunique,unchangingiden4fierassignedtoeachGenBanksequencerecord.

-  tobeusedforcita)onsfromGenBank-  mandatorykeyword/oneormorerecords.

Page 27: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

EntryFieldTypes(incomplete)

●  Version:-  compoundiden)fierconsis)ngoftheprimaryaccessionnumberandanumericversionnumberassociatedwiththecurrentversionofthesequencedataintherecord

-  op)onallyfollowedbyanintegeriden)fier(a"GI")assignedtothesequencebyNCBI

-  mandatorykeyword/exactlyonerecord

Page 28: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

EntryFieldTypes(incomplete)

●  DBLINK:providescross-referencestoresourcesthatsupporttheexistenceasequencerecord;op4onalkeyword/oneormorerecords

●  Keywords:shortphrasesdescribinggeneproductsandotherinforma)onaboutanentry;mandatorykeywordinallannotatedentries/oneormorerecords

Page 29: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

EntryFieldTypes(incomplete)

●  Source:Commonnameoftheorganismorthenamemostfrequentlyusedintheliterature;mandatorykeywordinallannotatedentries/oneormorerecords/includesonesubkeyword

●  Organism:Formalscien)ficnameoftheorganism(firstline)andtaxonomicclassifica)onlevels(secondandsubsequentlines);mandatorysubkeywordinallannotatedentries/twoormorerecords

Page 30: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

EntryFieldTypes(incomplete)●  Reference:-  Cita)onsforallar)clescontainingdatareportedinthisentry

-  includessevensubkeywordsandmayrepeat-  mandatorykeyword/oneormorerecords

●  Journal:liststhejournalname,volume,year,andpagenumbersofthecita)on;mandatorysubkeyword/oneormorerecords

●  op)onalsubkeywords:Authors,Consor)um,Title,Medline,Pubmed,Remark

Page 31: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

EntryFieldTypes(incomplete)●  Features:tablecontaininginforma)ononpor)onsofthesequencethatcodeforproteinsandRNAmolecules;sitesofbiologicalsignificance;op4onalkeyword/oneormorerecords

●  Origin:-  specifica)onofhowthefirstbaseofthereportedsequenceisopera)onallylocatedwithinthegenome

-  mandatorykeyword/exactlyonerecord-  followedbysequencedata(mul)plerecords)

●  //:entrytermina)onsymbol;mandatoryattheendofanentry/exactlyonerecord

Page 32: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

DetailedLocusFormatColumns Contents 01-05 'LOCUS'

06-12 spaces

13-28 Locus name

29-29 space

30-40 Length of sequence, right-justified

41-41 space

42-43 bp

44-44 space

45-47 spaces, ss- (single-stranded), ds- (double-stranded), or ms- (mixed-stranded)

48-53 NA, DNA, RNA, tRNA (transfer RNA), rRNA (ribosomal RNA), mRNA (messenger RNA), uRNA (small nuclear RNA), left justified

54-55 space

56-63 'linear' followed by two spaces, or 'circular'

64-64 space

65-67 The division code

68-68 space

69-79 Date, in the form dd-MMM-yyyy (e.g., 15-MAR-1991)

Page 33: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

AccessionFormat●  sixoreightcharacters●  sixcharacterformat:-  singleuppercaseleaer-  5digits

●  eigthcharacterformat:-  twouppercaseleaers-  6digits

●  primaryaccessionnumberalwaysthefirstone

Page 34: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

Features(Incomplete)

●  authorita)vesource:hap://www.insdc.org/documents/feature-table

●  featuretablecontainsinforma)onabout:-  geneandgeneproducts-  regionsofbiologicalsignificance-  canenumeratedifferencesbetweenvariousreports-  providescross-referencestootherdatacollec)ons-  allowshierarchicalrela)onbetweenthefeatures

Page 35: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

Layout●  firstlineofthefeaturetableisaheader●  includesthekeyword‘FEATURES’andthecolumnheader‘Loca)on/Qualifiers’

●  eachfeatureconsistsof:-  descriptorlinecontainingafeaturekeyandaloca)on

-  acon)nua)onlinefortheloca)onmayfollow-  featurequalifiersmayfollowthedescriptorline-  key:column6-20,loca)onstartsincolumn22-  qualifiersonsubsequentlinesatcolumn22star)ngwitha‘/’

Page 36: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

AFewFrequentFeatures●  CDS:sequencecodingforaminoacidsinprotein(includesstopcodon)

●  exon:regionthatcodesforpartofsplicedmRNA●  gene:regionthatdefinesafunc)onalgene,possiblyincludingupstream(promotor,enhancer,etc)anddownstreamcontrolelements,andforwhichanamehasbeenassigned

●  mRNA:messengerRNA

●  .......>60featurescurrently

Page 37: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

Loca)onandQualifiers

●  Loca)on:-  aloca)oncanbe:asinglebase,aspanofbases,asitebetweentwobases,ajoinofsequences,...

-  examples:23,23..56,23^24,join(23..56,87..110)

●  Qualifiers:-  format:fromcolumn22/qualifier_name[=value]-  types:freetext,enumera)onorcontrolledvocabulary,cita)ons,sequences,featurelabels

Page 38: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

DatabaseCrossReferences/db_xref

●  hap://www.ncbi.nlm.nih.gov/genbank/collab/db_xref/

●  Qualifier:/db_xref="database:idenDfier”●  Defini4on:databasecross-reference:pointertorelatedinforma)oninanotherdatabase

●  Scope:allfeaturekeys●  Example:/db_xref="Swiss-Prot:P12345”

●  currently>120databasesavailable

Page 39: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

AnatomyofaGenbankFlatFile

. . .

Page 40: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

AnatomyofaGenbankFlatFile

. . .

Locus line

Page 41: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

AnatomyofaGenbankFlatFile

. . . Accession Number, Version and GI number

Page 42: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

AnatomyofaGenbankFlatFile

. . . Feature table with annotations

Page 43: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

UsefulResourcesfromNCBI

●  Materials:●  Electronicbookshelf

●  hap://www.ncbi.nlm.nih.gov/educa)on/factsheets/

●  kp://kp.ncbi.nih.gov/pub/factsheets/Factsheet_Books.pdf

●  NCBImanuals

●  textbooks

Page 44: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

UsefulResourcesfromNCBI

●  Processes,e.g.Prokaryo)cGenomeAnnota)onPipeline

●  designedforbacterialandarchaealgenomes●  mul)-levelprocessincludingprotein-codinggenepredic)onandfunc)onalgenomeunitlikerRNAs,tRNAs,smallRNAs,pseudogenescontrolregions,repeats,inser)onelementsa.s.f.

●  combina)onofab-iniDopredic)onandhomologybasedmethods

Page 45: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

UsefulResourcesfromNCBI●  referencedatabases:RefSeq●  hap://www.ncbi.nlm.nih.gov/refseq/

●  comprehensive,integrated,non-redundant,well-annotatedsetofsequences,includinggenomicDNA,transcripts,andproteins

●  stablereferenceforgenomeannota)on,esp.subsetofRefSeqGene

●  referencesequences

●  referencecoordinates●  accessibleviaBLAST,EntrezandFTP

Page 46: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

RefSeq●  createdby:-  Eukaryo)cGenomeAnnota)onPipeline-  Prokaryo)cGenomeAnnota)onPipeline-  Manualcura)on-  SubmissiontoINSDCmembers

●  reflectcurrentknowledgeofsequencesdataandbiology

●  formatconsistency●  Accessionnumbercontainsan“_”

Page 47: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

RefSeqGrowth

Page 48: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

DatabasesAccessibleviaEntrez

http://www.ncbi.nlm.nih.gov/gquery/

Page 49: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

Computa)on:BlastatNCBI

Page 50: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

Page 51: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

Page 52: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

Page 53: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

Page 54: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

SearchingtheNCBI/Entrez●  provideanintegratedsearchinterfacetothedifferentNCBIdatabases:EntrezProgrammingU)li)es(E-u)li)es)

●  Base-URL:hap://eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/

●  >40databases

●  stableinterfaceofnineserver-sideprograms

●  hap://www.ncbi.nlm.nih.gov/books/NBK25501/

Page 55: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

EntrezGuidelines●  ifyouusetheeu)lsagainsttheguidelinesyoumightbebanned!

●  >100requests:weekendsoroutsideUSpeak)mes(9pm-5am,EST)

●  notmorethan3requestpersecond

●  provideemailandtoolname:&tool=<...>&email=<...>!

●  registra)onwithemailandtoolnamewithNCBImayrelaxtheserestric)ons

●  supportedbyBioPython

Page 56: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

Construc)ngURLs

●  parameter:&lowerCaseName●  excep)on:&WebEnv

●  norequiredorder

●  nullvaluesandinappropriateparameteraregenerallyignored

●  nospaces,use+instead

●  useURLencodingsforspecialcharacterlike:%22for“or%23for#or%40for@

Page 57: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

E-u)li)es●  Einfo●  Esearch

●  EPost

●  ESummary●  EFetch

●  ELink

●  EGQuery

●  ESpell●  ECitMatch

Page 58: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

ExternalInterfacestoEntrez/API●  thereareanumberofAPIstoaccessthevariousservicesfromNCBI,describedat:

●  hap://www.ncbi.nlm.nih.gov/books/NBK25501/●  baseURL:hap://eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/

●  basicsearching:-  esearch.fcgi?db=<database>&term=<query>-  Input:Entrezdatabase(&db);anyEntreztextquery(&term)

-  Output:ListofUIDsmatchingtheEntrezquery

Page 59: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

ESearch

●  textsearch●  eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/esearch.fcgi

●  respondstoatextquerywiththelistofmatchingUIDsinagivendatabase(forlateruseinESummary,EFetchorELink),alongwiththetermtransla)onsofthequery

Page 60: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

ESummary

●  documentsummarydownloads●  eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/esummary.fcgi

●  respondstoalistofUIDsfromagivendatabasewiththecorrespondingdocumentsummaries

Page 61: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

EGQuery

●  globalquery●  eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/egquery.fcgi

●  respondstoatextquerywiththenumberofrecordsmatchingthequeryineachEntrezdatabase

Page 62: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

EInfo

●  databasesta)s)cs●  eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/einfo.fcgi

●  providesthenumberofrecordsindexedineachfieldofagivendatabase,thedateofthelastupdateofthedatabase,andtheavailablelinksfromthedatabasetootherEntrezdatabases

●  without&db:listsallavailabledatabases

Page 63: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

EFetch

●  datarecorddownloads●  eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/efetch.fcgi

●  respondstoalistofUIDsinagivendatabasewiththecorrespondingdatarecordsinaspecifiedformat

Page 64: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

ELink

●  Entrezlinks●  eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/elink.fcgi

●  respondstoalistofUIDsinagivendatabasewitheitheralistofrelatedUIDs(andrelevancyscores)inthesamedatabaseoralistoflinkedUIDsinanotherEntrezdatabase

Page 65: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

ELink

●  checksfortheexistenceofaspecifiedlinkfromalistofoneormoreUIDs

●  createsahyperlinktotheprimaryLinkOutproviderforaspecificUIDanddatabase,orlistsLinkOutURLsandaaributesformul)pleUIDs

Page 66: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

EPost

●  UIDuploads●  eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/epost.fcgi

●  acceptsalistofUIDsfromagivendatabase,storesthesetontheHistoryServer,andrespondswithaquerykeyandwebenvironmentfortheuploadeddataset

Page 67: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

ESpell

●  spellingsugges)ons●  eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/espell.fcgi

●  retrievesspellingsugges)onsforatextqueryinagivendatabase

Page 68: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

ECitMatch

●  batchcita)onsearchinginPubMed●  eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/ecitmatch.cgi

●  retrievesPubMedIDs(PMIDs)correspondingtoasetofinputcita)onstrings

Page 69: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

Iden)ficators●  recordsareiden)fiedbyanintegerIDcalledUID●  UIDaredatabasespecificlikeGInumbers,PMIDS,MMDB-IDs

●  UIDareaswellinputandoutput

●  especiallyusefulincombina)onwiththeHistoryserver

●  afulldescrip)onofparametersandsyntaxcanbefoundat:hap://www.ncbi.nlm.nih.gov/books/NBK25499/

Page 70: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

SelectedUIDsEntrez Database UID common name E-utility Database Name Books Book ID books Conserved Domains PSSM-ID cdd dbVar dbVar ID dbvar EST GI number nucest Gene Gene ID gene Genome Genome ID genome MeSH MeSH ID mesh NCBI Web Site Web Site ID ncbisearch Nucleotide GI number nuccore PubMed PMID pubmed ... ... ...

Page 71: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

EntrezCoreEngine●  EGQuery,ESearch,andESummary●  twotasks:-  assemblealistofUIDsthatmatchatextquery(ESearch)-  retrieveabriefsummaryrecordcalledaDocumentSummary(DocSum)foreachUIDESummary)

●  EGQuey:globalversionofESearch●  esearch.fcgi?db=database&term=query esummary.fcgi?db=database&id=uid1,uid2,uid3,...!

●  expandedintomorecomplicatedEntrezqueries

Page 72: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

EntrezDatabases(EInfo,EFetch,andELink)

●  EInfo:-  providesdetailedinforma)onabouteachdatabase-  includinglistsoftheindexingfieldsinthedatabase-  availablelinkstootherEntrezdatabases

Page 73: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

EntrezDatabases(EInfo,EFetch,andELink)

●  addedvaluetotherawdata:-  supportsavarietyofdisplayformats:EFetchUIDlistsinXMLandplaintext(&retmode)foralldatabases,otherformats(&rettype)aredatabasespecific

-  hap://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/?report=objectonly

-  efetch.fcgi?db=database&id=uid1,uid2,uid3 &rettype=report_type&retmode=data_mode!

Page 74: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

EntrezDatabases(EInfo,EFetch,andELink)

●  addedvaluetotherawdata:-  linkstorecordsinotherEntrezdatabasesmanifestedaslistofassociatedUIDs

-  UIDsmustbevalidinsourcedatabase(&dbfrom)-  elink.fcgi?dbfrom=protein&db=gene&id=15718680,157427902

Page 75: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

EntrezHistoryServer

●  simple:intheGUIaccessibleviatherespec)vetabs

●  youcanstoretemporarilysetsofUIDsasinputforlaterqueriesthroughothertools

●  eachlistofUIDsisspecifiedby:-  &query_key(integerlabel)-  &WebEnv(cookiestring)

Page 76: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

Crea)onofastoredUIDlist

●  EPost:-  EPostcanbeuseduploadaUIDlist-  returns&query_keyand&WebEnv!

●  ESearch:-  storestheresultsifgiven&usehistory=y!

●  ELink:-  storestheresultsifgiven&cmd=neighbor_history!

Page 77: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

UsageofstoredUIDlists●  Useofstoredlists:esummary.fcgi?db=database&WebEnv=webenv &query_key=key!

●  onewebenvironmentcanholdmul)pleresultlists

●  listsinthesamewebenvironmentcanbecombinedwithAND,OR,NOT

●  bydefaulteverycallcreatesanewenvironment

●  ->give&WebEnvinsubsequentcallstostorethelistsinthesamewebenvironment

Page 78: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

SketchingPipelines

●  getDocSummariesorentriesforkeywordsorIDs:-  ESearch->ESummary/EFetch-  EPost->ESummary/EFetch

●  filter/limitarecordset:-  EPost/ELink->ESearch

●  moreadvancedqueries:-  ESearch->ELink->ESummary/EFetch-  EPost->ELink->ESearch->EFetch

Page 79: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

●  storingresults:-  esearch.fcgi?db=<database>&term=<query>&usehistory=y

-  input:anyEntreztextquery(&term);Entrezdatabase(&db);&usehistory=y

-  output:webenvironment(&WebEnv)andquerykey(&query_key)parametersspecifyingtheloca)onontheEntrezhistoryserverofthelistofUIDsmatchingtheEntrezquery

-  example:hap://eu)ls.ncbi.nlm.nih.gov/entrez/eu)ls/esearch.fcgi?db=pubmed&term=science%5bjournal%5d+AND+breast+cancer+AND+2008%5bpdat%5d&usehistory=y

Page 80: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

●  Associa)ngSearchResultswithExis)ngSearchResults:-  esearch.fcgi?db=<database>&term=<query1>&usehistory=y

-  esearch.fcgi?db=<database>&term=<query2>&usehistory=y&WebEnv=$web1

-  Input:AnyEntreztextquery(&term);Entrezdatabase(&db);&usehistory=y;Exis)ngwebenvironment(&WebEnv)fromapriorE-u)litycall

-  Output:Webenvironment(&WebEnv)andquerykey(&query_key)parametersspecifyingtheloca)onontheEntrezhistoryserverofthelistofUIDsmatchingtheEntrezquery

Page 81: Bioinformacs Resources - Genbank...April 27th Sequence Databases (3. sh.) June 15th MongoDB, JavaScript (8.sh.) May 4th Structure Databases (4. sh.) June 22nd Node.js Applications

BioinfRes SoSe 18

E-u)lityWebinar

●  haps://www.youtube.com/watch?v=iCFVVexp30o